PySpark : Reversing the order of lists in a dataframe column using PySpark

PySpark @ Freshers.in

pyspark.sql.functions.reverse

Collection function: returns a reversed string or an array with reverse order of elements.

In order to reverse the order of lists in a dataframe column, we can use the PySpark function reverse() from pyspark.sql.functions. Here’s an example.

Let’s start by creating a sample dataframe with a list of strings.

from pyspark.sql import SparkSession
from pyspark.sql.functions import reverse
spark = SparkSession.builder.getOrCreate()
#Create a sample data
data = [("Sachin", ["Python", "C", "Go"]),
        ("Renjith", ["RedShift", "Snowflake", "Oracle"]),
        ("Ahamed", ["Android", "MacOS", "Windows"])]
#Create DataFrame
df = spark.createDataFrame(data, ["Name", "Techstack"])
df.show()

Output

+-------+--------------------+
|   Name|           Techstack|
+-------+--------------------+
| Sachin|     [Python, C, Go]|
|Renjith|[RedShift, Snowfl...|
| Ahamed|[Android, MacOS, ...|
+-------+--------------------+

Now, we can apply the reverse() function to the “Techstack” column to reverse the order of the list.

df_reversed = df.withColumn("Fruits", reverse(df["Techstack"]))
df_reversed.show()

Output

+-------+--------------------+
|   Name|           Techstack|
+-------+--------------------+
| Sachin|     [Go, C, Python]|
|Renjith|[Oracle, Snowflak...|
| Ahamed|[Windows, MacOS, ...|
+-------+--------------------+

As you can see, the order of the elements in each list in the “Techstack” column has been reversed. The withColumn() function is used to add a new column or replace an existing column (with the same name) in the dataframe. Here, we are replacing the “Fruits” column with a new column where the lists have been reversed.

Spark important urls to refer

  1. Spark Examples
  2. PySpark Blogs
  3. Bigdata Blogs
  4. Spark Interview Questions
  5. Official Page
Author: user

Leave a Reply