PySpark : How to sort a dataframe column in ascending order while putting the null values first ?

PySpark @ Freshers.in

pyspark.sql.Column.asc_nulls_first

In PySpark, the asc_nulls_first() function is used to sort a column in ascending order while putting the null values first. This function is a part of the pyspark.sql.functions module, and it can be used in conjunction with the orderBy() method to sort a DataFrame. In other words we can say as, this provides a sort expression that is based on the column’s ascending order, with null values returning before non-null values.

Here is an example of using the asc_nulls_first() function in PySpark:

from pyspark.sql import SparkSession
from pyspark.sql.functions import asc_nulls_first
# Create a SparkSession
spark = SparkSession.builder.appName("Ascending order with nulls first").getOrCreate()
# Create a DataFrame with some sample data
data = [("Winter Alice", 25), ("Cherry Bob", 30), ("Jacob Charlie", None), ("David Peter", 27), (None, 32)]
df = spark.createDataFrame(data, ["client_name", "age"])
# Sort the DataFrame by the "age" column in ascending order with nulls first
df = df.orderBy(asc_nulls_first("age"))
# Show the result
df.show()

This code creates a SparkSession and a DataFrame with two columns “name” and “age” containing some sample data. Then it uses the orderBy() method in conjunction with the asc_nulls_first() function to sort the DataFrame by the “age” column in ascending order, with the null values appearing first. The DataFrame is then displayed to show the sorted result.

OutputĀ 

+-------------+----+
|  client_name| age|
+-------------+----+
|Jacob Charlie|null|
| Winter Alice|  25|
|  David Peter|  27|
|   Cherry Bob|  30|
|         null|  32|
+-------------+----+

As you can see from the output, the null values are coming first in the sorted result, and the rest of the rows are sorted by the age column in ascending order.
asc_nulls_first() is useful when you want to sort a DataFrame and you want to handle null values differently than non-null values.

Spark important urls to refer

  1. Spark Examples
  2. PySpark Blogs
  3. Bigdata Blogs
  4. Spark Interview Questions
  5. Official Page
Author: user

Leave a Reply