Explain dense_rank. How to use dense_rank function in PySpark ?

user January 16, 2023 Leave a Comment

In PySpark, the dense_rank function is used to assign a rank to each row within a result set, based on the values of one or more columns. It is a window function that assigns a unique rank to each unique value within a result set, with no gaps in the ranking values.

The dense_rank function is a window function that assigns a rank to each row within a result set, based on the values in one or more columns. The rank assigned is unique and dense, meaning that there are no gaps in the sequence of rank values. For example, if there are three rows with the same value in the column used for ranking, they will be assigned the same rank, and the next row will be assigned the rank that is three greater than the previous rank. The dense_rank function is typically used in conjunction with an ORDER BY clause to sort the result set by the column(s) used for ranking.

Here is an example of how to use the dense_rank function in PySpark:

from pyspark.sql import SparkSession
from pyspark.sql import Window
from pyspark.sql.functions import dense_rank, col

spark = SparkSession.builder.appName("dense_rank").getOrCreate()
data = [("Peter John", 25),("Wisdon Mike", 30),("Sarah Johns", 25),("Bob Beliver", 22),("Lucas Marget", 30)]

df = spark.createDataFrame(data, ["name", "age"])
df2 = df.select("name", "age", dense_rank().\
over(Window.partitionBy("age").\
orderBy("name")).\
alias("rank"))
df2.show()

In this example, the dense_rank function is used to assign a unique rank to each unique value of the “age” column, based on the order of the “name” column. The output will be

+------------+---+----+
|        name|age|rank|
+------------+---+----+
| Bob Beliver| 22|   1|
|  Peter John| 25|   1|
| Sarah Johns| 25|   2|
|Lucas Marget| 30|   1|
| Wisdon Mike| 30|   2|
+------------+---+----+

This means that Peter John and Sarah Johns have the same age with Peter John having 1st rank and Sarah Johns having 2nd rank.

Spark URLS

Post Views: 35

Author: user

Explain dense_rank. How to use dense_rank function in PySpark ?

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget

AWS EC2 vs Azure Virtual Machines

Production and Industrial Engineering

Engineering Technical campus placement question and answers

JavaScript’s reduceRight() method to iterate over an array from right to left

Merging Multiple Images into a Single PDF File Using Python

Nanotechnology

Electronics and Instrumentation

Chemical Engineering

Civil Engineering

Backpressure in AWS Kinesis Streams: Optimizing Data Processing

Most Viewed Posts

Related Posts

Related Articles

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget