The WHEN clause is used in Spark SQL to conditionally execute expressions. It’s similar to a CASE statement in SQL and can be used to perform conditional logic in your Spark SQL queries.
Here’s the syntax of the WHEN clause:
CASE WHEN condition THEN value
The condition is a Boolean expression that evaluates to either true or false. If the condition is true, then the value is returned. If the condition is false, then the next WHEN clause (if any) is evaluated. If none of the WHEN clauses evaluate to true, then the ELSE clause (if any) is executed.
Example
Let our data be
+------+---+
| name |age|
+------+---+
|Sachin| 12|
|Barry | 25|
|Suzy | 72|
+------+---+
We can use the WHEN clause to add an age_group column to the table that classifies each person as a child, adult, or senior based on their age:
from pyspark.sql.functions import when
spark = SparkSession.builder.appName("WHEN Clause Freshers.in").getOrCreate()
data = [("Sachin", 12), ("Barry", 25), ("Suzy", 72)]
df = spark.createDataFrame(data, ["name", "age"])
df.withColumn("age_group",
when(df["age"] < 18, "Child")
.when(df["age"] < 65, "Adult")
.otherwise("Senior")
).show()
Output
+------+---+---------+
|name|age|age_group |
+------+---+---------+
|Sachin| 12| Child|
|Barry | 25| Adult|
|Suzy | 72| Senior|
+------+---+---------+
In this example, we’re using the withColumn function to add a new column named age_group to the df DataFrame. We’re using the when function to determine the age_group of each person based on their age. If a person is under 18, they’re classified as a child. If they’re between 18 and 65, they’re classified as an adult. If they’re 65 or older, they’re classified as a senior.
Note that we’re using the otherwise function to specify what value to return if none of the when conditions evaluate to true. In this example, if a person’s age is greater than or equal to 65, the otherwise function is executed and returns the value ‘Senior’.
How to do the same using Spark SQL after creating temporary table, from the above spark program
df.createOrReplaceTempView("people_temp")
The below is the SQL that you need to execute thru Spark
SELECT name, age,
CASE
WHEN age < 18 THEN 'Child'
WHEN age < 65 THEN 'Adult'
ELSE 'Senior' END AS age_group
FROM people_temp
Spark Code
spark.sql("SELECT name, age, CASE WHEN age < 18 THEN 'Child' WHEN age < 65 THEN 'Adult' ELSE 'Senior' END AS age_group FROM people_temp").show()
This will output:
+------+---+---------+
|name|age|age_group |
+------+---+---------+
|Sachin| 12| Child|
|Barry | 25| Adult|
|Suzy | 72| Senior|
+------+---+---------+
In this example, we’re using the CASE WHEN clause to determine the age_group of each person in the my_table table based on their age. If a person is under 18, they’re classified as a child. If they’re between 18 and 65, they’re classified as an adult. If they’re 65 or older, they’re classified as a senior.
Note that the WHEN clause is part of a larger expression that includes an ELSE clause. The ELSE clause specifies what value to return if none of the WHEN conditions evaluate to true. In this example, if a person’s age is greater than or equal to 65, the ELSE clause is executed and returns the value ‘Senior’.
You can also use multiple WHEN clauses in a single expression, as shown in the example above. The WHEN clauses are evaluated in order, so it’s important to order them appropriately to ensure that the correct value is returned.
CASE WHEN clause in Spark SQL is a useful tool for performing conditional logic in your queries. By using the WHEN clause with ELSE, you can specify what value to return based on the result of the condition. This can help you to write more complex queries that can handle a variety of scenarios.