PySpark’s map_values Function : Extract the values from a map column.

PySpark @

In PySpark’s realm, the map_values function is employed to extract the values from a map column. Drawing a parallel to Python, it’s akin to invoking .values() on a dictionary. However, map_values operates at a DataFrame level, targeting individual columns.

Use map_values for 

Value Analysis: To understand the distribution or characteristics of values in a map column.

Data Transformation: Before reshaping values into distinct columns or rows.

Filtering Data: To curate rows based on the content or absence of specific values in a map column.

Advantages of map_values:

Performance: Given Spark’s distributed nature, map_values can process mammoth datasets swiftly.

Intuitive: Its use brings clarity and precision to PySpark code, enhancing readability.

Flexibility: Seamless integration with other DataFrame operations allows for comprehensive data processing.

from pyspark.sql import SparkSession
from pyspark.sql.functions import map_values
# Setting up Spark Session
spark = SparkSession.builder.appName("map_values_demo Learning @").getOrCreate()
# Crafting a DataFrame with a map column
data = [(1, {"Sachin": 10, "India": 20}),
        (2, {"Ramesh": 30, "USA": 40}),
        (3, {"Raju": 50, "Ireland": 60})]
df = spark.createDataFrame(data, ["id", "country"]),False)
# Deploying map_values to extract the values from the map column
df_values ="id", map_values(df["country"]).alias("age")),False)


|id |country                    |
|1  |{India -> 20, Sachin -> 10}|
|2  |{USA -> 40, Ramesh -> 30}  |
|3  |{Raju -> 50, Ireland -> 60}|

|id |age     |
|1  |[20, 10]|
|2  |[40, 30]|
|3  |[50, 60]|

Spark important urls to refer

  1. Spark Examples
  2. PySpark Blogs
  3. Bigdata Blogs
  4. Spark Interview Questions
  5. Official Page
Author: user