PySpark : How to create a map from a column of structs : map_from_entries

user January 25, 2023 Leave a Comment

pyspark.sql.functions.map_from_entries

map_from_entries(col) is a function in PySpark that creates a map from a column of structs, where the structs have two fields: key and value. This is a collection function which returns a map created from the given array of entries

from pyspark.sql.functions import map_from_entries, struct
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
df2 = spark.createDataFrame([
(1, "John", 25000, [("name","John"), ("age",25)]), 
(2, "Mike", 30000, [("name","Mike"),("age",30)]), 
(3, "Sophia", 35000, [("name","Sophia"), ("age",35)])
], 
["id", "name", "salary", "person_map"])
df2 = df2.select("id","name", "salary", map_from_entries("person_map").alias("map_col"))
df2.show(20,False)

In this example, we first import the necessary functions and create a SparkSession. We then create a DataFrame with a column called “person_map” which contains a list of structs each with two fields “key” and “value”.

We then use the map_from_entries() function to create a new column called “map_col” from the struct column, using the alias() function to rename the new column.

The “map_col” is used to select the fields of the structs to be used as key and value for the map.

The final DataFrame has two columns: “id” and “map_col”, where “map_col” contains a map created from the structs in “struct_col”.

For reference , the schema will be

root
 |-- id: long (nullable = true)
 |-- name: string (nullable = true)
 |-- salary: long (nullable = true)
 |-- map_col: map (nullable = true)
 |    |-- key: stringap_col")["name"]).show()
 |    |-- value: string (valueContainsNull = true)
|map_col[name]|

Result

+---+------+------+---------------------------+
|id |name  |salary|map_col                    |
+---+------+------+---------------------------+
|1  |John  |25000 |[name -> John, age -> 25]  |
|2  |Mike  |30000 |[name -> Mike, age -> 30]  |
|3  |Sophia|35000 |[name -> Sophia, age -> 35]|
+---+------+------+---------------------------+

In PySpark, creating a map column from entries allows you to convert existing columns in a DataFrame into a map, where each row in the DataFrame becomes a key-value pair in the map. This can be useful for organizing and structuring data in a more readable and efficient way. Additionally, it can also be used to perform operations such as filtering, aggregation and joining on the map column.

Spark important urls to refer

Post Views: 268

Author: user

PySpark : How to create a map from a column of structs : map_from_entries

pyspark.sql.functions.map_from_entries

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget

How PARTITION BY Works in Snowflake, and SQL in general

Stash a specific file using Git

Prevent your computer from locking : Python to simulate mouse movements

AWS EC2 vs Azure Virtual Machines

Production and Industrial Engineering

Engineering Technical campus placement question and answers

JavaScript’s reduceRight() method to iterate over an array from right to left

Merging Multiple Images into a Single PDF File Using Python

Nanotechnology

Electronics and Instrumentation

Most Viewed Posts

pyspark.sql.functions.map_from_entries

Related Posts

Related Articles

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget