PySpark : Transforming a column of arrays or maps into multiple rows : Converting rows into columns

user January 17, 2023 Leave a Comment

pyspark.sql.functions.explode_outer

In PySpark, the explode() function is used to transform a column of arrays or maps into multiple rows, with one row for each element in the array or map. The explode_outer() is similar to explode() but it will return null for the non-array column.

Every element of the specified array or map receives a new row in the response. In contrast to explode, null is produced if the array or map is empty or null. Unless otherwise provided, uses key and value for elements in the map and the default column name col for array elements.

Here is an example of using explode_outer() to transform a DataFrame with a column of arrays:

from pyspark.sql.functions import explode_outer
# Create a DataFrame with a column of arrays
data = [
    (1, ["BMW", "Audi", "Merc",]),
    (2, ["Maruti", "Toyota"]),
    (3, None),
    (4, ["Volkswagen"])
]
df = spark.createDataFrame(data, ["id", "cars"])
# Use explode_outer to transform the column of arrays
exploded_df = df.select("id", explode_outer("cars"))
# Show the resulting DataFrame
exploded_df.show()

This will output:

+---+----------+
| id|       col|
+---+----------+
|  1|       BMW|
|  1|      Audi|
|  1|      Merc|
|  2|    Maruti|
|  2|    Toyota|
|  3|      null|
|  4|Volkswagen|
+---+----------+

Here the column “cars” is exploded and each element is a new row. Also the rows with null values are also retained.

Spark important urls

Post Views: 108

Author: user

PySpark : Transforming a column of arrays or maps into multiple rows : Converting rows into columns

pyspark.sql.functions.explode_outer

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget

AWS EC2 vs Azure Virtual Machines

Production and Industrial Engineering

Engineering Technical campus placement question and answers

JavaScript’s reduceRight() method to iterate over an array from right to left

Merging Multiple Images into a Single PDF File Using Python

Nanotechnology

Electronics and Instrumentation

Chemical Engineering

Civil Engineering

Backpressure in AWS Kinesis Streams: Optimizing Data Processing

Most Viewed Posts

pyspark.sql.functions.explode_outer

Related Posts

Related Articles

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget