PySpark : Concatenatinating elements of an array into a single string.

user January 24, 2023 Leave a Comment

pyspark.sql.functions.array_join

PySpark’s array_join function is used to concatenate elements of an array into a single string, with the elements separated by a specified delimiter. The function takes two arguments: the array to be concatenated and the delimiter to use.

Syntax
array_join(array, delimiter [, nullReplacement])

Here is an example of how to use the array_join function in PySpark:

from pyspark.sql.functions import array_join

# Create a sample dataframe
data = [("John", ["apple", "banana", "orange"]), ("Jane", ["grapes", "pineapple", "kiwi"])]
df = spark.createDataFrame(data, ["name", "fruits"])

# Use the array_join function to concatenate the elements of the "fruits" column into a single string
df = df.withColumn("fruits_list", array_join("fruits", ","))

# Show the result
df.show(20, False)

This will output:

+----+-------------------------+---------------------+
|name|fruits                   |fruits_list          |
+----+-------------------------+---------------------+
|John|[apple, banana, orange]  |apple,banana,orange  |
|Jane|[grapes, pineapple, kiwi]|grapes,pineapple,kiwi|
+----+-------------------------+---------------------+

In this example, array_join function is used to concatenate the elements of the “fruits” column, which is an array of strings, into a single string. The delimiter used is a comma. The result of the function is stored in a new column named “fruits_list”.

You can also use the array_join function on a specific columns, like this:

df.selectExpr("name", "array_join(fruits, ',') as fruits_list").show(20, False)

+----+---------------------+
|name|fruits_list          |
+----+---------------------+
|John|apple,banana,orange  |
|Jane|grapes,pineapple,kiwi|
+----+---------------------+

This will give you the same output as previous example, but in this case it’s used as a function with column name as argument.

It’s important to note that the array_join function only works on columns of type array and the resulting column will always be of type string. Also, the delimiter passed to the function should be a string.

Spark important urls to refer

Post Views: 445

Author: user

PySpark : Concatenatinating elements of an array into a single string.

pyspark.sql.functions.array_join

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget

How PARTITION BY Works in Snowflake, and SQL in general

Stash a specific file using Git

Prevent your computer from locking : Python to simulate mouse movements

AWS EC2 vs Azure Virtual Machines

Production and Industrial Engineering

Engineering Technical campus placement question and answers

JavaScript’s reduceRight() method to iterate over an array from right to left

Merging Multiple Images into a Single PDF File Using Python

Nanotechnology

Electronics and Instrumentation

Most Viewed Posts

pyspark.sql.functions.array_join

Related Posts

Related Articles

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget