pyspark.sql.functions.array_join
PySpark’s array_join function is used to concatenate elements of an array into a single string, with the elements separated by a specified delimiter. The function takes two arguments: the array to be concatenated and the delimiter to use.
Syntax
array_join(array, delimiter [, nullReplacement])
Here is an example of how to use the array_join function in PySpark:
This will output:
In this example, array_join function is used to concatenate the elements of the “fruits” column, which is an array of strings, into a single string. The delimiter used is a comma. The result of the function is stored in a new column named “fruits_list”.
You can also use the array_join function on a specific columns, like this:
This will give you the same output as previous example, but in this case it’s used as a function with column name as argument.
It’s important to note that the array_join function only works on columns of type array and the resulting column will always be of type string. Also, the delimiter passed to the function should be a string.
Spark important urls to refer