PySpark : Removing all occurrences of a specified element from an array column in a DataFrame

user January 24, 2023 Leave a Comment

pyspark.sql.functions.array_remove

Syntax

pyspark.sql.functions.array_remove(col, element)

pyspark.sql.functions.array_remove is a function that removes all occurrences of a specified element from an array column in a DataFrame. This is a collection function remove all elements that equal to element from the given array. For example, if you have a DataFrame with a column named “colors” that contains arrays of strings, you can use array_remove to remove the string “red” from all arrays in that column:

from pyspark.sql.functions import array_remove
df = spark.createDataFrame([(1, ["red", "blue", "green"]), (2, ["yellow", "red", "purple"])], ["id", "colors"])
df.show(20,False)

+---+---------------------+
|id |colors               |
+---+---------------------+
|1  |[red, blue, green]   |
|2  |[yellow, red, purple]|
+---+---------------------+

No we need to remove “red” from the column “colors”

df.select("id", array_remove("colors", "red").alias("new_colors")).show()

Result

+---+----------------+
| id|      new_colors|
+---+----------------+
|  1|   [blue, green]|
|  2|[yellow, purple]|
+---+----------------+

Spark important urls to refer

Post Views: 287

PySpark : Sort an array of elements in a DataFrame column
pyspark.sql.functions.array_sort The array_sort function is a PySpark function that allows you to sort an array…
PySpark : Find the maximum value in an array column of a DataFrame
pyspark.sql.functions.array_max The array_max function is a built-in function in Pyspark that finds the maximum value…
PySpark : Find the minimum value in an array column of a DataFrame
pyspark.sql.functions.array_min The array_min function is a built-in function in Pyspark that finds the minimum value…
Retrieving value of a specific element in an array or map column of a DataFrame.
pyspark.sql.functions.element_at In PySpark, the element_at function is used to retrieve the value of a specific…
How to create an array containing a column repeated count times - PySpark
For repeating array elements k times in PySpark we can use the below library. Library…
PySpark : Transforming a column of arrays or maps into multiple columns, with one row for each element in the array or map [posexplode]
pyspark.sql.functions.posexplode The posexplode function in PySpark is part of the pyspark.sql.functions module and is used…
PySpark:Getting approximate number of unique elements in a column of a DataFrame
pyspark.sql.functions.approx_count_distinct Pyspark's approx_count_distinct function is a way to approximate the number of unique elements in…
PySpark : Finding the position of a given value in an array column.
pyspark.sql.functions.array_position The array_position function is used to find the position of a given value in…
PySpark : Combine the elements of two or more arrays in a DataFrame column
pyspark.sql.functions.array_union The array_union function is a PySpark function that allows you to combine the elements…
PySpark : How to Compute the cumulative distribution of a column in a DataFrame
pyspark.sql.functions.cume_dist The cumulative distribution is a method used in probability and statistics to determine the…

Author: user

PySpark : Removing all occurrences of a specified element from an array column in a DataFrame

pyspark.sql.functions.array_remove

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget

Electronics and Instrumentation

Chemical Engineering

Civil Engineering

Backpressure in AWS Kinesis Streams: Optimizing Data Processing

Troubleshooting Data Ingestion and Processing Issues with AWS Kinesis Streams

Impact of Shard Count Modification on AWS Kinesis Streams

How to map values of a Series according to an input correspondence:SSeries.map()

Understanding Series.transform(func[, axis])

Series.aggregate(func) : Pandas API on Spark

Series.agg(func) : Pandas API on Spark

Most Viewed Posts

pyspark.sql.functions.array_remove

Related Posts

Related Articles

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget