PySpark : Removing all occurrences of a specified element from an array column in a DataFrame

user January 24, 2023 Leave a Comment

pyspark.sql.functions.array_remove

Syntax

pyspark.sql.functions.array_remove(col, element)

pyspark.sql.functions.array_remove is a function that removes all occurrences of a specified element from an array column in a DataFrame. This is a collection function remove all elements that equal to element from the given array. For example, if you have a DataFrame with a column named “colors” that contains arrays of strings, you can use array_remove to remove the string “red” from all arrays in that column:

from pyspark.sql.functions import array_remove
df = spark.createDataFrame([(1, ["red", "blue", "green"]), (2, ["yellow", "red", "purple"])], ["id", "colors"])
df.show(20,False)

+---+---------------------+
|id |colors               |
+---+---------------------+
|1  |[red, blue, green]   |
|2  |[yellow, red, purple]|
+---+---------------------+

No we need to remove “red” from the column “colors”

df.select("id", array_remove("colors", "red").alias("new_colors")).show()

Result

+---+----------------+
| id|      new_colors|
+---+----------------+
|  1|   [blue, green]|
|  2|[yellow, purple]|
+---+----------------+

Spark important urls to refer

Post Views: 291

PySpark : Sort an array of elements in a DataFrame column
pyspark.sql.functions.array_sort The array_sort function is a PySpark function that allows you to sort an array…
PySpark : Find the maximum value in an array column of a DataFrame
pyspark.sql.functions.array_max The array_max function is a built-in function in Pyspark that finds the maximum value…
PySpark : Find the minimum value in an array column of a DataFrame
pyspark.sql.functions.array_min The array_min function is a built-in function in Pyspark that finds the minimum value…
Retrieving value of a specific element in an array or map column of a DataFrame.
pyspark.sql.functions.element_at In PySpark, the element_at function is used to retrieve the value of a specific…
How to create an array containing a column repeated count times - PySpark
For repeating array elements k times in PySpark we can use the below library. Library…
PySpark : Transforming a column of arrays or maps into multiple columns, with one row for each element in the array or map [posexplode]
pyspark.sql.functions.posexplode The posexplode function in PySpark is part of the pyspark.sql.functions module and is used…
PySpark:Getting approximate number of unique elements in a column of a DataFrame
pyspark.sql.functions.approx_count_distinct Pyspark's approx_count_distinct function is a way to approximate the number of unique elements in…
PySpark : Finding the position of a given value in an array column.
pyspark.sql.functions.array_position The array_position function is used to find the position of a given value in…
PySpark : Combine the elements of two or more arrays in a DataFrame column
pyspark.sql.functions.array_union The array_union function is a PySpark function that allows you to combine the elements…
PySpark : How to Compute the cumulative distribution of a column in a DataFrame
pyspark.sql.functions.cume_dist The cumulative distribution is a method used in probability and statistics to determine the…

Author: user

PySpark : Removing all occurrences of a specified element from an array column in a DataFrame

pyspark.sql.functions.array_remove

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget

AWS EC2 vs Azure Virtual Machines

Production and Industrial Engineering

Engineering Technical campus placement question and answers

JavaScript’s reduceRight() method to iterate over an array from right to left

Merging Multiple Images into a Single PDF File Using Python

Nanotechnology

Electronics and Instrumentation

Chemical Engineering

Civil Engineering

Backpressure in AWS Kinesis Streams: Optimizing Data Processing

Most Viewed Posts

pyspark.sql.functions.array_remove

Related Posts

Related Articles

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget