How to find array contains a given value or values using PySpark ( PySpark search in array)

PySpark @ Freshers.in

array_contains

You can find specific value/values in an array using spark sql function array_contains. array_contains(array, value) will return true if the array contains the value and returns null if the array is null. array_contains is an collection function.

Syntax : array_contains(array, value)
Example : SELECT array_contains(array(55,77,99), 99);
Result : true

Sample data:

name color_code grp_num
[“Sam”,”Tom”,”Ben”] RED 1000
[“Ram”,”Ben”,”Tom”] TAN 2000
[“Ben”,”Tim”,”Sam”] HUE 3000
[“Rex”,”Tom”,”Sam”] DUN 4000
[“Abe”,”Geo”,”Oli”] BAY 5000

Example Code : 

from pyspark.sql import SparkSession
from pyspark.sql.functions import array_contains
from pyspark.sql.functions import lit
spark = SparkSession.builder.appName('www.freshers.in training').getOrCreate()
df = spark.createDataFrame([
(["Sam","Tom","Ben"],"RED",1000,), 
(["Ram","Ben","Tom"],"TAN",2000,),
(["Ben","Tim","Sam"],"HUE",3000,),
(["Rex","Tom","Sam"],"DUN",4000,),
(["Abe","Geo","Oli"],"BAY",5000,),
], 
['name','color_code','grp_num'])
df.show()
#Array Containing the word "Sam"
df2 = df.select(df.name,df.color_code,df.grp_num,array_contains(df.name, lit("Sam")))
df2.show()
#Array Containing the word "Sam and Ben"
df3 = df.select(df.name,df.color_code,df.grp_num,(array_contains(df.name, lit("Sam")) & array_contains(df.name, lit("Ben"))).alias("Array With Sam AND Ben"))
df3.show()
#Array Containing the word "Sam or Ben"
df3 = df.select(df.name,df.color_code,df.grp_num,(array_contains(df.name, lit("Sam")) | array_contains(df.name, lit("Ben"))).alias("Array With Sam OR Ben"))
df3.show()

Reference

  1. Spark Examples
  2. PySpark Blogs
  3. Bigdata Blogs
  4. Spark Interview Questions

Result with code

PySpark Array Contains @ Freshers.in

 

Spark official page

Author: user

Leave a Reply