array_contains
You can find specific value/values in an array using spark sql function array_contains. array_contains(array, value) will return true if the array contains the value and returns null if the array is null. array_contains is an collection function.
Syntax : array_contains(array, value)
Example : SELECT array_contains(array(55,77,99), 99);
Result : true
Sample data:
name | color_code | grp_num |
[“Sam”,”Tom”,”Ben”] | RED | 1000 |
[“Ram”,”Ben”,”Tom”] | TAN | 2000 |
[“Ben”,”Tim”,”Sam”] | HUE | 3000 |
[“Rex”,”Tom”,”Sam”] | DUN | 4000 |
[“Abe”,”Geo”,”Oli”] | BAY | 5000 |
Example Code :
from pyspark.sql import SparkSession from pyspark.sql.functions import array_contains from pyspark.sql.functions import lit spark = SparkSession.builder.appName('www.freshers.in training').getOrCreate() df = spark.createDataFrame([ (["Sam","Tom","Ben"],"RED",1000,), (["Ram","Ben","Tom"],"TAN",2000,), (["Ben","Tim","Sam"],"HUE",3000,), (["Rex","Tom","Sam"],"DUN",4000,), (["Abe","Geo","Oli"],"BAY",5000,), ], ['name','color_code','grp_num']) df.show() #Array Containing the word "Sam" df2 = df.select(df.name,df.color_code,df.grp_num,array_contains(df.name, lit("Sam"))) df2.show() #Array Containing the word "Sam and Ben" df3 = df.select(df.name,df.color_code,df.grp_num,(array_contains(df.name, lit("Sam")) & array_contains(df.name, lit("Ben"))).alias("Array With Sam AND Ben")) df3.show() #Array Containing the word "Sam or Ben" df3 = df.select(df.name,df.color_code,df.grp_num,(array_contains(df.name, lit("Sam")) | array_contains(df.name, lit("Ben"))).alias("Array With Sam OR Ben")) df3.show()
Reference
Result with code