Tag: Big Data

PySpark @ Freshers.in

How to create UDF in PySpark ? What are the different ways you can call PySpark UDF ( With example)

PySpark UDF PySpark UDF is used to extend the PySpark build in capabilities. UDF (User Defined Functions) are used to…

PySpark @ Freshers.in

How to convert MapType to multiple columns based on Key using PySpark ?

Use case : Converting Map to multiple columns. There can be raw data with Maptype with multiple key value pair….

Hive @ Freshers.in

How to insert from Non Partitioned table to Partitioned table in Hive?

You can insert data from Non Partitioned table to Partitioned table , in short , if you want to have…

PySpark @ Freshers.in

What is the difference between concat and concat_ws in Pyspark

concat vs concat_ws Syntax: pyspark.sql.functions.concat(*cols) pyspark.sql.functions.concat_ws(sep, *cols) concat : concat concatenates multiple input columns together into a single column. The…

PySpark @ Freshers.in

How to add a new column in PySpark using withColumn

withColumn Syntax: DataFrame.withColumn(column_name, col) withColumn is comonly used to add a column on an existing dataframe. withColumn returns a new…

PySpark @ Freshers.in

How to use filter or where condition in PySpark

filter / where The filter condition will filters rows based on multiple conditions. where() is an alias for filter(). In…

PySpark @ Freshers.in

Explain Complex datatype PySpark (ArrayType,MapType,StructType)

There are three complex datatype in PySpark, (1) ArrayType, (2) MapType (3) StructType. ArrayType ArrayType represents values comprising a sequence…

PySpark @ Freshers.in

How to create tables from Spark Dataframe and join the tables (createOrReplaceTempView)

createOrReplaceTempView There are many scenario in which you can do the transformation using sql instead of direct spark dataframe operations….