Recent Posts

PySpark @ Freshers.in

PySpark : What happens once you do a spark submit command ?

When you submit a Spark application using the spark-submit command, a series of steps occur to start and execute the…

PySpark @ Freshers.in

PySpark : What is predicate pushdown in Spark and how to enable it ?

Predicate pushdown is a technique used in Spark to filter data as early as possible in the query execution process,…

PySpark @ Freshers.in

PySpark-What is map side join and How to perform map side join in Pyspark

Map-side join is a method of joining two datasets in PySpark where one dataset is broadcast to all executors, and…

PySpark @ Freshers.in

Installing Apache Spark standalone on Linux

Installing Spark on a Linux machine can be done in a few steps. The following is a detailed guide on…

SQL @ Freshers.in

SQL : How to execute large dynamic query in SQL

There are a few ways to execute large dynamic queries in SQL, but one common method is to use a…

PySpark @ Freshers.in

How to use if condition in spark SQL , explanation with example

In PySpark, you can use the if statement within a SQL query to conditionally return a value based on a…

PySpark @ Freshers.in

What is GC (Garbage Collection) time in Spark UI ?

In the Spark UI, GC (Garbage Collection) time refers to the amount of time spent by the JVM (Java Virtual…

Apache Parquet @ Freshers.in

Advantages of using Parquet file

Parquet is a columnar storage format that is designed to work with big data processing frameworks like Apache Hadoop and…

PySpark @ Freshers.in

PySpark : How do I read a parquet file in Spark

To read a Parquet file in Spark, you can use the spark.read.parquet() method, which returns a DataFrame. Here is an…