How to run dataframe as Spark SQL – PySpark

user June 14, 2021 Leave a Comment on How to run dataframe as Spark SQL – PySpark

If you have a situation that you can easily get the result using SQL/ SQL already existing , then you can convert the dataframe to a table and do a query on top of it. Converting dataframe to a table as bellow

from pyspark.sql import SparkSession
from pyspark import SparkContext
sc = SparkContext()
spark=SparkSession.builder.getOrCreate()
myDF = spark.createDataFrame([("Tom", 400,50, "Teacher","IND"),("Jack", 420,60, "Finance","USA"),("Brack", 500,10, "Teacher","IND"),("Jim", 700,80, "Finance","JAPAN")],("name", "salary","cnt", "department","country"))
myDF.registerTempTable("sql_df")
tot_salary = spark.sql("select department,sum(salary) as total_salary from sql_df group by department ")
tot_salary.show(30,False)

+----------+------------+
|department|total_salary|
+----------+------------+
|Teacher |900 |
|Finance |1120 |
+----------+------------+

You can also try the bellow to get all the column from data frame

tot_salary.selectExpr('*').show()
tot_salary.select('*').show()

Post Views: 46

PySpark : Inserting row in Apache Spark Dataframe.
In PySpark, you can insert a row into a DataFrame by first converting the DataFrame…
PySpark : Connecting and updating postgres table in spark SQL
Apache Spark is an open-source, distributed computing system that can process large amounts of data…
PySpark : Explain in detail whether Apache Spark SQL lazy or not ?
Yes, Apache Spark SQL is lazy. In Spark, the concept of "laziness" refers to the…
In pyspark what is the difference between Spark spark.table() and spark.read.table()
In PySpark, spark.table() is used to read a table from the Spark catalog, whereas spark.read.table()…
How can you convert PySpark Dataframe to JSON ?
pyspark.sql.DataFrame.toJSON There may be some situation that you need to send your dataframe to a…
PySpark: How to add months to a date column in Spark DataFrame (add_months)
I have a use case where I want to add months to a date column…
PySpark : LongType and ShortType data types in PySpark
pyspark.sql.types.LongType pyspark.sql.types.ShortType In this article, we will explore PySpark's LongType and ShortType data types, their…
PySpark : How to decode in PySpark ?
pyspark.sql.functions.decode The pyspark.sql.functions.decode Function in PySpark PySpark is a popular library for processing big data…
PySpark : Explanation of MapType in PySpark with Example
MapType in PySpark is a data type used to represent a value that maps keys…
PySpark : How decode works in PySpark ?
One of the important concepts in PySpark is data encoding and decoding, which refers to…