How to run dataframe as Spark SQL – PySpark

PySpark @

If you have a situation that you can easily get the result using SQL/ SQL already existing , then you can convert the dataframe to a table and do a query on top of it. Converting dataframe to a table as bellow

from pyspark.sql import SparkSession
from pyspark import SparkContext
sc = SparkContext()
myDF = spark.createDataFrame([("Tom", 400,50, "Teacher","IND"),("Jack", 420,60, "Finance","USA"),("Brack", 500,10, "Teacher","IND"),("Jim", 700,80, "Finance","JAPAN")],("name", "salary","cnt", "department","country"))
tot_salary = spark.sql("select department,sum(salary) as total_salary from sql_df group by department "),False)

|Teacher |900 |
|Finance |1120 |

You can also try the bellow to get all the column from data frame

Author: user

Leave a Reply