Category: spark

Spark User full article

PySpark-How to create and RDD from a List and from AWS S3

In this article you will learn , what an RDD is ?  How can we create an RDD from a…

How to run dataframe as Spark SQL – PySpark

If you have a situation that you can easily get the result using SQL/ SQL already existing , then you…

How to get all combination of columns using PySpark? What is Cube in Spark ?

A cube is a multi-dimensional generalization of a two- or three-dimensional spreadsheet. Cube is a shorthand for multidimensional dataset, given…

How to remove csv header using Spark (PySpark)

A common use case when dealing with CSV file is to remove the header from the source to do data…