PySpark – How to read a text file as RDD using Spark3 and Display the result in Windows 10

Here we will see how to read a sample text file as RDD using Spark

Environment and version which we use here are

Spark  : 3.0.3
Python : version 3.8.10
Java   : 11.0.13 2021-10-19 LTS
My OS  : Windows 10 Pro
Use case : Read data from local and Print in the console

My Local data set  : D:\\Learning\\PySpark\\SourceCode\\sample_data.txt

from pyspark import SparkContext
sc = SparkContext.getOrCreate()
textFile = sc.textFile("D:\\Freshers_in\\PySpark\\SourceCode\\sample_data.txt")
print(textFile.collect())

getOrCreate : Gets an existing SparkSession or, if there is no existing one, creates a new one based on the options set in this builder. Here we are not giving any options

PySpark Collect() :  Collect() is the function, operation for RDD or Dataframe that is used to retrieve the data from the Dataframe. It is used useful in retrieving all the elements of the row from each partition in an RDD and brings that over the driver node/program.

Author: user

Leave a Reply