PySpark : How do I read a parquet file in Spark

user January 27, 2023 Leave a Comment

To read a Parquet file in Spark, you can use the spark.read.parquet() method, which returns a DataFrame. Here is an example of how you can use this method to read a Parquet file and display the contents:

from pyspark.sql import SparkSession

# Create a SparkSession
spark = SparkSession.builder.appName("ReadParquet").getOrCreate()

# Read the Parquet file
df = spark.read.parquet("path/to/file.parquet")

# Show the contents of the DataFrame
df.show()

# Stop the SparkSession
spark.stop()

You can also read a parquet file from a hdfs directory,

df = spark.read.format("parquet").load("hdfs://path/to/directory")

You can also read a parquet file with filtering using the where method

df = spark.read.parquet("freshers_path/to/freshers_in.parquet").where("column_name = 'value'")

In addition to reading a single Parquet file, you can also read a directory containing multiple Parquet files by specifying the directory path instead of a file path, like this:

df = spark.read.parquet("freshers_path/to/directory")

You can also use the schema option to specify the schema of the parquet file:

from pyspark.sql.types import StructType, StructField, IntegerType, StringType

schema = StructType([
    StructField("name", StringType()),
    StructField("age", IntegerType())
])

df = spark.read.schema(schema).parquet("freshers_path/to/file.parquet")

By providing the schema, Spark will skip the expensive process of inferring the schema from the parquet file, which can be useful when working with large datasets.

Spark important urls to refer

Post Views: 300

Author: user

PySpark : How do I read a parquet file in Spark

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget

How PARTITION BY Works in Snowflake, and SQL in general

Stash a specific file using Git

Prevent your computer from locking : Python to simulate mouse movements

AWS EC2 vs Azure Virtual Machines

Production and Industrial Engineering

Engineering Technical campus placement question and answers

JavaScript’s reduceRight() method to iterate over an array from right to left

Merging Multiple Images into a Single PDF File Using Python

Nanotechnology

Electronics and Instrumentation

Most Viewed Posts

Related Posts

Related Articles

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget