Pyspark code to read and write data from and to google Bigquery.

user January 12, 2023 Leave a Comment

Here is some sample PySpark code that demonstrates how to read and write data from and to Google BigQuery:

from pyspark.sql import SparkSession

# Create a SparkSession
spark = SparkSession.builder \
.appName("BigQuery to/from Spark example") \
.getOrCreate()

# Read data from a BigQuery table
df = spark.read.format("bigquery") \
.option("table", "bigquery-public-data.hacker_news.comments") \
.load()

# Show the dataframe
df.show()

# Write data to a BigQuery table
df.write.format("bigquery") \
.option("table", "your_dataset.your_table") \
.mode("append") \
.save()

In this example, the spark.read.format(“bigquery”) is used to read data from a BigQuery table, and .option(“table”, “bigquery-public-data.hacker_news.comments”) is used to specify the table name.

Similarly, the df.write.format(“bigquery”) is used to write data to a BigQuery table, and .option(“table”, “your_dataset.your_table”) is used to specify the table name. The .mode(“append”) option is used to append the data to the table, if the table already exists, instead of overwriting it.

You will also need to set up the authentication properly. The best practice is to use the default service authentication.
You can find more details and options here : https://cloud.google.com/bigquery/docs/pyspark-bigquery-local-mode

Also, you will need to add the following dependencies in your pyspark job:

--packages com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.17.1

Get more post on Python, PySpark ,gcp , bigquery

Post Views: 137

Author: user

Pyspark code to read and write data from and to google Bigquery.

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget

AWS EC2 vs Azure Virtual Machines

Production and Industrial Engineering

Engineering Technical campus placement question and answers

JavaScript’s reduceRight() method to iterate over an array from right to left

Merging Multiple Images into a Single PDF File Using Python

Nanotechnology

Electronics and Instrumentation

Chemical Engineering

Civil Engineering

Backpressure in AWS Kinesis Streams: Optimizing Data Processing

Most Viewed Posts

Related Posts

Related Articles

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget