Pyspark code to read and write data from and to google Bigquery.

Google Big Query @ Freshers.in

Here is some sample PySpark code that demonstrates how to read and write data from and to Google BigQuery:

from pyspark.sql import SparkSession

# Create a SparkSession
spark = SparkSession.builder \
.appName("BigQuery to/from Spark example") \
.getOrCreate()

# Read data from a BigQuery table
df = spark.read.format("bigquery") \
.option("table", "bigquery-public-data.hacker_news.comments") \
.load()

# Show the dataframe
df.show()

# Write data to a BigQuery table
df.write.format("bigquery") \
.option("table", "your_dataset.your_table") \
.mode("append") \
.save()

In this example, the spark.read.format(“bigquery”) is used to read data from a BigQuery table, and .option(“table”, “bigquery-public-data.hacker_news.comments”) is used to specify the table name.

Similarly, the df.write.format(“bigquery”) is used to write data to a BigQuery table, and .option(“table”, “your_dataset.your_table”) is used to specify the table name. The .mode(“append”) option is used to append the data to the table, if the table already exists, instead of overwriting it.

You will also need to set up the authentication properly. The best practice is to use the default service authentication.
You can find more details and options here : https://cloud.google.com/bigquery/docs/pyspark-bigquery-local-mode

Also, you will need to add the following dependencies in your pyspark job:

--packages com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.17.1

Get more post on Python, PySpark

Author: user

Leave a Reply