Step-by-step guide on executing PySpark code from Snowflake Snowpark to read a DataFrame:


Here are the steps on how to execute PySpark code from Snowflake Snowpark to read a DataFrame:

1. Open Snowsight and create a new Python worksheet.
2. Import the necessary libraries, including snowflake.snowpark and pyspark.sql.
3. Create a SnowflakeSession object.
4. Use the table method to create a DataFrame from a table in Snowflake.
5. Print the DataFrame to see the results.

Step 1: Setup

Before executing any PySpark code, ensure you have the following:

  • Snowflake account with required permissions.
  • Snowpark library and its dependencies.
  • Configuration setup to access Snowflake (like connection strings, authentication credentials).

Step 2: Establish a Connection

Start by setting up a connection to your Snowflake instance.

from snowflake.snowpark import Session
session = Session.builder \
    .appName("SnowparkApp") \
    .master("local") \
    .snowflakeConnectionString("jdbc:snowflake://>") \

Step 3: Reading Data

Now, read the data from the freshers_in_view table in Snowflake into a DataFrame.

df = \
    .format("snowflake") \
    .option("dbtable", "freshers_in_view") \

Step 4: Perform Operations (if needed)

You can perform any operations you wish on the DataFrame using PySpark’s API. For example, let’s say we want to count the number of rows in the DataFrame.

count = df.count()
print(f"The number of rows in freshers_in_view is: {count}")

Step 5: (Optional) Writing Data Back to Snowflake

If you’ve transformed the data and want to store the result back in Snowflake, you can do so.

df.write \
    .format("snowflake") \
    .option("dbtable", "processed_freshers_in_view") \
    .mode("overwrite") \

This will write the DataFrame back to a Snowflake table named processed_freshers_in_view.

Step 6: Closing the Session

Finally, close the session to release resources.

Author: user

Leave a Reply