Insert a dataframe into Snowflake using Python – Source code included

Snowflake

In this article we will see how to insert a DataFrame into Snowflake using Python. You can use the pandas library to manage your data and the snowflake-connector-python to connect to Snowflake and perform the insertion.

Here’s a step-by-step guide:

First, ensure you have the required libraries installed:

pip install pandas snowflake-connector-python

Sample code to insert dataframe into Snowflake

import pandas as pd
import snowflake.connector

# Create a simple dataframe
data = {'Name': ['Sachin', 'Rohan', 'Amit'], 'Age': [28, 24, 22]}
df = pd.DataFrame(data)

# Snowflake connection parameters
snowflake_params = {
    'user': 'freshers_dev_sf_usr',
    'password': '@3JqmRlex!',
    'account': 'fresers_training_acct',
    'warehouse': 'freshers_dwdb',
    'database': 'freshers_db',
    'schema': 'freshers_schema',
    'role': 'freshers_training_role_dev',
}

# Create a connection to Snowflake
conn = snowflake.connector.connect(**snowflake_params)

# Create a cursor object
cur = conn.cursor()

# Create a table in Snowflake (if it doesn't exist)
create_table_sql = """
CREATE TABLE IF NOT EXISTS sample_table(
    Name STRING,
    Age INTEGER
);
"""
cur.execute(create_table_sql)

# Insert the DataFrame into Snowflake
for _, row in df.iterrows():
    insert_sql = f"INSERT INTO sample_table(Name, Age) VALUES('{row['Name']}', {row['Age']})"
    cur.execute(insert_sql)

# Commit the transaction
conn.commit()

# Close the cursor and connection
cur.close()
conn.close()

The sample code assumes you have permissions to create tables and insert data into them. If not, adjust the script accordingly.

This is a basic example that shows how to insert each row one by one. If you have a very large DataFrame, consider using a more efficient approach such as Snowflake’s bulk insert capabilities.

To insert the rows of a DataFrame into a Snowflake table without specifying each column name individually, you can dynamically generate the SQL query string. Here’s how you can do it:

  1. Extract column names from the DataFrame.
  2. Format the SQL string using these column names and the corresponding row values.
sf_cursor = ctx.cursor()

for _, row in df.iterrows():
    # Get column names from the dataframe and join them with commas
    cols = ','.join(df.columns)

    # Get values from the current row and format them
    # We're using repr() to handle string values correctly
    vals = ','.join(repr(row[col]) for col in df.columns)
    
    query = f"INSERT INTO dqm.{table}({cols}) VALUES ({vals})"
    print(query)
    
    sf_cursor.execute(query)

sf_cursor.execute('commit')
sf_cursor.close()

repr() function is used to get a string representation of each value in a way that’s safely formatted for SQL (handling quotes and special characters).
The code assumes table has the name of your Snowflake table.
Ensure that the DataFrame columns and the Snowflake table columns match in order and datatype.

For casting all the dataframe columns into string

DataFrame will be cast to a string before being formatted for SQL insertion

If you want all the values to be in string format, irrespective of their actual data type in the DataFrame, you can simply cast each value to a string and then use repr() to ensure it’s safely formatted for SQL insertion.

vals = ','.join(repr(str(row[col])) for col in df.columns)

With this modification, every value from the DataFrame will be cast to a string before being formatted for SQL insertion.

Snowflake important urls to refer

Author: user

Leave a Reply