Snowflake’s Snowpipe to ingest streaming data from an AWS S3 bucket

Snowflake

Snowpipe to ingest streaming data

Setting up Snowflake’s Snowpipe to ingest streaming data from an AWS S3 bucket into a Snowflake table involves several steps. We will  provide a detailed guide below:

Prerequisites:

  1. AWS Account: You need access to an AWS account to create an S3 bucket and an IAM role with permissions to access the bucket.
  2. Snowflake Account: You need a Snowflake account with sufficient permissions to create and manage objects, including databases, stages, pipes, and tables.
  3. Snowflake Client: Access to Snowflake’s UI, SnowSQL, or a Snowflake API client.

Steps:

Create an S3 Bucket: Go to the AWS Management Console, navigate to S3, and create a new bucket where your JSON files will be landing.

Create an IAM Role: Create an IAM role in AWS IAM Console with permissions to read objects from the S3 bucket you created. Attach the policy AmazonS3ReadOnlyAccess to this role.

Set Up a Stage in Snowflake: A stage is a location in Snowflake that points to your S3 bucket. You’ll need to create a stage with the IAM role you created in the previous step.

CREATE STAGE IF NOT EXISTS my_s3_stage
URL = 's3://your-bucket-name/'
CREDENTIALS = (
  AWS_ROLE_ARN='arn:aws:iam::123456789012:role/your-iam-role'
);

Create a Table to Ingest Data Into: Create a table in Snowflake to hold your streaming data.

CREATE TABLE IF NOT EXISTS freshers_stream_tbl (
  id INT,
  data VARIANT,
  ingestion_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP()
);

Create a Pipe: A pipe continuously ingests data from the stage into the Snowflake table.

CREATE PIPE IF NOT EXISTS my_stream_pipe
AUTO_INGEST = TRUE
COMMENT = 'Pipe for ingesting data from S3 into freshers_stream_tbl'
AS
COPY INTO freshers_stream_tbl
FROM (SELECT $1:id, $1:data FROM @my_s3_stage)
FILE_FORMAT = (TYPE = 'JSON');

Grant Necessary Permissions: Grant necessary permissions to your user or role to execute the pipe.

GRANT USAGE ON WAREHOUSE <your_warehouse> TO ROLE <your_role>;
GRANT USAGE ON DATABASE <your_database> TO ROLE <your_role>;
GRANT USAGE ON SCHEMA <your_schema> TO ROLE <your_role>;
GRANT EXECUTE ON PIPE my_stream_pipe TO ROLE <your_role>;

Start the Snowpipe: Start the pipe to begin ingesting data.

ALTER PIPE my_stream_pipe RESUME;

Ensure the IAM role attached to the stage has the necessary permissions to access the S3 bucket.

Replace placeholders like <your_bucket_name>, <your_iam_role>, <your_warehouse>, <your_database>, <your_schema>, <your_role> with your actual values.

Adjust the table schema and file format based on your JSON structure and requirements.

Snowpipe automatically monitors the stage for new files and ingests them as they arrive.

Snowflake important urls to refer

Snowpipe official doc

Author: user