Snowflake : Combining multiple arrays into a single array, removing duplicate elements in Snowflake

Snowflake

ARRAY_UNION_AGG

ARRAY_UNION_AGG is a built-in function in Snowflake that combines multiple arrays into a single array, removing duplicate elements. This function is particularly useful when you have multiple arrays of data that you want to merge into a single array, eliminating any duplicate values in the process. ARRAY_UNION_AGG function is a powerful tool for aggregating arrays, especially when dealing with data that needs to be merged and deduplicated. In this article, we explored the syntax and usage of ARRAY_UNION_AGG with a real-world example involving website statistics.

The syntax for ARRAY_UNION_AGG is as follows:

ARRAY_UNION_AGG([DISTINCT] array_expression)

DISTINCT (optional): This keyword ensures that only distinct elements are included in the resulting array. If omitted, ARRAY_UNION_AGG includes all elements, including duplicates.

array_expression: This is the array or array-like expression you want to merge.

Example:

Let’s illustrate the usage of ARRAY_UNION_AGG with a practical example involving website statistics. Assume you have a website with user data, and you want to combine the unique page views of users over a specific time frame.

First, let’s create a sample table in Snowflake to represent our data:

-- Create a sample table for website statistics
CREATE OR REPLACE TABLE website_stats (
    user_id INT,
    page_views ARRAY<STRING>
);

-- Insert sample data
INSERT INTO website_stats (user_id, page_views)
VALUES
    (1, ARRAY['home', 'about', 'contact']),
    (2, ARRAY['home', 'products', 'contact']),
    (3, ARRAY['home', 'about']),
    (4, ARRAY['home', 'products', 'contact']),
    (5, ARRAY['home']);

Let’s use ARRAY_UNION_AGG to calculate the total unique page views for our website:

-- Calculate total unique page views
SELECT ARRAY_UNION_AGG(page_views) AS total_unique_page_views
FROM website_stats;
-- Calculate total unique page views
SELECT ARRAY_UNION_AGG(page_views) AS total_unique_page_views
FROM website_stats;

ARRAY_UNION_AGG combines the arrays of page views for each user and returns a single array containing the unique page views. The result would be:

+------------------------+
| TOTAL_UNIQUE_PAGE_VIEWS|
+------------------------+
| ['home', 'about', 'contact', 'products']|
+------------------------+

ARRAY_UNION_AGG efficiently merged the page views from all users while removing duplicates.

Additional features and tips

Using DISTINCT

If you want to ensure that the resulting array contains only distinct values, you can use the DISTINCT keyword. For example:

SELECT ARRAY_UNION_AGG(DISTINCT page_views) AS distinct_page_views
FROM website_stats;

Handling Null Values

ARRAY_UNION_AGG handles null values gracefully. If any of the input arrays contain null values, those null values will be ignored in the aggregation process.

Snowflake important urls to refer

Author: user