Array handling in Google BigQuery: Storing, Querying, and Unnesting

Google Big Query @ Freshers.in

In Google BigQuery, an array is a homogenous collection of elements, allowing for more complex data storage and operations. An array can contain elements of integer, string, or even other arrays and structures. Handling arrays properly can result in more efficient queries and enable you to work with multi-valued attributes seamlessly. This article dives into storing, querying, and unnesting arrays in Google BigQuery, focusing on tracking the most visited pages on a website. Arrays in Google BigQuery offer a powerful way to handle multi-valued attributes and complex data types. The platform provides various ways to query and manipulate these arrays, including the essential UNNEST function, making your data operations more flexible and efficient.

How to Store Data in Arrays

Storing data in arrays can make your database schema cleaner and can help group related values together.

CREATE TABLE freshers_in_most_visited_pages (
    id INT64,
    page_name STRING,
    visitor_ids ARRAY<INT64>
);
INSERT INTO freshers_in_most_visited_pages (id, page_name, visitor_ids)
VALUES
(1, 'Home', [1, 2, 3]),
(2, 'About', [2, 4, 5]),
(3, 'Contact', [1, 3, 6]),
(4, 'Services', [4, 5, 6]);

Querying Data Within Arrays

You can run queries to search for specific elements within an array or to perform operations on array data.

Example 1: Finding a Specific Element

To find pages visited by a specific user with visitor_id 1:

SELECT page_name
FROM freshers_in_most_visited_pages,
UNNEST(visitor_ids) AS individual_visitor_id
WHERE individual_visitor_id = 1;
Counting Array Elements

To count the number of visitors for each page:

SELECT page_name, ARRAY_LENGTH(visitor_ids) AS num_of_visitors
FROM freshers_in_most_visited_pages;
SELECT page_name, ARRAY_LENGTH(visitor_ids) AS num_of_visitors
FROM freshers_in_most_visited_pages;

The UNNEST Function to Work with Array Data

The UNNEST function takes an array and returns a table, with each element of the array turned into a row.

Example 1: Basic UNNEST
SELECT page_name, individual_visitor_id
FROM freshers_in_most_visited_pages,
UNNEST(visitor_ids) AS individual_visitor_id;
UNNEST with Array Index

To get the visitor IDs along with their position in the array:

SELECT page_name, individual_visitor_id, array_index
FROM freshers_in_most_visited_pages,
UNNEST(visitor_ids) AS individual_visitor_id WITH OFFSET AS array_index;
Output
+-----------+--------------------+-------------+
| page_name | individual_visitor_id | array_index |
+-----------+--------------------+-------------+
| Home      | 1                    | 0           |
| Home      | 2                    | 1           |
| Home      | 3                    | 2           |
| About     | 2                    | 0           |
| About     | 4                    | 1           |
| About     | 5                    | 2           |
| ...       | ...                  | ...         |
+-----------+--------------------+-------------+

BigQuery import urls to refer

Author: user

Leave a Reply