Google BigQuery: Filling null dates with previous row value

Google Big Query @ Freshers.in

In the data world, it’s not uncommon to encounter scenarios where certain records might have missing date values. In BigQuery, there’s a way to handle this elegantly using window functions. Let’s explore how.

Setup

Let’s assume we’re working with a table named freshers_in_page_views that tracks daily page view counts.

1. DDL: Create the Table

CREATE TABLE freshers_in_page_views (
    view_date DATE,
    view_count INT64
);

2. Insert Sample Data

Here’s some sample data with missing view_date values:

INSERT INTO freshers_in_page_views (view_date, view_count) VALUES 
    ('2023-01-01', 100),
    (NULL, 120),
    ('2023-01-03', 140),
    (NULL, 110),
    ('2023-01-05', 90);

In the above data, the view_date for some rows is missing (NULL).

3. Select Statement to Fill Missing Dates

To fill the missing view_date with the date from the previous row, we’ll use the LAG() window function:

WITH FilledDates AS (
    SELECT 
        view_date,
        view_count,
        LAG(view_date) OVER (ORDER BY IFNULL(view_date, '9999-12-31')) AS previous_date
    FROM freshers_in_page_views
)

SELECT 
    CASE 
        WHEN view_date IS NULL THEN previous_date
        ELSE view_date
    END AS adjusted_view_date,
    view_count
FROM FilledDates
ORDER BY adjusted_view_date;

We’re using a common table expression (WITH clause) named FilledDates to determine the previous date using LAG().

We’re then using a CASE statement to replace NULL values with the date from the previous row.

The ORDER BY IFNULL(view_date, ‘9999-12-31’) ensures that the ordering is based on the existing dates while placing the NULL values at the end, so the LAG function doesn’t consider them.

BigQuery import urls to refer
Author: user

Leave a Reply