DBT (Data Build Tool) does not have a built-in feature for incremental data loading, but it can be accomplished by using DBT’s filtering and macro capabilities in combination with a database’s incremental loading functionality. This can be done by using the following steps:
- Use a database’s incremental loading feature (e.g. INSERT INTO … ON DUPLICATE KEY UPDATE) to only load new or updated rows into a staging table.
- In DBT, create a model that filters the data from the staging table to only include new or updated rows. This can be done by using a macro to generate a SQL statement that selects rows from the staging table based on a timestamp or other unique identifier.
- Create a DBT model that transforms the incremental data and loads it into the final table. This can be done by using a DBT model with the
incremental=True
configuration to specify that the model should only load new rows and update existing rows. - Run DBT with the target models in each run. This can be done by specifying the
-m
option and the name of the model to run. - Schedule the DBT run with incremental data loading in a cron job or cloud function to update the final table periodically.
Here is an example of how you might use DBT to handle incremental data loading:
- Use a database’s incremental loading feature (e.g. INSERT INTO … ON DUPLICATE KEY UPDATE) to only load new or updated rows into a staging table.
- In DBT, create a model that filters the data from the staging table to only include new or updated rows.
{% set incremental_data =
(select * from {{ref('staging_table')}}
where updated_at > (select max(updated_at) from {{this.schema}}.incremental_table)) %}
{{incremental_data}}
- Create a DBT model that transforms the incremental data and loads it into the final table.
{{ config(materialized='table', incremental=True) }}
select
id,
name,
address
from {{ref('incremental_data')}}
- Run dbt with the target models in each run.
dbt run -m incremental_data
dbt run -m final_table
- Schedule the dbt run with incremental data loading in a cron job or cloud function to update the final table periodically.
Get more useful articles on dbt