DBT – Best practices
The article will give you the cumulative knowledge of dbt’s seasoned users on how to use it most effectively in your analytics work. Following these best practices that will enable your analytics team to operate as efficiently as possible.
Make use the ref function religiously
The power of dbt is due to the ref function. Dbt may infer dependencies with the ref function, ensuring that models are constructed in the proper sequence. Additionally, it makes sure that your current model uses upstream tables and views in the same workspace. When choosing a model from another, always utilize the ref function rather than the direct connection reference
Rename and recast fields where ever necessary
During the initial development itself rename and recast fields where ever necessary. Raw data is often saved in a source-conformed format, which means that it adheres to the naming conventions and schema defined by the source. This format will vary between sources, but it will also probably vary from the naming standards you want to employ for analytics.
In a dbt project, the initial layer of transformations should:
(a)Choose just from one source.
(b)Tables and fields should be renamed to conform to the conventions you want to employ for your project. Change fields to the appropriate data type, such as converting dates to UTC and prices to dollars.
(c)These models should serve as the foundation for all later data models, minimising the amount of duplicative code.
Divide up complicated models into manageable chunks.
Multiple Common Table Expressions are frequently included in complex models (CTEs). Instead, you may divide these CTEs into independent models in dbt that build upon one another. Complex models should frequently be divided into smaller parts when the CTE is repeated in two models. By decomposing the CTE into a distinct model, you may use it as a reference in any number of downstream models, resulting in less duplicative code. The granularity of the data that a CTE chooses from is altered. Testing any modifications that alter the granularity of your data is frequently helpful. You may test this transformation independently of a bigger model by breaking a CTE into a separate model. There are several lines of SQL in a query. When another dbt user (or your future self) is reviewing the code, cognitive strain can be decreased by breaking CTEs into independent models.
Group your models in directories
You are allowed to have as many nested subdirectories as you like inside of your models/ directory. We make extensive use of directories because nested structures within directories make it simpler to:
(a)Specify settings in your dbt project.yml file to configure groups of models.
(b)Utilize your DAG’s subsections by applying the model selection syntax.
(c)Share the modelling process with your team.
(d)Establish rules on a model’s permitted upstream dependencies, such as “models in the marts directory can only choose from other models in the marts directory, or from models in the staging directory.”
Adding tests to your models
A framework for testing assumptions about a model’s output is provided by dbt. By including tests in a project, you can ensure that your SQL is processing data the way you intend it to and that your source data contains the values you anticipate.
Give importance to the information architecture of your data warehouse.
Users frequently rely on the names of schemas, relations, and columns when they connect to a data warehouse using a SQL client to understand the data that is provided to them. to upgrade a data warehouse’s information architecture. To divide relations into logical groupings, use custom schemas, or bury intermediary models in a different schema. Typically, these unique schemas are specified from the dbt project.yml file and match the directories we use to organise our models. Table names should have prefixes to indicate which relations end users should be able to query, such as stg_, fct_, and dim_.
Use materializations wisely
Through configuration, materialization controls how models are constructed. In general, views are quicker to develop than tables, but slower to query. Although incremental models are quicker to construct than table materialization and offer the same query performance as tables, they add complexity to projects.
For local execution, use the model selection syntax
It is frequently advisable to run only the model you are currently working on and any downstream models when developing. Using the model selection syntax, you can pick which models to run.
Run only modified models to test changes
You need to be certain that any code changes you make won’t break other parts of your project before you can confidently merge them. Therefore, as an automatic check in your git workflow, we advise running models and tests in a sandboxed environment that is distinct from your production data. (Read about setting up CI jobs if you use GitHub and dbt Cloud.)
To manage permissions on objects that dbt creates, use hooks
To make sure that permissions are applied to the objects created by dbt, use grant statements from hooks. These grant statements can be codified in hooks so that they can be applied repeatedly under version control.