DBT Tests: Ensuring Data Quality in Your DBT Project . Write and run custom data tests.

getDbt

One crucial aspect of data transformation is ensuring that your output data meets certain quality standards. DBT tests are an integral part of this process, enabling you to validate your data at various stages of the transformation pipeline. In this article, we’ll delve into the concept of DBT tests, how to write and run custom data tests, and provide a detailed example.

1. Understanding DBT tests:

DBT tests are SQL queries that validate your data against predefined quality checks or business rules. These tests can be applied to raw data (sources), intermediate models, or final output tables. DBT comes with built-in tests, such as unique, not_null, and accepted_values, but you can also create custom tests to suit your specific needs. Tests are executed when you run the dbt test command, and the results are reported in the command line or log files.

2. Built-in DBT tests:

DBT offers several built-in tests that can be applied to your models or sources:

unique: Ensures that a column has unique values.
not_null: Ensures that a column does not contain null values.
accepted_values: Ensures that a column contains only specific, allowed values.
To apply built-in tests, you need to define them in your schema.yml file, specifying the model or source, the column(s), and the test(s) to apply.

Example schema.yml for built-in tests:

version: 2

models:
  - name: freshers_in_orders_daily
    columns:
      - name: order_date
        tests:
          - unique
          - not_null
      - name: daily_revenue
        tests:
          - not_null

sources:
  - name: e-commerce
    tables:
      - name: freshers_in_orders
        columns:
          - name: order_id
            tests:
              - unique
              - not_null

3. Writing custom DBT tests:

To create custom tests in DBT, follow these steps:

Step 1: Create a new directory named tests within your project’s root directory.

Step 2: Within the tests directory, create a new SQL file for your custom test, e.g., revenue_positive.sql. Write your custom test as a SQL query that returns the records that fail the test.

Example custom test revenue_positive.sql:

SELECT *
FROM {{ ref('freshers_in_orders_daily') }}
WHERE daily_revenue < 0

Step 3: Define the custom test in your schema.yml file by specifying the model, column, and test name.

Example schema.yml for custom tests:

version: 2

models:
  - name: freshers_in_orders_daily
    columns:
      - name: order_date
        tests:
          - unique
          - not_null
      - name: daily_revenue
        tests:
          - not_null
          - test:
              name: revenue_positive

Running DBT tests:

To run DBT tests, execute the dbt test command in your command line or terminal. This command will run all tests defined in your project, both built-in and custom. The test results will be displayed in the command line, including the number of passed and failed tests, as well as detailed information about any failures.

Example command:

dbt test

Using custom tests with sources:

Custom tests can also be applied to your sources to validate raw data before it’s used in your transformations. To apply custom tests to sources, follow the same process as for models, but define the tests in the sources.yml file instead.

Example custom test order_id_length.sql:

SELECT *
FROM {{ source('e-commerce', 'freshers_in_orders') }}
WHERE LENGTH(order_id::VARCHAR) <> 8

Example sources.yml for custom tests:

version: 2

sources:
  - name: e-commerce
    tables:
      - name: freshers_in_orders
        columns:
          - name: order_id
            tests:
              - unique
              - not_null
              - test:
                  name: order_id_length

DBT tests are a powerful feature that ensures data quality and consistency throughout your data transformation pipeline. By leveraging built-in tests and creating custom tests tailored to your specific requirements, you can maintain high-quality data for your analytics and reporting needs. Remember to run your tests regularly and address any failures promptly to ensure the continued reliability of your data.

Get more useful articles on dbt

  1. ,
Author: user

Leave a Reply