DBT : Using DBT (Data Build Tool) for Testing and Capturing Test Results

getDbt

One of the great features of DBT is its testing framework. DBT allows us to validate the correctness and reliability of the transformations, reducing the risk of data issues going unnoticed.

DBT Test Command

DBT includes out-of-the-box support for several types of data tests, such as unique, not_null, accepted_values, relationships, etc. However, it also allows you to define your own custom data tests.

You run your tests in DBT using the dbt test command.

dbt test

This will execute all defined tests in the DBT project.

Capturing Test Results

DBT does not have a built-in feature to capture test results in a database or output file. However, DBT does return JSON output of the test results which you can capture and store.

You can use the –output option with dbt test to output the test results to a JSON file:

dbt test --output results.json

Then, you can parse this JSON file and store it in your database or use it however you need.

Example

Let’s assume we have a DBT project with a model called orders in our models directory. We want to ensure that the order_id column is unique and not null.

In our tests directory, we create a schema.yml file with the following content:

version: 2

models:
  - name: orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null

Now, we can run our tests:

dbt test

DBT will provide output on the console indicating which tests passed and which failed. If we want to capture these results, we can output it to a JSON file:

dbt test --output test_results.json

The test_results.json file will contain detailed information about each test, such as the test name, model, status (pass/fail), failure count, total rows, and more.

Sample JSON result:

{
    "metadata": {...},
    "results": [
        {
            "status": "pass",
            "message": null,
            "execution_time": 0.06402707099914551,
            "fail": null,
            "warn": null,
            "error": null,
            "skip": false,
            "test_metadata": {
                "name": "unique_order_id",
                "kwargs": {...},
                "total_rows": 1000,
                "passed_rows": 1000
            }
        },
        {
            "status": "pass",
            "message": null,
            "execution_time": 0.06440496444702148,
            "fail": null,
            "warn": null,
            "error": null,
            "skip": false,
            "test_metadata": {
                "name": "not_null_order_id",
                "kwargs": {...},
                "total_rows": 1000,
                "passed_rows": 1000
            }
        },
    ]
}

By capturing test results this way, we have the flexibility to parse and store them in a database, send them to a monitoring system, or do anything else we need. This is a great way to maintain a record of data quality over time and enhance the reliability of your data pipeline.

Get more useful articles on dbt

  1. ,
Author: user

Leave a Reply