select
statement or a Python DataFrame that returns the dataset you need, and dbt takes care of materialization.As a dbt user, your main focus will be on writing models (i.e. select queries) that reflect core business logic – there’s no need to write boilerplate code to create tables and views, or to define the order of execution of your models. Instead, dbt handles turning these models into objects in your warehouse for you.
Feature | Description |
---|---|
Handle boilerplate code to materialize queries as relations | For each model you create, you can easily configure a materialization. A materialization represents a build strategy for your select query – the code behind a materialization is robust, boilerplate SQL that wraps your select query in a statement to create a new, or update an existing, relation. Read more about Materializations. |
Use a code compiler. | SQL files can contain Jinja, a lightweight templating language. Using Jinja in SQL provides a way to use control structures in your queries. For example, if statements and for loops. It also enables repeated SQL to be shared through macros . Read more about Macros. |
Determine the order of model execution. | Often, when transforming data, it makes sense to do so in a staged approach. dbt provides a mechanism to implement transformations in stages through the ref function. Rather than selecting from existing tables and views in your warehouse, you can select from another model. |
Document your dbt project | dbt provides a mechanism to write, version-control, and share documentation for your dbt models. You can write descriptions (in plain text or markdown) for each model and field. In dbt Cloud, you can auto-generate the documentation when your dbt project runs. Read more about the Documentation. |
Test your models | Tests provide a way to improve the integrity of the SQL in each model by making assertions about the results generated by a model. Read more about writing tests for your models Testing |
Manage packages | dbt ships with a package manager, which allows analysts to use and publish both public and private repositories of dbt code which can then be referenced by others. Read more about Package Management. |
Load seed files | Often in analytics, raw values need to be mapped to a more readable value (for example, converting a country code to a country name) or enriched with static or infrequently changing data. These data sources, known as seed files, can be saved as a CSV file in your project and loaded into your data warehouse using the seed command. Read more about Seeds. |
Snapshot data | Often, records in a data source are mutable because they change over time. This can be not easy to handle in analytics if you want to reconstruct historic values. dbt provides a mechanism to snapshot raw data for a point in time, through use of snapshots. |