Using dbt and Jinja to Write Custom SQL - Unioning Data

Using Jinja for Dynamic Unions in dbt

Revolutionizing SQL with Jinja

In data preparation, consolidating multiple datasets with identical structures is a frequent task. SQL's UNION operation is commonly used to stack data from these datasets into one consolidated table. However, as the number of datasets increases, maintaining and updating these SQL scripts becomes cumbersome. This is where dbt and Jinja come into play, revolutionizing how SQL queries are written and maintained.

Sample Input Data

For the purpose of the demonstration I have kept the actual data in the tables very simple (1 line with same data in it repeated across columns so we can clearly see the union occur successfully). I have opted to store static .csvs in the project and load them with a dbt seed command. Thus, the data can be generated quickly from within the project. All that is required after setting up the project is to run the dbt seed command and then run the models. sales_2022, sales_2023, and sales_2024 tables:

`sales_2022` Table:

Jan	Feb	etc.
1	1	1

`sales_2023` Table:

Jan	Feb	etc.
2	2	2

`sales_2024` Table:

Jan	Feb	etc.
3	3	3

Obviously its unlikely you would want to store these wide tables with a column for each month in your database. It would make far more sense to unpivot the data first and then union. This example has been picked for its familiarity with less mature data users to clearly demonstrate the union occuring. Nevertheless, the process and approach works for more reasonable situations like a table for individual countries or stores (that can be manually specified or even queried from a table).

The Challenge: Tedious Manual Unions

When consolidating data—for instance, combining tables for different years (sales_2022, sales_2023, etc.)—you often have to manually construct a query:

SELECT * FROM sales_2022
UNION ALL
SELECT * FROM sales_2023
UNION ALL
SELECT * FROM sales_2024;

Adding a new dataset (e.g., sales_2025) requires modifying the query. This manual process is repetitive, error-prone, and not scalable. Tools like Tableau Prep or Alteryx offer quick solutions for such tasks through wild-card unions, but SQL lacks a direct equivalent—unless you bring in Jinja.

Jinja in dbt: Revolutionizing SQL Queries

Jinja, the templating language used by dbt, allows dynamic SQL generation. At runtime, it interprets and compiles templates into executable SQL queries. This flexibility automates repetitive tasks like unions and significantly simplifies your workflows.

Example Jinja Code for Unions:

{% set years = ['2022', '2023', '2024'] %}

{% for year in years %}
  SELECT 
      *,
      '{{ year }}' AS year
  FROM {{ ref('sales_' ~ year) }}
{% if not loop.last -%} UNION ALL {%- endif %}
{% endfor %}

How It Works:

Variable Declaration: The years list stores the target years.
For Loop: Iterates over each year, generating a SELECT query for each dataset (e.g., sales_2022).
Dynamic UNION: Automatically adds UNION ALL between queries, skipping it for the last iteration.

Compiled SQL Output:

SELECT 
    *,
    '2022' AS year
FROM sales_2022
UNION ALL
SELECT 
    *,
    '2023' AS year
FROM sales_2023
UNION ALL
SELECT 
    *,
    '2024' AS year
FROM sales_2024;

Making the Jinja Code Dynamic

We can make the approach more dynamic by using Python's built-in datetime module and the range function in Jinja, this can make the generation of a date list more dynamic.

Dynamic Jinja Code:

{% set current_year = modules.datetime.datetime.now().year %}
{% set years = range(2022, current_year + 1) %}

{% for year in years %}
  SELECT 
      *,
      '{{ year }}' AS year
  FROM {{ ref('sales_' ~ year) }}
{% if not loop.last -%} UNION ALL {%- endif %}
{% endfor %}

How It Works:

modules.datetime.datetime.now().year: Gets the current year dynamically.
range(2022, current_year + 1): Generates a list of years starting from 2022 up to the current year (inclusive).

The range function is exclusive of the upper bound, so current_year + 1 is used to include the current year in the list.

Flexible Loop: The loop now adapts automatically as new years are added without modifying the code.

The compiled output will automatically include all years from 2022 up to the current year, eliminating the need for manual updates.

Result: Consolidated Unioned Table

After running the dbt model, the final output is a single table with all datasets stacked together, ready for analysis. This resolution saves tedious repetition of statements with minor variation but is not yet dynamic given the need to add new variables to the list at the start of the model.

Example Output (Unioned Data):

Column1	Column2	Column3	Year
1	1	1	2022
2	2	2	2023
3	3	3	2024

Conclusion

Using Jinja in dbt for dynamic unions transforms tedious SQL scripts into scalable, automated scripts. Whether you're consolidating yearly sales data or combining tables across multiple sources (e.g. a table per country or store), this approach reduces complexity, saves time, and enhances the efficiency of your data preparation process.

Explore the repository here and keep an eye out for expanded examples: https://github.com/edxhayter/dbt_jinja_union

Author:

Edward Hayter

View Profile

Using dbt and Jinja to Write Custom SQL - Unioning Data

Revolutionizing SQL with Jinja

Sample Input Data

sales_2022 Table:

sales_2023 Table:

sales_2024 Table:

The Challenge: Tedious Manual Unions

Jinja in dbt: Revolutionizing SQL Queries

Example Jinja Code for Unions:

How It Works:

Compiled SQL Output:

Making the Jinja Code Dynamic

Dynamic Jinja Code:

How It Works:

Result: Consolidated Unioned Table

Example Output (Unioned Data):

Conclusion

`sales_2022` Table:

`sales_2023` Table:

`sales_2024` Table: