Revolutionizing SQL with Jinja
In data preparation, consolidating multiple datasets with identical structures is a frequent task.
SQL's UNION
operation is commonly used to stack data from these datasets into one consolidated table.
However, as the number of datasets increases, maintaining and updating these SQL scripts becomes cumbersome.
This is where dbt and Jinja come into play, revolutionizing how SQL queries are written and maintained.
Sample Input Data
For the purpose of the demonstration I have kept the actual data in the tables very simple (1 line with same data in it repeated across columns so we can clearly see the union occur successfully). I have opted to store static .csv
s in the project and load them with a dbt seed
command. Thus, the data can be generated quickly from within the project. All that is required after setting up the project is to run the dbt seed command and then run the models.
sales_2022
, sales_2023
, and sales_2024
tables:
sales_2022
Table:
Jan | Feb | etc. |
---|---|---|
1 | 1 | 1 |
sales_2023
Table:
Jan | Feb | etc. |
---|---|---|
2 | 2 | 2 |
sales_2024
Table:
Jan | Feb | etc. |
---|---|---|
3 | 3 | 3 |
The Challenge: Tedious Manual Unions
When consolidating data—for instance, combining tables for different years (sales_2022
,
sales_2023
, etc.)—you often have to manually construct a query:
SELECT * FROM sales_2022
UNION ALL
SELECT * FROM sales_2023
UNION ALL
SELECT * FROM sales_2024;
Adding a new dataset (e.g., sales_2025
) requires modifying the query.
This manual process is repetitive, error-prone, and not scalable.
Tools like Tableau Prep or Alteryx offer quick solutions for such tasks through wild-card unions,
but SQL lacks a direct equivalent—unless you bring in Jinja.
Jinja in dbt: Revolutionizing SQL Queries
Jinja, the templating language used by dbt, allows dynamic SQL generation. At runtime, it interprets and compiles templates into executable SQL queries. This flexibility automates repetitive tasks like unions and significantly simplifies your workflows.
Example Jinja Code for Unions:
{% set years = ['2022', '2023', '2024'] %}
{% for year in years %}
SELECT
*,
'{{ year }}' AS year
FROM {{ ref('sales_' ~ year) }}
{% if not loop.last -%} UNION ALL {%- endif %}
{% endfor %}
How It Works:
- Variable Declaration: The
years
list stores the target years. - For Loop: Iterates over each year, generating a
SELECT
query for each dataset (e.g.,sales_2022
). - Dynamic
UNION
: Automatically addsUNION ALL
between queries, skipping it for the last iteration.
Compiled SQL Output:
SELECT
*,
'2022' AS year
FROM sales_2022
UNION ALL
SELECT
*,
'2023' AS year
FROM sales_2023
UNION ALL
SELECT
*,
'2024' AS year
FROM sales_2024;
Making the Jinja Code Dynamic
We can make the approach more dynamic
by using Python's built-in datetime
module and the range
function in Jinja, this can make the generation of a date list more dynamic.
Dynamic Jinja Code:
{% set current_year = modules.datetime.datetime.now().year %}
{% set years = range(2022, current_year + 1) %}
{% for year in years %}
SELECT
*,
'{{ year }}' AS year
FROM {{ ref('sales_' ~ year) }}
{% if not loop.last -%} UNION ALL {%- endif %}
{% endfor %}
How It Works:
modules.datetime.datetime.now().year
: Gets the current year dynamically.range(2022, current_year + 1)
: Generates a list of years starting from 2022 up to the current year (inclusive).- The
range
function is exclusive of the upper bound, socurrent_year + 1
is used to include the current year in the list. - Flexible Loop: The loop now adapts automatically as new years are added without modifying the code.
The compiled output will automatically include all years from 2022 up to the current year, eliminating the need for manual updates.
Result: Consolidated Unioned Table
After running the dbt model, the final output is a single table with all datasets stacked together, ready for analysis. This resolution saves tedious repetition of statements with minor variation but is not yet dynamic given the need to add new variables to the list at the start of the model.
Example Output (Unioned Data):
Column1 | Column2 | Column3 | Year |
---|---|---|---|
1 | 1 | 1 | 2022 |
2 | 2 | 2 | 2023 |
3 | 3 | 3 | 2024 |
Conclusion
Using Jinja in dbt for dynamic unions transforms tedious SQL scripts into scalable, automated scripts. Whether you're consolidating yearly sales data or combining tables across multiple sources (e.g. a table per country or store), this approach reduces complexity, saves time, and enhances the efficiency of your data preparation process.
Explore the repository here and keep an eye out for expanded examples: https://github.com/edxhayter/dbt_jinja_union