Introduction
During my initial learning of dbt, I connected to a Snowflake Instance to complete the DBT Fundamentals Course and other personal projects. These included resolving Preppin Data Challenges and completing an internal SQL Hackathon. Having passed the dbt Analytics Engineering Certification, my next goal was to work with live company data.
The focus was on our sales pipeline, which had preliminary API connections feeding into our Microsoft SQL Server Database. This presented a new challenge: using dbt with a database I had no prior experience with. Since dbt is a transformation tool designed for data warehouses, it isn't designed for moving tables to Snowflake, instead it made more sense to set up dbt to connect to where the data already lives. This blog outlines the challenges and learnings encountered while setting up dbt with Microsoft SQL Server.
Local dbt Setup
Overview
Although not the blog's main focus, discussing my local dbt setup might help others with best practices or prompt better understanding. Two key components:
- Profiles.yml Configuration: I followed dbt best practices by keeping connection details and confidential credentials separate from my projects on the local machine.
- Virtual Environment: I set up a virtual environment to isolate the Python instance and dependencies.
python -m venv dbt-env
python -m pip install dbt-core dbt-sqlserver dbt-snowflake
Key Takeaways
Initially, I had set up dbt-core with the Snowflake adapter. I encountered issues when adding the sql_server adapter. Rebuilding the virtual environment from scratch resolved these issues. Consequently, I’d recommend exploring virtual environments for each project to avoid conflicts on the versions required for certain dependencies.
Project Organization
I store my dbt projects separately from the virtual environments to allow easy resets of the virtual environments without affecting projects and to keep them removed from the version-controlled project directories. This setup requires some additional navigation within the Command Line with the directory set to Virtual Environments:
cd 'DBT Projects' --navigate to the overhead directory
dbt-env__project_1/Scripts/Activate --activate Virtual Environment
cd dbt-project_1 --navigate to the project folder
Activating the Python Environment and then navigating to the project being worked on.
When I work on the project in VS Code I ensure the interpreter is set to the correct Python Instance. Command-Shift-P to bring up the command palette. I search Python Interpreter and copy the path to the Virtual Environment Instance of Python.
Configuring a Profile for Microsoft SQL Server
For my profiles.yml file, I preemptively configured the connection (as it is not my first dbt_project, I have a profiles.yml file that exists and can be edited:
conn__sql_server:
outputs:
dev:
type: sqlserver
server: <your_server>
port: <port number>
user: <your_user>
password: <your_password>
database: <database to write to>
schema: <schema to write to>
trust_cert: true
target: dev
This profile directs dbt to write project objects to the specified schema and database.
I can initialize a project with an already created profile using the command:
dbt init --profile conn__sql_server
Testing the connection can be done using:
dbt debug
I followed the documentation here. Note the YAML configuration is distinct from other connection types.
Updating dbt Power User in VS Code for working with Microsoft SQL Server.
I encountered SQL errors when previewing queries with dbt PowerUser due to different SQL Server syntax. Adjusting the default query template in the PowerUser settings resolved this issue.
After some googling, I identified that I needed to adjust the default query template that PowerUser uses. I was able to identify the template I needed to start using here. However, I still could not find the elusive query template configuration.
That configuration is in the settings!
The settings can be found in the bottom left, I searched ‘Power User’ in the search bar to reduce the number of settings, and then in the query template used the setup the documentation suggests.
set rowcount {limit}; {query}
From here I was able to preview my source tables and begin to write logic to produce staging tables and then apply transformations to creat a reporting datamart with Microsoft SQL Server.