SQL and Snowflake #4: Reshaping Data with unions, pivots and date functions

When learning a new skill, it is always important to practice, practice and practice! Thus, I have been using the Preppin' Data Challenges to hone my skills in SQL and Snowflake. These are weekly exercises to help you learn and develop data preparation skills. The challenges are designed for learning Tableau Prep but they can be completed in R, Python, SQL, DBT, EasyMorph and many other tools.

To start, I will be completing challenge 2023- Week 4. This challenge involves restructuring customer data for a fictional bank. The link to the challenge is: https://preppindata.blogspot.com/2023/01/2023-week-4-new-customers.html

To start, I brought the data in to inspect it, using the select * to bring in all columns from the tables. This challenge has multiple inputs, one for each month. They are all structured the same, each dataset has an ID, JOINING_DAY, DEMOGRAPHIC and VALUE column.

Therefore, we can stack all the monthly datasets together.

To union the datasets, we first need to make a date column called MonthDate, where we have a day, month of the table and a set year 2023. This is so we can make a joining date column after everything is unioned.

Now we want to reshape our data so we have a field for each demographic, for each new customer. This is how we structure a pivot in SQL:

We also need to aggregate our values when pivoting, which isn't very intuitive when we have strings. To keep everything the same we can use the max or min of the string.

Putting the whole query together, we select the ID and new pivoted columns. We also form the new joining date column, where we combine the JOINING_DAY and MonthDay.

After the select, we add all the unions.

Finally, we pivot the unioned data and change the header names.

This is the complete cleaned dataset.

And that's everything! This Preppin' Data Challenge was useful for practicing unions, date functions and pivots.

Author:

Harvey Joyce

View Profile