I wrote a blog about comparing the calculations between Power BI and Tableau. In 2 blogs, I shared how to write DAX in Power BI and compare it with the calculations in Tableau through some use cases (Part 1, Part 2). In this blog, I will share some calculations I used in Tableau and SQL through some use cases.
1/ LOD in Tableau
2/ Table Calculations in Tableau
- Difference from the previous/ next row
- Rank
- Running total
- Moving Calculation
In this blog, I use the sample SuperStore dataset as an example. I will discuss how to do it in Tableau first and how to write SQL queries in Snowflake.
LOD in Tableau
Use Case: Find the sum of sales by each sub-category and the sum of sales by each category.
In Tableau, I drag Category, Sub-Category, and SUM([Sales]) into Rows. The SUM([Sales]) calculation will calculate the total sales of each sub-category in the category where it belongs.
Then, I double-click on the blank space in the Rows and type the calculation:
{FIXED [Category]: SUM([Sales])}
It means calculating the total sales group by each category. Now, in the level of row, it's helpful to find how many percent of sales of each sub-category in the category.
In SQL: there are 2 ways to write the SQL query, but the logic is the same.
Option 1: Not using CTE
Firstly, I need to find the sum of sales for each sub-category.
SELECT category,
sub_category,
SUM(sales) as sales_subcatg
FROM SUPERSTORE
GROUP BY category,
sub_category
ORDER BY category,
sub_category;
From the SQL query above, it will return the sum of sales groups by each sub-category in the category. It is the same as SUM([Sales]) in Tableau above.
From the Sum of Sales by each sub-category temp table, I need to find the sum of sales for each category. To do that, I will sum all values of the sum of sales by each sub-category group by category.
From Fig.2, I declare another SELECT statement outside.
SELECT category,
sub_category,
sales_subcatg,
SUM(sales_subcatg) OVER (PARTITION BY category) AS sales_category
FROM (
< The table returns the sum of sales by each sub-category above>
)
ORDER BY category,
sub_category;
From the table that returns the sum of sales by each sub-category above, I will output the category, sub_category, sales_subcatg and then find the sum of sales by each category. To do that, I will use the WINDOW function in SQL.
SUM(sales_subcatg) OVER (PARTITION BY category) AS sales_category
That statement will sum values of sales_subcatg grouped by category. It is represented as "PARTITION BY category". I add the ORDER BY clause to show the result table in order as the result table in Tableau.
Option 2: By using CTE
The same logic as the first option, but in this option, I separate the sum of sales by each sub-category table in a CTE.
WITH sales_each_subcatg as (
SELECT category,
sub_category,
SUM(sales) as sales_subcatg
FROM superstore
GROUP BY category,
sub_category
)
SELECT category,
sub_category,
sales_subcatg,
SUM(sales_subcatg) OVER (PARTITION BY category) AS sales_category
From sales_each_subcatg
ORDER BY category,
sub_category;
After you get the value for sales of each sub-category and sales by each category, it's easy to find the percentage sales by each sub-category in the row level.
You can also apply this in many cases as Cohort Analysis, New Customer Acquisition, ...
Table Calculations in Tableau
1a/ Difference from the previous row
In Tableau
In Tableau, I drag the Month of Order Date in Discrete, Sum of Sales, and copy the Sum of Sales into Rows. I double-click on the copy of the Sum of Sales and choose Add Table Calculation.
The Table Calculation window appears, I choose Difference From in the Calculations Type, Table (down) in Compute Using, and Relative to Previous from the dropdown menu.
In Fig. 4, the result table will have 3 columns: Month of Order Date, Sales, and Difference in Sales. The first row in the table is Null because no previous value for the first row.
In SQL:
To find the difference in Sales in SQL, I need to do it in 2 steps:
- Step 1: Find the sum of sales by month in the previous row
- Step 2: Find the difference in sales from the previous row
Step 1: Find the sum of sales by month in the previous row
SELECT DATE_TRUNC('month',order_date) as month_orderdate,
ROUND(SUM(sales),2) as sum_sales,
LAG(ROUND(SUM(sales),2)) OVER (ORDER BY month_orderdate) as previous_sales
FROM superstore
GROUP BY month_orderdate
ORDER BY month_orderdate;
From the SQL query above, I will retrieve the month of order date, the sum of sales by each month, and the sum of sales in the previous month.
For the Sum of sales (sum_sales), I use the aggregate function SUM inside the ROUND function to limit the number of decimals.
For the Sum of Sales in the previous month, I used the LAG function for the sum of sales ordered by the month of order date in ascending (same as computed using Table Down with the month of order in Tableau). The LAG function returns the data in the previous row without having to join the table to itself.
In Fig.5, the result table returns the previous row value in the Previous Sales column.
Step 2: Find the difference in sales from the previous row
To find the difference in sales from the previous row, I will put the query above in FROM clause as a subquery.
To return the value, I will select the month of the order date, and the sum of sales, and the difference in sales is calculated as sum_sales - previous_sales.
In Fig. 6, the result table returns data the same as Tableau. You can also create a CTE instead of using a subquery.
Note: the superstore dataset that I am using in SQL is old, so the value is quite different from the data in Tableau.
- If I don't use the LAG function, is there any way to find the difference from the previous row? The answer is Yes
Another way to return the value from the previous row without using the LAG function is by joining the table by itself (or Self Join). There are 2 steps. The logic is the same as the way above.
I create a CTE to find the sum of sales by each month. From that CTE, I do Left Join with its CTE. The key to join is the order date from the left table joins with the order date from the right table plus 1 month (which means the next row to match). For calculation, the previous sum of sales is subtracted from the sum of sales from the current month.
WITH sales_by_month AS (
SELECT DATE_TRUNC('month',order_date) as month_orderdate,
ROUND(SUM(sales),2) as sum_sales
FROM superstore
GROUP BY 1
ORDER BY 1
)
SELECT
curr.month_orderdate,
curr.sum_sales,
curr.sum_sales - prev.sum_sales AS difference_in_sales
FROM sales_by_month AS curr
LEFT JOIN sales_by_month AS prev
ON curr.month_orderdate = prev.month_orderdate + INTERVAL '1 month'
ORDER BY curr.month_orderdate;
1b/ Difference from the next row
In Tableau, I only need to switch the option in "Relative to" to Next. Then, it will show the difference in Sales from the next row.
In SQL, I change the LAG function to the LEAD function to get the data from the next row and rename the alias. Opposite to the LAG function, the LEAD function will return the value in the next row.
SELECT month_orderdate,
sum_sales,
sum_sales - next_sales as difference_in_sales
FROM (
SELECT DATE_TRUNC('month',order_date) as month_orderdate,
ROUND(SUM(sales),2) as sum_sales,
LEAD(ROUND(SUM(sales),2)) OVER (ORDER BY month_orderdate) as next_sales
FROM superstore
GROUP BY 1
ORDER BY 1
);
Without using the LEAD function in SQL, I keep the same SQL as I did in difference from the previous row. The only different point here is I change the alias from prev into next. However, in this case, the next order date equals the current order date plus 1 month (for the next row) when joining.
WITH sales_by_month AS (
SELECT DATE_TRUNC('month',order_date) as month_orderdate,
ROUND(SUM(sales),2) as sum_sales
FROM superstore
GROUP BY 1
ORDER BY 1
)
SELECT
next.month_orderdate,
next.sum_sales,
curr.sum_sales - next.sum_sales AS difference_in_sales
FROM sales_by_month AS curr
LEFT JOIN sales_by_month AS next
ON next.month_orderdate = curr.month_orderdate + INTERVAL '1 month'
ORDER BY curr.month_orderdate;
This blog is long, so I will split the topic into 2 parts. In this first part, I shared how to do LOD calculation in Tableau and WINDOW function in SQL. I also introduced how to apply Table Calculation in Tableau to find the difference from the previous/ next row. In SQL, I introduced the LAG, LEAD function to find the previous and next row data. Furthermore, I shared how to apply self-join to find the previous and next row data in SQL.
In the next part, I will share the calculations for finding RANK, Running Total, and Moving Calculation in both Tableau and SQL.
I hope you enjoy this blog and see you in the next blog!