Today we were set a challenge to create a 6-month moving average of sales by state in the Sample-Superstore.xlsx dataset in Tableau Prep.
This challenge seemed easy at first to just go through table calculations and create a moving average grouped by State/Province.
However, not every state has an order every month. So the first step is to add these months by scaffolding the data, the old method to do this would have been to create a date table and join that onto the data we are working with.
Yet, since Tableau Prep version 2021.3 the new rows feature was added and now we are able to scaffold the data without the need of a date table.
We first aggregate the data to have just the fields; [State/Province], [Order Date] (which has been grouped to month start), [Sales]. We do this so that there are not as many fields when going through the steps to help prevent mistakes.
When we try to add a New Rows step we need a field to scaffold it on and that field should be [Order Date]. We cannot however select it, so we have to go back a step and look at the source. After aggregating to Month start for Order Date the data type has changed to datetime so we need to change this to a Date type so we can use it in the new rows step.
With the New rows step set up like this, the data is still not scaffolded correctly as it is not taking the minimum order date for this state it is taking the minimum order date from the whole data set, and the same with the maximum.
To remedy this problem we need to create a fixed LOD for the minimum and maximum dates for each state.
After this step we try to use the new rows function, however it still is not working as we are creating a new row with the sales field already there so we would want to remove this so that we can have just the [State/Province], [Minimum Order Date], and [Maximum Order Date] fields to get every month between these 2 points for each state.
Before the New Rows step we need to add an Aggregation step to get down to just these fields.
The new Rows step should now work and you should have all of the months that each state regardless of whether there was a data point for that month originally.
We then rejoin on the data set from before the 2nd aggregation so that we can have the sales data back in the data.
When we join back on the data we have to make sure that it is a left join, as this will mean that the unmatched data from the data set we scaffolded will be added to the original data set. This means that the dates without sales data will be pulled through.
The data is now scaffolded.