For a personal project, I had the idea of visualising Mario Kart statistics in order to find the best combination in the game. In order to race, the player has to select a Driver, Kart body, Wheel type and Glider. Each part has a distribution of 14 stats, here is an example using Mario!
To visualise and compare each combination I need to combine EVERY Driver with EVERY Kart with EVERY Wheel and EVERY Glider...
Thus, I need a data manipulation tool to achieve this, I decided to go with Tableau Prep!
To start, I downloaded the data from mariowiki.com added each sheet to Tableau Prep. Starting with the Drivers sheet, I added a cleaning step to check if the headers was implemented okay. I then added a new 'Join' column which just had the integer '1' for every row. I repeated these steps from the Kart sheet.
Next, we need to join the Driver and Kart sheets. The applied join clause was combining the 'Join' columns from each sheet. From 49 Drivers and 41 Karts, we now have 2,009 combinations!
From the join, we now have two lists of the same stats e.g WG from the Driver sheet and WG-1 from the joined Kart sheet. These need to be the combined, therefore we need to pivot the stats together.
After the pivot, we have to clean the Pivot1 Names column to remove the "-1" from each stat, we are now back to 14 stats.
After cleaning, we need to aggregate the data. We need the sum the Pivot1 Values for each Stat (Pivot1 Names), Driver and Kart type.
The final step is to re-pivot the data to make it look like how it started with each stat being a column. This brings the number of rows down from 28K back to 2,009 combinations.
We now repeat all the previous steps again such as adding another join column to this data and the wheel sheet etc...
We will eventually end up with over 660k combinations!
It was important to note that during the final step of repivoting the data, a couple nulls started to appear...
Luckily, this was just a superfluous mistake due to Tableau Prep having to read so much data and didn't impact the exporting of the data flow!
And that's it! I hope you find this useful if you want to combine data sets where each row is a unique combination.