Data Preparation... for Dummies

Last Friday DS13 was assigned a project. In the time span of 2 hours (which became 4 afterwards), we had to re-create our application viz. It was also required to find a new dataset, preparing it with Alteryx or Tableau Prep, and adding it to the original Viz.

Thankfully I had in mind which dataset I could add, although I was pretty worried for the timing. When I originally designed my Viz I spent hours to manually cleaning the data. Only the thought of repeating this process, plus doing it for a new dataset discouraged me. Despite that, I started a workflow in Alteryx and to re-build my dataset from scratch.

Against all my expectation, the cleansing process took me around 45 minutes only. In this post, I would like to describe how I used to clean up the data before joining DS and how this process can be streamlined using a tool such as Alteryx or Tableau Prep.

The original dataset I’ve used for the Viz was the Pokémon database file, retrieved from https://public.tableau.com/en-us/s/resources.

The first thing I’ve changed from the original dataset was to get rid of all the duplicates (Pokemon with 2 different types, I just wanted to keep the original one). Originally, I manually performed this task in Excel, deleting every row containing duplicates.

In Alteryx this action can be performed with a single tool, called Unique.

Just select the field to group the data by, allowed me to perform this operation in a matter of seconds, without the risk of missing rows of data.

The dataset I decided to add contained the Pokémon evolutions. I originally thought to include this in my application Viz, although due to my little knowledge at the time, I did not manage to join these 2 datasets properly.

This action, again took seconds to be performed, all I had to do was to join this 2 datasets and union the L and J outputs.

This new union allowed me to display Pokémon evolutions in the tooltip, a task that I could not perform before.

The most time-consuming task during the original creation of this Viz was to add the Pokémon descriptions. At the time I actually had to copy and paste this text from a webpage (yes, this means that I copied and pasted 151 different strings manually).

This operation could be streamlined using an Alteryx macro. Unfortunately, we haven’t discussed this topic yet, but I look forward to learning this and use it for other projects.

Reflecting on how I used to clean data makes me feel dummy, although I know is all part of the learning process. I look forward to keeping improving my skills and laugh about all the time I spent to compute simple operations.

Author:
Alessandro Costanzo
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2025 The Information Lab