Drawing out your Data Journey

…and you don't need to be an artist! A huge lesson I learnt in my very first week at the data school is that a plan is what takes you from chaos to clarity when preparing and cleaning a dataset. A plan is the difference that will make or break you in data preparation. I felt this especially, when Carl (our head coach, UK) has thrown some dirty data from his very real luxurious soap company at us. [a moment to appreciate the irony]


A dirty dataset can be described as breaking the general rules of data structure, which are:

  • single data field for each category or measure
  • single data type for each field of data
  • a row, to record all the values for each field (where possible!)
  • single column for dates (unless different dates describe unique instances, such as employment start date and end date)

This may mean you have to filter data, split columns, join different tables and a whole lot more! Ultimately, this can pose an obstacle when you want to dig out insights in your data analysis. Therefore, having a well-formatted structure for your data set (squeaky clean data) is essential in your data journey.

But, where to even begin - AH!


Well, Stan, you need a plan.

Your plan is a guide to what you want your dataset to look like, and ultimately to make it as useful as possible for your analysis. Now that you can visualise the ideal dataset, you can break the preparation into a step-by-step list to get there. I tend to sketch what the dataset currently looks like at the top of the page and what I want it to look like at the bottom. Then fill in the middle with each step to change the table, to eventually evolve it to the end sketch! As a visual person, this works well for me however you may find you would want to plan differently (numbered lists, sentences, flow charts etc).

Photo by charlesdeluvio / Unsplash

Have a whirl!

Try out some challenges with dirty data (https://www.preppindata.com/), and test yourself to see if you can find the logical steps to go from the input dataset to the output in your plan. This definitely helped me get to grips with data preparation and cleaning.

Best of luck with the start of your data journey!

Author:
Numa Begum
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2025 The Information Lab