How to clean data?

On day 3 at The Data School, we investigated cleaning data, a fundamental concept when it comes to data prep. To know what clean data is, we must first understand what it means for data to be dirty. Data is dirty is if it breaks one of Carl's 4 rules:

· One data field for each category or measure.

· One data type for each data field.

· A single data column where possible

· One row should be a record containing all values in each data field.

Data is also dirty if it is out of date or has null values.

How do we solve these issues? By checking spelling, we can remove errors caused by spelling errors. Filtering out data is also a good way to clean data however we need to be cautious about what we filter as it will remove all record of the data. When there is more than one data type for each field and category, or we can split data in a column. Be wary when splitting dates as we don’t want to break one of the 4 rules.

In coming weeks, we will apply the theory we have learned to Tableau Prep.

Author:
Saampave Sanmuhanathan
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2025 The Information Lab