What to do with unclean data?

Unclean data is like a jigsaw puzzle missing important pieces. It confuses your overall picture, making it hard to see the complete and accurate image. This is why cleaning your data before analysis is crucial.

Here are some key things to watch out for:

Field Names / Column Headers:
Are they clear and meaningful? For instance, a field just named "total" can be vague. Make them more specific and concise; for example, "Total_Sales."

Splitting Columns
Sometimes, data in one column needs to be separated. For instance, addresses like "10 Downing Street, City of Westminster, SW1A 2AA" could be more useful if you split them into town, city, and postcode.

Assessing Inconsistencies
Missing and null values are common. Understand if data is missing intentionally or not. For example; sales data that is missing for the current month may not, in fact, be missing, but could be a discontinued product. If you find gaps, consult with subject matter experts for clarification.

Misspellings
Data isn't always entered consistently, especially when done manually. Variations like "Downing Street," "Dawning Street," "Downing St," and so on can create confusion.

Standardized Formats
Different regions may write dates differently (e.g., 10/01/2023 vs. 01/10/2023). Be aware of these variations, especially when dealing with data from various sources. Additional considerations include inconsistent decimal points, variations between imperial and metric values, and different scales of measurement (e.g. kilograms vs. grams). These common errors can impact your analysis if not addressed in the cleaning step.

In conclusion, cleaning your data is a vital step in the data analysis process. It ensures that your data is accurate, consistent, and ready for meaningful insights. By addressing issues like unclear field names, data inconsistencies, and misspellings, you lay the foundation for more reliable and accurate analyses.

Author:
Dan Wade
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2025 The Information Lab