What is Data Prep and Why Do We Prep Data? Imagine walking into a bedroom that’s completely cluttered. Clothes are strewn all over the floor, shoes are scattered under the bed, and you can’t quite remember where you left that one favourite shirt. You know there’s a lot of good stuff in there, but finding exactly what you need? That’s a whole different story. Now, picture trying to answer a simple question in that mess: How many shirts do you actually have? You’d probably have to dig through piles, wasting time and energy, and still not be confident you’ve got the right number.
This is exactly what working with unprepared data feels like.
So, what’s data prep, and why do we need it?
Data prep is like tidying up that chaotic bedroom. It’s the process of cleaning, transforming, and reshaping data, making it ready for analysis. Just like you wouldn’t try to count your shirts before putting everything in its place, you wouldn’t want to analyse data before getting it organised.
Cleaning is that first crucial step. It’s like sorting through your clothes—separating socks from shirts, putting shoes together, and making sure everything’s in the right category. In data terms, it means ensuring that your data types are correct, your values are consistent, and you’re not dealing with duplicates or missing information. It’s about making sure that a date is recognised as a date, a number as a number, and a name as a name.
Next comes transformation—the equivalent of folding those clothes. It’s about getting things into a usable state, ironing out the wrinkles, so to speak. Maybe you need to combine data from different sources, adjust the format, or create new fields that better represent what you’re looking for. It’s all about making the data more functional and easier to work with.
Finally, reshaping is the step where you organise everything into drawers and wardrobes. This is where you structure the data so that it’s easy to access and analyse. Just like knowing which drawer holds your shirts and which shelf has your shoes, reshaped data lets you quickly find the information you need to answer those business questions without getting lost in the clutter.
Organising clothing might not be the most fun activity and the same can be said about data prep, but it is the most important step before making good decisions, looking good, and discovering amazing insights.