How do we get from data in storage to using it?
In the first week of the Data school, we were introduced to a concept called ETL, a process that explains how we get a dataset to a point where we can analyse it.
There are 3 main steps to this process :
Extract - Pull data from a source - typically a data source / flat file
Transform - Reshape/ aggregate / clean your data to a form that is usable for analytics.
Load - Upload/ load the cleaned dataset in an accessible place. So that analysts or those who will need to use it can!
The order of these steps isn't fixed, so you could end up extracting, loading and then transforming. But what is generally done in each step tends to stay the same!
Below is a comic doodle that helps me remember what each step of this process requires! Hope you find this useful as well!

