Our second day in training in DS 39 was spent learning about different business approaches in extracting, storing, preparing and reporting data. The six stages that we looked at were; raw data, ingestion, central storage, prepared data, trusted data and visualized data. It was a lot of information to take in but to better understand the concepts we had to pick a business or department and present how we think the analytics pipeline works. I decided to pick Tesco and did some research regarding their analytics process and turns out they use Tableau! So in this blog post I will explain each analytics stage using Tesco as an example. Not every stage had a lot of information regarding the business' process so some guess work was involved.
Raw data: unfiltered collected data
Tesco collects data on a lot of different things but for simplicity I'm going to focus on sales. The number of sales is stored in a Teradata data warehouse with 100 terabyte capacity due to large amount of data that needs storing.
Ingestion: moving data to a storage space
Not much information was found on which ingestion method Tesco uses but I'm guessing it is API's, which is the connection between client and server so in this case, Tesco Clubcard app is the client and the database which stores all the Clubcard discounts is the server. Think of them as a process rather than a software.
Storage: central storage refers to where all data is stored including trusted and prepared data, trusted data is data that is known to be right and prepared data has been cleaned but has not been reviewed (also stored in Teradata database).
Preparing data: cleaning, transforming and formatting data
This is done using Alteryx with an Extract, Transform and Load approach (or ETL). This approach is beneficial as it cuts down storage costs however, there is a risk of security and errors.
Visualizing data: displaying data in a understandable format to form conclusions
Tesco has been using Tableau to visualize their data since 2018, which makes sense why they use Alteryx for the preparation.
Using a business as an example really helped me understand each step clearly and how different businesses may use different approaches. This information will be useful in placements, especially understanding the pros and cons of different software that can be used in each step. For example, Excel can do a lot of these steps but may be time consuming whereas, another software can do the same task at a much faster rate.