In the first week at The Data School, I learned about the Modern Analytics Architecture. But before digging into the architect of modern analytics today. Let's figure out the meaning of Analytics Architecture.
Analytics Architecture is the set of tools and technologies which enable companies and organizations to collect, store, process, prepare, analyze, and visualize data. The company or organization uses that framework to drive their business or support to make a data-driven decision.
There are 2 types of Analytics Architecture: Traditional and Modern. Traditional Analytics follows waterfall development while Modern Analytics follow agile development. So what is the difference between those 2 types? and Which type is more effective?
In waterfall development, every stages are in order. If one stage is fully completed, then move to the next stage. In Agile development, it's more convenient when each stage can work independently. To explain this, I draw a diagram to compare between waterfall development and agile development on excalidraw.
Consider that I have 3 stages: A, B and C. I assume that I need:
- 5 hours to complete stage A
- 7 hours to complete stage B
- 3 hours to complete stage C.
In Waterfall Development, I have to start with stage A and wait 5 hours to complete it. After stage A is fully completed, I will start with stage B. When stage B is fully completed in 7 more hours, I start with stage C. Total hours for the whole process is 15 hours.
In Agile Development, I could start working on stages A, B, and C from the beginning. I don't need to wait for stage A to be fully completed to start stage B or stage C. While working on stage A, I could start working on stage B or C in parallel. Total hours would be shorter than waterfall development.
Modern Analytics saves much time to complete the process. Although waterfall development takes longer time than agile development, every stage in waterfall development is secured that fully completed.
All the stages in modern analytics are composed in the Modern Analytics pipeline including:
- Raw Data: the very stage where data is collected from multiple sources.
- Ingestion: move data to storage space. It could be on the cloud or on-premise.
- Central Storage: data is stored at a place where people can extract it. Data could be stored in the company database/ cloud, AWS, Google Drive, ...
- Prepared Data: at this stage, the data analyst will use the tool to clean, filter, do aggregation and transpose, ... Some tools can use to prepare data: Python language, mySQL, Alteryx, Tableau Prep, ...
- Trusted Data: make sure the data is correct and it could be stored back at the central storage.
- Visualized Data: Data will be displayed in maps, charts, lines, ... Some tools to visualize data: Tableau Public, Tableau Desktop, Power BI, Excel, matplotlib library in Python,...
After learning modern analytics architecture, I understand the process of data through each stage in the pipeline, the pros and cons of traditional and modern analytics.