During my first day of training at the data school it became information overload, with so many technical terms and new vocabulary that I needed to learn. I thought this would be the perfect opportunity to write my first blog post. About terminology I will hear often throughout my time here.
The first one I know will come up a lot is Data
Data is facts or numbers collected from observations for the purpose of understanding a particular subject better.
This is what is going to make up the most of my time here at the data school.
Other key terminology that I will come across are the way that this data is stored. For example a flat file.
A flat file is any way that data is stored with rows and columns, examples include Excel and CSV
The next most common type seems to be databases and database warehouses: these are places where there is structured data, with the most common type being SQL
Another way that data can be stored in on a Cube. This is where lots of flat files are stored on top of one another in a cube structure, to allow you to access a specific part of the data.
The final way I was taught how data can be stored is in a data lake/Data lake house. A Data Lake is a place where all data is dumped and there is no structure to the data and as a consultant you will have to extract the data that you need from the data lake. A data lake house will be a data lake attached to a database to allow for some structure.
What can be stored on these databases?
Data can be stored in many different forms, strings are the type of data that takes the longest to process and contains both alphabet and also numerical values, these should be avoided if possible. Boolean is any data stored as either true or false, and is the fastest to process.
Tables can be split into 2 types, Fact and Dimension, a fact table is a table that holds the records for the data, whereas a dimension table holds the details about the categorical fields. And these tables are linked by schema