Dividing the Data

Whilst it might appear initially convenient to store as much information as possible for each record within a single table, there are several compelling reasons to NOT use this approach. Some of these reasons include:

1) Storage - When dealing with small datasets, storage concerns are typically negligible. However, as a dataset grows in size, so do the storage and resource demands. Storing all the information within a single file can lead to inefficiencies and increased storage requirements.

2) Performance - Executing processes, such as aggregation or querying, can become significantly slower and more resource-intensive as tables contain an increasing number of columns and rows.

So... How can we solve this problem?

Splitting flat datasets into smaller, more reusable datasets, called fact and dimension tables, can help to address these problems.

  • Fact tables - Fact tables hold the main records of the dataset.
Example of fact table


  • Dimension tables - Dimension tables hold details regarding the categorical fields in the dataset.
Example of dimension table

Whilst a Star Schema for a database incorporates these ideas by featuring one fact table and multiple dimension tables, a Snowflake Schema takes this concept one step further by introducing dimension tables that reference other dimension tables.

These dimension and fact tables are connected through the use of foreign keys. Whilst a primary key in a table serves as a unique identifier for that table, a foreign key references the unique identifier (primary key) of another table. This facilitates the linkage of fact tables to multiple-dimension tables, effectively dividing and organising the dataset.

Ultimately, there are numerous challenges associated with using a single large table, many of which can be mitigated by splitting up the data into fact and dimension tables - a concept I found particularly intriguing when introduced to it during one of my first lessons at the Data School!

Author:
Thomas Smith
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2025 The Information Lab