Database vs Data Warehouse vs Data Lake

In today’s data pipeline, data often travels through different repositories as it is processed for various purposes. This blog explores the primary types of data repositories: databases, data warehouses, and data lakes.

1. What is a Database?

A database is a structured collection of data used for storing, managing, and retrieving information efficiently. Databases are optimized for everyday transactional operations.

Databases are usually defined by two features:

  • Databases are often normalized. Normalized data reduces redundancy by organizing data into multiple tables.
  • Databases typically support Online Transaction Processing (OLTP) systems. OLTP systems prioritize write operations and handling small, real-time transactions.

Customer ID

Name

1

Bob

2

John

Order ID

Sales ($)

Customer ID

1

100

1

2

200

2


In this example, the database has separate tables for customers and orders. With data split into multiple tables this way, it makes it easy to insert, update, and delete information.

We should note that an OLTP system isn’t always normalized, although it often is. OLTP refers to the purpose of the database, while normalization refers to its structure.

Examples:
- MySQL
- PostgreSQL
- Microsoft SQL Server
- Oracle Database

2. What is a Data Warehouse?

A data warehouse is a centralized system designed for reporting and data analysis. It stores historical data from various sources to support business decisions.

Data warehouses are usually defined by two features:

  • Data warehouses are often denormalized. Denormalized data reduces redundancy by organizing data into multiple tables.
  • Data warehouses typically support Online Transaction Analytical Processing (OLAP) systems. OLTP systems prioritize read operations and enabling complex queries.


Examples:
- Amazon Redshift
- Snowflake
- Google BigQuery
- Microsoft Azure Synapse

3. What is a Data Lake?

A data lake is a large repository that stores raw, unprocessed data in various formats including structured, semi-structured, and unstructured.

Because of the flexibility of data lakes, they are often used as a pre-processing staging layer in modern data pipelines.

Examples:
- Amazon S3
- Azure Data Lake Storage
- Google Cloud Storage
- Hadoop Distributed File System (HDFS)

Summary

Feature

Database

Data Warehouse

Data Lake

Purpose

Day-to-day transactions (OLTP)

Business analysis & reporting (OLAP)

Store raw data for analytics & ML

Data Type

Structured

Structured & processed

Structured, semi-structured, unstructured

Data State

Current, real-time

Historical, cleaned

Raw and unprocessed

Users

Developers, operations teams

Analysts, BI teams

Data scientists, engineers

Examples

MySQL, PostgreSQL

Snowflake, Redshift

S3, Azure Data Lake

Author:
Charles Yi
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2025 The Information Lab