Admittedly, I haven't much thought about whether data or information is the same before I started my training to become a data consultant. But during my first week I saw the following definition of data visualization:
A data visualization is a display of data designed to enable analysis, exploration, and discovery.
(Alberto Cairo)
And suddenly I asked myself: Would this definition still convey the same meaning if I switched out data for information? No, it would change its whole meaning! Why? Because data and information is not the same!
So what is data?
Data is anything that we can collect, e.g. through observation, tests, or logic. We can count the number of sales of a certain product. Or observe how many cars run a red light. We can calculate the percentage of a population who contracted COVID-19. Or we test a new medication and record the reduction in self-reported symptoms in patients.
These are facts, numbers, or observations. By themselves, they are not very useful. The context is missing. How do we know that these numbers are good or bad? How does this number relate to other numbers? How does the data fit into the larger context?
And what is then information?
Information takes data and processes it, puts it together, organizes it, and puts it into context. For example, does the new medication reduce more symptoms compared to an already existing drug? Do more people run a red light in New York compared to London? Are COVID-19 rates different across countries? This would be data put into context, which creates understanding and meaning: information!
So data is the foundation for information. Without data, we cannot have information. But data can exist by itself in a more unstructured way without context. When we put it into context we create information.
Here is another example:
Assume that you have a box of legos. The individual lego pieces are all mixed up. You have different pieces of data. Just like you can have a dataset with many different numbers.
Now let's assume that you were given an instructions for your lego set. You start to sort the different lego pieces that you need: the red ones in one pile, the green ones in a different pile etc. In your dataset, you would also have a separate date column, a price column, and a quantity column. Maybe you can spot some kind of vague pattern in both your lego progression and in your dataset. But we still don't see how it all fits together or what it all means.
Now, we are building together the legos according to the instructions. And now we can see that all the individual lego pieces actually formed a lego car. Our lego car is our information. When thinking about datasets, we may create a data visualization where we plot the profit of each date to see whether our profit increased or decreased over time. We added meaning and context to our data and received information!
So back to the definition from the beginning.
A data visualization is a display of data designed to enable analysis, exploration, and discovery.
(Alberto Cairo)
We want to display the data and design it in a way where the user analyze, explore, to retrieve meaning and understanding so that we can discover insights. If we had that information to begin with, we wouldn't need to analyze or explore our information anymore. So switching out information in the above definiton does not really make any sense! We want to explore the data to retrieve the information!
So, the next time you find yourself talking about data or information, think about whether you talk about data without any context or you talk about information that attaches meaning and context to data.
Pictures in order by: