Importance of Data Structures and Pivoting

Understanding types of data structures, why they are important, and when you may need to transform your data.

Data Structures

There are three main data structures:

Classic Spreadsheet

  • Wide structure
  • Title, totals column, years as headers
  • Measures as rows
  • Easy to read for User, Difficult to read for Computer

Nicely Formatted Spreadsheet

  • Tall structure
  • Measures as headers
  • Each row is a different year
  • Fairly easy to read for User, fairly easy to read for Computer

Database

  • Tall Structure
  • Headers explain values (measures and values)
  • A row per data point
  • Hard to read for User, easy to read for Computer

A taller data structure is better because it allows for aggregations or cross field calculations.

WHY PIVOT?

First off, what is pivoting?

Pivoting is transforming your data structure by reshaping tables.

It can be converted by transposing (transforming columns to rows) or cross tabing (transforming rows to columns).

Blogs about using the Transpose and Cross Tab tools on Alteryx will be posted next and linked accordingly.

Why do we pivot?

After profiling the data, sometimes it is not ready for analysis. There can be duplicated rows, unnecessary columns, or, as just explained, improperly formatted data structures. In these instances, pivoting may be the fix. The main use case is when dates are as headers in different columns.

Another reason is because computers like tall data, especially Tableau. I will give an example next.

💡
Remember: Pivots are not forever, so do not be afraid to use them!

Use-Case on Tableau

The two charts below are trying to show the same data, however, the difference is that in the first chart, the data source has dates in columns and in the second, the date fields are in rows.

The issue with this chart is that since the data has the dates in columns, an overtime analysis is not possible. A user is unable to change the column into a date data type since there are no list of dates in one column.

However, once the data is transposed, the dates that were in columns populate rows in just one date column. Once the data is properly structured, an overtime analysis is now achievable.

Now that we have an understanding of data structures and the basics of pivoting, we are ready to actually transform our data. Stay tuned for my next two blogs: one about the Transpose tool and the second about the Cross Tab tool on Alteryx.

Let me know if you have any questions at sabrina.vazquez@theinformationlab.com.

Author:
Sabrina Vazquez
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2024 The Information Lab