Understanding types of data structures, why they are important, and when you may need to transform your data.
Data Structures
There are three main data structures:
Classic Spreadsheet
- Wide structure
- Title, totals column, years as headers
- Measures as rows
- Easy to read for User, Difficult to read for Computer
Nicely Formatted Spreadsheet
- Tall structure
- Measures as headers
- Each row is a different year
- Fairly easy to read for User, fairly easy to read for Computer
Database
- Tall Structure
- Headers explain values (measures and values)
- A row per data point
- Hard to read for User, easy to read for Computer
A taller data structure is better because it allows for aggregations or cross field calculations.
WHY PIVOT?
First off, what is pivoting?
Pivoting is transforming your data structure by reshaping tables.
It can be converted by transposing (transforming columns to rows) or cross tabing (transforming rows to columns).
Blogs about using the Transpose and Cross Tab tools on Alteryx will be posted next and linked accordingly.
Why do we pivot?
After profiling the data, sometimes it is not ready for analysis. There can be duplicated rows, unnecessary columns, or, as just explained, improperly formatted data structures. In these instances, pivoting may be the fix. The main use case is when dates are as headers in different columns.
Another reason is because computers like tall data, especially Tableau. I will give an example next.
Use-Case on Tableau
The two charts below are trying to show the same data, however, the difference is that in the first chart, the data source has dates in columns and in the second, the date fields are in rows.
The issue with this chart is that since the data has the dates in columns, an overtime analysis is not possible. A user is unable to change the column into a date data type since there are no list of dates in one column.
However, once the data is transposed, the dates that were in columns populate rows in just one date column. Once the data is properly structured, an overtime analysis is now achievable.
Now that we have an understanding of data structures and the basics of pivoting, we are ready to actually transform our data. Stay tuned for my next two blogs: one about the Transpose tool and the second about the Cross Tab tool on Alteryx.
Let me know if you have any questions at sabrina.vazquez@theinformationlab.com.