Connecting to Data: Extract or Live?

by Ellen Blackburn

Extract or Live?

Extracts

Tableau Data Extracts: (tde or hyper) These are snapshots of data optimised for aggregation. These snapshots are loaded into Tableau’s memory to be quickly recalled/queried for visualisation, accordingly, the database isn’t required to build your viz.

What is a TDE? A TDE is a columnar store. A rudimentary definition of a columnar store is a database that stores values together within a column rather than a row. By only reading in the relevant data (columns) necessary to answer the question, the input and output necessary to query and aggregate data is largely decreased. Therefore, extracts are often faster (but not always!).

Extracts often have several benefits:

  • Speeding up the workbook through optimisation – The extract is embedded within the workbook (see above). Thus, extracts are often much faster than live connections, especially when concerning complex visualizations e.g. large data sets or many filters. Therefore, extracts are great for the flow of visual analytics.
  • Offline – As the extract is embedded, your data source can be utilised when you’re offline or your network connection is poor.
  • Privacy – By hiding certain fields within your data source and then creating an extract, you can hide the fields that aren’t used, thereby ensuring data can only be viewed by the people intended.

And several downsides:

  • Snapshot – Given the nature of the extract, the data will remain the same, unless the extract is refreshed. This can be scheduled to take place (either fully or incrementally) via cloud applications such as Salesforce, Google Analytics and Google Sheets.
  • Size/Structure: Extracts can become slow to refresh and query depending on the data structure, e.g. many columns and many rows.

Example of use: If you want your visualizations to show weekly/monthly/annual trends. Extracts are also recommended for when you’re publishing a multi-connection data source.

The Guiding Principle: If workbook performance is more important than data freshness, then use an extract.

The Guiding principle: If your workbook contains sensitive data, an extract may be best.

 

Live

Live connections: Data source that contains direct connection to underlying data. relies on database for all queries and undergoes real-time updates.

  • Real-time updates – As your viz is directly connected to the underlying data, this ensures data freshness.
  • Databases are not always optimized for fast performance (unlike extracts) – As data queries go through the database, they can only as fast as the database itself. Accordingly, working with a live connection may be slow.
  • Other factors can affect speed – e.g. Poor network speed and network traffic can slow down your workbook.
  • Stress – Live connections, especially within complex workbooks, can stress some traditional databases.

Example of use: A business that needs incoming sales data to make real-time decisions would require a live database connection.

The Guiding principle: If business decisions need to be predicated upon real-time data, utilise a live connection.

See these two blogs by Gordon Rose and Diego Medrano for more information.