I’ve had two main questions on this topic throughout training:
- What’s the difference between a LIVE data source and an EXTRACT in Tableau?
- What does the refresh option for each of these actually do?
My understanding of the first question actually came from understanding the answer to the second, therefore this blog will actually focus solely on the second question. For a different perspective on the first question I recommend this blog by Ellen Blackburn.
LIVE DATA SOURCE
By default, when you connect to data in Tableau, whether it’s a file stored locally on your computer or in a cloud database, a live data connection is created, This means that every change you make in Tableau Desktop will cause a new query to be sent to the data source.
Refreshing a Live Data Source
With a live data source your visualizations will be updated every time you open your workbook or if you manually refresh the data source
When you swap your Data Connection from ‘Live’ to ‘Extract, a static snapshot of the data is taken. The extract is embedded in the workbook and becomes available offline. This means any queries sent to the data source can happen much faster.
Refreshing an Extract
Once you have created an extract there are now two places where you can Refresh:
1. The first option is the same as for the Live Data Connection. This will not update the data in your workbook if any changes are made to the original data 2. The second option refreshes the extract, specifically. This will create a new extract (aka: a new snapshot of the original data) and therefore it will update the data in your workbook.
Why does this happen?
Option 1 only refreshes the connection between the extract and the workbook. Since the extract is static, it won’t have changed and so the refresh does nothing.
Option 2 will go back to the original data source and create a new snapshot of the data. If the original data was changed since the last time you created an extract then the new extract will contain these changes.
Everything up until this point covers my understanding throughout most of training at the Data School. Everything that follows is what I learned today in a Tableau Server refresher session.
You can automate the process of refreshing your data extract by publishing your data source to Tableau Server and scheduling when and how often you want your extract to refresh.
When you publish the extract to the server, it will no longer be embedded in your workflow but there are many potential benefits for doing this, some of which are described here.
The interaction between your workbook and the data source now looks something like this:
Your workbook will only query and interact with the extract (which is now in the server), but an updated extracted can be regularly created from the original data source. This allows you to streamline the performance of your dashboards and worksheets without losing the ability to display the most up-to-date data.
There are a lot of helpful blogs and resources out there to help you dive deeper into this topic but I hope this blog has provided you with a good visual representation of what’s going on behind the scenes.
If you have any questions, feel free to reach out to me on Twitter @_hughej