How Do Relationships Differ From Joins?

Tableau has taken over week 4 of training and one of the things that we have covered was Relationships and Joins (and Blends) in Tableau. At the beginning, it was quite difficult to figure out what are the differences between them, but as the day went along, we were able to consolidate the knowledge of the differences between the two. Let's get into it!

To begin, let's introduce the Tableau Data Model. The Data Model has two different layers: the logical layer and the physical layer.

The logical layer of the data source is the first view that you see when you open a Data Source page canvas. This is where the data can be combined using Relationships.

The physical layer is the next layer after the logical one. This is where the data can be combined using Joins and Unions and each logical table would have at least one physical table in this particular layer.

Essentially, the easiest way to remember this is that the logical layer is the Relationship canvas in the Data Source pane and the physical layer is the Join or Union canvas in the Data Source pane.

In the exercise, we have used a CSV file with sales data and target data. This is how the logical and physical layers look in Tableau:

Logical Layer
Physical Layer (Inside the Logical Layer)

In a way, the physical layer resides within the logical layer. This is because when you drag a data source onto the canvas (the logical layer) and double click on it, it opens up the canvas where the data sets can be joined with each other (the physical layer).

Now, let's look at the Relationships and Joins and the differences between them.

For Relationships, the data tables that get dragged onto the logical layer remain distinct and not merged in the data, connected by flexible noodles. On the other hand, as Joins operate in the physical layer, the data tables in that layer get merged into one table that then is displayed in the logical layer and the connection between two data tables in the physical layer are shown with Venn diagrams.

Another difference between the two is that for Relationships, you are required to choose the matching fields between the tables but you do not have to select the join types. Whereas for Joins, you are required to choose the join type and also their clauses.

Also, for Relationships, the unmatched measure values get kept (when in the performance options, we select that Some Records Match) and they do not duplicate values (when in the performance options, we select many - to - many option). However, for Joins, they may drop the unmatched values unless you're creating a full outer join which includes the unmatched values. Additionally, they might duplicate values that are aggregated when different fields are at different levels of detail.

At the end of the day, the advantages of using Relationships is that they make your data source more easier to define, to change and then to reuse the way you want to. They make the analysis more easier and more intuitive. They do not require level of detail calculations for the analysis that you may want to do at different levels of data whereas Joins require that.

While the limitations of Relationships may be that if the data tables would have a lot of unmatched data, it would make it more difficult to do some analysis of the data.

Hope it makes more sense now!

Author:
Gerda Staurylaite
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2024 The Information Lab