Connecting Data: Joins

by Matthew Armstrong

Joining describes the process of connecting 2 tables together from the same data-source, generating a new table with ALL fields from both tables, and a selection of rows determined by the join condition. In Tableau, it looks like this:

Where we have:

  • Our Left Table (Yellow)
  • Our Right Table (Blue)
  • Our join condition (Black)

We can modify the join by choosing which type of join we want (by clicking on the Venn diagrams), which fields we want to match (by clicking the fields and choosing from the drop-down), and the condition on which we want to match the fields (by clicking on the operator symbol).
So for the example above, we have an inner join (more on what that means later), matching Order ID on Order ID, and we want the Order ID’s in each table to equal each other.

Our join type and conditions specify which of the rows we keep from each table.

To look into the logic behind joins and the different join types a bit more, we’ll look at a simplified example in Excel:

Let us suppose we want the join condition to match records where the Order ID’s are equal to each other. We have 4 types of join we can create in Tableau:

  • Inner- we keep only the rows that satisfy the join condition, and populate the new table with the corresponding fields:

Here, we return 2 rows only (where both Order ID’s equal 1 and 3), and populate the Profit and Sales columns with the values in those rows in the original tables.

Left-inner- we keep the rows that satisfy the join condition AND ALL the rows in the left table:

Here we get the same table as the previous example, with 2 additional rows (from the left table), which are populated with the Profit values from the left table, and null values from the right table (since there were no Sales field (and values) for Order ID’s 5 and 7.

  • Right-inner- we keep the we keep the rows that satisfy the join condition AND ALL the rows in the right table:

Similar to left-inner, we’ve got the first generated table (green), and the rows from the right table populated accordingly.

  • Full-outer – we keep all the rows that satisfy the join condition AND ALL the rows in the right table, AND ALL the rows in the left table:

Here we’ve got all the rows from both tables, populated with the values from the corresponding fields in each table.

***BEWARE DUPLICATION***

Let us suppose that instead of Order ID, we had Customer ID and that one of our customers made more than one order:

The inner join will retrieve every combination of matching Customer ID’s equal to 3, which duplicates the values in our final table. This can lead to errors in calculations if we were to then SUM either Profit or Sales in Tableau afterwards.

Hopefully his gives you a better understanding of what joining means, and how to join data in Tableau!

Avatar

Matthew Armstrong