In the data world, a join is when two or more separate tables are combined based on matching identifiers. This can be useful when you have data across multiple tables; that you want to visualize together. Today, we are going to use Tableau Prep Builder to combine 2 tables and explore the different ways they can be joined. Tableau Prep is a great way to clean and join data, but this can be done in other tools like Tableau Desktop and Excel. Although, Prep is my preferred way to demonstrate different joins and their outcomes.
Let’s take a look at the sample data we are going to use.
Table 1 is a simples 2 attribute set of data. The fields of this table are: “Name” and “Age.” There are 7 rows and 2 columns.
Table 2 has attributes: “Name,” “Gender,” “Location,” and “Occupation.” There are 5 rows and 4 columns.
The goal is to take both tables and combine them into one complete dataset for analysis.
Now we can take a closer look at the data. Both tables have one common attribute, “Name.” This is going to be the identifier to match the rows of both tables. When we load the data into Tableau Prep, the tables are assigned a color as a visual indicator to represent the individual tables (blue and yellow) and the join (green).
The way I like to visualize the tables is that the first table is the primary data also known as the left table. Therefore, table 2 is the secondary dataset. When you join two tables together you attach the secondary table to the primary on the right, expanding the set horizontally. Hopefully this sketch enforces the mental picture of what is going on.
In Prep, we are given options on how we want to handle the join. First off, we will tell Prep the attributes that connect. In this instance, both tables are labeled, “Name.” However, the names of the columns do not need to be the same, just the same data.
The Venn diagram visual indicator for the join process will look different depending on what type of join we are performing.
Types of joining
Inner Join
An inner join is particularly useful when you want to see the values that were matched in both tables. Beware that some values from your primary dataset (table 1) can be lost in this type of join. When the “Name” value is not found in BOTH tables, the value is dropped.
In the left-hand window, “Join Clauses,” the 2 tables are listed with their “Name” values printed out. The names that are highlighted in red are values that cannot be matched. In the instance of an inner join, only the black colored values will be included. The “Join Results” pane shows visually what data came from which table (yellow or blue). Below this window, the inner join table results are displayed, leaving 3 rows and 6 columns.
Left Join
A left join is a great way to preserve your primary dataset (Table 1). Left join returns all the values from table 1 with corresponding values from table 2. However, when the value in table 1 does not have a matching value in table 2, null values are filled in to complete the joined table.
Here we have retained the people from Table 1 and where we see null values, there was no match to fill in the data. There are 7 rows and 6 columns.
Right Join
A right join is very similar to a left join, but flipped.
Here we retain the values from the secondary data source (Table 2) and are left with 5 rows and 6 columns.
Full Outer
A full outer join includes ALL values from BOTH tables. Even when the value does not have a match from the other table, it will appear in the join with null values.
Though there are many null values, we did not drop any information while creating the join. We are left with a dataset with 9 rows and 6 columns.
More than 1 join clause
What happens when we have a directory of names and need to look up their ages? What happens when 2 people have the same first name. These values are not unique so we will have to have more than 1 clause.
Table A is just a directory of first and last names.
Table B is a dataset with first names, last names, and ages.
Table A has two different people with the same last name and Table B has people with the same first name. When we join Table B to the directory, we will need to connect both first and last names together. In order to do this, we will have to have 2 clauses. I clicked the plus button at the top right corner of the "Join Clause" section.
We will create a unique identifier out of BOTH first and last names. That way we will not get Sam Waterson’s age attributed to Sam Thompson. We will only receive Same Thompson’s age for Sam Thompson specifically.
After a left join the directory is completed. All the names provided in Table A that matched a value in Table B returned the age that matched.
This should get you started with joining datasets in Tableau Prep and I hope you feel free to explore the different types of joining and figuring out when it is best to use them.