A Beginner's Guide to Unions and Joins in Tableau Prep

Having used Tableau Prep for the past couple of weeks on client projects, I wanted to share how easy it is to combine multiple data sources using the Join and Union functions - particularly the ease of configuring these tools before actually applying these or outputting new files.

For this demonstration, I'm going to be using two data sources which are both Excel Spreadsheets (many other file types can be used these were just how I downloaded these). One shows the GDP Growth of various countries whilst the other shows CO2 Emissions of the same countries. See below for how this looks in Tableau Prep before using any tools.

Union

When performing a union, the first thing I would do is to apply a clean step to each sheet so that I am able to see how many fields I should be expecting from the union. From the below screenshot, we can see that one of the sheets has around 3,000 rows. The other sheet also has around 3,000 rows so if the union is working as planned this should leave us with around 6,000. This seems simple however this kind of QA can be very useful when dealing with larger datasets.

After this, there are a couple of ways that a Union can be performed. My favourite is shown in the screenshot below when you can drag one Clean Step towards another and drop it onto the 'Union' section that pops up. The alternative way is to press the + symbol in front of the Clean Step, select Union and then drag the other data up to the 'Add' option which will appear to the left of the Union and then connect these.

After this, you can see how the Union tool is working with the data it has been given. In this instance, the datasets are identical in shape so the Union is easy enough. If there were any columns with different headers you can manually assign these to union despite the different header names. Another handy function of the Union tool in Tableau Prep is that it automatically adds a 'Table Names' field to the new data so there's no confusion over the origin of the output data. The Clean Step after this once more allows us to have an overview of the output which looks good - success!

Joins

The Join tool in this scenario has a few more functions to review. Firstly to get the Join started is essentially the same method as shown in my example for the Union, however just drag and drop your new data onto the 'Join' area instead.

In this example, I'd like to join this data on the 'Country Code' and 'Year' fields to avoid duplicating my data too much. This second join clause can be added by pressing the + symbol next to the 'Applied Join Clauses' section.

One area of the Join tool to draw your attention to is the 'Join Type' diagram - a really useful tool for those trying to get an understanding of the various different types of join. As seen below, this is set to an inner join so that all of the values in common will be joined. However, this can easily be changed to a right inner, right outer, left inner, left outer or full join by simply clicking on different areas of the diagram. Again, it's worth reiterating that no changes will actually be made to the output at this stage, so it's a really useful way of learning how different joins affect the output.

After applying a final Clean Step we can see how the data has joined and I could then make any final changes to this data (changing column headers, data types etc) before outputting my file to visualise.

Hopefully this has helped for anyone starting out on Tableau Prep and many different data sources can be combined for analysis moving forward!

Author:
Sam Search
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2024 The Information Lab