Alteryx: Join Tool

Before breaking down the Join Tool within the context of Alteryx, it is important to recognize that the joins do not work in the same way as joins in SQL.

What is a Join?

The Join Tool in Alteryx. 
  • A tool that can be used to bring together two datasets to create one cohesive dataset.
  • To do a join there must be at least one common field.
  • The join tool is normally used to add information/ fill in information to existing records.
  • Unlike unions,  joins make your dataset wider.
  • It is important to note that unlike the union tool you cannot add as many datasets into the input anchor. You MUST ALWAYS have one in the left anchor and one in the right anchor. (See image above.)          

The two join types in Alteryx          

As you can see in the image above there are two ways to configure a join.

Join by Record Position

  • The output contains the rows that matched based on the position of the data within the 2 tables.
  • Use caution when using this method as any changes will affect the results of your joins.

Join by Specific Fields

  • This will produce results based on the specific fields you decide to join on.

Understanding the Outputs

There are 3 different outputs that come out of the Join Tool. For the examples below I used two tables one consisting of 500 records and the other consisting of 1,000 records. I will be joining them by specific fields and matching them on customer_id.

Left Output

  • Returns everything that doesn't match on the right will appear.
Here we can see that 85 records from the left table did not match with the right table and only 6 fields are being displayed.

Join Output

  • Returns everything that matches in both the left and right table will appear.
Here we can see that 966 records of matches from the left and right table and now we have 10 fields of data. Note: We have 10 because I unselected a repeated column, more on that below. 

Right Output

  • Everything that doesn't match on the left will appear.
Here we can see that there are 34 records from the right table that didn't match with the left table and only 5 fields are displayed. 

When you add together all the output results you will see that all 1,500 records have passed through in some capacity.

Additional Elements

Options in the Configuration pane.

Once you begin a join this will be the configuration pane you will see. The lower half is what I use the most. Here's why:

  • You can move fields up or down using the arrows to the left of  'Options'. In order to make that happen, you must have the field highlighted.
  • There is the option to uncheck field you do not want to have transfered over in your results. In this example I joined on customer_id which appears in both the left and right table. To unclutter my results I unchecked the duplicate input, since the information would still exist.
  • There is also the option to change the data type, but I highly suggest using other tools to configure this portion for easier handoff on projects. Also, by changing the Type you will also change the size allowed in that field.
  • You can also rename your columns should you want by typing in the new name next to the field under 'Rename'.
  • Lastly, you can add descriptions to each field here under 'Description'.
Author:
Tabitha Diaz
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2024 The Information Lab