Understanding the Tools: Blend Your Data Using the Union Tool in Alteryx

The Union tool takes two or more datasets with similar structures and stacks the data one on top of the other.  Notice how the Union tool has a grey anchor on the left with two angle brackets (>>). This means that there can be more than two datasets connecting to the input anchor.

Auto configure by name

By default, Alteryx Designer will Auto configure by name. This means that Alteryx will stack the data based on the column name, regardless of the order in which the columns are.

However, this also means that if the columns are named differently, regardless of if they contain similar data fields, the fields will not match up. For example, if I labeled the column in the bottom dataset as location instead of city, let’s see how that changes:

Notice how instead of having one column for the cities, it’s now one column for City and another for Location and where it doesn’t match, there are NULL values.

Auto configure by position:

Now, let’s look at auto configure by column order instead. This means regardless of what the column name is, the columns will stack on top of each other. For this example, I will keep the second column as Location.

Now click the drop down in the Configuration window and change it to Auto Config by Position and hit Run:

Notice how Location doesn’t make its own column anymore. Instead the second column name is derived from the name of the second column in the first dataset. Since the first dataset has two columns Year and City, these are now the new columns we have.

Manually configure columns:

Now let’s say we want to rearrange the columns the way we want it or the data can’t be stacked based on name or position, then there’s a third option to manually configure the way we want the data to stack:

By default, Alteryx Designer will have the data stacked based on Name. Clicking on the Reset button in the Configuration window will allow you change the way it’s stacked:

In order to move the fields left and right, select a cell. Then, click on the arrows in the Configuration window next to the Reset drop down. Just for funsies, I decided to stack City on top of Year and Year on top of location and hit Run:

Output Order:

At the bottom of the Configuration window, there’s a section that’s labeled “Output Order”. Checking the box allows you to set a specific output order and specify which input data’s dataset displays first in the output dataset.

So let’s say, I want dataset 2 to be shown first, I’ll check the box, select either #1 or #2 and using the arrows, I can change the order of the dataset (don’t forget to hit Run!):

And that’s it for today’s quick lesson on the Union Tool! Thanks for reading and until next time, let’s analyze away!

Author:
Jessica Kwan
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2024 The Information Lab