“This won’t come up on the exam much, right?” Famous last words.
Although I did manage to pass my Alteryx certification exam, I had some trouble with the questions involving the Unique tool. Admittedly, I overlooked it a little during my preparation, which led to some misunderstanding on how it worked when I was under time pressure in the exam. Naturally, now is the time to revisit the tool and gain a better understanding of how it works.
So, what does the Unique tool do? My misunderstanding during the exam (and to be fair it was early in the morning!) was that it completely removes any rows that are not unique – that is, all duplicate rows are left in the “D” tab for the user to review. This is not the case – instead, if the tool finds that a record has duplicates, it will leave one of the rows in the “U” tab with the rest of the data, and cut out the rest of the duplicates, leaving them in the “D” tab. Essentially, it would be more fitting to see it as a tool that transforms the data so it is all unique, rather than filtering it.
However, depending on how you specify the fields in the configuration window, the duplicate records might still have some differences between them.
For example, the above workflow deals with data on retail customers and their reward customers. To ensure that no customers have multiple records, the Unique tool is configured so that records with the same combination of First Name, Last Name and State are considered duplicates. This is so that customers that happen to have the same names don’t get cast as duplicates.
These are some of the duplicate records found within the data. If we filter the uncleaned data to take a closer look at a couple of the customers, we see that the records themselves are not 100% duplicates according to every field. It seems that these customers have been entered twice because they switched to a different reward card.
The Unique tool decided which record to keep by simply choosing the first one in the data order – something to consider if you want to include the type of reward card in your analysis. If you only want to filter out rows that are entirely exact duplicates, you can additionally select “Previous Year Tier” in the tool configuration, or simply select by the reward card number. However, if you want to focus on the customers in your analysis (for example, looking at count of customers from a state), you will want one record per customer, and the tier will not matter.