Today we'll be going through Alteryx Challenge 9 together, which you can find here.
The challenge data contains people's names and a Count field. The objective is to filter the data so that only the people whose counts are among the highest 5 remain. The catch is that multiple people can have the same value for Count, meaning that you can't just use a Sample tool to keep the first five rows after sorting. Moreover, the ranking is not done in the traditional way, in which scores of 15, 15, and 10 result in ranks of 1, 1, and 3 respectively. Instead, dense ranking is used, meaning that ties are recognized with the same rank but no ranking number is skipped. So scores of 15, 15, and 10 would result in ranks of 1, 1, and 2. If that's still not clear, it may help to take a look at the input and the desired output. Take a look at the input data below:
And here's the desired output:
Let's get started!
Step 1: Sorting
In order to be able to rank the Count field, we'll first need to sort the data in descending order by Count so that the highest values for Count are at the top.
Step 2: Creating the Rank Field
You may have noticed that the desired output contains a field called Rank that didn't exist in the original data. The presence of this field is not just an additional piece of the challenge; it's also a hint for how to proceed. We'll use this Rank field to filter out records with values greater than 5.
Sometimes a Record ID tool will suffice for creating ranks after sorting in the desired order. Unfortunately, that won't work here because there are instances of multiple people having the same value for Count. Instead, we'll have to check whether or not the Count value for one row is equal to the Count value of the preceding row. If it is, we'll repeat the same Rank. If it isn't, we'll need to add 1 to the Rank of the preceding row.
In order to compare different rows to each other, a Multi-Row Formula tool is required. We'll use the logic I outlined above to create a new field called Rank. I'll configure the tool as follows:
The most important part here is the Expression window in the bottom left of the screen. This adds either 1 or 0 to the Rank of the previous row. It adds 0 if the Count of the row equals the Count of the previous row. Otherwise it adds 1. Note that for Values for Rows that don't Exist, 0 or Empty is selected, meaning that the row prior to the first row will have a value of 0 (since this field is numeric). That's how the Rank for the first row is calculated, which then allows the values for the rest of the rows to be determined. As you can see from the results above, the Rank field is being populated with the same values seen in the desired output. Now all we have left to do is filter out the unneeded rows.
Step 3: Filtering
The task was to get only the people with values for Rank in the top 5. Now that we've created the Rank field, we can use that to filter out the rows with a Rank greater than 5. For this, a Filter tool is best.
The Results window at the bottom of the screen now contains the exact same data from the desired output so we're all done! I hope this post was helpful and if you're looking for more Alteryx challenges, make sure to check out this post on the Alteryx community site that gives information on the topics covered and difficulty level of Alteryx challenges 1-300.