Mode Functions Have Limitations - My First Alteryx Community Challenge

After being tasked with attempting a weekly challenge set on https://community.alteryx.com/t5/Weekly-Challenge/Weekly-Challenge-Index-amp-Welcome/td-p/48275, I chose one with a dataset that I was curious about, and chose challenge #189 which looked at the "Billboard top 100" from 1965-2015.

The aim: To find which artists had the most top 100 hits in each decade from the Billboard top 100 ranking.

Data Prep: By using the select tool, I was able to filter out unnecessary fields, for example: a field with all the lyrics of each song - something not useful for this challenge. A unique tool was carried out to check any duplicates in the dataset, as well as a summarize tool which checked that the ranking system was correct - some basic data quality.

Changing the Years into Decades: In order to determine the most popular artists by decade, I first had to change the field from showing individual years, to the decade it was from. To do this I used a simple formula which involved a substring - something I had not learnt but rather stumbled across online, which seemed to be a greatly simplified way of obtaining the decade (as opposed to the 8 line if statement I had attempted before).

It uses the text in the "Year" field, and starts at the 2nd character, before taking the next '1' characters, which then with a simple concatenation changes "1960" to "60's".

the format: Substring(String, start, length)

Finding the Best Artist: The next challenge was to find the artist that appeared the most over each decade, and initially a quick summarize mode function was used:

However, this was not accurate, as a result of the intricacies of the way that Alteryx works, and it only yielded one mode for each decade (even when there were 2 for some decades).

In order to fix this, a different approach was needed - which turned out to be more lengthy and confusing, but I finally arrived at the correct outcome. The approach involved using a summarize tool to first make a ranking of all the artists, then a sort tool to rank the data in a more useful order.

The next step involved using a second summarize tool which found the highest rank for each decade, which was then matched against the original flow with a join tool, (matching the highest rank to the artists) ultimately creating the list that was required.

Finally a concatenate function within the summarize tool allowed the results to be shown on the same row:

Summary: The challenge was fun and allowed me to use some tools that I had not come across, as well as to understand some of the limitations of Alteryx. It was also very beneficial to use Alteryx in the context of solving a problem - something that will be very useful to me in the coming weeks and months.

Author:
Stephen Moyse
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2025 The Information Lab