After being tasked with attempting a weekly challenge set on https://community.alteryx.com/t5/Weekly-Challenge/Weekly-Challenge-Index-amp-Welcome/td-p/48275, I chose one with a dataset that I was curious about, and chose challenge #189 which looked at the "Billboard top 100" from 1965-2015.
The aim: To find which artists had the most top 100 hits in each decade from the Billboard top 100 ranking.
Data Prep: By using the select tool, I was able to filter out unnecessary fields, for example: a field with all the lyrics of each song - something not useful for this challenge. A unique tool was carried out to check any duplicates in the dataset, as well as a summarize tool which checked that the ranking system was correct - some basic data quality.
![](https://www.thedataschool.co.uk/content/images/2022/10/image-73.png)
Changing the Years into Decades: In order to determine the most popular artists by decade, I first had to change the field from showing individual years, to the decade it was from. To do this I used a simple formula which involved a substring - something I had not learnt but rather stumbled across online, which seemed to be a greatly simplified way of obtaining the decade (as opposed to the 8 line if statement I had attempted before).
It uses the text in the "Year" field, and starts at the 2nd character, before taking the next '1' characters, which then with a simple concatenation changes "1960" to "60's".
![](https://www.thedataschool.co.uk/content/images/2022/10/image-74.png)
the format: Substring(String, start, length)
Finding the Best Artist: The next challenge was to find the artist that appeared the most over each decade, and initially a quick summarize mode function was used:
![](https://www.thedataschool.co.uk/content/images/2022/10/image-78.png)
However, this was not accurate, as a result of the intricacies of the way that Alteryx works, and it only yielded one mode for each decade (even when there were 2 for some decades).
In order to fix this, a different approach was needed - which turned out to be more lengthy and confusing, but I finally arrived at the correct outcome. The approach involved using a summarize tool to first make a ranking of all the artists, then a sort tool to rank the data in a more useful order.
![](https://www.thedataschool.co.uk/content/images/2022/10/image-85.png)
The next step involved using a second summarize tool which found the highest rank for each decade, which was then matched against the original flow with a join tool, (matching the highest rank to the artists) ultimately creating the list that was required.
![](https://www.thedataschool.co.uk/content/images/2022/10/image-91.png)
Finally a concatenate function within the summarize tool allowed the results to be shown on the same row:
![](https://www.thedataschool.co.uk/content/images/2022/10/image-86.png)
Summary: The challenge was fun and allowed me to use some tools that I had not come across, as well as to understand some of the limitations of Alteryx. It was also very beneficial to use Alteryx in the context of solving a problem - something that will be very useful to me in the coming weeks and months.