Dashboard week day 4: Cheeses, cheeses everywhere...

by Eliott Sacau

Dashboard week day four. No dataset as per say today, simply an instruction. Find some data about cheeses, and create a visualisation. Andy handed us the details of an API we could query to obtain the data about all the cheeses.

First problem we encountered was simple: the API did not work. Since it is dashboard week, and we need to have something to present by the end of the day, we quickly found an alternative way of obtaining the data: scrapping the data from the cheese.com website.

This is not without its complications however. We have to gather, for all the cheeses that are on the website, the characteristics of each and every cheese. This was another occasion by which Alteryx managed to amaze me. The first step involved gathering a list of all the cheeses available on cheese.com, which amounted to 1831 cheeses. The next step was to then download all the characteristics of the each of the cheeses, their taste, texture and smell, as well as their country of origin and many other.

Step 1: Gathering the cheese names

Scraping the data from the front page made us realise that the names of the cheeses could be sorted alphabetically, and we could get up to 100 of them per page. The tough bit was sorting through the HTML to find the tag where those names were hidden. Hidden under the alias of <h3> within the <main body> section of the HTML were our names.

A little RegEx magic, under the simple formula of returned us the names of our sought-after cheeses. Repeating this for all the 19 pages gave us a complete list of all the available cheeses, and let us move to step 2.

Step 2: Cleaning the HTML for each cheese, to get to the fragrant bits of usable data.

We were quite perplexed about how to extract all the data from the page, until George noticed that all the data we needed was hidden under the <p> tags. A quick parse on <p> (while ignoring the errors!) through the XML parse tool gave us a mostly complete dataset, albeit horrible and unusable as is. A little RegEx and data Cleansing and pivoting later got us the data in a smooth format for some tableau visualising.

Step 3: the Cheese Cheat Sheet

Once all the data was cleaned, there was not too much time left during the day. I decided to go for a quick overview of the cheeses by region and country of the world, what cheeses were the favourite based on texture, flavour, smell, type, wither by fat % content, or by number of cheeses.

This is to look on one side which are the most popular types of cheeses in any given region in the world, but also how relatively healthy particular types of cheeses are in any region/country, as well as an option for the end-user to drill down and discover the cheeses.

The dashboard was simple, but attempted to cover and convey a lot of information in a simple and informative way.

You can find it here: https://public.tableau.com/profile/eliott.sacau#!/vizhome/Cheeses_Thursday/CheeseCheatSheet?publish=yes