Wednesday brought in a bit of web-scraping, which is a really powerful way of accessing information from the web. We covered web-scraping a couple times during training with PGB, where we accessed IMDB data to set up a data source with the top 250 movies. Web-scraping in Alteryx (and in general) usually involves 3 steps:
For today's project, we took a look a NCD Risk Factor Collaboration data, and I focused specifically on the data pertaining to Diabetes. In order to scrape the required data (which would be through the individual countries tab), I first needed to create the list of countries, which I did by inspecting the page and then editing the section with all the countries as HTML, as it was stored in a Javascript pop-up list.
Once I had copied this list, I added it as a text input in Alteryx and parsed it out in order to get a clean list of countries:
I thing accessed the URL of the page in order to request the data for each specific country:
I then did a bit of reshaping, and also added in some population data to boost the data set a bit. Finally I outputted the data into a .hyper file so I could begin visualizing the data.
For the dashboarding part, I wanted to work on my dashboard design and incorporate some new ideas. I decided to try set up my dashboard as a 'Patient Form', where each patient would be a country and you could look at their status in terms of Diabetes. The outcome is as follows:
It's a simple dashboard but gives quality insight into the diabetes status of specific countries!