Dashboard Week - Day 1 - ASOS Web-Scraping

by Georgie Grgec

Dashboard week is here for DS15! Day 1 we were tasked with using web-scraping the ASOS website for all the clothing items, brand, cost, description and product types for the Men’s and Women’s new clothing.

Andy posted the details our challenge on the Data School blog at 9am this morning.

We first looked at web-scraping two weeks ago so it was good to try and put these new skills to the test. Below is a summary of how the day went –

  1. Alteryx Workflow

Majority of the day was spent doing work in Alteryx to download and parse the all the data required.

I started off the process with downloading data from the ASOS Men’s New in items and then repeated the process for the Women’s New in items and joined the data together.

There was a fair bit of Regex involved to extract the relevant details we required. In order to find where the relevant data was stored on the web pages, I used a combination of the inspect web page feature as well as by viewing the page source.

The most challenging part came when I needed to create a Macro in order to scrape all the pages for both men and women. I created an iterative macro to run for every page until it there were no more pages. This took me a lot longer than expected.

My workflow:

After obtaining the full data, building the iterative macro and extracting it in a .hyper format for Tableau, it was time to quickly jump into Tableau.

Unfortunately building the macro took me a lot longer than expected and meant the time remaining for my dashboard was limited!

Day one down, let’s see what day two brings!