How to Easily Download Data From the Internet with Alteryx

by Jeremy Kneebone

Have you ever tried to download some data from the Internet? Occasionally websites make it easy and give you a csv, but much of the time that option’s not available. The website might offer something called an API, but you have no idea what that is or how to use it.

 

Here’s what to do with an API. (For the time being I’ll focus on APIs in JSON format, just as an example.)

 

The API will be a hyperlink. If you try and open that link in a web browser, you’ll get a big text file full of gobbledegook. You can’t read it, but Alteryx can.

 

In Alteryx:

  1. Open a new workflow and drag the Text Input icon to the canvas (it’s a different icon from your usual Input Data one).

2. Copy and paste the hyperlink into the field of the Text Input icon. This might take a couple of attempts for the link to appear.

3. Find the Download icon (you can use the Search bar) and drag it to the canvas. You don’t need to change the configuration.

4. Drag the appropriate Parse tool to the canvas. For example, if the data is in JSON format, use the JSON Parse tool.

5. Select the field that your data is in – it’ll appear as something like ‘DownloadData’.

6. Run the workflow to parse your data. You’ll end up with the data in your final tool.

7. Now comes the long bit. You’ll need to clean the data, working out which fields are the values, which the headers, and which are unnecessary. Generally, there’s some pattern to the data: find that pattern, use Text To Columns to separate the field with the data out into columns, and use the Crosstab tool to reshape the values into separate fields.

 

And there you go. It should take about five tools: Text Input, Download, JSON Parse, Text To Columns and Crosstab, with a Select or two thrown in for renaming fields.