Jesus turned water into wine. Alteryx's JSON tools can turn 1 string into 325 beers.
This week in training, we've been learning about APIs and web scraping. In this blog, I'll discuss what an API is, when to use them, and provide a tutorial on how to use the Punk API—the unofficial Brewdog beer list—and prepare it for analysis using the JSON tool.
What is an API?
API stands for Application Programming Interface and is effectively a way for two programs to communicate, acting like a translator. APIs are used across the digital world and are behind the blog you're reading, the social media platforms you scroll and websites you shop at to facilitate the transfer of information or data.
An API works by sending a request to a server, which then returns a response, similar to ordering food in a restaurant. You give your food request to a waiter or waitress who passes the order onto the chefs, and they return your request. However, not all APIs work the same way, and often the server will return the data in an unusable format. That's when additional steps are required to prepare your data before beginning any data analysis. Using the restaurant analogy, this would be like the waiter or waitress returning to your table with all the raw ingredients for you to prepare your meal yourself.
So, what is this unusable format? The request is often returned in JSON format, a large piece of text containing all your desired information. In this format, you need to chop it up, move it around, and remove the parts you don't need. See the example below.
data:image/s3,"s3://crabby-images/c7550/c75508e0f814ace280880d1a9286ff8d0f13d8b1" alt=""
When to use APIs?
As mentioned, APIs are everywhere in the digital world, mostly operating behind the scenes. However, many websites and companies allow their users to access data and information via an API. This enables the movement of data between systems for businesses. For example, using APIs in an e-commerce brand would allow the central storage of information, even if the data comes from different systems. Google Analytics, Shopify, and Klavyio, for instance, hold data for web analytics, orders, and customer engagement, respectively. Manually retrieving this data from each platform or downloading it as a CSV or .xls file (if available) is time-consuming. However, by using APIs and building a workflow in tools like Alteryx, an e-commerce brand can automate data retrieval, pulling it into a central database. This approach becomes even more effective when different data sources can be joined by customer, product, or date, allowing companies to advance their use of data and make more informed decisions.
There are many other popular APIs to practice your skills, which you can find here. This can be done in Python, Alteryx, or other tools like Postman.
PUNK API TUTORIAL
In this tutorial, I'll walk through the Punk API as well as the cleaning steps to organize my data. The Punk API contains Brewdog's expansive back catalog of beer from their DIY Dog homebrew beer list.
(It's essential to read the reference documentation provided. Note that not all reference documentation for each API is clear and helpful, and sometimes you may find yourself scratching your head).
You can find the Punk API reference documentation here.
data:image/s3,"s3://crabby-images/b1380/b1380ce47cb90a99967ca150070d591292b034a2" alt=""
Alteryx Tutorial
Step 1
Once you open up a new workflow in Alteryx, use a Text Input tool to create a record for the URL which is taken from the reference document. Name the header URL and add the URL link to the row.
data:image/s3,"s3://crabby-images/5efd5/5efd5b95c5622a0cf5fa44dc82ffeb35e44a6b23" alt=""
Step 2
If we were to return this single link, the API request will only return 25 records which is the first page of data. We have 325 records available that span across 13 webpages. To retrieve these records we use the Generate Rows tool to create a new row for each unique page URL. We have 13 pages of data to retrieve so we generate 13 rows in total.
data:image/s3,"s3://crabby-images/f39a6/f39a67084623d619596c5e7c344073fa91cf3df4" alt=""
Step 3
Next, we use a Formula tool to create a new field. This field takes our original URL and adds the page number on the end, creating our unique URLs for our 13 pages. We wrap the page number in a TOSTRING() function to convert our page numbers from integers to a string so that we can concatenate this with our new URL. We name this columns "New URL".
data:image/s3,"s3://crabby-images/eb5b0/eb5b04eb88a707b39b35af01a848322c206d1ee8" alt=""
Step 4
We then use the Download Tool to retrieve our data from each URL. Here we make sure to select our New URL column and we output this to a string.
data:image/s3,"s3://crabby-images/56638/5663844a7601562a5826b22138fec18a13f4a926" alt=""
Step 5
Our next step is to use the JSON Parse Tool. All our data is contained in one long string so we want to split this up. This tool extracts the string/value pairs from a single record and pivots them into their own columns.
data:image/s3,"s3://crabby-images/ba657/ba6578f78bf421498362ced207b0fc364497e2ec" alt=""
data:image/s3,"s3://crabby-images/968af/968afe9d743ffc712c420c46ce88f62ce10a4e93" alt=""
data:image/s3,"s3://crabby-images/61681/616819d0d97d175cb5aca6bffbb2f234dd3ff5af" alt=""
Step 6
We then use a Text to Column Tool to split the number from our field names.
data:image/s3,"s3://crabby-images/30f5a/30f5a26c527049b6ef7c0f1b97b48d8cf53acd5b" alt=""
data:image/s3,"s3://crabby-images/cf274/cf2742f0b8e082f2265f41863d85be5ac7077685" alt=""
Step 7
Next we use the Select Tool to reorder and rename the existing column headers and remove unwanted columns.
data:image/s3,"s3://crabby-images/642a1/642a12a5bedf044b713a2c2b3eaefc97bf9566d1" alt=""
Step 8
Now we're interested in creating a Unique ID for each beer. We want this so we can have a single row for each beer when we later come to pivot our field names to columns. There are a few ways to do this, but this step gets to the desired in a single step rather than multiple pivots and crosstabs for example.
We can overcome this with a Multi-Row Formula Tool
data:image/s3,"s3://crabby-images/c85e4/c85e429789ca51c675af2cbb3ef0bd38cd7ab3c8" alt=""
Multi-Row Formula Tool Expression:
IF ISNULL([Row-1:Beer ID]) THEN 1:
This says, "If the Beer ID in the row just above the current one is empty (or doesn't exist), then assign the value 1 to the current row's Beer ID."
ELSEIF [Header] = "id" THEN [Row-1:BeerID]+1:
- This part checks if the column header is "id." If it is, then it says, "If the column header is 'id,' then make the Beer ID in the current row one more than the Beer ID in the row just above it."
ELSE [Row-1:BeerID]:
- If none of the above conditions are true, it means the current row is not the first row, and the column header is not "id." In that case, it says, "Just copy the Beer ID from the row just above the current one."
ENDIF:
- This is the end of the "if-else" instructions.
Within out output we now currently have something that looks like this:
data:image/s3,"s3://crabby-images/b2969/b2969bd8fd56776f7d4664b0c759e9447bdfdc59" alt=""
Step 9
If you wish to explore ingredients of each beer, additional cleaning steps are required to start using this data. For the purpose of this blog I'm only interested in a handful of fields so I'm going to filter my ingredients fields out here, but the F-output from this step will enable to you go on to cleaning the ingredients fields.
data:image/s3,"s3://crabby-images/e78f0/e78f0ffc4053c0ce7d4ea29e74b2247da1d7946a" alt=""
Step 10
Next we use the Crosstab Tool to pivot our field names (Headers) onto columns. We group our pivot by our Beer ID which will fix our Beer IDs into their own column, moving all our data (Values) for each beer into a single row under the corresponding header.
data:image/s3,"s3://crabby-images/1a036/1a0365b122422c9e0b6c6dac37285d253a22a77c" alt=""
Step 11
Next we use the select tool to rename, reorder and keep the desired fields.
data:image/s3,"s3://crabby-images/f18b1/f18b17f5e9d488f0df6107f3d4e765f0ad03c9dc" alt=""
Step 12
Finally, we can output our data as a CSV using the output tool and Data is now ready for analysis with a beer representing a row of data.
data:image/s3,"s3://crabby-images/f7fe8/f7fe83ddb6373dbabbbaae0910877bdf09d3d18c" alt=""
Summary
APIs play a crucial role by facilitating smooth communication between various computer programs, enabling seamless data and function exchange. Alteryx, excels in efficiently handling APIs and data transformation. Its user-friendly interface simplifies complex tasks, streamlining the processes of data preparation and analysis. for this Punk API but can be easily replicated in professional contexts too.
For more check out the other API blogs here.