API stands for Application Programming Interface. According to the Postman page, API is a set of protocols where different software components communicate and transfer data. The client will send the request when they need to get the data. Then, depending on the type of API (Private APIs, Public APIs, Partner APIs), the API will receive the request and transfer it to the server. The server will check the authentication, validate data, and retrieve the data. After that, the server will respond to the client. If the authentication, requests are good, the client will receive a status code (200), and retrieve the data. If there is something wrong during sending request, or problems in authentication, depending on the problem, the client will receive the status code. Based on the status code, the client knows the reason why they cannot retrieve the data. To get more info about API, you can read more at the Postman page at: https://www.postman.com/what-is-an-api/
Usually, the client will get the JSON data format after retrieving data from API. In this blog, I will share:
1/ What is JSON? What is the JSON structure?
2/ Extracting data from API with the requests and pandas package
3/ Extracting data directly from API in Tableau Desktop with TabPy
Are you ready to extract data from API? Let's get started!
1/ What is JSON? What is the JSON Structure?
JSON stands for Javascript Object Notation. According to the w3resource page, JSON is used to serialize and transfer data over a network connection. The Relational Database stores data in tables with rows and columns, while JSON stores data with objects and arrays.
In an array, it could have one or more objects. An object contains key-value pair(s) and is separated by a comma if more than 1 pair. The value of each object could be strings, numbers, Booleans, and null. In some cases, the value of a key could be an array containing multiple objects.
- The JSON object contains key-value pairs surrounded by curly braces {}.
- Key and value is separated by a semi-colon ":". Each key-value pair is separated by a comma ",".
- An array contains values or objects and is surrounded by a square bracket [].
For example, I have a JSON structure in Fig. 1. There is one big object contains:
- 6 keys: title, name, role, company, skills, and blog_post
- For the skills key, this is an array containing 5 values (Python, SQL, Tableau, Alteryx, R)
- For the blog post, this is also an array containing 3 objects
- Object 1 contains 3 keys (tableau, tool, released_year). The tableau array contains 3 text values
- Object 2 contains 3 keys (alteryx, tool, released_year). The alteryx array contains 2 text values
- Object 3 contains 3 keys (SQL, tool, released_year). The SQL array contains 2 text values
Does it seem like a tree structure? :) In Fig. 2, I sketch a tree structure for that JSON data.
Now, the question is: How can I reconstruct the JSON data to a table with rows and columns?
The answer will be in the next section!
2/ Extracting data from API with the requests and pandas package
In this section, I will use Pokémon API as an example. The Pokémon API is free and open source where you can send requests to retrieve data without authentication. This API limits the method, so you can only send the GET request to retrieve data. Before sending requests to retrieve data from API, I recommend to read the documentation page first to know how to use API and also the policy, rules.
We will retrieve data from this link: https://pokeapi.co/api/v2/ability?offset=0&limit=367
Fig. 3 shows the JSON format when we retrieve the data. I will open Jupyter Notebook to work on this, but you can use any Python IDEs (PyCharm, VS Code, ...) to do.
First, I need to import all libraries I need. I imported requests, pandas, and json library. In Fig. 4, I sent a request to retrieve data from the link above and store the result in the res variable. The PokemonAPI is free and open source, so I don't need to set username, password or token when sending the request. Then, I can check the status code and the data type.
If I type type(res.json()), it will return the data type of res.json() is a dictionary in Python. In Python, the dictionary data type is the same as the object in JSON. A dictionary also includes key and value pair(s).
Then, you can print the data by printing res.json(). Notice that, the result is a dictionary in Python.
However, the output format doesn't look good, so I can use the dumps function from json package to print out the result in an organized way to read it easier. The json.dumps() function will convert from Python dictionary to JSON and print it out as a string. I know it's a little confused, but don't worry much about it.
We only need to focus on the key "results". I want to store all the ability name and url into a Data frame. If you notice from fig.6, the results key has an array with multiple objects. Each object is a different name and url.
To store the key and value pairs from the results key into a data frame, I used the DataFrame() function from pandas package. Inside that function, I only need to list the key I want to store from res.json() result. Finally, I assigned the result to ability_df variable and print the result (Fig. 7).
Nice! Let's switch to Tableau Desktop to test.
3/ Extracting data directly from API in Tableau Desktop with TabPy
This step requires that you already installed TabPy on your local machine or Tableau Server/ Cloud. I will activate TabPy and test it on my local machine. If you haven't installed TabPy, you can follow my instruction from my previous blog here.
First, I need to create an input file to store my API link. I open Excel and copy the link into it with the header is API link and save it as xlsx format.
Then, I open Tableau Desktop (I'm using version 2024.3) to connect to the Excel file I just created. Then, I also check the connection with TabPy.
Tableau automatically brings Sheet 1 into canvas. I right click on Sheet 1 and remove it. On the left hand side, I drag and drop New Table Extension into the blank canvas (Fig. 10).
Then, drag and drop Sheet 1 into Table Extension window. Make sure that the data from the sheet shows in the Input Table tab (Fig. 11).
In the Current Analytics Connection window, there is a host name and the port number to show the connection between TabPy and Tableau Desktop. Inside the box, you can start typing the Python script.
Same as what I did in Jupyter Notebook, I import all libraries I need. Then, send the GET request to retrieve data and store the result in the res variable. I only need the value from the results key, so I type res.json()['results'] to retrieve all data in the results key and store them in the data variable. Then, I store data in a pandas data frame before return it.
Finally, I click on the Apply button. Then, select the Output Table tab to show the result. In the Output Table, you can see there are 2 columns (name and url).
Congratulations!!! You just retrieved data from API in Tableau Desktop!
This blog is quite long, but I have many more things to share that we can do with API in Tableau Desktop. From this blog, I hope you get the info about what is API, JSON and JSON structure. From my instruction in Python and Tableau, you know how to send request to retrieve data and store it in the data frame.
In the next blog, let's increase the difficult level a little bit. As you see, the data from the results key is easy. It contains multiple objects with name and url key value pair. If you copy any url from the ability name and paste in the browser, it will show another JSON data belongs to that ability (Fig. 13).
In Fig. 13, that is another JSON data but it contains nested objects and arrays. It would be more complicated than what we did so far. In the next blog, I will share how you can extract all nested objects, array from JSON in Tableau Desktop. Fig. 14 is one of the result tables.
I hope you enjoy this blog and see you again soon!