At the end of October 2024, Alteryx Designer updated the latest version to 2024.2. If you haven't updated it yet, click on this link to download and update Alteryx Designer to the latest version. Many of the updated features in this latest version are very helpful in preparing data. Alteryx released a new tool called Rank tool, adding the Group By function in the Record ID tool, and many new updates ... You can check Alteryx Designer 2024.2 Release Notes here.
Extracting data by web-scraping data from web pages and retrieving data from API are my favorite topics. In this latest version, Alteryx Designer updated some helpful features in the JSON Parse tool. Those new features save so much time when I need to parse JSON after retrieving data from API, flatten nested objects, and arrays without typing any lines of code. In this blog, I will walk through:
1/ JSON structure
2/ Flatten nested objects and nested arrays with the JSON Parse tool
(Using Unnest JSON field and Flatten Array options)
3/ Output values into data type specific values
4/ Output values into single string field
Ah! Remember to update your Alteryx Designer version to 2024.2 to follow the steps I will show later in this blog.
1/ JSON Structure
JSON object contains key-value pair(s). The key and value are separated by a colon mark ":". If there are more than one key-value pair, each pair will be separated by a comma. The data type of each object could be strings, numbers, Booleans, and null.
In some cases, an object contains multiple nested objects or nested arrays. The object is surrounded by a curly braces {}. An array is surrounded by a square bracket [].
In the example above, there is a big object which contains keys:
- title (the value is a string)
- name (the value is a string)
- role (the value is a string)
- company (the value is a string)
- skills (this is an array when the value is surrounded by a square bracket [], each value in this array is a string and separated by a comma )
- blog_post (is also an array with 3 nested objects because there are 3 curly braces pairs. For each object, there are 3 key-value pairs inside. For example, in Object 1 in Fig. 1, tableau, Alteryx, and SQL are an array when their values are listed in a square bracket and each value is separated by a comma)
If I sketch it in a tree diagram, it would look like this:
2/ Flatten Nested Objects and Nested Arrays with the JSON Parse Tool
In this session, I will demo how to send the request to extract data from API and parse JSON with the updated JSON Parse Tool.
The API I use is from the Star Wars API (SWAPI). You can read the API documentation page here. The API link that I use is: https://swapi.dev/api/planets/
The data is about the planets that appeared in the Star Wars series including the name of the planet, rotation period, diameter, climate, gravity, films, ...
If I display that JSON structure under the tree diagram, you will see the structure clearly with nested objects and nested arrays.
Let's open Alteryx Designer 2024.2!
Drag and drop the Text Input tool into the canvas and copy the API link above to a cell in the Text Input tool.
Then, I drag and drop the Download tool from the Developer palette to connect to the Text Input tool. The Download tool will help us to send the request to the server to retrieve data from API. The API we are using is free and public, so I don't need to set the Headers. If you are using API with the access token or API token, you should set it in the Headers tab in the Download tool.
In the Configuration window of the Download tool, I unchecked the box Use Data Connection Manager (DCM) and ensure the Field is selecting the field url.
If your request to the server succeeds, you will get the response in the Result window. The code in DownloadHeaders will be 200 OK. Your retrieved data will be in the Download Data column. If you use the Browse tool to connect to the output of the Download tool, you will see the JSON structure as Fig. 3 above.
Next, I drag and drop the JSON Parse tool (also from the Developer palette) to connect to the output anchor of the Download tool. Some options in the JSON Parse Tool require running with AMP. By default, Alteryx Designer will turn on AMP for you. If you are not sure, you can check by clicking on the blank canvas. Then, select the Runtime tab, and make sure that the box Use AMP Engine is checked.
Unnest JSON field (AMP Only)
This option only works with AMP on. This option will help to unnest JSON objects (One level deeper).
As I mentioned in the first part, the JSON object is surrounded by curly braces {}. If you look at the value in the DownloadData column, the leftmost mark is a curly brace "{" which represents an object.
When choosing the Unnest JSON field, it will flatten the object to a deeper level. In Fig. 8, the tree diagram on the right shows the keys in the object after flattening.
Run the workflow, the result will return 4 keys in 4 separate columns. The DownloadData will be concatenated at the front of the key (Fig. 9).
As you know from the tree diagram in Fig. 4, the results key is an array. The leftmost mark of the value in the DownloadData.results is a square bracket "[".
Flatten array (AMP Only)
Next, I want to flatten the array. The JSON Parse tool has a feature to flatten the array only. It also requires AMP on.
I connect a new JSON Parse tool to connect to the previous JSON Parse tool. In the Configuration window, at the JSON field, I selected DownloadData.results and chose the option Flatten Array (AMP Only) as I wanted to flatten the Results array.
From the tree diagram, in the results array, there are 10 nested objects. In the Results window in Fig. 10, there are 2 columns for results. The DownloadData.results_flatten will store 10 nested objects from the results array. The DownloadData.results_idx will store the index of each nested object after flattening sorted by ascending.
Repeat the step to unnest the object and flatten the array
If you notice the value in the DownloadData.results_flatten, the leftmost mark is a curly brace "{" which is an object. From here, you can use the Unnest JSON field (AMP Only) option from another JSON Parse tool to unnest the object (Fig. 11a).
After unnesting the DownloadData.results_flatten field, I got some new columns which are keys of the nested object in the result array. From the tree diagram, you notice that we have 2 nested arrays (residents and films). In the result window, you also see the leftmost mark value of those columns is a square bracket "[".
Splitting the branch
Currently, we have 10 rows that contain 10 nested objects from the result array. For each nested object, there is an array for resdients and an array for films.
In Fig. 12, that is an example in the first result object containing the resident 1 and film 1. The Resident 1 contains 10 values. The film 1 contains 5 values.
If you flatten either one array (Residents or Films) first and flatten the other array after that, it will explode your dataset. Let me show in the sketch.
In Fig. 13, I flattened the Residents array first, and then flattened the Film array after. I assumed that the number of elements in Residents is the same for all Residents and the number of values in Films is the same for all Films.
The number of rows if I do that = 10 Residents total * 10 residents for each big Resident * 10 films array for each resident value * 5 films for each film array = 5000 rows.
However, we only want each value in the Residents_1 array to match with the Films_1 array (Fig. 14). By doing this, the number of rows is less than 10 times and we don't need to prepare the data again.
Alteryx already created the result index column. Based on this index, we can match the index of the Resident value with the index of the Film value (Fig. 15).
Come back to our workflow, to return the result as I want in Fig. 14, I need to split the main branch into 2 small branches.
Now, I only join those 2 results back with the key is the result index from Fig. 15.
If I sort the D0wnloadData.result_idx by ascending, you will see each residents_idx in the residents array will match with 5 films value from the films array.
Then, you can rename the column name, and remove unnecessary columns with the Select tool.
3/ Output values into data type specific fields
This option does not require AMP. This option will organize data in each separate column (depending on the data type of the value).
4/ Output values into single string field
This option also doesn't require AMP. This option will flatten all levels in your JSON structure.
In the Results window, each nested object will be returned in each separate row. If there are many values in the same object or array, Alteryx will automatically create a unique index id at the end.
One thing to notice is that all values in the JSON_ValueString are in the string data type.
In this blog, I shared the JSON structure after retrieving data from API. In the second part, I introduced the latest features from JSON Parse tool to flatten nested objects, arrays into each row by using 2 options (Unnest JSON field and Flatten Array) with AMP. The third and fourth part is 2 options that you can flatten all levels in the JSON structure into a separate row.
This new feature for the JSON Parse tool is very helpful; especially in the complex JSON structure. The user doesn't need to understand much about the JSON structure or have to count how many curly braces, and square brackets. It's also useful for those who don't have a programming background. Users only need to connect the tools and select options in the JSON Parse tool. Quickly, the JSON structure will transform to a data table. I really like this new feature from Alteryx.
I hope this blog is helpful to you when you need to parse JSON and prepare data. See you soon in the next blog!