In the previous blog, I discussed what XML is and its basic structure. Today I am going to talk about how to use the XML parse tool to put the information wrapped by specific tags into columns. Please make sure you are familiar with the terminology like root, child and XML tree. If you are not sure, go to check out my previous blog!
The configuration of the XML parse tool is very simple. After you select the field that contains the XML data from the input data stream, you have to select 'XML Element to Parse'. It basically means to which level or parent of the XML trees you want to extract the information from. There are three options:
- Root: the root of the XML tree, which means the outermost tag of the tree
- Auto Detect Child: Default is the tag that appears most in the whole XML tree. If every tag only appears once, it points to the first child of the root.
- Specific Child Name: literally, you specify the child you want to parse
data:image/s3,"s3://crabby-images/746bc/746bcc0e65d785cbcaf73f4e4a3fb33ddb1330c8" alt=""
Choosing the options under 'XML Element to Parse' will give you the tag name as a column name. As many parent tags don't have any text information (they are the tags wrapping other XML elements), the value in the column will likely be NULL. It will also return the attribute as a column if the tag has an attribute.
The additional options give you control over what should be returned.
- Return Child Value: return the text of the child one level under the parent you select in 'XML Element to Parse'
- Return Outer XML: return the trimmed XML tree of which the root is the level you select in 'XML Element to Parse'. It is useful when you need to further parse the XML trees and have finer control.
Let's say we have the following XML element in the input data stream:
data:image/s3,"s3://crabby-images/1cc03/1cc03adedf9fd8293bd757faddaa47a3334a7105" alt=""
By only selecting 'Auto Detect Child', the most common tag is parsed, here is <book>. Because the <book> tags have their own attribute, the attribute was also returned as an extra column.
data:image/s3,"s3://crabby-images/7bf06/7bf061746fca0f2755c6352af4b7a88043d998fc" alt=""
By selecting 'Auto Detect Child' and 'Return Child Values', the text of the children tags (here are the <title>, <author>, and <year>) under <book> were all returned to the corresponding columns.
data:image/s3,"s3://crabby-images/65c69/65c69676c31ff61563feadd733a75dc4a9a788d0" alt=""
By also selecting the 'Return Outer XML', the XML tree at the <book> level was returned as well. When the XML tree is complicated nested, we can use this option and chain another XML parse tool to drill down to the particular child we need.
data:image/s3,"s3://crabby-images/fd75b/fd75b8539f917bd46b530604f4c3b9bb75ba7f5f" alt=""
That's it for this tutorial on the XML parse tool in Alteryx. Hope you take something away.
Feel free to check out my other posts and connect with me on Linkedin - learn and improve together!
https://www.thedataschool.co.uk/blog/stanley-chan