RegEx output methods in Alteryx

RegEx stands for Regular Expression and is used in various coding languages. There are several applications of RegEx, for example to clean data, to search for specific patterns and replace or parse them. In Alteryx, RegEx could be found under the tool group “Parse”. There are four output methods that Alteryx uses within the RegEx tool: parse, replace, tokenize and match. This post will lead you through each of them with a simple example.

1. Parse

Parse is used to extract values in one column into a new column or multiple colum. In the following example, I split the Phone number into two separate columns: Area Code with is the first part of the number, separated from the Local Number by a blank space, or put inside the round brackets. The whole expression of the phone number is written inside the “Regular Expression” field, in which the two new output fields are defined within the round brackets respectively.

2. Replace

Replace is used to replace value in one column or a part of it in the defined replacement text. Look at the ID field, it is formatted as Gender-Surname-“nm”ID Number. Now we want to replace the whole Gender-Surname-“nm” part with the Surname only, so that we have in the end Surname-ID Number. In the Regular Expression Field is the whole text which should be replaced. I want to take the Surname out of that so I group it using round brackets. Because there is only one group so that it is group 1. In the Replacement Text field I told Alteryx to take the information of the group 1 by typing $1 in it.

3. Tokenize

Tokenize splits a column into multiple columns or rows. The function is similar to “Text to column” tool, however, it gives users more flexibility using RegEx. Considering the long text containing URLs as showed below.

What we want now is to have each URL in a rows. Be sure to check on the Box “Split to Rows” in the Output Method. The URLs are inside the quotation marks (“ “) and separated by a comma (,), so in the Regular Expression, we take anything in the string except the quotation marks and commas.

The URLs will be extracted to rows like this:

4. Match

Match returns a Boolean values as a result whether the whole expression matches the value or not. This output method is pretty easy to understand. The first three rows in the left column will turn “True” for the Expression:

\d\.\s+abc.

But the fourth row returns "False", because we want the text to include one or more white spaces (\s+).

The tricky part of this is to write the exact expression in order to get only the result you need and skip the others.

Start the fun

Does that sounds good for you? If you want to have more fun with RegEx, go to https://regexone.com/ and do some exercised designed from basic to advanced levels. If you want to test out expression before you run it on Alteryx, there are several Website where you can enter your texts and try out with the expression to see how it matches, for instance https://regexr.com/, https://regex101.com/.

Have fun!

Author:

Nhung Le

View Profile