Today Philip explained how to proficiently use Regular Expressions in both Tableau and Alteryx for extracting and restructuring data. It is possible to use Regular Expressions in Alteryx with both the Formula and the Regex tools.
In Tableau
Regex formula is available in Tableau with 3 different formulas:
Regexp_extract (String, pattern) –> returns the part matching
In this case the formula looks at the characters between a-d followed by a space and a digit.
Regexp_match (String, pattern) –> if it is True it matches
In this case the formula looks for matches for all characters starting with a capital letter.
Regexp_replace (String, pattern, replacement) –> replace everything that matches
In this case the formula replaces the space with a dash.
In Alteryx
With the Formula tool is possible to use only three different options:
Regex count matches (string, pattern) –> gives a number as output
Regex match –> true or false value in the -1 or 0 output
Regex replace (string, pattern, replace) –> the output is a string
Using the Regex tool in Alteryx gives much more freedom compared to the formula tool:
Replace –> replace everything matched
Match –> Boolean statement
Parse –> extract just what wanted
Tokenize –> returns the value separated in either columns or rows
By looking at the Regular expressions in the drop-down, it is possible to have a basic overview of the combinations to be used in the Regex tool.
In addition to this there are a couple more to keep in mind.
\b –> border between alphanumeric characters and punctuation
– –> used for ranges
| –> or, either
? –> returns zero or one
.* –> returns everything, 1 or more
^ –> not in the set
{ } –> quantifier. First “(\w{3})” Inclusive ” (\w{2,5})” At least “(\w{6,})”
\w+ –> all alphanumeric characters
^\w+ –> word at the start of the string “^(\w+)”
\w+$ –> word at the end of the string “\w+$”
A very good website for exercises, examples or just practice is RegeExr .