Regex Day

Today's training was about Regex and this was the first time I have heard about Regex. What is Regex?

Regex is short for regular expression, a regex is a string of text that allows you to create patterns that help match, parse, and manage text. We used Regex in Alteryx and did some exercises. Now I am going to show you how it works in Alteryx by using simple expressions.

There are four tasks in the picture. The first is to remove any spaces from the phone number. The result window showed the original data and as we can see there are spaces in the column number. On the configuration pan, I put \s in the Regular Expression box and Output Method is replace, but nothing n the replacement text box. \s means space in Regex. Thus, the outcome picture will show non-space number below:

The second task is to keep the time in the time field only by removing PM or AM. In Regex, | means or. On the configuration pan, Column is time, AM|PM is in regular expression box and output Method is replace as above.

The result shows below:

The third task is to create a column for each product ID. The product ID has a long text mixed with - and numbers. Here we need to separate them by -. Regex is a bit different than Alteryx text to column. In text to column function we separate text by delimiters and we put - in the delimiter box. However, in Regex we only write down what we need to keep, so we don't write - down. In this requirement we don't use match or replace, instead we choose Tokenize, which is a bit similar to text to column.

\w in Regex means all letters, numeric and _.  + means one or more. Hence, \w+ means one or more alphanumeric will be tokenized into five columns. The result window shows underneath.

The final challenge is to find the initial of a name and the surname from the emails. Since the email pattern is fairly simple just formed as initial and surname, and it is also repetitive, so it is a capture group. \l means letter in Regex and ( ) means a capture and the action will be repetitive. Since there are going to new columns, so we use parse this time.

The result will be as below:

Regex is very useful, especially when we deal with large amount of text. Hopefully, these simple examples gives you some ideas about Regex.

Author:
Longyan Chen
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2024 The Information Lab