Today, at The Data School, I had the opportunity to learn Regular Expressions (Regex) for the first time. I have to say this was an entirely novel concept to me. As a data professional, the prospect of exploring this powerful tool and its endless possibilities filled me with excitement and curiosity. While initially challenging to grasp the underlying principles of Regex, with theoretical discussions and hands-on practice, I began to understand the basic building blocks of this tool.
In this blog, I aim to reflect on my learning with Regex and share some of the valuable expressions I discovered along the way.
Regex, short for Regular Expression, is a text pattern-matching tool used in data processing to search, find, and manipulate specific patterns or sequences of characters within a text. It is more versatile than a regular search and is universal which can be used in various programming and data analytics platforms such as Alteryx, Tableau, Python, R, etc.
Here are some of the major expressions I learned today to find specific patterns of characters within the text.
Qualifier Expressions
Qualifiers are used to specify the exact number of occurrences of a character or pattern in the text.
Qualifiers |
Represents |
. |
Wildcard (anything) |
\w |
Alphanumeric character or _ |
\d |
digits (0-9) |
\s |
Space |
\W |
Not alphanumeric or _ |
\D |
Not digits |
\S |
Not space |
[A-Z] |
Letters from A to Z |
[a-z] |
Letters from a to z |
[0-9] |
Number from 0 to 9 |
[abc] |
a or b or c (anything from the set) |
[^abc] |
Anything but those in the set |
Quantifier Expressions
Quantifiers are used to specify a general range of occurrences of a character or pattern in the text
Quantifiers |
Represents |
+ |
Wildcard (anything) |
* |
Alphanumeric character or _ |
? |
Zero or one (When used with Qualifiers) |
? |
As few as possible (When used with Quantifiers) |
{x} |
Match exactly x times |
{x,y} |
Between x and y |
{x,} |
Minimum x |
Others
Others |
Represents |
| |
OR operator |
^ |
Start anchor |
$ |
End anchor |
(…) |
Capture group |
\ |
Escaped Character |
Let's see some examples based on the following text
I learned Regex in The Data School on July 24.
- Expression to capture the text "Regex" is [A-Z]\w.+x
- Expression to capture the number "24" is \d\d
- Expression to capture the text "July" is (J\w.+)\s