RegEx Explained

What is a Regular Expression?

A regular expression is a pattern that describes a set of strings.

It's like a secret code for matching and manipulating text. Whether you want to validate an email address, extract phone numbers from a document, or replace all occurrences of a word, regex is your go-to tool.

Ex: Regex for an email address: ([\w\.]+@[\w\.]+)

What’s so great about RegEx?

Regular expressions are helpful for efficiently finding and extracting specific information from large text datasets.

They can help you find a needle in a haystack making it possible for organizations to allocate their human resources more effectively.

It’s also universal- these patterns can be used in various programming languages like JavaScript, Python, R, Alteryx, Tableau, etc.

Additionally, regular expressions are known for their speed and efficiency in testing, debugging, and maintaining code.

RegEx Cheat Sheet

Qualifiers (Specify the type of instances of a character, group, or character class must be present in the input for a match to be found)

. wildcard (anything)

\w matches any word character (basically alpha-numeric)

\d matches any digit character.

\s matches any whitespace characters such as space and tab.

\W matches any non-word character.

\D matches any non-digit characters.

\S matches any non-whitespace characters.

[A-Z] Matches any single letter in set of characters (uppercase)

[a-z] Matches any single letter in set of characters (lowercase)

[0-9] Matches any single digit in set of digits

[abc] Matches a or b or c (anything from the set)

[^abc] Matches anything but those in the set

Quantifiers (Specify how many instances of a character, group, or character class must be present in the input for a match to be found)

+ To match one or more occurrences of the preceding expression

* To match zero or more occurrences of the preceding expression

? To match zero or one occurrences of the preceding expression (optional)

{x} Repeat the preceding character (or set of characters) for as many times as the value inside this bracket

{x,y} The preceding character is repeated at least x & at most y times

{x,} The preceding character will match x or more times

Note:

+ and * are Greedy – will match as much of the string as possible

+? and *? are Lazy / Non-greedy – will match as little as possible

Others

| Matches any one element separated by the vertical bar (OR operator)

^ Setting position for the match (start anchor)

$ Match must occur at the end of the string (end anchor)

(…) Wraps a set of different symbols of a regular expression to be grouped together to act as a single unit (capture group)

\ Matches the actual ‘+’, ‘.’ etc instead of the RegEx special character (escaped character)

Author:

Erin Potter

View Profile