RegEx - An Introduction and Cheat Sheet

What is Regex?

  • Regular Expressions
  • a way to specify part of a string that matches something
  • a way to check if a string contains something

Why Regex?

  • More versatile
  • Universal - available in Tableau & Alteryx
  • It is used for finding, removing or replacing parts of string.
  • \w{3} - 3 word like characters - My cat is called Bob (matches to every 3 letters next to each other - ‘cat’, ‘cal’, ’led’, ’Bob’
  • String processing

How to work well with Regex?

  • Be vague enough to capture everything you want
  • Build up expressions for trial and testing

Quantifiers - how many of something?

Special Characters:

.[]{}()\*+?|^$

Character Classes:

. = everything!

\w = character

\d = digit

\s = whitespace

Formula:

REGEX_CountMatches(String, Pattern)

How many times in this string does this pattern show up? Output is a number

REGEX_Replace(String, Pattern, Replace)

Output is the String with replacement In this string, find this pattern and replace it with this string

REGEX_Match(String, Pattern)

Does this string contain this pattern? Output is -1 or 0 (True or False)

Examples:

"My cat is called Bob and born 10th May. I have Many cats." - For reference

  1. “.” - Every separate character
  2. “.*” - Sentence as a whole
  3. “.+” - Sentence as a whole - one or more
  4. “\w” = alphanumeric and ‘_’ - every character that isnt whitespace
  5. “\w+” - each word - greedy
  6. “\w{3}” = 3 alphanumerics - 3 word like characters - My cat is called Bob (matches to every 3 letters next to each other - ‘cat’, ‘cal’, ’led’, ’Bob’
  7. “B\w+” - words beginning with B
  8. “[A-Z]\w+” - includes a list - will look for those characters in a string - [A-Z] words beginning with a capital letter
  9. “[A-Z]*\w+” - words beginning with 0 or more capitals
  10. “^\w+” = start of a string
  11. “\w+$” - end of the string
  12. “My” - the word “My”
  13. “M.y” - M followed by any character followed by y - only 1 character - word, blank, word
  14. “M..y” M followed by any character followed by y - only 2 character - word, blank, blank, word
  15. “M.+y” - will detect all y’s (greedy) until the last y. Sentence as a whole - M, 1 or more blanks, y (greedy)
  16. “M.+?y” - ‘?’ makes it not greedy - M, blank y, - Stops at ‘May’
  17. “M\w*y” - M, 0 or more alphanumerics, y (greedy) - My, May, Many
  18. ( ) - Group Greedy
  19. “\s” - whitespace - one character - matches with all the whitespace
  20. “\d” - digit - 0-9
  21. “\d{2}” - works for double digit number
  22. “\u” - uppercase letter
  23. “\u\u?\d\u?\s\d\u\u - ‘?’ - makes it optional - e.g. Postcodes (because its unique)
  24. “\n” - New Line
  25. “.+” - one or more of anything (any character)
  26. \S - NOT space
  27. [^s] - not an S
Author:
Sherina Mahtani
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2024 The Information Lab