RegEx: what, why, and how?

by Natalia Miteva

What is RegEx?

RegEx (Regular Expression) is a sequence of characters that defines a pattern in a text field. The characters could be words, numbers, symbols, punctuation, unicode, etc. It is used for extracting, replacing, and matching text.

For example, a simple . finds any character except a new line in RegEx.
Notice how everything except for the new line in the middle of the two sentences is highlighted.


\w finds any alphanumeric character


\d finds digits


\s finds spaces

Why is it useful?

RegEx is used in the software world, largely by programmers and analysts. It can solve a diverse number of problems and will become your best friend if you are working with any of the following:

  • free input text fields
  • data validation
  • inconsistent data
  • web scraping

It also comes in handy when you need to extract information from a text field, such as: email addresses, post codes, various dates formats, names, id numbers.

Here’s an example:

Extracting an email address with RegEx

The syntax \w+@\w+.\w+ successfully matched the email address above. It reads: find one or more characters followed by @ followed by one or more characters, a  .  and one or more characters.

Extracting a UK post code with RegEx

To find a postcode, I included the \d pattern which finds digits.

\w+\d+\s\d\w+ reads: find one or more characters followed by one or more digits followed by a space, another digit and one or more characters.

Note: Overusing RegEx is a great way to make your co-workers very angry with you. Do not use it if there is an alternative simpler solution to an issue that does not require a complex syntax (e.g. a text field that can be split into two columns based on a common delimiter).

How to learn it?

As described by Craig Dewar from The Data School Australia ‘Learning Regex is a bit like learning a second language – except you don’t speak it – you just think it.’

Luckily, there are numerous resources online and you would not need to remember the RegEx syntax by heart. You can find a full reference ‘cheatsheet’ with all of it listed on www.regexr.com.

RegEx cheatsheet

This great website also allows you to write and test your RegEx. Another one is www.regexone.com which provides bite-size lessons and practice examples.

If you have any questions about RegEx, or would like to chat about it in general, do not hesitate to get in touch on Twitter @nataliatamiteva.

Avatar

Natalia Miteva