What is RegEx?
RegEx (Regular Expression) is a sequence of characters that defines a pattern in a text field. The characters could be words, numbers, symbols, punctuation, unicode, etc. It is used for extracting, replacing, and matching text.
For example, a simple . finds any character except a new line in RegEx.
Notice how everything except for the new line in the middle of the two sentences is highlighted.
![](https://www.thedataschool.co.uk/content/images/wordpress/2019/05/image-12.png)
\w finds any alphanumeric character
![](https://www.thedataschool.co.uk/content/images/wordpress/2019/05/image-13.png)
\d finds digits
![](https://www.thedataschool.co.uk/content/images/wordpress/2019/05/image-14.png)
\s finds spaces
![](https://www.thedataschool.co.uk/content/images/wordpress/2019/05/image-15.png)
Why is it useful?
RegEx is used in the software world, largely by programmers and analysts. It can solve a diverse number of problems and will become your best friend if you are working with any of the following:
- free input text fields
- data validation
- inconsistent data
- web scraping
It also comes in handy when you need to extract information from a text field, such as: email addresses, post codes, various dates formats, names, id numbers.
Here’s an example:
![](https://www.thedataschool.co.uk/content/images/wordpress/2019/05/image-16.png)
The syntax \w+@\w+.\w+ successfully matched the email address above. It reads: find one or more characters followed by @ followed by one or more characters, a . and one or more characters.
![](https://www.thedataschool.co.uk/content/images/wordpress/2019/05/image-17.png)
To find a postcode, I included the \d pattern which finds digits.
\w+\d+\s\d\w+ reads: find one or more characters followed by one or more digits followed by a space, another digit and one or more characters.
Note: Overusing RegEx is a great way to make your co-workers very angry with you. Do not use it if there is an alternative simpler solution to an issue that does not require a complex syntax (e.g. a text field that can be split into two columns based on a common delimiter).
How to learn it?
As described by Craig Dewar from The Data School Australia ‘Learning Regex is a bit like learning a second language – except you don’t speak it – you just think it.’
Luckily, there are numerous resources online and you would not need to remember the RegEx syntax by heart. You can find a full reference ‘cheatsheet’ with all of it listed on www.regexr.com.
![](https://www.thedataschool.co.uk/content/images/wordpress/2019/05/image-11-530x1024.png)
This great website also allows you to write and test your RegEx. Another one is www.regexone.com which provides bite-size lessons and practice examples.
If you have any questions about RegEx, or would like to chat about it in general, do not hesitate to get in touch on Twitter @nataliatamiteva.