Skip to main content

In Marketing We Trust is taking action to support the people of Ukraine. ukraine-logo Donate Ukraine Policy

6 min read

For our TrustED Conf 2021 VR World Tour, we heard from Andrew Patterson, a Senior Data and DevOps Engineer with over a decade of experience in software engineering on how Regular Expressions can be used for powerful pattern matching, including in Search Console and Google Analytics.

What are Regular Expressions?

Regular Expressions are a syntax used to define a search pattern. They are commonly used in “find” and “find and replace” operations, allowing you to work with literal text, regular expressions extend this to allow you to search for text that matches a pattern instead.

Want to find the email addresses in some text? You could try:

\b[\w.%+-][email protected][\w.-]+\.[a-zA-Z]{2,6}\b

But there are many ways to do this!

And I care because …

You can do cool stuff with Regular Expressions! Both Google Analytics and Google Search Console use them. You’ll find them in plenty of other tools too:

  • Text editors like Notepad++, EditPad
  • Google Docs, Sheets

They are also found extensively in programming languages.

Here be dragons

regular expressions dragon

Because people seldom agree on anything, there are different versions of regular expressions.

  • Not all RegEx are equal
    • POSIX and Perl
    • Basic and Extended (and Simple)
    • Different tools will use different versions, so best to check the reference

Regular Expressions (fun)damentals

Syntax

No need to worry about remembering all of these Regular Expressions syntaxes. The key is to understand the concepts – generally, I’ll look up the reference for the tool I’m using, as there can be differences between them.

regular expressions syntax

  1. . – is a wild card, and will match any character
  2. ? – is existential, does it exist or not?
  3. * and + will eat as much as they can
  4. {n,m} – you can omit one of these – {n,} n or more – {,m} upper bound of m
  5. | – has an identity crisis cat|dog, is it a cat or a dog?
  6. ( ) – grouping is interesting, it has two purposes – it can affect the order of operations, but it is also used to extract sections of matching text
  7. [ ] – just wants to hug everyone
  8. ^ and $ – handy if you know the starting or ending of the pattern you’re looking for
  9. \ – Allows you to put any of these special characters in as literal characters

The syntax of regular expressions is made up of characters that make up expression elements. These elements can represent:

  • The possible characters
  • A quantifier of how many of these characters are allowed to match
  • Grouping to define scope and precedence
  • Anchoring to the start or end
  • Boolean or operation
  • An escape character, to allow for special characters to be made literal

Character classes

Character classes use the escape syntax to allow for quick definitions of ranges of character values. We go through some of the commonly used ones.

regular expressions character classes

Good to note that not all tools support the same character classes. POSIX is a little different, classes are like [:digit:] and can only be used within bracket expressions. 

Modifiers

Modifiers are used to change how a regular expression is run against the target text. Some tools allow for a wide range of these, others will provide some of them as checkbox items, while others won’t have this functionality. We take a look at some of the common modifiers and how they change the operation of the regular expression.

regular expressions modifiers

Single line – By modifying the . to also match the new line character, you can also think of it as treating all the text as a single line, since abc.*xyz will match the whole text even if there are only 3 characters per line.

These are generally found in programming languages, and more advanced tools, and are appended to the end of the expression with some additional syntax, like /[a-z]+/gi

There are also things like assertions (lookahead, lookbehind, conditionals, etc), the ability to specify characters using octal or hexadecimal. You won’t need these much.

Lazy and greedy

Regular expressions can be lazy or greedy

^R.*y

Lazy: Regular expressions can be lazy

Greedy:  Regular expressions can be lazy or greedy

By default, regular expressions are greedy!

The question mark has another purpose, it can modify things to be lazy. ^R.*?y will use lazy matching.

In some cases, you can also specify to use lazy matching globally using modifiers, like U (for ungreedy).

Walkthrough

Let’s use the example earlier.

\b[\w.%+-][email protected][\w.-]+\.[a-zA-Z]{2,6}\b

regular expressions example walkthrough

What will match in the below text?

“Some example email addresses are [email protected], [email protected]. Some more examples include: [email protected], [email protected] and really%this(is)@your.example.nz

Recently used Regular Expressions

I have used regular expressions a lot in programming and in various tools. Most recently I’ve used them in finding and extracting tokens in URLs, finding specific lines in CSV files, in Google Search Console, and in Google Analytics.

regex on google analytics

reg ex on google search console

Testing it out

Regular Expression testing sites:

Here are just some useful regular expression testing sites, which you can utilise to test some examples, and build a regular expression.

Kirsten Tanner

Kirsten Tanner

Editor in Chief at In Marketing We Trust. Passionate about content marketing and dogs. Loves creating long-form, evergreen and 10x content. Is mentioned in Guy Kawasaki's latest book.

Leave a Reply