Python Regex Lookahead

Summary: in this tutorial, you’ll learn about Python regex lookahead and negative lookahead.

Introduction to the Python regex lookahead

Sometimes, you want to match X but only if it is followed by Y. In this case, you can use the lookahead in regular expressions.

The syntax of the lookahead is as follows:

X(?=Y)Code language: Python (python)

This syntax means to search for X but matches only if it is followed by Y.

For example, suppose you have the following string:

'1 Python is about 4 feet long'Code language: Python (python)

And you want to match the number (4) that is followed by a space and the literal string feet, not the number 1. In this case, you can use the following pattern that contains a lookahead:

\d+(?=\s*feet)Code language: Python (python)

In this pattern:

  • \d+ is the combination of the digit character set with the + quantifier that matches one or more digits.
  • ?= is the lookahead syntax
  • \s* is the combination of the whitespace character set and * quantifier that matches zero or more whitespaces.
  • feet matches the literal string feet.

The following code uses the above pattern to match the number that is followed by zero or more spaces and the literal string feet:

import re s = '1 Python is about 4 feet long' pattern = '\d+(?=\s*feet)' matches = re.finditer(pattern,s) for match in matches: print(match.group())Code language: Python (python)

Output:

4Code language: PHP (php)

Regex multiple lookaheads

Regex allows you to have multiple lookaheads with the following syntax:

 X(?=Y)(?=Z)Code language: Python (python)

In this syntax, the regex engine will perform the following steps:

  1. Find X
  2. Test if Y is immediately after X, skip if it isn’t.
  3. Test if Z is also immediately after Y; skip if it isn’t.
  4. If both tests pass, the X is a match; otherwise, search for the next match.

So the X(?=Y)(?=Z) pattern matches X followed by Y and Z simultaneously.

Regex negative lookaheads

Suppose you want to match only the number 1 in the following text but not the number 4:

'1 Python is about 4 feet long'Code language: Python (python)

To do that, you can use the negative lookahead syntax:

X(?!Y)Code language: Python (python)

The X(?!Y) matches X only if it is not followed by Y. It’s the \d+ not followed by the literal string feet:

import re s = '1 Python is about 4 feet long' pattern = '\d+(?!\s*feet)' matches = re.finditer(pattern,s) for match in matches: print(match.group())Code language: Python (python)

Output:

1

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *