Python Regex Flags

Summary: in this tutorial, you’ll learn about the Python regex flags and how they change the behavior of the regex engine for pattern matching.

Introduction to the Python regex flags

The regular expression functions like findall, finditer, search, match, split, sub, … have the parameter (flags) that accepts one or more regex flags.

Since Python 3.6, regex flags are instances of the RegexFlag enumeration class in the re module. The following table shows the available regex flags and their meanings:

FlagAliasInline FlagMeaning
re.ASCIIre.A?mThe re.ASCII is relevant to the byte patterns only. It makes the \w\W,\b\B\d, \D, and \S perform ASCII-only matching instead of full Unicode matching.
re.DEBUGN/AN/AThe re.DEBUG shows the debug information of compiled pattern.
re.IGNORECASEre.I?iperform case-insensitive matching. It means that the [A-Z] will also match lowercase letters.
re.LOCALEre.L?LThe re.LOCALE is relevant only to the byte pattern. It makes the \w\W\b\B and case-sensitive matching dependent on the current locale. The re.LOCALE is not compatible with the re.ASCII flag.
re.MUTILINEre.M?mThe re.MULTILINE makes the ^ matches at the beginning of a string and at the beginning of each line and $ matches at the end of a string and at the end of each line.
re.DOTALLre.S?sBy default, the dot (.) matches any characters except a newline. The re.DOTALL makes the dot (.) matches all characters including a newline.
re.VERBOSEre.X?xThe re.VERBOSE flag allows you to organize a pattern into logical sections visually and add comments.

To combine two or more flags, you use the | operator like this:

re. A | re.M | re.SCode language: Python (python)

Python regex flags

Let’s take some examples of using the Python regex flags.

1) The re.IGNORECASE flag example

The following example uses the findall() function to match all lowercase characters in the set [a-z] in a string:

import re s = 'Python is awesome' pattern = '[a-z]+' l = re.findall(pattern, s) print(l)Code language: Python (python)

Output:

['ython', 'is', 'awesome']Code language: Python (python)

Note that the letter P is not included in the result because it is not in the set [a-z].

The following example uses the re.INGORECASE flag:

import re s = 'Python is awesome' pattern = '[a-z]+' l = re.findall(pattern, s, re.IGNORECASE) print(l)Code language: Python (python)

Output:

['Python', 'is', 'awesome']Code language: Python (python)

Even though the pattern matches only characters in the set [a-z], the re.IGNORECASE flag instructs the regex engine to also match characters in [A-Z] set.

2) The re.MULTILINE flag example

The following example uses the ^ anchor to match one or more word characters at the beginning of a string:

import re s = '''Regex Flags''' pattern ='^\w+' l = re.findall(pattern,s) print(l)Code language: Python (python)

Output:

['Regex']Code language: Python (python)

The s string has two lines. The ^ only match at the beginning of the string as expected.

If you use the re.MULTILINE flag, the ^ will match at the beginning of each line. For example:

import re s = '''Regex Flags''' pattern = '^\w+' l = re.findall(pattern, s, re.MULTILINE) print(l)Code language: Python (python)

Output:

['Regex', 'Flags']Code language: Python (python)

3) The re.DOTALL flag example

In this example, the dot .+ pattern match one or more characters except for the new line:

import re s = '''Regex Flags''' pattern = '.+' l = re.findall(pattern, s) print(l)Code language: Python (python)

Output:

['Regex', 'Flags']Code language: Python (python)

If you use the re.DOTALL flag, the .+ will also match the new line:

import re s = '''Regex Flags''' pattern = '.+' l = re.findall(pattern, s, re.DOTALL) print(l)Code language: Python (python)

Output:

['Regex\nFlags']Code language: Python (python)

4) The re.VERBOSE flag example

The following example shows how to use the re.VERBOSE flag to write a pattern in sections with comments:

import re s = 'Python 3' pattern = r'''^(\w+) # match one or more characters at the beginning of the string \s* # match zero or more spaces (\d+)$ # match one or more digits at the end of the string''' l = re.findall(pattern, s, re.VERBOSE) print(l)Code language: Python (python)

Output:

[('Python', '3')]Code language: Python (python)

In this example, the re.VERBOSE flag allows us to add spaces and comments to the regular expression to explain each individual rule.

5) The re.ASCII flag example

The following example matches words with two characters:

import re s = '作法 is Pythonic in Japanese' pattern = r'\b\w{2}\b' l = re.findall(pattern, s) print(l)Code language: Python (python)

Output:

['作法', 'is', 'in']Code language: Python (python)

However, if you use the re.ASCII flag, the matches will contain only ASCII characters:

import re s = '作法 is Pythonic in Japanese' pattern = r'\b\w{2}\b' l = re.findall(pattern, s, re.ASCII) print(l)Code language: Python (python)

Output:

['is', 'in']

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *