Summary: in this tutorial, you’ll learn about the Python regex flags and how they change the behavior of the regex engine for pattern matching.
Introduction to the Python regex flags
The regular expression functions like findall, finditer, search, match, split, sub, … have the parameter (flags
) that accepts one or more regex flags.
Since Python 3.6, regex flags are instances of the RegexFlag
enumeration class in the re
module. The following table shows the available regex flags and their meanings:
Flag | Alias | Inline Flag | Meaning |
---|---|---|---|
re.ASCII | re.A | ?m | The re.ASCII is relevant to the byte patterns only. It makes the \w , \W ,\b , \B , \d , \D, and \S perform ASCII-only matching instead of full Unicode matching. |
re.DEBUG | N/A | N/A | The re.DEBUG shows the debug information of compiled pattern. |
re.IGNORECASE | re.I | ?i | perform case-insensitive matching. It means that the [A-Z] will also match lowercase letters. |
re.LOCALE | re.L | ?L | The re.LOCALE is relevant only to the byte pattern. It makes the \w , \W , \b , \B and case-sensitive matching dependent on the current locale. The re.LOCALE is not compatible with the re.ASCII flag. |
re.MUTILINE | re.M | ?m | The re.MULTILINE makes the ^ matches at the beginning of a string and at the beginning of each line and $ matches at the end of a string and at the end of each line. |
re.DOTALL | re.S | ?s | By default, the dot (. ) matches any characters except a newline. The re.DOTALL makes the dot (. ) matches all characters including a newline. |
re.VERBOSE | re.X | ?x | The re.VERBOSE flag allows you to organize a pattern into logical sections visually and add comments. |
To combine two or more flags, you use the |
operator like this:
re. A | re.M | re.S
Code language: Python (python)
Python regex flags
Let’s take some examples of using the Python regex flags.
1) The re.IGNORECASE flag example
The following example uses the findall()
function to match all lowercase characters in the set [a-z]
in a string:
import re s = 'Python is awesome' pattern = '[a-z]+' l = re.findall(pattern, s) print(l)
Code language: Python (python)
Output:
['ython', 'is', 'awesome']
Code language: Python (python)
Note that the letter P
is not included in the result because it is not in the set [a-z]
.
The following example uses the re.INGORECASE
flag:
import re s = 'Python is awesome' pattern = '[a-z]+' l = re.findall(pattern, s, re.IGNORECASE) print(l)
Code language: Python (python)
Output:
['Python', 'is', 'awesome']
Code language: Python (python)
Even though the pattern matches only characters in the set [a-z]
, the re.IGNORECASE
flag instructs the regex engine to also match characters in [A-Z]
set.
2) The re.MULTILINE flag example
The following example uses the ^ anchor to match one or more word characters at the beginning of a string:
import re s = '''Regex Flags''' pattern ='^\w+' l = re.findall(pattern,s) print(l)
Code language: Python (python)
Output:
['Regex']
Code language: Python (python)
The s string has two lines. The ^
only match at the beginning of the string as expected.
If you use the re.MULTILINE
flag, the ^
will match at the beginning of each line. For example:
import re s = '''Regex Flags''' pattern = '^\w+' l = re.findall(pattern, s, re.MULTILINE) print(l)
Code language: Python (python)
Output:
['Regex', 'Flags']
Code language: Python (python)
3) The re.DOTALL flag example
In this example, the dot .+
pattern match one or more characters except for the new line:
import re s = '''Regex Flags''' pattern = '.+' l = re.findall(pattern, s) print(l)
Code language: Python (python)
Output:
['Regex', 'Flags']
Code language: Python (python)
If you use the re.DOTALL
flag, the .+
will also match the new line:
import re s = '''Regex Flags''' pattern = '.+' l = re.findall(pattern, s, re.DOTALL) print(l)
Code language: Python (python)
Output:
['Regex\nFlags']
Code language: Python (python)
4) The re.VERBOSE flag example
The following example shows how to use the re.VERBOSE
flag to write a pattern in sections with comments:
import re s = 'Python 3' pattern = r'''^(\w+) # match one or more characters at the beginning of the string \s* # match zero or more spaces (\d+)$ # match one or more digits at the end of the string''' l = re.findall(pattern, s, re.VERBOSE) print(l)
Code language: Python (python)
Output:
[('Python', '3')]
Code language: Python (python)
In this example, the re.VERBOSE
flag allows us to add spaces and comments to the regular expression to explain each individual rule.
5) The re.ASCII flag example
The following example matches words with two characters:
import re s = '作法 is Pythonic in Japanese' pattern = r'\b\w{2}\b' l = re.findall(pattern, s) print(l)
Code language: Python (python)
Output:
['作法', 'is', 'in']
Code language: Python (python)
However, if you use the re.ASCII
flag, the matches will contain only ASCII characters:
import re s = '作法 is Pythonic in Japanese' pattern = r'\b\w{2}\b' l = re.findall(pattern, s, re.ASCII) print(l)
Code language: Python (python)
Output:
['is', 'in']
Leave a Reply