9.2. Regex#
String methods allow you to search and manipulate text to a certain extent. A regular expression (regex) is a sequence of characters that defines a search pattern to search, match, and manipulate text in more powerful ways that go beyond simple string methods. For example:
Task |
Use |
|---|---|
Check if string starts with ‘http’ |
|
Replace all spaces with underscores |
|
Extract all email addresses from text |
regex |
Validate a phone number format |
regex |
Find words matching a complex pattern |
regex |
The rule of thumb: if the pattern is fixed and simple, use string methods. If the pattern is variable or complex, use regex.
For example, to search a pattern in a text, we may use the find() string method and an index is returned if the pattern is found.
text = "I am Dracula; and I bid you welcome, Mr. Harker,\
to my house."
pattern = 'Dracula'
text.find(pattern)
5
9.2.1. Escape Sequences and Raw Strings#
Before using the regular expression (re) functions, we need to understand regex escapes and raw strings.
Regex patterns use backslashes heavily. In regex, the backslash \ introduces escape sequences, which create special patterns or allow metacharacters to be treated as literal characters.
For example, common escape sequences include:
Pattern |
Meaning |
Example match |
|---|---|---|
|
digit |
|
|
word character (letters, digits, underscore) |
|
|
whitespace character |
space, tab, new line |
|
literal dot |
|
|
literal dollar sign |
|
|
literal backslash |
|
9.2.2. Raw Strings#
A raw string, on the other hand, is a string prefixed with r, which tells Python to treat backslashes \ as literal characters rather than escape sequences.
Prefix with
rto prevent Python from interpreting backslashes before the regex engine sees the pattern; works like\.Raw strings avoid double escaping and make patterns easier to read.
Without raw strings, you often need extra backslashes, like
'\\d+'.
regular = "\n" # newline character
raw = r"\n" # literally backslash + n (two characters)
print(regular) # prints a newline
print(raw) # prints \n
\n
Use escapes for literal special regex characters too (for example \., \$, \?).
print('\\\\') # double backslash in a normal string
print(r'\\') # double backslash in a raw string (same result)
print(r'\\\n') # backslash + n in a raw string
print(r'\\\n' == '\\\\n') # False: raw string is backslash + n, normal string is backslash + backslash + n
\\
\\
\\\n
False
Raw string tells Python we are treating this backslash \ as a special character.