The re Module

9.2.3. The `re` Module#

Python’s built-in re module provides regex support. The 6 most commonly used regex functions are:

Function	Description	Sample Syntax	Return
`re.search()`	Find first match anywhere in the string	`re.search(pattern, text)`	`Match` object or `None`
`re.match()`	Match only at the start of the string	`re.match(pattern, text)`	`Match` object or `None`
`re.findall()`	Find all matches; return as a list	`re.findall(pattern, text)`	`list` of strings
`re.sub()`	Find and replace matches	`re.sub(pattern, repl, text)`	`str`
`re.split()`	Split string on a pattern	`re.split(pattern, text)`	`list` of strings
`re.fulmatch()`	Match the entire string against the pattern	re.fullmatch(pattern, text)	`Match` object or `None`

import sys
from pathlib import Path

# Find project root by looking for _config.yml
current = Path.cwd()
for parent in [current, *current.parents]:
    if (parent / '_config.yml').exists():
        project_root = parent
        break
else:
    project_root = Path.cwd().parent.parent

# Add project root to path
sys.path.insert(0, str(project_root))

# Import shared teaching helpers and cell magics
from shared import thinkpython, diagram, jupyturtle, structshape
from shared.download import download

9.2.3.1. The `Match` object#

re.search(), re.match(), and re.fullmatch() functions return a Match object when pattern is matched.

For example,

returns: re.search(pattern, text) scans through text and returns
- a Match object for the first location where pattern is found.
- If the pattern is not found anywhere in the string, it returns None.
Match: A Match object has the following commonly used attributes and methods:

Attribute / Method	Description	Example
`.group()`	Returns the matched substring	`m.group()` → `'Dracula'`
`.start()`	Index where the match begins	`m.start()` → `5`
`.end()`	Index where the match ends	`m.end()` → `12`
`.span()`	Tuple of `(start, end)`	`m.span()` → `(5, 12)`
`.string`	The original string that was searched	`m.string` → `'I am Dracula...'`

import re

text = "I am Dracula; and I bid you welcome, Mr. Harker, to my house."
pattern = 'Dracula'

result = re.search(pattern, text)     ### pattern: Dracula; text: the line
result                              ### the Match object

<re.Match object; span=(5, 12), match='Dracula'>

If the pattern appears in the text, search returns a Match object that contains the results of the search.

String: Among other information, it has a variable named string that contains the text that was searched.
Group: It also provides a method called group that returns the part of the text that matched the pattern.
Span and Start/End: And it provides a method called span that returns the index in the text where the pattern starts and ends.

print(result.string)
print(result.group())
print(result.start())
print(result.end())
print(result.span())

I am Dracula; and I bid you welcome, Mr. Harker, to my house.
Dracula
5
12
(5, 12)

Note

.group() returns the matched substring from the text — the portion of the text that the pattern matched against. In simple cases like re.search('Dracula', text), the match equals the pattern string. But with a regex like r'\$[\d.]+', .group() would return something like '$42.99' — the actual text that matched, not the pattern expression itself.

If the pattern doesn’t appear in the text, the return value from search is None. So we can check whether the search was successful by checking whether the result is None.

result = re.search('Count', text)
print(result)

result is None

None

True

s = "This is a test of the regular expression system."
print(re.findall('is', s))  # ['is', 'is']
print(re.findall('is.', s)) # ['is ', 'is ']    ### 'is' followed by any character (space in this case)
print(re.findall('is.?', s)) # ['is ', 'is ']   ### 'is' followed by zero or one character (space in this case)
print(re.findall('is.?', s, re.IGNORECASE)) # ['is ', 'is '] ### same as above, but case-insensitive   
print(re.findall('is.?', s, re.IGNORECASE | re.DOTALL)) # ['is ', 'is ']    ### same as above, but also makes '.' match newline characters (not relevant in this case since there are no newlines)

['is', 'is']
['is ', 'is ']
['is ', 'is ']
['is ', 'is ']
['is ', 'is ']

The + in the pattern means one or more occurrence.

import re

text = "The price is $42.99 and $7.50"

# findall — get all matches
print(re.findall(r"\$[\d.]+", text))         # ['$42.99', '$7.50']

# search — first match object
m = re.search(r"\$[\d.]+", text)
print(m.group())                              # '$42.99'
print(m.start(), m.end())                     # position in string

# sub — replace
print(re.sub(r"\$[\d.]+", "PRICE", text))    # 'The price is PRICE and PRICE'

# split
print(re.split(r"\s+", "one  two   three"))  # ['one', 'two', 'three']

['$42.99', '$7.50']
$42.99
13 19
The price is PRICE and PRICE
['one', 'two', 'three']

### EXERCISE: Regex Escape Sequences
# Difficulty: Basic
s = "Price: $19.95, code=A_7, spaces here"
# 1. Extract all digit sequences
# 2. Extract all word tokens
# 3. Extract literal '$' and literal '.' matches
### Your code starts here:



### Your code ends here.

['19', '95', '7']
['Price', '19', '95', 'code', 'A_7', 'spaces', 'here']
['$']
['.']

### EXERCISE: The Match Object
# Difficulty: Basic
import re
text = "Customer ID: 4892, Order date: 2024-03-15"
# 1. Use re.search() to find the first 4-digit number in text
# 2. Print the matched string, start index, end index, and span
### Your code starts here:



### Your code ends here.

The re Module

Contents

9.2.3. The re Module#

9.2.3.1. The Match object#

9.2.3. The `re` Module#

9.2.3.1. The `Match` object#