Chapter 8

Strings

9.0 Intro · 9.1 Strings · 9.2 Regex · 9.3 Text Analysis

← → or Space to navigate · F for fullscreen

8.1 Strings

Methods, comparison, looping, sorting

String Methods

s = "  Hello, World!  "

print(s.strip())          # 'Hello, World!'
print(s.lower())          # '  hello, world!  '
print(s.upper())
print(s.replace("World", "Python"))
print(s.split(", "))      # ['  Hello', 'World!  ']
print(",".join(["a","b","c"])) # 'a,b,c'

Testing methods

"hello".startswith("he")   # True
"hello123".isalnum()        # True
"   ".isspace()             # True
"Hello".istitle()           # True

# Check and membership
"ell" in "hello"            # True
"hello".find("ll")          # 2  (-1 if not found)
"hello".count("l")          # 2

String Comparison & Sorting

Strings compare lexicographically using Unicode code points.

print("apple" < "banana")   # True
print("B" < "a")            # True  (uppercase < lowercase)
print(ord("A"), ord("a"))   # 65 97

Sorting strings

words = ["banana", "apple", "Cherry", "date"]

# Default: case-sensitive (uppercase first)
print(sorted(words))
# ['Cherry', 'apple', 'banana', 'date']

# Case-insensitive
print(sorted(words, key=str.lower))
# ['apple', 'banana', 'Cherry', 'date']

# By length
print(sorted(words, key=len))

8.2 Regular Expressions

Pattern matching with the re module

re Module Basics

Function Returns
re.search(pat, s) First match object or None
re.findall(pat, s) List of all matches
re.sub(pat, rep, s) String with replacements
re.compile(pat) Compiled pattern object
import re

text = "Call 555-1234 or 555-5678 for info."

# Find all phone numbers
phones = re.findall(r"\d{3}-\d{4}", text)
print(phones)   # ['555-1234', '555-5678']

# Replace
clean = re.sub(r"\d{3}-\d{4}", "XXX-XXXX", text)

Key Regex Patterns

Pattern Matches
. Any character except newline
\d Digit (0–9) · \D non-digit
\w Word char (letter/digit/_) · \W non-word
\s Whitespace · \S non-whitespace
^ / $ Start / end of string
* + ? 0+, 1+, 0 or 1 occurrences
{n} {m,n} Exactly n, or m to n occurrences
[abc] Any of a, b, c
(...) Capturing group

8.3 Text Analysis

Cleaning, word frequencies, Counter, Markov generation

Word Frequencies & Counter

Manual approach

word_counter = {}
for word in text.split():
    word = word.strip(".,!?").lower()
    word_counter[word] = word_counter.get(word, 0) + 1

# Sort by frequency
items = sorted(word_counter.items(),
               key=lambda t: t[1], reverse=True)

With Counter

from collections import Counter
import re

words = re.findall(r"\b[a-z]+\b", text.lower())
counts = Counter(words)

# Top 5 words
for word, freq in counts.most_common(5):
    print(freq, word, sep="\t")

Counter replaces the manual loop and adds .most_common() directly.

Chapter 8 — Quick Reference

Concept Key syntax / notes
Strip / clean .strip(), .lower(), .replace()
Split / join .split(sep), sep.join(seq)
Membership "sub" in s, .find(), .count()
Regex search re.search(pattern, string)
Find all re.findall(pattern, string)
Substitute re.sub(pattern, repl, string)
Compile re.compile(pattern) for reuse
Word freq Counter(words).most_common(n)
Markov {(w1,w2): [w3, ...]} bigram model

End of Chapter 8

Next: Chapter 9 — Object-Oriented Programming

classes · instances · inheritance · polymorphism