String Comparison

9.1.3. String Comparison#

import sys
from pathlib import Path

# Find project root by looking for _config.yml
current = Path.cwd()
for parent in [current, *current.parents]:
    if (parent / '_config.yml').exists():
        project_root = parent
        break
else:
    project_root = Path.cwd().parent.parent

# Add project root to path
sys.path.insert(0, str(project_root))

# Import shared teaching helpers and cell magics
from shared import thinkpython, diagram, jupyturtle, structshape
from shared.download import download

Observe the following operations.

### check out the comparisons here:

print("A" < 'a')
print("a" < 'banana')
print('Pineapple' > 'pineapple')
print('Pineapple' > 'banana')
True
True
False
False

The relational operators work on strings as seen above. String comparisons are based on the ASCII code table (this one is easier to read than the one presented in an earlier chapter). As you can see in the table below, each character has a decimal number that string comparison uses to compare strings. Note that:

  • 0 is 48

  • A is 65

  • a is 97

Dec Chr Dec Chr Dec Chr Dec Chr Dec Chr
0 NUL 26 SUB 52 4 78 N 104 h
1 SOH 27 ESC 53 5 79 O 105 i
2 STX 28 FS 54 6 80 P 106 j
3 ETX 29 GS 55 7 81 Q 107 k
4 EOT 30 RS 56 8 82 R 108 l
5 ENQ 31 US 57 9 83 S 109 m
6 ACK 32 58 : 84 T 110 n
7 BEL 33 ! 59 ; 85 U 111 o
8 BS 34 " 60 < 86 V 112 p
9 HT 35 # 61 = 87 W 113 q
10 LF 36 $ 62 > 88 X 114 r
11 VT 37 % 63 ? 89 Y 115 s
12 FF 38 & 64 @ 90 Z 116 t
13 CR 39 ' 65 A 91 [ 117 u
14 SO 40 ( 66 B 92 \ 118 v
15 SI 41 ) 67 C 93 ] 119 w
16 DLE 42 * 68 D 94 ^ 120 x
17 DC1 43 + 69 E 95 _ 121 y
18 DC2 44 , 70 F 96 ` 122 z
19 DC3 45 - 71 G 97 a 123 {
20 DC4 46 . 72 H 98 b 124 |
21 NAK 47 / 73 I 99 c 125 }
22 SYN 48 0 74 J 100 d 126 ~
23 ETB 49 1 75 K 101 e 127 DEL
24 CAN 50 2 76 L 102 f
25 EM 51 3 77 M 103 g

So we can use the ASCII code table to compare strings.

word = 'banana'

if word == 'banana':
    print('All right, banana.')
All right, banana.

Other relational operations are useful for putting words in alphabetical order:

def compare_word(word):
    if word < 'banana':
        print(word, 'comes before banana.')
    elif word > 'banana':
        print(word, 'comes after banana.')
    else:
        print('All right, banana.')
compare_word('apple')
apple comes before banana.

Python does not handle uppercase and lowercase letters the same way people do. All the uppercase letters come before all the lowercase letters, so:

compare_word('Pineapple')
Pineapple comes before banana.

This can be problematic sometimes. To solve this problem, we can convert strings to a standard format, such as all lowercase or all uppercase, before performing the comparison.

compare_word('Pineapple'.lower())
pineapple comes after banana.
### EXERCISE: String Comparison
# Difficulty: Challenge
words = ['Apple', 'apple', 'banana', 'Banana', 'cherry']
# 1. Print the list sorted as-is
# 2. Print the list sorted case-insensitively
# 3. Build and print a list of tuples: (word, word.casefold())
# 4. Print whether "Apple" and "apple" are equal under casefold
### Your code starts here:


### Your code ends here.

Hide code cell source

# Solution
words = ['Apple', 'apple', 'banana', 'Banana', 'cherry']
print(sorted(words))
print(sorted(words, key=str.casefold))
pairs = [(w, w.casefold()) for w in words]
print(pairs)
print('Apple'.casefold() == 'apple'.casefold())
['Apple', 'Banana', 'apple', 'banana', 'cherry']
['Apple', 'apple', 'banana', 'Banana', 'cherry']
[('Apple', 'apple'), ('apple', 'apple'), ('banana', 'banana'), ('Banana', 'banana'), ('cherry', 'cherry')]
True