8.1.5. Select Topics#
import sys
from pathlib import Path
# Find project root by looking for _config.yml
current = Path.cwd()
for parent in [current, *current.parents]:
if (parent / '_config.yml').exists():
project_root = parent
break
else:
project_root = Path.cwd().parent.parent
# Add project root to path
sys.path.insert(0, str(project_root))
# Import shared teaching helpers and cell magics
from shared import thinkpython, diagram, jupyturtle, structshape
from shared.download import download
8.1.5.1. Dictionary Comprehension#
Dictionary comprehensions provide a concise way to build dictionaries from existing data. The pattern is similar to a list comprehension, but each result contains a key and a value.
The basic syntax:
{key_expr: value_expr for item in iterable}
# Dictionary comprehensions
# Create a dictionary of squares
square_dict = {x: x**2 for x in range(5)}
print("Square dict:", square_dict)
# From two lists
keys = ['a', 'b', 'c', 'd']
values = [1, 2, 3, 4]
combined_dict = {k: v for k, v in zip(keys, values)}
print("Combined dict:", combined_dict)
# Filter and transform
prices = {'apple': 0.50, 'banana': 0.30, 'orange': 0.80, 'grape': 1.20}
expensive_fruits = {fruit: price for fruit, price in prices.items() if price > 0.50}
print("Expensive fruits:", expensive_fruits)
Square dict: {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
Combined dict: {'a': 1, 'b': 2, 'c': 3, 'd': 4}
Expensive fruits: {'orange': 0.8, 'grape': 1.2}
### Exercise: Dictionary Comprehension
# 1. Use a dict comprehension to create a dictionary that maps
# each number 1-5 to its cube (n**3).
# 2. Given the prices dict below, use a dict comprehension to keep
# only fruits that cost more than $1.00.
prices = {'apple': 1.20, 'banana': 0.50, 'cherry': 2.00, 'grape': 0.80}
### Your code starts here.
### Your code ends here.
Cubes: {1: 1, 2: 8, 3: 27, 4: 64, 5: 125}
Pricey: {'apple': 1.2, 'cherry': 2.0}
8.1.5.2. Counting with Dictionaries#
Suppose you are given a string and you want to count how many times each letter appears. A dictionary is a good tool for this job. We’ll start with an empty dictionary.
counter = {}
As we loop through the letters in the string, suppose we see the letter 'a' for the first time.
We can add it to the dictionary like this.
counter['a'] = 1
The value 1 indicates that we have seen the letter once.
Later, if we see the same letter again, we can increment the counter like this.
counter['a'] += 1
Now the value associated with 'a' is 2, because we’ve seen the letter twice.
counter
{'a': 2}
The following function uses these features to count the number of times each letter appears in a string.
def value_counts(string):
counter = {}
for letter in string:
if letter not in counter:
counter[letter] = 1
else:
counter[letter] += 1
return counter
Each time through the loop, if letter is not in the dictionary, we create a new item with key letter and value 1.
If letter is already in the dictionary we increment the value associated with letter.
Here’s an example.
counter = value_counts('brontosaurus')
counter
{'b': 1, 'r': 2, 'o': 2, 'n': 1, 't': 1, 's': 2, 'a': 1, 'u': 2}
The items in counter show that the letter 'b' appears once, 'r' appears twice, and so on.
### Exercise: Counting with a Dictionary
# 1. Use the value_counts() function defined above to count the
# frequency of each letter in the word "mississippi".
# 2. Print the letter that appears the most (highest count).
### Your code starts here.
### Your code ends here.
{'m': 1, 'i': 4, 's': 4, 'p': 2}
Most common letter: i -> 4
8.1.5.3. Iterating Through Dictionaries#
If you use a dictionary in a for statement, it traverses the keys of the dictionary.
To demonstrate, let’s make a dictionary that counts the letters in 'banana'.
counter = value_counts('banana')
counter
{'b': 1, 'a': 3, 'n': 2}
The following loop prints the keys, which are the letters.
for key in counter:
print(key)
b
a
n
To print the values, we can use the values method.
for value in counter.values():
print(value)
1
3
2
To print the keys and values, we can loop through the keys and look up the corresponding values.
for key in counter:
value = counter[key]
print(key, value)
b 1
a 3
n 2
### Exercise: Iterating Through a Dictionary
# Use this dictionary:
scores = {"alice": 92, "bob": 85, "charlie": 78}
# 1. Print only the names (keys).
# 2. Print only the scores (values).
# 3. Print each name and score together using .items().
### Your code starts here.
### Your code ends here.
alice
bob
charlie
92
85
78
alice 92
bob 85
charlie 78
8.1.5.4. Why Dictionary Lookup Is Fast#
The items in a Python dictionary are stored in a hash table, which is a way of organizing data that has a remarkable property: the in operator takes about the same amount of time no matter how many items are in the dictionary.
That makes it possible to write some remarkably efficient algorithms.
Note
Big O Notation
Big O notation is a way of describing how the time (or memory) an operation needs grows as the size of the data grows. We use the letter n to represent the number of items.
The two cases you’ll encounter most often here:
Notation |
Name |
Meaning |
|---|---|---|
O(1) |
Constant time |
Time stays the same regardless of n |
O(n) |
Linear time |
Time grows in proportion to n |
Imagine searching for a word in a book.
O(1) is like looking up a word in a dictionary: you go directly to the page. You know exactly which page to turn to, so the search takes the same time no matter how thick the book is, that’s O(1).
O(n) is like finding a word in a novel: you read until you find it. The worst case is you may have to scan every page until the end of the book, that’s O(n): the more pages you have, the more time it would likely to take.
Dictionary lookup is O(1) because Python computes a hash of the key and jumps directly to the right slot in memory, without scanning. List lookup with in is O(n) because Python checks each element in order until it finds a match (or reaches the end).
The full treatment of Big O notation — including O(log n), O(n²), and others — comes in the Algorithms chapter.
The following example builds a list and a dictionary with one million entries, then times a worst-case lookup in each. It shows the difference between O(n) linear search (list) and O(1) constant-time lookup (dictionary) in practice.
import time
# Build a large list and a large dictionary with the same 1 million keys
n = 1_000_000
big_list = list(range(n))
big_dict = {i: True for i in range(n)}
target = n - 1 # worst case: last element
# O(n): list search time scales up to n elements
start = time.time()
result = target in big_list
list_time = time.time() - start
# O(1): dict lookup jumps directly
start = time.time()
result = target in big_dict
dict_time = time.time() - start
print(f"List lookup (O(n)): {list_time:.6f} s")
print(f"Dict lookup (O(1)): {dict_time:.6f} s")
print(f"Dict was ~{list_time / dict_time:.0f}x faster")
List lookup (O(n)): 0.006787 s
Dict lookup (O(1)): 0.000025 s
Dict was ~274x faster
To demonstrate, we’ll compare two algorithms for finding pairs of words where one is the reverse of another – like stressed and desserts.
We’ll start by reading the word list.
from pathlib import Path
words_file = Path('../../data/words.txt')
if not words_file.exists():
download('https://raw.githubusercontent.com/AllenDowney/ThinkPython/v3/words.txt', words_file)
word_list = words_file.read_text().split()
print(len(word_list))
print(type(word_list))
113783
<class 'list'>
To check out the file content:
for word in word_list[:10]:
print(word)
aa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals
aardvark
And here’s the reverse_word function.
def reverse_word(word):
return ''.join(reversed(word))
reverse_word('hello')
'olleh'
The following function loops through the words in the list. For each one, it reverses the letters and then checks whether the reversed word is in the word list.
def too_slow():
count = 0
for word in word_list:
if reverse_word(word) in word_list: ### O(n) lookup in a list is slow
count += 1
return count
To measure how long a function takes, we can use %time which is one of Jupyter’s “built-in magic commands”.
These commands are not part of the Python language, so they might not work in other development environments.
%time too_slow()
Output:
CPU times: user 1min 9s, sys: 182 ms, total: 1min 9s
Wall time: 1min 9s
885
Uncomment the line to run the command in Live Code mode or in your editor.
# %time too_slow()
This function takes more than a minute to run.
The problem is that the in operator checks the words in the list one at a time, starting at the beginning.
If it doesn’t find what it’s looking for – which happens most of the time – it has to search all the way to the end.
And the in operator is inside the loop, so it runs once for each word.
Since there are more than 100,000 words in the list, and for each one we check more than 100,000 words, the total number of comparisons is the number of words squared – roughly – which is almost 13 billion.
len(word_list)**2
12946571089
We can make this function much faster with a dictionary. The following loop creates a dictionary that contains the words as keys.
word_dict = {}
for word in word_list:
word_dict[word] = 1
Or, you can use dictionary comprehension:
word_dict = {word: 1 for word in word_list}
The values in word_dict are all 1, but they could be anything, because we won’t ever look them up – we will only use this dictionary to check whether a key exists.
Now here’s a version of the previous function that replaces word_list with word_dict.
def much_faster():
count = 0
for word in word_dict:
if reverse_word(word) in word_dict:
count += 1
return count
This function takes less than one hundredth of a second, so it’s about 10,000 times faster than the previous version.
In general, the time it takes to find an element in a list is proportional to the length of the list. The time it takes to find a key in a dictionary is almost constant – regardless of the number of items.
%time much_faster()
CPU times: user 23.3 ms, sys: 289 μs, total: 23.6 ms
Wall time: 23.6 ms
885
### Exercise: Dictionary vs. List Lookup
# Given the list of fruits below:
fruits = ['apple', 'banana', 'cherry', 'date', 'elderberry']
# 1. Create a dictionary with fruits as keys and 1 as values using dict comprehension.
# 2. Use the `in` operator to check if 'cherry' is in:
# a) the list b) the dictionary
# 3. Print both results.
### Your code starts here.
### Your code ends here.
True
True