dict
5 Other collections - Dictionaries and Sets
5.1 Dictionaries
We’ve discussed three built-in sequence collections — strings
, lists
and tuples
. Now, we consider the built-in non-sequence collections — dictionaries
and sets
. A dictionary
is an unordered collection which stores key–value pairs that map immutable keys to values, just as a conventional dictionary maps words to definitions. A set
is an unordered collection of unique immutable elements.
Like a list
, a dictionary
is a mutable collection of many values, but more general. In a list
, the index positions have to be integers; in a dictionary
, the indices can be any immutable data type. You can think of a dictionary as a mapping between a set of indices (which are called keys) and a set of values. Each key maps to a value. The association of a key and a value is called a key-value pair or sometimes an item.
A dictionary
’s keys must be immutable (such as strings
, integers
or tuples
) and unique (that is, no duplicates). However, multiple keys can have the same value.
It is noted that as of Python 3.7, dictionary items maintain the order in which they are inserted into the dictionary. However,
dictionaries
are considered unordered collections and do not write code that depends on the order of the key–value pairs.
As an example, we’ll build a dictionary that maps from subjects to grades, so the keys are string
while the values are integers
. The function dict
creates a new dictionary with no items.
To add/update items to the dictionary, you can again use subscript operator (square brackets):
grade['calculus'] = 85 # Key:'calculus', value: 85
print(grade) # Note that key and value are separate by colon
{'calculus': 85}
You can create a dictionary
that contains multiple items by enclosing in curly braces, {}
, a comma-separated list of key–value pairs, each of the form key:value
.
grade = {'calculus':85, 'introduction to mathematics':80, 'computer programming':90, 'linear algebra':95}
grade
{'calculus': 85,
'introduction to mathematics': 80,
'computer programming': 90,
'linear algebra': 95}
You can store them using separate lists
for subjects and scores, but the following update and maintenance will become tedious:
subjects = ['calculus', 'introduction to mathematics', 'computer programming', 'linear algebra']
score = [85, 80, 90, 95]
You can now use the keys to look up the corresponding values:
You can obtain the number of items using len()
Note that you can’t access items in them using integer indexes like
grade[0]
because it is unordered collections. (Consider the case when you use 4, 2, 1, 0 as the keys). Therefore, you can’t use slice syntax for dictionaries.
Trying to access a key that does not exist in a dictionary
will result in a KeyError
error message, much like a list’s
“out-of-range” IndexError
error message.
To add or delete an entry, it is similar to list
{'calculus': 85,
'introduction to mathematics': 80,
'computer programming': 90,
'linear algebra': 95,
'English': 100}
You can delete a key–value pair from a dictionary with the del
statement:
{'calculus': 85,
'introduction to mathematics': 80,
'computer programming': 90,
'linear algebra': 95}
5.1.1 The keys()
, values()
, and items()
Methods
There are three dictionary
methods that will return list
-like values of the dictionary
’s keys, values, or both keys and values: keys()
, values()
, and items()
. The values returned by these methods are not true lists, but these data types (dict_keys
, dict_values
, and dict_items
, respectively) can be used in for
loops (Just like range
object)!
If you want a true list from one of these methods, pass its list-like return value to the list()
function
['calculus', 'introduction to mathematics', 'computer programming', 'linear algebra']
[85, 80, 90, 95]
Here, a for
loop iterates over each of the values in the grade
dictionary. A for
loop can also iterate over the keys:
calculus
introduction to mathematics
computer programming
linear algebra
Note that by default, it will traverse over the keys!
Dictionaries have a method called items()
that returns a list of tuples, where each tuple is a key-value pair:
[('calculus', 85),
('introduction to mathematics', 80),
('computer programming', 90),
('linear algebra', 95)]
Combining items()
, multiple assignment, and for
, you can see a nice code pattern for traversing the keys and values of a dictionary in a single loop:
calculus 85
introduction to mathematics 80
computer programming 90
linear algebra 95
5.1.1.1 Checking Whether a Key or Value Exists in a Dictionary
Recall from the previous chapter that the in
and not in
operators can check whether a value exists in a list. You can also use these operators to see whether a certain key or value exists in a dictionary
Again, it will check keys by default. Therefore, in the previous example,
'calculus' in grade
is essentially a shorter version of writing'calculus' in grade.keys()
. This is always the case: if you ever want to check whether a value is (or isn’t) a key in thedictionary
, you can simply use thein
(ornot in
) keyword with thedictionary
itself.
5.1.1.2 Retrieve value uisng get()
Method
It’s tedious to check whether a key exists in a dictionary
before accessing that key’s value. Fortunately, dictionaries have a get()
method that takes two arguments: the key of the value to retrieve and a fallback value to return if that key does not exist.
picnicItems = {'apples': 5, 'cups': 2}
print('I am bringing ' + str(picnicItems.get('cups', 0)) + ' cups.')
print('I am bringing ' + str(picnicItems.get('eggs', 0)) + ' eggs.')
I am bringing 2 cups.
I am bringing 0 eggs.
Because there is no ‘eggs’ key in the picnicItems
dictionary, the default value 0 is returned by the get()
method. Without using get()
, the code would have caused a KeyError
message
5.1.1.3 Update value using setdefault()
Method
You’ll often have to set a value in a dictionary
for a certain key only if that key does not already have a value. The code looks something like this:
The setdefault()
method offers a way to do this in one line of code. The first argument passed to the method is the key to check for, and the second argument is the value to set at that key if the key does not exist.
The setdefault()
method is a nice shortcut to ensure that a key exists. Here is a short program that counts the number of occurrences of each letter in a string
.
message = 'It was a bright cold day in April, and the clocks were striking thirteen.'
count = {}
for character in message:
if character not in count:
count[character] = 0
count[character] = count[character] + 1
print(count)
{'I': 1, 't': 6, ' ': 13, 'w': 2, 'a': 4, 's': 3, 'b': 1, 'r': 5, 'i': 6, 'g': 2, 'h': 3, 'c': 3, 'o': 2, 'l': 3, 'd': 3, 'y': 1, 'n': 4, 'A': 1, 'p': 1, ',': 1, 'e': 5, 'k': 2, '.': 1}
message = 'It was a bright cold day in April, and the clocks were striking thirteen.'
count = {}
for character in message:
count.setdefault(character, 0)
count[character] = count[character] + 1
print(count)
{'I': 1, 't': 6, ' ': 13, 'w': 2, 'a': 4, 's': 3, 'b': 1, 'r': 5, 'i': 6, 'g': 2, 'h': 3, 'c': 3, 'o': 2, 'l': 3, 'd': 3, 'y': 1, 'n': 4, 'A': 1, 'p': 1, ',': 1, 'e': 5, 'k': 2, '.': 1}
You can view the execution of this program at https://autbor.com/setdefault. The program loops over each character in the message
variable’s string, counting how often each character appears. The setdefault()
method ensures that the key is in the count
dictionary
(with a default value of 0) so the program doesn’t throw a KeyError
error when count[character] = count[character] + 1
is executed!
From the output, you can see that the lowercase letter c appears 3 times, the space character appears 13 times, and the uppercase letter A appears 1 time.
5.1.2 Pretty Printing
If you import the pprint
module into your programs, you’ll have access to the pprint()
function that will “pretty print” a dictionary
’s values. This is helpful when you want a cleaner display of the items in a dictionary
than what print()
provides.
import pprint
message = 'It was a bright cold day in April, and the clocks were striking thirteen.'
count = {}
for character in message:
count.setdefault(character, 0)
count[character] = count[character] + 1
pprint.pprint(count)
{' ': 13,
',': 1,
'.': 1,
'A': 1,
'I': 1,
'a': 4,
'b': 1,
'c': 3,
'd': 3,
'e': 5,
'g': 2,
'h': 3,
'i': 6,
'k': 2,
'l': 3,
'n': 4,
'o': 2,
'p': 1,
'r': 5,
's': 3,
't': 6,
'w': 2,
'y': 1}
You can view the execution of this program at https://autbor.com/pprint/. This time, when the program is run, the output looks much cleaner, with the keys sorted. The pprint.pprint()
function is especially helpful when the dictionary itself contains nested lists
or dictionaries
.
5.1.2.1 Dictionary Comprehensions
Dictionary comprehensions provide a convenient notation for quickly generating dictionaries
, often by mapping one dictionary
to another. For example, in a dictionary
with unique values, you can reverse the key–value pairs:
{1: 'January', 2: 'February', 3: 'March'}
Curly braces delimit a dictionary comprehension, and the expression to the left of the for
specifies a key–value pair of the form key:value
. The comprehension iterates through months.items()
, unpacking each key–value pair tuple into the variables name
and number
. The expression number:name
reverses the key and value, so the new dictionary maps the month numbers to the month names.
A dictionary comprehension also can map a dictionary
’s values to new values. The following comprehension converts a dictionary
of names and lists
of grades into a dictionary
of names and grade-point averages. The variables k
and v
commonly mean key and value:
Note the above is nested structure, with a
list
in adictionary
!
5.1.3 Using Data Structures to Model Real-World Things
A tic-tac-toe
board looks like a large hash symbol (#
) with nine slots that can each contain an X
, an O
, or a blank. To represent the board with a dictionary
, you can assign each slot a key, as shown in below:
1 | 2 | 3 |
---|---|---|
4 | 5 | 6 |
7 | 8 | 9 |
You can use string values to represent what’s in each slot on the board: ‘X’, ‘O’, or ’ ’ (a space). Thus, you’ll need to store nine strings. You can use a dictionary of values for this. The string value with the key ‘3’ can represent the top-right corner, the string value with the key ‘7’ can represent the bottom-left corner, the string value with the key ‘5’ can represent the middle, and so on. This dictionary is a data structure that represents a tic-tac-toe board. Store this board-as-a-dictionary in a variable named board
.
A board where player O has won by placing Os across the top might look like this:
board = {'1': 'O', '2': 'O', '3': 'O',
'4': 'X', '5': 'X', '6': ' ',
'7': ' ', '8': ' ', '9': 'X'}
print(board)
{'1': 'O', '2': 'O', '3': 'O', '4': 'X', '5': 'X', '6': ' ', '7': ' ', '8': ' ', '9': 'X'}
Of course, the player should see only what is printed to the screen, not the contents of variables. Let’s create a function to print the board dictionary onto the screen:
def printBoard(board):
"""
The function that prints out the board in a square shape
"""
print(board['1'] + '|' + board['2'] + '|' + board['3'])
print('-+-+-')
print(board['4'] + '|' + board['5'] + '|' + board['6'])
print('-+-+-')
print(board['7'] + '|' + board['8'] + '|' + board['9'])
printBoard(board)
O|O|O
-+-+-
X|X|
-+-+-
| |X
The printBoard()
function can handle any tic-tac-toe data structure you pass it!
Now let’s add code that allows the players to enter their moves.
board = {'1': ' ', '2': ' ', '3': ' ',
'4': ' ', '5': ' ', '6': ' ',
'7': ' ', '8': ' ', '9': ' '}
board['1'] = 'O' # Update the board
printBoard(board)
board['5'] = 'X'
printBoard(board)
O| |
-+-+-
| |
-+-+-
| |
O| |
-+-+-
|X|
-+-+-
| |
We can also check whether player has won the game or not using the following code:
p = 'O' # Check whether 'O' has won the game or not
((board['1'] == board['2'] == board['3'] == p) or # Across top
(board['4'] == board['5'] == board['6'] == p) or # Across middle
(board['7'] == board['8'] == board['9'] == p) or # Across boardottom
(board['1'] == board['4'] == board['7'] == p) or # Down left
(board['2'] == board['5'] == board['8'] == p) or # Down middle
(board['3'] == board['6'] == board['9'] == p) or # Down right
(board['3'] == board['5'] == board['7'] == p) or # Diagonal
(board['1'] == board['5'] == board['9'] == p)) # Diagonal
False
Note that by enclosing the condiations with
()
, we do not have to add\
for multipline commands.
5.1.4 Exercise 1: Tic-tac-toe is a classic pencil-and-paper game played on a 3 × 3 grid. Players take turns placing their ‘X’ or ‘O’ marks, trying to get three in a row. Try to complete the following game design by complete three functions
getBlankBoard()
,isValidSpace()
andisBoardFull()
.
%%writefile ttt.py
def printBoard(board):
"""
The function that prints out the board in a square shape
"""
print(board['1'] + '|' + board['2'] + '|' + board['3'])
print('-+-+-')
print(board['4'] + '|' + board['5'] + '|' + board['6'])
print('-+-+-')
print(board['7'] + '|' + board['8'] + '|' + board['9'])
def updateBoard(board, space, mark):
"""Sets the space on the board to mark."""
board[space] = mark
def isWinner(board, player):
"""Return True if player is a winner on this TTTBoard."""
# Shorter variable names used here for readablility:
b, p = board, player
# Check for 3 marks across 3 rows, 3 columns, and 2 diagonals.
return ((b['1'] == b['2'] == b['3'] == p) or # Across top
(b['4'] == b['5'] == b['6'] == p) or # Across middle
(b['7'] == b['8'] == b['9'] == p) or # Across bottom
(b['1'] == b['4'] == b['7'] == p) or # Down left
(b['2'] == b['5'] == b['8'] == p) or # Down middle
(b['3'] == b['6'] == b['9'] == p) or # Down right
(b['3'] == b['5'] == b['7'] == p) or # Diagonal
(b['1'] == b['5'] == b['9'] == p)) # Diagonal
def getBlankBoard():
"""Create a new, blank tic-tac-toe board."""
# Map of space numbers: 1|2|3
# -+-+-
# 4|5|6
# -+-+-
# 7|8|9
# Keys are '1' through '9', the values are 'X', 'O', or ' ':
# Initialize all spaces of the board as blank string ' ' using loop or dictionary complehention
ALL_SPACES = ['1', '2', '3', '4', '5', '6', '7', '8', '9']
______________________________
return board
def isValidSpace(board, space):
"""Returns True if the space on the board is a valid space number (1-9)
and the space is blank."""
ALL_SPACES = ['1', '2', '3', '4', '5', '6', '7', '8', '9']
return ____________ and ________________
def isBoardFull(board):
"""Return True if every space on the board has been taken."""
for v in ________: # Traverse over the board to see if there is blank space
if v == ' ':
return False # If any space is blank, return False.
return True # No spaces are blank, so return True
# The logic of the game
print('Welcome to Tic-Tac-Toe!')
gameBoard = getBlankBoard() # Create a TTT board dictionary.
currentPlayer, nextPlayer = 'X', 'O' # X goes first, O goes next.
while True: # Main game loop.
# 1. Display the board on the screen:
printBoard(gameBoard)
# 2. Keep asking the player until they enter a number 1-9:
move = None
while not isValidSpace(gameBoard, move):
print('What is ' + currentPlayer + '\'s move? (1-9)')
move = input('> ')
updateBoard(gameBoard, move, currentPlayer) # Make the move.
# 3. Check if the game is over:
if isWinner(gameBoard, currentPlayer): # Check for a winner.
printBoard(gameBoard)
print(currentPlayer + ' has won the game!')
break
elif isBoardFull(gameBoard): # Check for a tie.
print(printBoard(gameBoard))
print('The game is a tie!')
break
# 4. Switch turns to the next player using multiple assignment:
currentPlayer, nextPlayer = nextPlayer, currentPlayer
print('Thanks for playing!')
Overwriting ttt.py
5.1.5 Nested Dictionaries and Lists
5.1.5.1 A List of Dictionaries
Consider a game featuring aliens that can have different colors and point values. This simple dictionary
stores information about a particular alien:
The alien_0
dictionary contains a variety of information about one alien, but it has no room to store information about a second alien, much less a screen full of aliens. How can you manage a fleet of aliens? One way is to make a list of aliens in which each alien is a dictionary
of information about that alien.
aliens = []
# Make 30 green aliens.
for alien_number in range(30):
new_alien = {'color': 'green', 'points': 5, 'speed': 'slow'}
aliens.append(new_alien)
# Show the first 5 aliens.
for alien in aliens[:5]:
print(alien)
print("...")
# Show how many aliens have been created.
print(f"Total number of aliens: {len(aliens)}")
{'color': 'green', 'points': 5, 'speed': 'slow'}
{'color': 'green', 'points': 5, 'speed': 'slow'}
{'color': 'green', 'points': 5, 'speed': 'slow'}
{'color': 'green', 'points': 5, 'speed': 'slow'}
{'color': 'green', 'points': 5, 'speed': 'slow'}
...
Total number of aliens: 30
These aliens all have the same characteristics, but Python
considers each one a separate object, which allows us to modify each alien individually. How might you work with a group of aliens like this? Imagine that one aspect of a game has some aliens changing color and moving faster as the game progresses. When it’s time to change colors, we can use a for
loop and an if
statement to change the color of the aliens. For example, to change the first three aliens to yellow, medium-speed aliens worth 10 points each, we could do this:
for alien in aliens[:3]:
if alien['color'] == 'green':
alien['color'] = 'yellow'
alien['speed'] = 'medium'
alien['points'] = 10
aliens[:10]
[{'color': 'yellow', 'points': 10, 'speed': 'medium'},
{'color': 'yellow', 'points': 10, 'speed': 'medium'},
{'color': 'yellow', 'points': 10, 'speed': 'medium'},
{'color': 'green', 'points': 5, 'speed': 'slow'},
{'color': 'green', 'points': 5, 'speed': 'slow'},
{'color': 'green', 'points': 5, 'speed': 'slow'},
{'color': 'green', 'points': 5, 'speed': 'slow'},
{'color': 'green', 'points': 5, 'speed': 'slow'},
{'color': 'green', 'points': 5, 'speed': 'slow'},
{'color': 'green', 'points': 5, 'speed': 'slow'}]
5.2 Sets
A set
is an unordered collection of unique values. Sets
may contain only immutable objects, like strings
, ints
, floats
and tuples
that contain only immutable elements.
The following code creates a set
of strings named colors
:
{'blue', 'green', 'orange', 'red', 'yellow'}
Notice that the duplicate string 'red'
was ignored (without causing an error). An important use of sets
is duplicate elimination, which is automatic when creating a set
. Also, the resulting set’s
values may not be displayed in the same order as they were listed! Though the color names are displayed in sorted order, sets are unordered. You should not write code that depends on the order of their elements!
Note that we also use curly bracket to create a set!
Though sets
are iterable, they are not sequences and do not support indexing and slicing with square brackets, []
.
You can determine the number of items in a set with the built-in len()
function:
You can check whether a set
contains a particular value using the in
and not in
operators:
Sets
are iterable, so you can process each set element with a for
loop:
green blue red yellow orange
Sets
are unordered, so there’s no significance to the iteration order!
5.2.1 Creating a Set
with the Built-In set()
Function
You can create a set
from another collection of values by using the built-in set()
function — here we create a list
that contains several duplicate integer values and use that list
as set
’s argument:
If you need to create an empty set
, you must use the set()
function with empty parentheses, rather than empty braces, {}
, which represent an empty dictionary
:
Python displays an empty
set
asset()
to avoid confusion with Python’s string representation of an emptydictionary
({}
).
5.2.2 Set
Operators and Methods
Sets
are mutable — you can add and remove elements, but set elements must be immutable. Therefore, a set
cannot have other sets
as elements.
5.2.2.1 Methods for Adding and Removing Elements
Here we first discuss operators and methods that modify an existing set
.
Set method update()
performs a union operation modifying the set in-place — the argument can be any iterable:
Set
method add()
inserts its argument if the argument is not already in the set; otherwise, the set
remains unchanged:
Set
method remove()
removes its argument from the set
— a KeyError
occurs if the value is not in the set
:
Method discard()
also removes its argument from the set
but does not cause an exception if the value is not in the set
.
5.2.3 Mathematical Set Operations
The operators and methods presented in this section each result in a new set!
5.2.3.1 Union
The union of two sets
is a set consisting of all the unique elements from both sets. You can calculate the union with the |
operator or with the set union()
method:
The operands of the binary set operators, like |
, must both be sets
. The corresponding set
methods may receive any iterable object as an argument — we passed a list
. When a mathematical set
method receives a non-set iterable argument, it first converts the iterable to a set, then applies the mathematical operation.
5.2.3.2 Intersection
The intersection of two sets
is a set consisting of all the unique elements that the two sets
have in common. You can calculate the intersection with the &
operator or with the set intersection()
method:
5.2.3.3 Difference
The difference between two sets
is a set consisting of the elements in the left operand that are not in the right operand. You can calculate the difference with the -
operator or with the set difference()
method:
5.2.3.4 Symmetric Difference
The symmetric difference between two sets is a set consisting of the elements of both sets that are not in common with one another. You can calculate the symmetric difference with the ^
operator or with the set symmetric_difference
method:
5.2.3.5 Disjoint
Two sets
are disjoint if they do not have any common elements. You can determine this with the set isdisjoint()
method:
5.2.4 Set Comprehensions
Like dictionary comprehensions, you define set comprehensions in curly braces. Let’s create a new set
containing only the unique even values in the list
numbers:
5.2.5 Sorting the set
and dictionary
As we mentioned last week, data types like tuples
don’t provide methods like sort()
. However Python
provides the built-in function sorted()
, which takes any sequence as a parameter and returns a new container with the same elements in a different order. You can also apply sorted
to the set, but the returning container will be list
.
Help on built-in function sorted in module builtins:
sorted(iterable, /, *, key=None, reverse=False)
Return a new list containing all items from the iterable in ascending order.
A custom key function can be supplied to customize the sort order, and the
reverse flag can be set to request the result in descending order.
RANKS = ["A", "2", "3", "4", "5", "6", "7", "8", "9", "10", "J", "Q", "K"]
def rank_key(card):
return RANKS.index(card)
ori_set = {"A", "2", "7", "4", "Q"}
print(sorted(ori_set))
sorted_list = sorted(ori_set, key=rank_key)
# Each element will be replaced by the output of rank_key() and sorts!
print(sorted_list)
['2', '4', '7', 'A', 'Q']
['A', '2', '4', '7', 'Q']
Note that we have changed the behavior of the sorted()
function by providing the custom key that allows us to sort the data in a specific order using the predefined list
and the index()
function.
If you would like to sort the dictionary
, you need to use the items()
method (Otherwise, it will only return keys). The returning container will again be a list
:
grade = {'calculus':85, 'introduction to mathematics':80, 'introduction to computer science':90, 'linear algebra':95}
sorted_list = sorted(grade.items())
print(sorted_list)
[('calculus', 85), ('introduction to computer science', 90), ('introduction to mathematics', 80), ('linear algebra', 95)]
If you would like to sort by the value, use the following code:
def value_key(x):
return x[1]
grade = {'calculus':85, 'introduction to mathematics':80, 'computer programming':90, 'linear algebra':95}
sorted_dict = sorted(grade.items(), key=value_key)
print(sorted_dict)
[('introduction to mathematics', 80), ('calculus', 85), ('computer programming', 90), ('linear algebra', 95)]
5.2.6 Exercise 2: Try to design a program that counts the number of unique characters in a string. Be sure to exclude the punctuation and white space and the character with upper and lower cases are treated as different characters.
import string
# You can obtain all the punctuations using the following attribute
string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
# Sample text
text = "The quick brown fox jumped over the lazy dog. The dog didn't seem to mind!"
# Create a set of punctuation characters
punctuation = set(string.punctuation)
# Create a set of all characters included in the text
all_chars = _____________
# Find the difference between the two sets to get the non-punctuation characters
non_punctuation_chars = ____________________________
# Also remove the whitespace
_____________________________
# Count the number of non-punctuation characters
num_unique_chars = len(non_punctuation_chars)
print("Number of unique characters:", num_unique_chars)
# Sanity check
assert num_unique_chars == 27
Number of unique characters: 27
In this chapter, we discussed Python’s dictionary
and set
collections. They are both unorder, mutable and do not allow duplicates.
We said what a dictionary is and presented several examples. We showed the syntax of key–value pairs and showed how to use them to create dictionaries
with comma-separated lists of key–value pairs in curly braces, {}
. You also created dictionaries with dictionary comprehensions. You used square brackets, []
, to retrieve the value corresponding to a key, and to insert and update key–value pairs. You also used the dictionary method update to change a key’s associated value. You iterated through a dictionary’s keys, values and items.
You created sets
of unique immutable values. You combined sets
with set operators and methods, changed sets’ values with the mutable set operations and created sets
with set comprehensions. You saw that sets are mutable.
A short comparison of the containers is shown below:
Feature | List | Tuple | Dictionary | Set |
---|---|---|---|---|
Mutable (Can be modified in place) | Yes | No | Yes (keys are immutable) | Yes |
Iterable (Can be use in for loop) | Yes | Yes | Yes | Yes |
Ordered (Can access by index, slicing) | Yes | Yes | No | No |
Duplicate Values | Allowed | Allowed | Not in keys | Not allowed |