Dictionaries, Maps, and Hashtables
TL;DR
- Dictionaries are the central data structure in Python
- THe built-in
dict
type is “good enough” most of the time - Specialized implementations, like read-only or ordered dicts, are also available in the Python standard library.
In Python, dictionaries (or “dicts” for short)
- store an arbitrary number of objects, each identified by a unique dictionary key.
- also called maps, hashmaps, lookup tables, or associative arrays.
- allow for the efficient lookup, insertion, and deletion of any object associated with a given key.
dict
Python features a robust dictionary implementation that’s built directly into the core language: the dict
data type.
“Suyntactic sugar” for working with dicts:
curly-braces dict expression syntax
phonebook = { "bob": 7387, "alice": 3719, "jack": 7052, } >>> phonebook['alice'] 3719
dictionary comprehensions
squares = {x: x * x for x in range(6)} >>> squares {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
Python’s dictionaries are indexed by keys that can be of any hashable type.
- A hashable object has a hash value which never changes during its lifetime (see
__hash__
), and it can be compared to other objects (see__eq__
). - Hashable objects which compare as equal must have the same hash value.
For most use cases, Python’s built-in dictionary implementation will do everything you need.
- Are highly optimized and underlie many parts of the language (e.g. class attributes and variables in a stack frame are both stored internally in dictionaries)
- Are based on a well-tested and finely tuned hash table implementation that provides $O(1)$ time complexity for lookup, insert, update, and delete operations in the average case.
collections.OrderedDict
- Remembers the insertion order of keys added to it
- Must be imported from the
collections
module
Example
>>> from collections import OrderedDict
>>> d = OrderedDict(one=1, two=2, three=3)
>>> d
OrderedDict([('one', 1), ('two', 2), ('three', 3)])
>>> d["four"] = 4
>>> d
OrderedDict([('one', 1), ('two', 2), ('three', 3), ('four', 4)])
>>> d.keys()
odict_keys(['one', 'two', 'three'])
collections.defaultdict
- Accepts a callable in its constructor whose return value will be used if a requested key cannot be found
Example: Use list
as the default_factory
>>> from collections import defaultdict
>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list)
>>> for k, v in s:
... d[k].append(v)
...
>>> sorted(d.items())
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
- When each key is encountered for the first time, it is not already in the mapping; so an entry is automatically created using the
default_factory
function which returns an emptylist
. Thelist.append()
operation then attaches the value to the new list.- When keys are encountered again, the look-up proceeds normally (returning the list for that key) and the
list.append()
operation adds another value to the list.
Example: Setting the default_factory
to int
makes the defaultdict
useful for counting
from collections import defaultdict
s = "mississippi"
d = defaultdict(int)
for k in s:
d[k] += 1
>>> d.items()
dict_items([('m', 1), ('i', 4), ('s', 4), ('p', 2)])
- When a letter is first encountered, it is missing from the mapping, so the
default_factory
function callsint()
to supply a default count of zero.- The increment operation then builds up the count for each letter.
collections.ChainMap
Quickly links a number of mappings so they can be treated as a single unit. It is often much faster than creating a new dictionary and running multiple
update()
calls.Is an updatable view over multiple dicts, and it behaves just like a normal dict (All of the usual dictionary methods are supported).
Can be used to simulate nested scopes and is useful in templating.
Use cases
Searching through multiple dictionaries
Example:
>>> toys = {'Blocks': 30, 'Monopoly': 20} >>> computers = {'iMac': 1000, 'Chromebook': 800, 'PC': 400} >>> clothing = {'Jeans': 40, 'T-Shirt': 10}
>>> from collections import ChainMap >>> inventory = ChainMap(toys, computers, clothing)
Test some normal dict operations:
>>> inventory['Monopoly'] 20 >>> inventory.get('Mario Bros.') None >>> inventory.pop('Blocks') 200 >>> inventory['Blocks'] # KeyError: 'Blocks'
If we now add a toy to the
toys
dictionary, it will also be made available in the inventory. This is the updatable aspect of a ChainMap.>>> toys['Nintendo'] = 200 >>> inventory['Nintendo'] 200
A nice feature is that while in our example
toys
,computers
andclothing
are all in the same context (the interpreter), they could come from totally different modules or packages. This is because ChainMap stores the underlying dictionaries by reference.Providing a chain of default values
Performance-critical applications that frequently compute subsets of a dictionary
More see:
types.MappingProxyType
- A wrapper around a standard dictionary that provides a read-only view into the wrapped dictionary’s data. It can be used to create immutable proxy versions of dictionaries.
- Using
MappingProxyType
allows you to put these read-only restrictions in place without first having to create a full copy of the dictionary
Example
from types import MappingProxyType
writable = {"one": 1, "two": 2}
read_only = MappingProxyType(writable)
>>> read_only["one"]
1
The proxy is read-only. Trying to modify the proxy will cause an error:
>>> read_only["one"] = 23
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In [19], line 1
----> 1 read_only["one"] = 23
TypeError: 'mappingproxy' object does not support item assignment
Updates to the original are reflected in the proxy:
>>> writable["one"] = 42
>>> read_only["one"]
42
Conclusion
If you’re looking for a general recommendation on which mapping type to use in your programs, it is recommended to use the built-in dict
data type. It’s a versatile and optimized hash table implementation that’s built directly into the core language.
It is ONLY recommended that you use one of the other data types listed here if you have special requirements that go beyond what’s provided by dict.