Generator
In Python, a generator is a function that returns an iterator that produces a sequence of values when iterated over.
Generators are useful when we want to produce a large sequence of values, but we don’t want to store all of them in memory at once.
Understanding Generators
Generator functions look and act just like regular functions, but with one defining characteristic. Generator functions use the Python yield
keyword instead of return
.
For example: generating infinite sequence:
def infinite_sequence():
num = 0
while True:
yield num
num += 1
This looks like a typical function definition, except for the Python yield
statement and the code that follows it.
yield
indicates where a value is sent back to the caller, but unlikereturn
, you don’t exit the function afterward.- Instead, the state of the function is remembered. That way, when
next()
is called on a generator object (either explicitly or implicitly within afor
loop), the previously yielded variablenum
is incremented, and then yielded again.
Building generators
Define as function
Similar to defining a normal function, we can define a generator function using the def
keyword, but instead of the return
statement we use the yield
statement.
def generator_name(arg):
# statements
yield something
When the generator function is called, it does not execute the function body immediately. Instead, it returns a generator object that can be iterated over to produce the values.
Define using comprehension
Like list comprehensions, generator expressions allow you to quickly create a generator object in just a few lines of code.
(expression for item in iterable)
The generator expression creates a generator object that produces the values of expression
for each item in the iterable
, one at a time, when iterated over.
Generator comprehensions are also useful in the same cases where list comprehensions are used, with an added benefit: you can create them without building and holding the entire object in memory before iteration.
Example: squaring numbers
num_squared_list = [num ** 2 for num in range(5)] # list comprehension
num_squared_generator = (num ** 2 for num in range(5)) # generator comprehension
>>> num_squared_list
[0, 1, 4, 9, 16]
>>> num_squared_generator
<generator object <genexpr> at 0x7f9c604c8430>
Profiling generator performance
Memory:
>>> import sys
>>> nums_squared_lc = [i ** 2 for i in range(10000)]
>>> sys.getsizeof(nums_squared_lc) # in bytes
87624
>>> nums_squared_gc = (i ** 2 for i in range(10000))
>>> print(sys.getsizeof(nums_squared_gc)) # in bytes
120
The list is over 700 times larger than the generator object!
Speed:
>>> import cProfile
>>> cProfile.run('sum([i * 2 for i in range(10000)])')
5 function calls in 0.001 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 0.001 0.001 <string>:1(<listcomp>)
1 0.000 0.000 0.001 0.001 <string>:1(<module>)
1 0.000 0.000 0.001 0.001 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {built-in method builtins.sum}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
>>> cProfile.run('sum((i * 2 for i in range(10000)))')
10005 function calls in 0.003 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
10001 0.002 0.000 0.002 0.000 <string>:1(<genexpr>)
1 0.000 0.000 0.003 0.003 <string>:1(<module>)
1 0.000 0.000 0.003 0.003 {built-in method builtins.exec}
1 0.001 0.001 0.003 0.003 {built-in method builtins.sum}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Summing across all values in the list comprehension took about a third of the time as summing across the generator.
If the list is smaller than the running machine’s available memory, then list comprehensions can be faster to evaluate than the equivalent generator expression.
Memory matters → Use generator comprehension.
Runtime matters → Use list comprehension.
Understanding the yield
Statement
On the whole, yield
is a fairly simple statement. Its primary job is to control the flow of a generator function in a way that’s similar to return
statements.
- When the
yield
statement is hit/arrived, the program suspends function execution and returns the yielded value to the caller. (In contrast,return
stops function execution completely.) When a function is suspended, the state of that function is saved (including variables binding local to the generator, the instruction point, the internal stack, and any exception handling). - This allows you to resume function execution whenever you call one of the generator’s methods. In this way, all function evaluation picks back up right after
yield
.
Example:
def infinite_sequence():
num = 0
while True:
yield num
num += 1
yield f"Next number to yield: {num}"
>>> inf_seq_generator = infinite_sequence()
>>> inf_seq_generator
<generator object infinite_sequence at 0x7f9c604c8b30>
# The `inifinite_sequence` runs and arrives line 4,
# the execution is suspended by yield statement.
# The `num` (0 in this case) is yielded.
>>> next(inf_seq_generator)
0
# The funtion execution picks back up right after `yield num`.
# I.e., the program continues from line 5, `num += 1`. `num` is incremented to 1.
# Then the `yield` statement at line 6 is arrived
# --> It suspends the program and yield "Next number to yield: 1"
>>> next(inf_seq_generator)
'Next number to yield: 1'
>>> next(inf_seq_generator)
1
>>> next(inf_seq_generator)
'Next number to yield: 2'
>>> next(inf_seq_generator)
2
Use of Python Generators
There are several reasons that make generators a powerful implementation.
Easy to Implement
Generators can be implemented in a clear and concise way as compared to their iterator class counterpart.
Memory Efficient
A normal function to return a sequence will create the entire sequence in memory before returning the result. This is an overkill, if the number of items in the sequence is very large.
In contrast, Generator implementation of such sequences is memory friendly and is preferred since it only produces one item at a time.
Represent Infinite Stream
Generators are excellent mediums to represent an infinite stream of data. Infinite streams cannot be stored in memory, and since generators produce only one item at a time, they can represent an infinite stream of data.
E.g., the following generator function can generate all the even numbers (at least in theory).
def all_even():
n = 0
while True:
yield n
n += 2
Pipelining Generators
Multiple generators can be used to pipeline a series of operations.
E.g., suppose we have a generator that produces the numbers in the Fibonacci series. And we have another generator for squaring numbers. If we want to find out the sum of squares of numbers in the Fibonacci series, we can do it in the following way by pipelining the output of generator functions together.
def fibonacci_numbers(nums):
x, y = 0, 1
for _ in range(nums):
x, y = y, x+y
yield x
def square(nums):
for num in nums:
yield num**2
print(sum(square(fibonacci_numbers(10))))
# Output: 4895