The Vanishing Pattern: from iterators to generators in Python

101
The Vanishing Pattern from iterators to generators in Python Luciano Ramalho [email protected] @ramalhoorg

description

The core of the talk is refactoring a simple iterable class from the classic Iterator design pattern (as implemented in the GoF book) to compatible but less verbose implementations using generators. This provides a meaningful context to understand the value of generators. Along the way the behavior of the iter function, the Sequence protocol and the Iterable interface are presented. The motivating examples of this talk are database applications.

Transcript of The Vanishing Pattern: from iterators to generators in Python

Page 1: The Vanishing Pattern: from iterators to generators in Python

The Vanishing Patternfrom iterators to generators in Python Luciano Ramalho

[email protected]@ramalhoorg

Page 2: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Demo: laziness in the Django Shell

2

Page 3: The Vanishing Pattern: from iterators to generators in Python

>>> from django.db import connection>>> q = connection.queries>>> q[]>>> from municipios.models import *>>> res = Municipio.objects.all()[:5]>>> q[]>>> for m in res: print m.uf, m.nome... GO Abadia de GoiásMG Abadia dos DouradosGO AbadiâniaMG AbaetéPA Abaetetuba>>> q[{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id", "municipios_municipio"."uf", "municipios_municipio"."nome", "municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id", "municipios_municipio"."capital", "municipios_municipio"."latitude", "municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM "municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}]

Page 4: The Vanishing Pattern: from iterators to generators in Python

>>> from django.db import connection>>> q = connection.queries>>> q[]>>> from municipios.models import *>>> res = Municipio.objects.all()[:5]>>> q[]>>> for m in res: print m.uf, m.nome... GO Abadia de GoiásMG Abadia dos DouradosGO AbadiâniaMG AbaetéPA Abaetetuba>>> q[{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id", "municipios_municipio"."uf", "municipios_municipio"."nome", "municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id", "municipios_municipio"."capital", "municipios_municipio"."latitude", "municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM "municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}]

this expression makes a Django QuerySet

Page 5: The Vanishing Pattern: from iterators to generators in Python

>>> from django.db import connection>>> q = connection.queries>>> q[]>>> from municipios.models import *>>> res = Municipio.objects.all()[:5]>>> q[]>>> for m in res: print m.uf, m.nome... GO Abadia de GoiásMG Abadia dos DouradosGO AbadiâniaMG AbaetéPA Abaetetuba>>> q[{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id", "municipios_municipio"."uf", "municipios_municipio"."nome", "municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id", "municipios_municipio"."capital", "municipios_municipio"."latitude", "municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM "municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}]

this expression makes a Django QuerySet

QuerySets are “lazy”: no database access so far

Page 6: The Vanishing Pattern: from iterators to generators in Python

>>> from django.db import connection>>> q = connection.queries>>> q[]>>> from municipios.models import *>>> res = Municipio.objects.all()[:5]>>> q[]>>> for m in res: print m.uf, m.nome... GO Abadia de GoiásMG Abadia dos DouradosGO AbadiâniaMG AbaetéPA Abaetetuba>>> q[{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id", "municipios_municipio"."uf", "municipios_municipio"."nome", "municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id", "municipios_municipio"."capital", "municipios_municipio"."latitude", "municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM "municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}]

this expression makes a Django QuerySet

QuerySets are “lazy”: no database access so far

the query is made only when we iterate over the results

Page 7: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

QuerySet is a lazy iterable

7

Page 8: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

QuerySet is a lazy iterable

technical term

8

Page 9: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Lazy

• Avoids unnecessary work, by postponing it as long as possible

• The opposite of eager

9

In Computer Science, being “lazy” is often a good thing!

Page 10: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Now, back to basics...

10

Page 11: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Iteration: C and Python#include <stdio.h>

int main(int argc, char *argv[]) { int i; for(i = 0; i < argc; i++) printf("%s\n", argv[i]); return 0;}

import sys

for arg in sys.argv: print arg

Page 12: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Iteration: Java (classic)

class Arguments { public static void main(String[] args) { for (int i=0; i < args.length; i++) System.out.println(args[i]); }}

$ java Arguments alfa bravo charliealfabravocharlie

Page 13: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Iteration: Java ≥1.5

$ java Arguments2 alfa bravo charliealfabravocharlie

• Enhanced for (a.k.a. foreach)

since2004

class Arguments2 { public static void main(String[] args) { for (String arg : args) System.out.println(arg); }}

Page 14: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Iteration: Java ≥1.5• Enhanced for (a.k.a. foreach)

class Arguments2 { public static void main(String[] args) { for (String arg : args) System.out.println(arg); }}

since2004

import sys

for arg in sys.argv: print arg

since1991

Page 15: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

You can iterate over manyPython objects

• strings

• files

• XML: ElementTree nodes

• not limited to built-in types:

• Django QuerySet

• etc.

15

Page 16: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

So, what is an iterable?

• Informal, recursive definition:

• iterable: fit to be iterated

• just as: edible: fit to be eaten

16

Page 17: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

The for loop statement is not the only construct that

handles iterables...

17

Page 18: The Vanishing Pattern: from iterators to generators in Python

List comprehension

● Compreensão de lista ou abrangência de lista

● Exemplo: usar todos os elementos:

– L2 = [n*10 for n in L]

List comprehension• An expression that builds a list from any iterable

>>> s = 'abracadabra'>>> l = [ord(c) for c in s]>>> l[97, 98, 114, 97, 99, 97, 100, 97, 98, 114, 97]

input: any iterable object

output: a list (always)

Page 19: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Set comprehension

• An expression that builds a set from any iterable

>>> s = 'abracadabra'>>> set(s){'b', 'r', 'a', 'd', 'c'}>>> {ord(c) for c in s}{97, 98, 99, 100, 114}

19

Page 20: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Dict comprehensions

• An expression that builds a dict from any iterable

>>> s = 'abracadabra'>>> {c:ord(c) for c in s}{'a': 97, 'r': 114, 'b': 98, 'c': 99, 'd': 100}

20

Page 21: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Syntactic support for iterables

• Tuple unpacking, parallel assignment

>>> a, b, c = 'XYZ'>>> a'X'>>> b'Y'>>> c'Z'

21

>>> l = [(c, ord(c)) for c in 'XYZ']>>> l[('X', 88), ('Y', 89), ('Z', 90)]>>> for char, code in l:... print char, '->', code...X -> 88Y -> 89Z -> 90

Page 22: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Syntactic support for iterables (2)

• Function calls: exploding arguments with *

>>> import math>>> def hypotenuse(a, b):... return math.sqrt(a*a + b*b)...>>> hypotenuse(3, 4)5.0>>> sides = (3, 4)>>> hypotenuse(sides)Traceback (most recent call last): File "<stdin>", line 1, in <module>TypeError: hypotenuse() takes exactly 2 arguments (1 given)>>> hypotenuse(*sides)5.0

22

Page 23: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Built-in iterable types

• basestring

• str

• unicode

• dict

• file

• frozenset

• list

• set

• tuple

• xrange

23

Page 24: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Built-in functions that take iterable arguments

• all

• any

• filter

• iter

• len

• map

• max

• min

• reduce

• sorted

• sum

• zip

unrelated to compression

Page 25: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Classic iterables in Python

25

Page 26: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Iterator is...

• a classic design pattern

Design PatternsGamma, Helm, Johnson & VlissidesAddison-Wesley, ISBN 0-201-63361-2

26

Page 27: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Head First Design Patterns PosterO'Reilly, ISBN 0-596-10214-3

27

Page 28: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Head First Design Patterns PosterO'Reilly, ISBN 0-596-10214-3

28

“The Iterator Pattern provides a way to access the elements of an aggregate object sequentially without exposing the underlying representation.”

Page 29: The Vanishing Pattern: from iterators to generators in Python

An iterable Train class>>> train = Train(4)>>> for car in train:... print(car)car #1car #2car #3car #4>>>

Page 30: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

class Train(object):

def __init__(self, cars): self.cars = cars

def __len__(self): return self.cars

def __iter__(self): return TrainIterator(self)

class TrainIterator(object):

def __init__(self, train): self.train = train self.current = 0

def __next__(self): # Python 3 if self.current < len(self.train): self.current += 1 return 'car #%s' % (self.current) else: raise StopIteration()

An iterable Train with iterator

iterable

iterator

Page 31: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Iterable ABC

• collections.Iterable abstract base class

• A concrete subclass of Iterable must implement .__iter__

• .__iter__ returns an Iterator

• You don’t usually call .__iter__ directly

• when needed, call iter(x)

31

Page 32: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Iterator ABC

• Iterator provides.nextor.__next__

• .__next__ returns the next item

• You don’t usually call .__next__ directly

• when needed, call next(x)

Python 3

Python 2

Python ≥ 2.6

32

Page 33: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

for car in train:

• calls iter(train) to obtain a TrainIterator

• makes repeated calls to next(aTrainIterator) until it raises StopIteration

class Train(object):

def __init__(self, cars): self.cars = cars

def __len__(self): return self.cars

def __iter__(self): return TrainIterator(self)

class TrainIterator(object):

def __init__(self, train): self.train = train self.current = 0

def __next__(self): # Python 3 if self.current < len(self.train): self.current += 1 return 'car #%s' % (self.current) else: raise StopIteration()

Train withiterator

1

1

2

>>> train = Train(3)>>> for car in train:... print(car)car #1car #2car #3

2

Page 34: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg34 Richard Bartz/Wikipedia

Page 35: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Iterable duck-like creatures

35

Page 36: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Design patterns in dynamic languages

• Dynamic languages: Lisp, Smalltalk, Python, Ruby, PHP, JavaScript...

• Many features not found in C++, where most of the original 23 Design Patterns were identified

• Java is more dynamic than C++, but much more static than Lisp, Python etc.

36

Gamma, Helm, Johnson, Vlissides a.k .a. the Gang of Four (GoF)

Page 37: The Vanishing Pattern: from iterators to generators in Python

Peter Norvig:“Design Patterns in Dynamic Languages”

Page 38: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Dynamic types

• No need to declare types or interfaces

• It does not matter what an object claims do be, only what it is capable of doing

38

Page 39: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Duck typing

39

“In other words, don't check whether it is-a duck: check whether it quacks-like-a duck, walks-like-a duck, etc, etc, depending on exactly what subset of duck-like behaviour you need to play your language-games with.”

Alex Martellicomp.lang.python (2000)

Page 40: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

A Python iterable is...

• An object from which the iter function can produce an iterator

• The iter(x) call:

• invokes x.__iter__() to obtain an iterator

• but, if x has no __iter__:

• iter makes an iterator which tries to fetch items from x by doing x[0], x[1], x[2]...

sequence protocol

Iterable interface

40

Page 41: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Train: a sequence of carstrain = Train(4)

41

train[0] train[1] train[2] train[3]

Page 42: The Vanishing Pattern: from iterators to generators in Python

Train: a sequence of cars>>> train = Train(4)>>> len(train)4>>> train[0]'car #1'>>> train[3]'car #4'>>> train[-1]'car #4'>>> train[4]Traceback (most recent call last): ...IndexError: no car at 4

>>> for car in train:... print(car)car #1car #2car #3car #4

Page 43: The Vanishing Pattern: from iterators to generators in Python

Train: a sequence of carsclass Train(object):

def __init__(self, cars): self.cars = cars

def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key)

if __getitem__ exists, iteration “just works”

Page 44: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

The sequence protocol at work>>> t = Train(4)>>> len(t)4>>> t[0]'car #1'>>> t[3]'car #4'>>> t[-1]'car #4'>>> for car in t:... print(car)car #1car #2car #3car #4

__len__

__getitem__

__getitem__

Page 45: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Protocol

• protocol: a synonym for interface used in dynamic languages like Smalltalk, Python, Ruby, Lisp...

• not declared, and not enforced by static checks

45

Page 46: The Vanishing Pattern: from iterators to generators in Python

class Train(object):

def __init__(self, cars): self.cars = cars

def __len__(self): return self.cars

def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key)

Sequence protocol

__len__ and __getitem__ implement the immutable sequence protocol

Page 47: The Vanishing Pattern: from iterators to generators in Python

import collections

class Train(collections.Sequence):

def __init__(self, cars): self.cars = cars

def __len__(self): return self.cars

def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key)

Sequence ABC• collections.Sequence abstract base class

abstract methods

Python ≥ 2.6

Page 48: The Vanishing Pattern: from iterators to generators in Python

import collections

class Train(collections.Sequence):

def __init__(self, cars): self.cars = cars

def __len__(self): return self.cars

def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key)

Sequence ABC• collections.Sequence abstract base class

implement these 2

Page 49: The Vanishing Pattern: from iterators to generators in Python

import collections

class Train(collections.Sequence):

def __init__(self, cars): self.cars = cars

def __len__(self): return self.cars

def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key)

Sequence ABC• collections.Sequence abstract base class

inherit these 5

Page 50: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Sequence ABC• collections.Sequence abstract base class

>>> train = Train(4)>>> 'car #2' in trainTrue>>> 'car #7' in trainFalse>>> for car in reversed(train):... print(car)car #4car #3car #2car #1>>> train.index('car #3')2

50

Page 51: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg51 U.S. NRC/Wikipedia

Page 52: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Generators

52

Page 53: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Iteration in C (example 2)

#include <stdio.h>

int main(int argc, char *argv[]) { int i; for(i = 0; i < argc; i++) printf("%d : %s\n", i, argv[i]); return 0;}

$ ./args2 alfa bravo charlie0 : ./args21 : alfa2 : bravo3 : charlie

Page 54: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Iteration in Python (ex. 2)

import sys

for i in range(len(sys.argv)): print i, ':', sys.argv[i]

$ python args2.py alfa bravo charlie0 : args2.py1 : alfa2 : bravo3 : charlie 54

not Pythonic

Page 55: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Iteration in Python (ex. 2)

import sys

for i, arg in enumerate(sys.argv): print i, ':', arg

$ python args2.py alfa bravo charlie0 : args2.py1 : alfa2 : bravo3 : charlie 55

Pythonic!

Page 56: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

import sys

for i, arg in enumerate(sys.argv): print i, ':', arg

Iteration in Python (ex. 2)

$ python args2.py alfa bravo charlie0 : args2.py1 : alfa2 : bravo3 : charlie

this returns a lazy iterable object

that object yields tuples (index, item)

on demand, at each iteration

56

Page 57: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

What enumerate does

>>> e = enumerate('Turing')>>> e<enumerate object at 0x...>>>>

enumerate builds an enumerate object

57

Page 58: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

What enumerate does

isso constroium gerador

and that is iterable

>>> e = enumerate('Turing')>>> e<enumerate object at 0x...>>>> for item in e:... print item...(0, 'T')(1, 'u')(2, 'r')(3, 'i')(4, 'n')(5, 'g')>>>

58

enumerate builds an enumerate object

Page 59: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

What enumerate does

isso constroium gerador

the enumerate object produces an

(index, item) tuplefor each next(e) call

>>> e = enumerate('Turing')>>> e<enumerate object at 0x...>>>> next(e)(0, 'T')>>> next(e)(1, 'u')>>> next(e)(2, 'r')>>> next(e)(3, 'i')>>> next(e)(4, 'n')>>> next(e)(5, 'g')>>> next(e)Traceback (most recent...): ...StopIteration

• The enumerator object is an example of a generator

Page 60: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Iterator x generator• By definition (in GoF) an iterator retrieves successive items

from an existing collection

• A generator implements the iterator interface (next) but produces items not necessarily in a collection

• a generator may iterate over a collection, but return the items decorated in some way, skip some items...

• it may also produce items independently of any existing data source (eg. Fibonacci sequence generator)

60

Page 61: The Vanishing Pattern: from iterators to generators in Python

Faraday disc(Wikipedia)

Page 62: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Very simplegenerators

62

Page 63: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Generatorfunction

• Any function that has the yield keyword in its body is a generator function

63

>>> def gen_123():... yield 1... yield 2... yield 3...>>> for i in gen_123(): print(i)123>>>

the keyword gen was considered for defining generator functions,

but def prevailed

Page 64: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

• When invoked, a generator function returns a generator object

Generatorfunction

64

>>> def gen_123():... yield 1... yield 2... yield 3...>>> for i in gen_123(): print(i)123>>> g = gen_123()>>> g <generator object gen_123 at ...>

Page 65: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Generatorfunction

>>> def gen_123():... yield 1... yield 2... yield 3...>>> g = gen_123()>>> g <generator object gen_123 at ...>>>> next(g)1>>> next(g)2>>> next(g)3>>> next(g)Traceback (most recent call last):...StopIteration

• Generator objects implement the Iterator interface

65

Page 66: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Generatorbehavior

• Note how the output of the generator function is interleaved with the output of the calling code

66

>>> def gen_AB():... print('START')... yield 'A'... print('CONTINUE')... yield 'B'... print('END.')...>>> for c in gen_AB():... print('--->', c)...START---> ACONTINUE---> BEND.>>>

Page 67: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Generatorbehavior

• The body is executed only when next is called, and it runs only up to the following yield

>>> def gen_AB():... print('START')... yield 'A'... print('CONTINUE')... yield 'B'... print('END.')...>>> g = gen_AB()>>> next(g)START'A'>>>

Page 68: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Generatorbehavior

• When the body of the function returns, the generator object throws StopIteration

• The for statement catches that for you

68

>>> def gen_AB():... print('START')... yield 'A'... print('CONTINUE')... yield 'B'... print('END.')...>>> g = gen_AB()>>> next(g)START'A'>>> next(g)CONTINUE'B'>>> next(g)END.Traceback (most recent call last): File "<stdin>", line 1, in <module>StopIteration

Page 69: The Vanishing Pattern: from iterators to generators in Python

for car in train:

• calls iter(train) to obtain a generator

• makes repeated calls to next(generator) until the function returns, which raises StopIteration

class Train(object):

def __init__(self, cars): self.cars = cars

def __iter__(self): for i in range(self.cars): # index 2 is car #3 yield 'car #%s' % (i+1)

Train with generator function

1

1

2

>>> train = Train(3)>>> for car in train:... print(car)car #1car #2car #3

2

Page 70: The Vanishing Pattern: from iterators to generators in Python

Classic iterator x generator

class Train(object):

def __init__(self, cars): self.cars = cars

def __len__(self): return self.cars

def __iter__(self): return TrainIterator(self)

class TrainIterator(object):

def __init__(self, train): self.train = train self.current = 0

def __next__(self): # Python 3 if self.current < len(self.train): self.current += 1 return 'car #%s' % (self.current) else: raise StopIteration()

class Train(object):

def __init__(self, cars): self.cars = cars

def __iter__(self): for i in range(self.cars): yield 'car #%s' % (i+1)

2 classes, 12 lines of code

1 class,3 lines of code

Page 71: The Vanishing Pattern: from iterators to generators in Python

class Train(object):

def __init__(self, cars): self.cars = cars

def __iter__(self): for i in range(self.cars): yield 'car #%s' % (i+1)

The pattern just vanished

Page 72: The Vanishing Pattern: from iterators to generators in Python

class Train(object):

def __init__(self, cars): self.cars = cars

def __iter__(self): for i in range(self.cars): yield 'car #%s' % (i+1)

“When I see patterns in my programs, I consider it a sign of trouble. The shape of a program should reflect only the problem it needs to solve. Any other regularity in the code is a sign, to me at least, that I'm using abstractions that aren't powerful enough -- often that I'm generating by hand the expansions of some macro that I need to write.”

Paul GrahamRevenge of the nerds (2002)

Page 73: The Vanishing Pattern: from iterators to generators in Python

Generator expression (genexp)

>>> g = (c for c in 'ABC')>>> g<generator object <genexpr> at 0x10045a410> >>> for l in g:... print(l)... ABC>>>

Page 74: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

• When evaluated, returns a generator object

>>> g = (n for n in [1, 2, 3])>>> g<generator object <genexpr> at 0x...>>>> next(g)1>>> next(g)2>>> next(g)3>>> next(g)Traceback (most recent call last): File "<stdin>", line 1, in <module>StopIteration

Generator expression (genexp)

Page 75: The Vanishing Pattern: from iterators to generators in Python

for car in train:

• calls iter(train) to obtain a generator

• makes repeated calls to next(generator) until the function returns, which raises StopIteration

class Train(object):

def __init__(self, cars): self.cars = cars

def __iter__(self): for i in range(self.cars): # index 2 is car #3 yield 'car #%s' % (i+1)

Train with generator function

1

1

2

>>> train = Train(3)>>> for car in train:... print(car)car #1car #2car #3

2

Page 76: The Vanishing Pattern: from iterators to generators in Python

for car in train:

• calls iter(train) to obtain a generator

• makes repeated calls to next(generator) until the function returns, which raises StopIteration

1

2

class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): return ('car #%s' % (i+1) for i in range(self.cars))

Train with generator expression

>>> train = Train(3)>>> for car in train:... print(car)car #1car #2car #3

Page 77: The Vanishing Pattern: from iterators to generators in Python

class Train(object):

def __init__(self, cars): self.cars = cars

def __iter__(self): return ('car #%s' % (i+1) for i in range(self.cars))

Generator functionx genexpclass Train(object):

def __init__(self, cars): self.cars = cars

def __iter__(self): for i in range(self.cars): yield 'car #%s' % (i+1)

Page 78: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Built-in functions that return iterables, iterators or generators

• dict

• enumerate

• frozenset

• list

• reversed

• set

• tuple

78

Page 79: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

• boundless generators

• count(), cycle(), repeat()

• generators which combine several iterables:

• chain(), tee(), izip(), imap(), product(), compress()...

• generators which select or group items:

• compress(), dropwhile(), groupby(), ifilter(), islice()...

• generators producing combinations of items:

• product(), permutations(), combinations()...

The itertools module Don’t reinvent the wheel, use itertools!

this was not reinvented: ported from Haskell

great for MapReduce

Page 80: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Generators in Python 3

• Several functions and methods of the standard library that used to return lists, now return generators and other lazy iterables in Python 3

• dict.keys(), dict.items(), dict.values()...

• range(...)

• like xrange in Python 2.x (more than a generator)

• If you really need a list, just pass the generator to the list constructor. Eg.: list(range(10))

80

Page 81: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

A practical example using generator functions

• Generator functions to decouple reading and writing logic in a database conversion tool designed to handle large datasets

https://github.com/ramalho/isis2json

81

Page 82: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Main loop writes JSON file

Page 83: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Another loop readsthe input records

Page 84: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

One implementation:same loop reads/writes

Page 85: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

But what if we need to read another format?

Page 86: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

Functions in the script

• iterMstRecords*

• iterIsoRecords*

• writeJsonArray

• main

* generator functions

Page 87: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

main:read commandline arguments

Page 88: The Vanishing Pattern: from iterators to generators in Python

main: determineinput format

selected generator function is passed as an argument

input generator function is selected based on the input file extension

Page 89: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

writeJsonArray:write JSON records

89

Page 90: The Vanishing Pattern: from iterators to generators in Python

writeJsonArray:iterates over one of the input generator functions

selected generator function received as an argument...

and called to produce input generator

Page 91: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

iterIsoRecords:read recordsfrom ISO-2709format file

generator function!

91

Page 92: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

iterIsoRecords

yields one record, structured as a dict

creates a new dict in each iteration

92

Page 93: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

iterMstRecords:read recordsfrom ISIS.MST file

generator function!

Page 94: The Vanishing Pattern: from iterators to generators in Python

iterIsoRecordsiterMstRecords

yields one record, structured as a dict

creates a new dict in each iteration

Page 95: The Vanishing Pattern: from iterators to generators in Python

Generators at work

Page 96: The Vanishing Pattern: from iterators to generators in Python

Generators at work

Page 97: The Vanishing Pattern: from iterators to generators in Python

Generators at work

Page 98: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

We did not cover

• other generator methods:

• gen.close(): causes a GeneratorExit exception to be raised within the generator body, at the point where it is paused

• gen.throw(e): causes any exception e to be raised within the generator body, at the point it where is paused

Mostly useful for long-running processes.Often not needed in batch processing scripts.

98

Page 99: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

We did not cover

• generator delegation with yield from

• sending data into a generator function with the gen.send(x) method (instead of next(gen)), and using yield as an expression to get thedata sent

• using generator functions as coroutines

not useful in the context of iteration

Python ≥ 3.3

“Coroutines are not related to iteration”

David Beazley

99

Page 100: The Vanishing Pattern: from iterators to generators in Python

@ramalhoorg

How to learn generators

• Forget about .send() and coroutines: that is a completely different subject. Look into that only after mastering and becoming really confortable using generators for iteration.

• Study and use the itertools module

• Don’t worry about .close() and .throw() initially. You can be productive with generators without using these methods.

• yield from is only available in Python 3.3, and only relevant if you need to use .close() and .throw()

100