CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs....
Transcript of CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs....
CIS 192: Lecture 5Iterators and I/O
Lili Dworkin
University of Pennsylvania
String Formatting
Recall:
>>> a = 0
>>> b = 0
>>> "a = %d, b = %d" % (0, 1)
'a = 0, b = 1'
String Formatting
How can we accomplish the following in one line?
>>> d = {'a':0, 'b': 1, 'c': 2}
>>> ...
'a: 0, c: 2, b: 1'
String Formatting
How can we accomplish the following in one line?
>>> d = {'a':0, 'b': 1, 'c': 2}
>>> ', '.join(['%s: %d' % (key, value) for
(key, value) in d.items()])
'a: 0, c: 2, b: 1'
String Formatting
Even better:
>>> d = {'a':0, 'b': 1, 'c': 2}
>>> ', '.join(['%s: %d' % x for x in d.items()])
'a: 0, c: 2, b: 1'
String Formatting
Things can sometimes get tedious:
os.system('scp class%d/lec%d/slides/lec%d.pdf
[email protected]:~/html/files/'% (num, num, num))
String Formatting
Another (maybe preferable) option:
>>> 'class{0}/lec{0}/slides/lec{0}.pdf'.format(num)'scp class2/lec2/slides/lec2.pdf'
String Formatting
More generally:
>>> template = "{0} is {1} years old. {0} is a girl."
>>> template.format("Annie", 20)
'Annie is 20 years old. Annie is a girl.'
String Formatting
And even more generally:
>>> template = "{name} is {age} years old. {name} is
a girl."
>>> template.format(name="Annie", age=20)
'Annie is 20 years old. Annie is a girl.'
String Formatting
Missing args and kwargs?
>>> l = [1, 10, 100, 1000, 10000]
>>> template = 'First element: {0}, Forth Element:
{3}'>>> template.format(*l)
'First element: 1, Forth Element: 1000'
String Formatting
Missing args and kwargs?
>>> d = {"name": "John", "age": 45, "gender": "male"}
>>> template = "{name} is a {age} year old {gender}."
>>> template.format(**d)
'John is a 45 year old male.'
Iterables vs. Iterators
Iterable objects can be used in a for loop because they have an__iter__ magic method, which converts them to iterator objects:
>>> l = [1,2,3]
>>> l.__iter__()
<listiterator object at 0x100a85590>
>>> iter(l)
<listiterator object at 0x100a85550>
Iterators
Iterators are objects with a next method:
>>> i = iter(l)
>>> i.next()
1
>>> i.next()
2
>>> i.next()
3
>>> i.next()
StopIteration
Iterators
for ... in ... is just syntactic sugar for the following:
1. Call __iter__ to create an iterator
2. Call next on the iterator
3. Catch StopIteration exceptions
Iterators
We can define our own iterators. Below is an object that is aniterable *and* an iterator:
class EveryOther:
def __init__(self, seq):
self.seq = seq
self.index = 0
def __iter__(self):
return self
def next(self):
if self.index >= len(self.seq):
raise StopIteration
self.index += 2
return self.seq[self.index - 2]
Iterators
>>> l = EveryOther([1,2,3,4])
>>> for x in l:
... print x
...
1
3
Iterators
How about a “reverse” iterator?
class Reverse:
def __init__(self, seq):
self.seq = seq
self.index = ?
def __iter__(self):
return ?
def next(self):
if ...
raise StopIteration
self.index = ?
return ?
Iterators
How about a “reverse” iterator?
class Reverse:
def __init__(self, seq):
self.seq = seq
self.index = len(seq)
def __iter__(self):
return self
def next(self):
if self.index == 0
raise StopIteration
self.index = self.index - 1
return self.seq[self.index]
Generators
I Downside of iterators – lots of boilerplate and extra state (i.e.self.index). We hate that in Python!
I Generators are specific kinds of iterators (i.e. they have anext method)
I We create generators by writing functions that contain theyield keyword
Generators
def counter(x):
print "Starting."
while True:
yield x
print "Incrementing x."
x = x + 1
>>> g = counter(5)
>>> g
<generator object counter at 0x100a87050>
Generators
Each time we call the next method on the generator, the functionprocesses up until it encounters a yield statement, and then itstops and returns the value that was yielded. Next time, it resumeswhere it left off.
>>> g.next()
Starting.
5
>>> g.next()
Incrementing x.
6
Fibonacci Revisited
Return a sequence of the first n Fibonacci numbers:
def fib_iter(n):
a, b = 0, 1
l = [a]
for i in range(n-1):
a, b = b, a + b
l.append(a)
return l
def fib_gen(n):
a, b = 0, 1
while a < n:
yield a
a, b = b, a + b
Fibonacci Revisited
>>> l = fib_iter(5)
>>> l
[0, 1, 1, 2, 3]
>>> g = fib_gen(5)
>>> g
<generator object fib_gen at 0x100a87190>
>>> [i for i in g]
[0, 1, 1, 2, 3]
Fibonacci Revisited
Generators can only be used once! Afterwards, StopIterationkeeps getting raised.
>>> for x in l: print x,
...
0 1 1 2 3
>>> for x in g: print x,
...
0 1 1 2 3
>>> for x in g: print x,
...
>>>
Generators
I Values are computed one at a time, as they’re needed
I Avoids storing the entire sequence in memory
I Good for aggregating (summing, counting) items
I Good for infinite sequences
I Bad if you need to inspect the individual values
>>> g[0]
TypeError
Summing Generators
What if we want to sum the first million squares? Storing all ofthem would use a lot of memory.
def squares(n):
for i in range(n):
yield i ** 2
>>> g = squares(100)
>>> sum(g)
328350
Generator Comprehensions
Can also create generators on the fly:
>>> g = (i ** 2 for i in range(100))
>>> g.next()
0
>>> g.next()
1
>>> sum(g)
328349 # why is this one less than before?
Generator Comprehensions
Using range defeats the purpose (why?) So we should use xrange
instead, which is also a generator! Find the sum of all multiples of3 or 5 below 1000:
>>> g = (i for i in xrange(1000)
if i % 3 == 0 or i % 5 == 0)
>>> sum(g)
233168
Generator Comprehensions
How to calculate the length of a generator?
>>> g = (i for i in xrange(1000)
if i % 3 == 0 or i % 5 == 0)
>>> len(g)
TypeError
Can you think of another way using sum?
Generator Comprehensions
How to calculate the length of a generator?
>>> sum(1 for _ in g)
467
Infinite Generator
What about infinite generators, like the first one we saw?
def counter(x):
while True:
yield x
x = x + 1
Can’t sum it or get the length. What if we want the first 3elements?
Infinite Generator
Attempt 1:
>>> g = counter(5)
>>> l = [i for in g]
>>> l[:3]
What goes wrong?
Infinite Generator
Attempt 2:
>>> g = counter(5)
>>> [g.next() for _ in range(3)]
[5, 6, 7, 8, 9]
Infinite Generator
Better / more flexible option:
>>> import itertools
>>> [i for i in itertools.islice(counter(5), 3)]
[5, 6, 7]
>>> [i for i in itertools.islice(counter(5), 5, 7)]
[8, 9]
This works for *all* iterators! We’ll see it again later.
Custom Generators
I We can also define our own generators in a class, as we didwith iterators
I Now the __iter__ method should return a generator, whichmeans it should have a yield statement
I This will save us from all the boilerplate and extra state wehad before
Custom Generators
We can turn EveryOther into a generator:
class EveryOtherGen():
def __init__(self, seq):
self.seq = seq
def __iter__(self):
for index in range(0, len(self.seq), 2):
yield self.seq[index]
Exceptions
I KeyError: accessing a non-existent dictionary key
I AttributeError: calling a non-existent method
I NameError: referencing a non-existent variable
I TypeError: mixing data-types
I ValueError: right type, wrong value
I ImportError: module not available
I IOError: file does not exist
Exceptions
I Syntax:I Java: try...catch to handle, and throw to generateI Python: try...except to handle, and raise to generate
I When to use try...except:I Opening a file (may not exist)I User input (never trust anybody)I Connecting to a database (might be unavailable)
Catching Exceptions
Catch all exceptions, regardless of type:
def int_default_0(x):
try:
return int(x)
except:
return 0
>>> int_default_0('5')5
>>> int_default_0('hi')0 # would have thrown a ValueError
>>> int_default_0([])
0 # would have thrown a TypeError
Catching Exceptions
Catch only a specific type:
def int_default_0(x):
try:
return int(x)
except ValueError:
return 0
>>> int_default_0('hi')0
>>> int_default_0([])
TypeError
Catching Exceptions
Catch multiple types together:
def int_default_0(x):
try:
return int(x)
except (ValueError, TypeError):
return 0
>>> int_default_0('hi')0
>>> int_default_0([])
0
Catching Exceptions
Multiple except blocks:
def int_default_0(x):
try:
return int(x)
except ValueError:
print "Caught a ValueError."
except TypeError:
print "Caught a TypeError."
>>> int_default_0('hi')Caught a ValueError.
>>> int_default_0([])
Caught a TypeError.
Catching Exceptions
Get a reference to the Exception class instance:
def int_default_0(x):
try:
return int(x)
except (ValueError, TypeError) as e:
print e
return 0
>>> int_default_0('hi')invalid literal for int() with base 10: 'hi'>>> int_default_0([])
int() argument must be a string or a number,
not 'list'
Catching Exceptions
If you wanted to know the type of error:
def int_default_0(x):
try:
return int(x)
except (ValueError, TypeError) as e:
print type(e).__name__
return 0
>>> int_default_0('hi')ValueError
>>> int_default_0([])
TypeError
Raising Exceptions
def raise_exception():
raise Exception("silly exception")
def catch_exception():
try:
raise_exception()
except Exception as e:
print e
>>> catch_exception()
silly exception
Defining Custom Exceptions
Define a class that derives from the built-in Exception class:
class InvalidInputException(Exception):
pass
def validate_input(input):
if len(input) == 0:
raise InvalidInputException("Input empty.")
Else and Finally Clauses
def divide(x, y):
try:
result = x/y
except ZeroDivisionError:
print "Division by zero!"
else:
print "Result is %d." % (result)
finally:
print "All done."
Else and Finally Clauses
>>> divide(10,2)
Result is 5.
All done.
>>> divide(10,0)
Division by zero!
All done.
Else and Finally Clauses
Why would you ever use the else clause?
I Why not put the code in the try block?
I Why not put the code after the entire try/except block?
Else and Finally Clauses
Why would you ever use the else clause?
I Why not put the code in the try block?I Well, that code might raise an exception too, but maybe you
didn’t want to protect it! Always keep try blocks small.
I Why not put the code after the entire try/except block?
Else and Finally Clauses
Why would you ever use the else clause?
I Why not put the code in the try block?I Well, that code might raise an exception too, but maybe you
didn’t want to protect it! Always keep try blocks small.
I Why not put the code after the entire try/except block?I Then it will execute after the finally block.
Forgiveness vs. Permission
Bad:
def get_contents(file):
if not os.path.exists(file):
print "File not found."
return None
else:
return open(file).read()
What if file got deleted between the call to os.path.exists andthe call to open? Then we’ll get an error.
Forgiveness vs. Permission
Better:
def get_contents(file):
try:
return open(file).read()
except IOError as e:
print "Unable to open file: " + str(e)
return None
Eiter the file gets open and read, or an exception with allinformation gets printed.
User Input
>>> var = input("Enter: ")
Enter: 5
>>> var
5
>>> type(var)
<type 'int'>
Not very safe, and will disappear in Python 3.0.
User Input
>>> var = raw_input("Enter: ")
Enter: [1,2,3]
>>> var
'[1,2,3]'>>> type(var)
<type 'str'>
Usually safer to use raw_input and manipulate it yourself. Don’ttrust the user!
Opening Files
open function takes a file name as input and returns a file object:
>>> f = open('test.txt', 'r')>>> f.mode
'r'>>> f.name
'test.txt'
If mode is not specified, defaults to ’r’.
Reading Files
read method (with no parameters) consumes all bytes of the fileand returns a string with the data that was read:
>>> s = f.read()
>>> s
'This is a file.\nThis is another line.\n'>>> print s
This is a file.
This is another line.
>>> s = f.read()
>>>
Reading Files
readlines returns a list of strings of lines in the file:
>>> l = f.readlines()
>>> l
['This is a file.\n', 'This is another line.\n']
Reading Files
readlines puts the whole file in memory; if you have a large file,you can simply iterate over the file object itself:
>>> for l in f:
... print l
...
This is a file
This is another line.
>>>
Reading Files
What if we want just the first n lines of a file?
>>> f = open('test2.txt')>>> for line in f:
... print line.rstrip('\n')
...
First line.
Second line.
Third line.
Forth line.
Reading Files
This doesn’t work:
>>> f = open('test.txt')>>> for line in f[:2]:
... print line.rstrip('\n')
...
TypeError: 'file' object has no attribute
'__getitem__'
Remember, f is an iterator, not a list!
Reading Files
One option:
>>> f = open('test.txt')>>> for line in [f.next() for _ in range(2)]:
... print line.rstrip('\n')
...
First line.
Second line.
Reading Files
Even better:
>>> f = open('test.txt')>>> for line in itertools.islice(f, 2):
... print line.rstrip('\n')
...
First line.
Second line.
Writing to Files
I “Append” mode:I open(file, ‘a’)I Add data to the end of the file
I “Write” mode:I open(file, ‘w’)I Overwrite the file
I Either will create the file if it does not already exist, whereasopen(file) would throw an IOError
Writing to Files
>>> f = open('log.txt', 'w')>>> f.write('First line.\n') # need new line
>>> f.close()
>>> f = open('log.txt', 'a')>>> f.write('Second line.')>>> f.close()
>>> f = open('log.txt')>>> f.read()
First line.
Second line.
Closing Files
Don’t forget:
f.close()
Why is this important?
Closing Files
But if you are forgetful ...
with open("test.txt") as file:
data = file.read()
print data
I No matter how we exit the block, f.close() will be calledI What a with-statement does depends on the object
I In fact, it works for any object if the magic methods__enter__ and __exit__ are defined
I More duck typing!