CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs....

67
CIS 192: Lecture 5 Iterators and I/O Lili Dworkin University of Pennsylvania

Transcript of CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs....

Page 1: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

CIS 192: Lecture 5Iterators and I/O

Lili Dworkin

University of Pennsylvania

Page 2: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

String Formatting

Recall:

>>> a = 0

>>> b = 0

>>> "a = %d, b = %d" % (0, 1)

'a = 0, b = 1'

Page 3: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

String Formatting

How can we accomplish the following in one line?

>>> d = {'a':0, 'b': 1, 'c': 2}

>>> ...

'a: 0, c: 2, b: 1'

Page 4: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

String Formatting

How can we accomplish the following in one line?

>>> d = {'a':0, 'b': 1, 'c': 2}

>>> ', '.join(['%s: %d' % (key, value) for

(key, value) in d.items()])

'a: 0, c: 2, b: 1'

Page 5: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

String Formatting

Even better:

>>> d = {'a':0, 'b': 1, 'c': 2}

>>> ', '.join(['%s: %d' % x for x in d.items()])

'a: 0, c: 2, b: 1'

Page 6: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

String Formatting

Things can sometimes get tedious:

os.system('scp class%d/lec%d/slides/lec%d.pdf

[email protected]:~/html/files/'% (num, num, num))

Page 7: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

String Formatting

Another (maybe preferable) option:

>>> 'class{0}/lec{0}/slides/lec{0}.pdf'.format(num)'scp class2/lec2/slides/lec2.pdf'

Page 8: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

String Formatting

More generally:

>>> template = "{0} is {1} years old. {0} is a girl."

>>> template.format("Annie", 20)

'Annie is 20 years old. Annie is a girl.'

Page 9: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

String Formatting

And even more generally:

>>> template = "{name} is {age} years old. {name} is

a girl."

>>> template.format(name="Annie", age=20)

'Annie is 20 years old. Annie is a girl.'

Page 10: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

String Formatting

Missing args and kwargs?

>>> l = [1, 10, 100, 1000, 10000]

>>> template = 'First element: {0}, Forth Element:

{3}'>>> template.format(*l)

'First element: 1, Forth Element: 1000'

Page 11: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

String Formatting

Missing args and kwargs?

>>> d = {"name": "John", "age": 45, "gender": "male"}

>>> template = "{name} is a {age} year old {gender}."

>>> template.format(**d)

'John is a 45 year old male.'

Page 12: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Iterables vs. Iterators

Iterable objects can be used in a for loop because they have an__iter__ magic method, which converts them to iterator objects:

>>> l = [1,2,3]

>>> l.__iter__()

<listiterator object at 0x100a85590>

>>> iter(l)

<listiterator object at 0x100a85550>

Page 13: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Iterators

Iterators are objects with a next method:

>>> i = iter(l)

>>> i.next()

1

>>> i.next()

2

>>> i.next()

3

>>> i.next()

StopIteration

Page 14: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Iterators

for ... in ... is just syntactic sugar for the following:

1. Call __iter__ to create an iterator

2. Call next on the iterator

3. Catch StopIteration exceptions

Page 15: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Iterators

We can define our own iterators. Below is an object that is aniterable *and* an iterator:

class EveryOther:

def __init__(self, seq):

self.seq = seq

self.index = 0

def __iter__(self):

return self

def next(self):

if self.index >= len(self.seq):

raise StopIteration

self.index += 2

return self.seq[self.index - 2]

Page 16: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Iterators

>>> l = EveryOther([1,2,3,4])

>>> for x in l:

... print x

...

1

3

Page 17: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Iterators

How about a “reverse” iterator?

class Reverse:

def __init__(self, seq):

self.seq = seq

self.index = ?

def __iter__(self):

return ?

def next(self):

if ...

raise StopIteration

self.index = ?

return ?

Page 18: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Iterators

How about a “reverse” iterator?

class Reverse:

def __init__(self, seq):

self.seq = seq

self.index = len(seq)

def __iter__(self):

return self

def next(self):

if self.index == 0

raise StopIteration

self.index = self.index - 1

return self.seq[self.index]

Page 19: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Generators

I Downside of iterators – lots of boilerplate and extra state (i.e.self.index). We hate that in Python!

I Generators are specific kinds of iterators (i.e. they have anext method)

I We create generators by writing functions that contain theyield keyword

Page 20: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Generators

def counter(x):

print "Starting."

while True:

yield x

print "Incrementing x."

x = x + 1

>>> g = counter(5)

>>> g

<generator object counter at 0x100a87050>

Page 21: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Generators

Each time we call the next method on the generator, the functionprocesses up until it encounters a yield statement, and then itstops and returns the value that was yielded. Next time, it resumeswhere it left off.

>>> g.next()

Starting.

5

>>> g.next()

Incrementing x.

6

Page 22: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Fibonacci Revisited

Return a sequence of the first n Fibonacci numbers:

def fib_iter(n):

a, b = 0, 1

l = [a]

for i in range(n-1):

a, b = b, a + b

l.append(a)

return l

def fib_gen(n):

a, b = 0, 1

while a < n:

yield a

a, b = b, a + b

Page 23: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Fibonacci Revisited

>>> l = fib_iter(5)

>>> l

[0, 1, 1, 2, 3]

>>> g = fib_gen(5)

>>> g

<generator object fib_gen at 0x100a87190>

>>> [i for i in g]

[0, 1, 1, 2, 3]

Page 24: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Fibonacci Revisited

Generators can only be used once! Afterwards, StopIterationkeeps getting raised.

>>> for x in l: print x,

...

0 1 1 2 3

>>> for x in g: print x,

...

0 1 1 2 3

>>> for x in g: print x,

...

>>>

Page 25: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Generators

I Values are computed one at a time, as they’re needed

I Avoids storing the entire sequence in memory

I Good for aggregating (summing, counting) items

I Good for infinite sequences

I Bad if you need to inspect the individual values

>>> g[0]

TypeError

Page 26: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Summing Generators

What if we want to sum the first million squares? Storing all ofthem would use a lot of memory.

def squares(n):

for i in range(n):

yield i ** 2

>>> g = squares(100)

>>> sum(g)

328350

Page 27: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Generator Comprehensions

Can also create generators on the fly:

>>> g = (i ** 2 for i in range(100))

>>> g.next()

0

>>> g.next()

1

>>> sum(g)

328349 # why is this one less than before?

Page 28: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Generator Comprehensions

Using range defeats the purpose (why?) So we should use xrange

instead, which is also a generator! Find the sum of all multiples of3 or 5 below 1000:

>>> g = (i for i in xrange(1000)

if i % 3 == 0 or i % 5 == 0)

>>> sum(g)

233168

Page 29: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Generator Comprehensions

How to calculate the length of a generator?

>>> g = (i for i in xrange(1000)

if i % 3 == 0 or i % 5 == 0)

>>> len(g)

TypeError

Can you think of another way using sum?

Page 30: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Generator Comprehensions

How to calculate the length of a generator?

>>> sum(1 for _ in g)

467

Page 31: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Infinite Generator

What about infinite generators, like the first one we saw?

def counter(x):

while True:

yield x

x = x + 1

Can’t sum it or get the length. What if we want the first 3elements?

Page 32: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Infinite Generator

Attempt 1:

>>> g = counter(5)

>>> l = [i for in g]

>>> l[:3]

What goes wrong?

Page 33: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Infinite Generator

Attempt 2:

>>> g = counter(5)

>>> [g.next() for _ in range(3)]

[5, 6, 7, 8, 9]

Page 34: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Infinite Generator

Better / more flexible option:

>>> import itertools

>>> [i for i in itertools.islice(counter(5), 3)]

[5, 6, 7]

>>> [i for i in itertools.islice(counter(5), 5, 7)]

[8, 9]

This works for *all* iterators! We’ll see it again later.

Page 35: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Custom Generators

I We can also define our own generators in a class, as we didwith iterators

I Now the __iter__ method should return a generator, whichmeans it should have a yield statement

I This will save us from all the boilerplate and extra state wehad before

Page 36: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Custom Generators

We can turn EveryOther into a generator:

class EveryOtherGen():

def __init__(self, seq):

self.seq = seq

def __iter__(self):

for index in range(0, len(self.seq), 2):

yield self.seq[index]

Page 37: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Exceptions

I KeyError: accessing a non-existent dictionary key

I AttributeError: calling a non-existent method

I NameError: referencing a non-existent variable

I TypeError: mixing data-types

I ValueError: right type, wrong value

I ImportError: module not available

I IOError: file does not exist

Page 38: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Exceptions

I Syntax:I Java: try...catch to handle, and throw to generateI Python: try...except to handle, and raise to generate

I When to use try...except:I Opening a file (may not exist)I User input (never trust anybody)I Connecting to a database (might be unavailable)

Page 39: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Catching Exceptions

Catch all exceptions, regardless of type:

def int_default_0(x):

try:

return int(x)

except:

return 0

>>> int_default_0('5')5

>>> int_default_0('hi')0 # would have thrown a ValueError

>>> int_default_0([])

0 # would have thrown a TypeError

Page 40: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Catching Exceptions

Catch only a specific type:

def int_default_0(x):

try:

return int(x)

except ValueError:

return 0

>>> int_default_0('hi')0

>>> int_default_0([])

TypeError

Page 41: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Catching Exceptions

Catch multiple types together:

def int_default_0(x):

try:

return int(x)

except (ValueError, TypeError):

return 0

>>> int_default_0('hi')0

>>> int_default_0([])

0

Page 42: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Catching Exceptions

Multiple except blocks:

def int_default_0(x):

try:

return int(x)

except ValueError:

print "Caught a ValueError."

except TypeError:

print "Caught a TypeError."

>>> int_default_0('hi')Caught a ValueError.

>>> int_default_0([])

Caught a TypeError.

Page 43: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Catching Exceptions

Get a reference to the Exception class instance:

def int_default_0(x):

try:

return int(x)

except (ValueError, TypeError) as e:

print e

return 0

>>> int_default_0('hi')invalid literal for int() with base 10: 'hi'>>> int_default_0([])

int() argument must be a string or a number,

not 'list'

Page 44: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Catching Exceptions

If you wanted to know the type of error:

def int_default_0(x):

try:

return int(x)

except (ValueError, TypeError) as e:

print type(e).__name__

return 0

>>> int_default_0('hi')ValueError

>>> int_default_0([])

TypeError

Page 45: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Raising Exceptions

def raise_exception():

raise Exception("silly exception")

def catch_exception():

try:

raise_exception()

except Exception as e:

print e

>>> catch_exception()

silly exception

Page 46: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Defining Custom Exceptions

Define a class that derives from the built-in Exception class:

class InvalidInputException(Exception):

pass

def validate_input(input):

if len(input) == 0:

raise InvalidInputException("Input empty.")

Page 47: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Else and Finally Clauses

def divide(x, y):

try:

result = x/y

except ZeroDivisionError:

print "Division by zero!"

else:

print "Result is %d." % (result)

finally:

print "All done."

Page 48: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Else and Finally Clauses

>>> divide(10,2)

Result is 5.

All done.

>>> divide(10,0)

Division by zero!

All done.

Page 49: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Else and Finally Clauses

Why would you ever use the else clause?

I Why not put the code in the try block?

I Why not put the code after the entire try/except block?

Page 50: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Else and Finally Clauses

Why would you ever use the else clause?

I Why not put the code in the try block?I Well, that code might raise an exception too, but maybe you

didn’t want to protect it! Always keep try blocks small.

I Why not put the code after the entire try/except block?

Page 51: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Else and Finally Clauses

Why would you ever use the else clause?

I Why not put the code in the try block?I Well, that code might raise an exception too, but maybe you

didn’t want to protect it! Always keep try blocks small.

I Why not put the code after the entire try/except block?I Then it will execute after the finally block.

Page 52: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Forgiveness vs. Permission

Bad:

def get_contents(file):

if not os.path.exists(file):

print "File not found."

return None

else:

return open(file).read()

What if file got deleted between the call to os.path.exists andthe call to open? Then we’ll get an error.

Page 53: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Forgiveness vs. Permission

Better:

def get_contents(file):

try:

return open(file).read()

except IOError as e:

print "Unable to open file: " + str(e)

return None

Eiter the file gets open and read, or an exception with allinformation gets printed.

Page 54: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

User Input

>>> var = input("Enter: ")

Enter: 5

>>> var

5

>>> type(var)

<type 'int'>

Not very safe, and will disappear in Python 3.0.

Page 55: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

User Input

>>> var = raw_input("Enter: ")

Enter: [1,2,3]

>>> var

'[1,2,3]'>>> type(var)

<type 'str'>

Usually safer to use raw_input and manipulate it yourself. Don’ttrust the user!

Page 56: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Opening Files

open function takes a file name as input and returns a file object:

>>> f = open('test.txt', 'r')>>> f.mode

'r'>>> f.name

'test.txt'

If mode is not specified, defaults to ’r’.

Page 57: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Reading Files

read method (with no parameters) consumes all bytes of the fileand returns a string with the data that was read:

>>> s = f.read()

>>> s

'This is a file.\nThis is another line.\n'>>> print s

This is a file.

This is another line.

>>> s = f.read()

>>>

Page 58: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Reading Files

readlines returns a list of strings of lines in the file:

>>> l = f.readlines()

>>> l

['This is a file.\n', 'This is another line.\n']

Page 59: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Reading Files

readlines puts the whole file in memory; if you have a large file,you can simply iterate over the file object itself:

>>> for l in f:

... print l

...

This is a file

This is another line.

>>>

Page 60: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Reading Files

What if we want just the first n lines of a file?

>>> f = open('test2.txt')>>> for line in f:

... print line.rstrip('\n')

...

First line.

Second line.

Third line.

Forth line.

Page 61: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Reading Files

This doesn’t work:

>>> f = open('test.txt')>>> for line in f[:2]:

... print line.rstrip('\n')

...

TypeError: 'file' object has no attribute

'__getitem__'

Remember, f is an iterator, not a list!

Page 62: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Reading Files

One option:

>>> f = open('test.txt')>>> for line in [f.next() for _ in range(2)]:

... print line.rstrip('\n')

...

First line.

Second line.

Page 63: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Reading Files

Even better:

>>> f = open('test.txt')>>> for line in itertools.islice(f, 2):

... print line.rstrip('\n')

...

First line.

Second line.

Page 64: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Writing to Files

I “Append” mode:I open(file, ‘a’)I Add data to the end of the file

I “Write” mode:I open(file, ‘w’)I Overwrite the file

I Either will create the file if it does not already exist, whereasopen(file) would throw an IOError

Page 65: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Writing to Files

>>> f = open('log.txt', 'w')>>> f.write('First line.\n') # need new line

>>> f.close()

>>> f = open('log.txt', 'a')>>> f.write('Second line.')>>> f.close()

>>> f = open('log.txt')>>> f.read()

First line.

Second line.

Page 66: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Closing Files

Don’t forget:

f.close()

Why is this important?

Page 67: CIS 192: Lecture 5 Iterators and I/Ocis192/fall2014/files/lec5.pdf · 2015-12-22 · Iterables vs. Iterators Iterable objects can be used in a for loop because they have an __iter__

Closing Files

But if you are forgetful ...

with open("test.txt") as file:

data = file.read()

print data

I No matter how we exit the block, f.close() will be calledI What a with-statement does depends on the object

I In fact, it works for any object if the magic methods__enter__ and __exit__ are defined

I More duck typing!