Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

22
Python Crash Course Python Crash Course File I/O File I/O Bachelors V1.0 dd 20-01-2015 Hour 2

Transcript of Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

Page 1: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

Python Crash CoursePython Crash CourseFile I/OFile I/O

Bachelors

V1.0

dd 20-01-2015

Hour 2

Page 2: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

File I/OFile I/O

• Types of input/output available– Interactive

• Keyboard• Screen

– Files• Ascii/text

– txt– csv

• Binary• Structured

– FITS > pyFITS, astropy.io.fits

• URL• Pipes

Page 3: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

Interactive I/O, fancy outputInteractive I/O, fancy output

>>> s = 'Hello, world.'

>>> str(s)

'Hello, world.'

>>> repr(s)

"'Hello, world.'"

>>> str(1.0/7.0)

'0.142857142857'

>>> repr(1.0/7.0)

'0.14285714285714285'

>>> x = 10 * 3.25

>>> y = 200 * 200

>>> s = 'The value of x is ' + repr(x) + ', and y is ' + repr(y) + '...'

>>> print s

The value of x is 32.5, and y is 40000...

>>> # The repr() of a string adds string quotes and backslashes:

... hello = 'hello, world\n'

>>> hellos = repr(hello)

>>> print hellos

'hello, world\n'

>>> # The argument to repr() may be any Python object:

... repr((x, y, ('spam', 'eggs')))

"(32.5, 40000, ('spam', 'eggs'))"

Page 4: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

Interactive I/O, fancy outputInteractive I/O, fancy output

>>> import math

>>> print 'The value of PI is approximately %5.3f.' % math.pi

The value of PI is approximately 3.142.

Old string formatting

>>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 7678}

>>> for name, phone in table.items():

... print '{0:10} ==> {1:10d}'.format(name, phone)

...

Jack ==> 4098

Dcab ==> 7678

Sjoerd ==> 4127

New string formatting

Page 5: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

Formatting I/OFormatting I/O

A conversion specifier contains two or more characters and has the following components, which must occur in this order: •The "%" character, which marks the start of the specifier.

•Mapping key (optional), consisting of a parenthesised sequence of characters (for example, (somename)).

•Conversion flags (optional), which affect the result of some conversion types.

•Minimum field width (optional). If specified as an "*" (asterisk), the actual width is read from the next element of the tuple in values, and the object to convert comes after the minimum field width and optional precision.

•Precision (optional), given as a "." (dot) followed by the precision. If specified as "*" (an asterisk), the actual width is read from the next element of the tuple in values, and the value to convert comes after the precision.

•Length modifier (optional).

•Conversion type.

>>> print '%(language)s has %(#)03d quote types.' % \

{'language': "Python", "#": 2}

Python has 002 quote types.

Page 6: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

Conversion Meaning

d Signed integer decimal.

i Signed integer decimal.

o Unsigned octal.

u Unsigned decimal.

x Unsigned hexadecimal (lowercase).

X Unsigned hexadecimal (uppercase).

e Floating point exponential format (lowercase).

E Floating point exponential format (uppercase).

f Floating point decimal format.

F Floating point decimal format.

g Same as "e" if exponent is greater than -4 or less than precision, "f" otherwise.

G Same as "E" if exponent is greater than -4 or less than precision, "F" otherwise.

c Single character (accepts integer or single character string).

r String (converts any python object using repr()).

s String (converts any python object using str()).

% No argument is converted, results in a "%" character in the result.

The conversion types are:

Page 7: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

Interactive I/OInteractive I/O

>>> print “Python is great,”, ”isn’t it?”

>>> str = raw_input( “Enter your input: ”)

>>> print “Received input is: “,str

Enter your input: Hello Python

Received input is: Hello Python

>>> str = input("Enter your input: ");

>>> print "Received input is: ", str

Enter your input: [x*5 for x in range(2,10,2)]

Received input is: [10, 20, 30, 40]

If the readline modules was loaded the raw_input() will use it to provide elaborate line editing and history features.

Page 8: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

File I/OFile I/O

>>> fname = ‘myfile.dat’

>>> f = file(fname)

>>> lines = f.readlines()

>>> f.close()

>>> f = file(fname)

>>> firstline = f.readline()

>>> secondline = f.readline()

>>> f = file(fname)

>>> for l in f:

... print l.split()[1]

>>> f.close()

>>> outfname = ‘myoutput’

>>> outf = file(outfname, ‘w’) # second argument denotes writable

>>> outf.write(‘My very own file\n’)

>>> outf.close()

Page 9: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

Read File I/ORead File I/O

>>> f = open("test.txt")

>>> # Read everything into single string:

>>> content = f.read()

>>> len(content)

>>> print content

>>> f.read() # At End Of File

>>> f.close()

>>> # f.read(20) reads (at most) 20 bytes

Using with block:>>> with open(’test.txt’, ’r’) as f:

... content = f.read()

>>> f.closed

CSV file:>>> import csv>>> ifile = open(’photoz.csv’, "r")>>> reader = csv.reader(ifile)>>> for row in reader:... print row,>>> ifile.close()

Page 10: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

Read and write text fileRead and write text file

>>> from numpy import *

>>> data = loadtxt("myfile.txt") # myfile.txt contains 4 columns of numbers

>>> t,z = data[:,0], data[:,3] # data is a 2D numpy array, t is 1st col, z is 4th col

>>> t,x,y,z = loadtxt("myfile.txt", unpack=True) # to automatically unpack all columns

>>> t,z = loadtxt("myfile.txt", usecols = (0,3), unpack=True) # to select just a few columns

>>> data = loadtxt("myfile.txt", skiprows = 7) # to skip 7 rows from top of file

>>> data = loadtxt("myfile.txt", comments = '!') # use '!' as comment char instead of '#'

>>> data = loadtxt("myfile.txt", delimiter=';') # use ';' as column separator instead of whitespace

>>> data = loadtxt("myfile.txt", dtype = int) # file contains integers instead of floats

>>> from numpy import *

>>> savetxt("myfile.txt", data) # data is 2D array

>>> savetxt("myfile.txt", x) # if x is 1D array then get 1 column in file.

>>> savetxt("myfile.txt", (x,y)) # x,y are 1D arrays. 2 rows in file.

>>> savetxt("myfile.txt", transpose((x,y))) # x,y are 1D arrays. 2 columns in file.

>>> savetxt("myfile.txt", transpose((x,y)), fmt='%6.3f') # use new format instead of '%.18e'

>>> savetxt("myfile.txt", data, delimiter = ';') # use ';' to separate columns instead of space

Page 11: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

String formatting for outputString formatting for output

>>> sigma = 6.76/2.354

>>> print(‘sigma is %5.3f metres’%sigma)

sigma is 2.872 metres

>>> d = {‘bob’: 1.87, ‘fred’: 1.768}

>>> for name, height in d.items():

... print(‘%s is %.2f metres tall’%(name.capitalize(), height))

...

Bob is 1.87 metres tall

Fred is 1.77 metres tall

>>> nsweets = range(100)

>>> calories = [i * 2.345 for i in nsweets]

>>> fout = file(‘sweetinfo.txt’, ‘w’)

>>> for i in range(nsweets):

... fout.write(‘%5i %8.3f\n’%(nsweets[i], calories[i]))

...

>>> fout.close()

Page 12: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

File I/O, CSV filesFile I/O, CSV files

• CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases.

• Functions– csv.reader– csv.writer– csv.register_dialect– csv.unregister_dialect– csv.get_dialect– csv.list_dialects– csv.field_size_limit

Page 13: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

File I/O, CSV filesFile I/O, CSV files

• Reading CSV files

• Writing CSV files

import csv # imports the csv module

f = open('data1.csv', 'rb') # opens the csv file

try:

reader = csv.reader(f) # creates the reader object

for row in reader: # iterates the rows of the file in orders

print row # prints each row

finally:

f.close() # closing

import csv

ifile = open('test.csv', "rb")

reader = csv.reader(ifile)

ofile = open('ttest.csv', "wb")

writer = csv.writer(ofile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)

for row in reader:

writer.writerow(row)

ifile.close()

ofile.close()

Page 14: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

File I/O, CSV filesFile I/O, CSV files

• The csv module contains a the following quoting options. • csv.QUOTE_ALL

Quote everything, regardless of type.• csv.QUOTE_MINIMAL

Quote fields with special characters• csv.QUOTE_NONNUMERIC

Quote all fields that are not integers or floats• csv.QUOTE_NONE

Do not quote anything on output

Page 15: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

File I/O, PickleFile I/O, Pickle

• Pickle: powerful algorithm for serializing and de-serializing a Python object structure

– can transform a complex object into a byte stream– can transform the byte stream into an object with the same internal structure– most obvious thing to do with these byte streams is to write them onto a file– also conceivable to send them across a network or store them in a database

• The following types can be pickled:– None, True, and False– integers, long integers, floating point numbers, complex numbers– normal and Unicode strings– tuples, lists, sets, and dictionaries containing only picklable objects– functions defined at the top level of a module– built-in functions defined at the top level of a module– classes that are defined at the top level of a module– instances of such classes whose __dict__ or the result of calling __getstate__() is

picklable (see section The pickle protocol for details).

Page 16: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

File I/O, PickleFile I/O, Pickle

• Example saveimport pickle

data1 = {'a': [1, 2.0, 3, 4+6j],

'b': ('string', u'Unicode string'),

'c': None}

selfref_list = [1, 2, 3]

selfref_list.append(selfref_list)

output = open('data.pkl', 'wb')

# Pickle dictionary using protocol 0.

pickle.dump(data1, output)

# Pickle the list using the highest protocol available.

pickle.dump(selfref_list, output, -1)

output.close()

Page 17: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

File I/O, PickleFile I/O, Pickle

• Example loadimport pprint, pickle

pkl_file = open('data.pkl', 'rb')

data1 = pickle.load(pkl_file)

pprint.pprint(data1)

data2 = pickle.load(pkl_file)

pprint.pprint(data2)

pkl_file.close()

(dp0

S'a'

p1

(lp2

I1

aF2.0

aI3

ac__builtin__

complex

p3

(F4.0

F6.0

tp4

Rp5

asS'c'

p6

NsS'b'

p7

(S'string'

p8

VUnicode string

p9

tp10

s. ]q�

Page 18: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

File I/O, PickleFile I/O, Pickle

• Real live example AstroWise cluster job submission– client – server model– exchanging code & data for remote processing

def dpu_packit(*args):

return pickle.dumps(args)

def dpu_unpackit(data):

return pickle.loads(data)

# Sender side:

def submitremotejobs(self, key, zip=None, jobs=[], env=None):

if not len(jobs): return False

return self.senddata(key, dpu_packit((zip, env), jobs))

# Receiver side:

data = self.get_data()

((code, env), jobdictlist) = dpu_unpackit(data)

make_code_file(key, code)

Page 19: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

URLURL

• URLS can be used for reading

>>> import urllib2

>>> url = 'http://python4astronomers.github.com/_downloads/data.txt'

>>> response = urllib2.urlopen(url)

>>> data = response.read()

>>> print data

RAJ DEJ Jmag e_Jmag

2000 (deg) 2000 (deg) 2MASS (mag) (mag)

---------- ---------- ----------------- ------ ------

010.684737 +41.269035 00424433+4116085 9.453 0.052

010.683469 +41.268585 00424403+4116069 9.321 0.022

010.685657 +41.269550 00424455+4116103 10.773 0.069

010.686026 +41.269226 00424464+4116092 9.299 0.063

010.683465 +41.269676 00424403+4116108 11.507 0.056

010.686015 +41.269630 00424464+4116106 9.399 0.045

010.685270 +41.267124 00424446+4116016 12.070 0.035

Page 20: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

URLURL

• URLS sometimes need input data. Such as POST data for a form

import urllib

import urllib2

url = 'http://www.someserver.com/cgi-bin/register.cgi'

values = {'name' : 'Michael Foord',

'location' : 'Northampton',

'language' : 'Python' }

data = urllib.urlencode(values)

req = urllib2.Request(url, data)

response = urllib2.urlopen(req)

the_page = response.read()

Page 21: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

URLURL

• And for GET type of parameter passing:

import urllib

import urllib2>>> import urllib2

>>> import urllib

>>> data = {}

>>> data['name'] = 'Somebody Here'

>>> data['location'] = 'Northampton'

>>> data['language'] = 'Python'

>>> url_values = urllib.urlencode(data)

>>> print url_values # The order may differ.

name=Somebody+Here&language=Python&location=Northampton

>>> url = 'http://www.example.com/example.cgi'

>>> full_url = url + '?' + url_values

>>> handler = urllib2.urlopen(full_url)

Note that the full URL is created by adding a ? to the URL, followed by the encoded values.

Page 22: Python Crash Course File I/O Bachelors V1.0 dd 20-01-2015 Hour 2.

Introduction to languageIntroduction to language

End