(Python) Fundamentals

89
(Python) Fundamentals

description

(Python) Fundamentals. geek think. Problem : Calculate GC-content of a DNA molecule. GC-content : the percentage of nitrogenous bases on a DNA molecule which are either guanine or cytosine. http://en.wikipedia.org/wiki/GC-content. Given : DNA sequence, e.g. "ACTGCT...". - PowerPoint PPT Presentation

Transcript of (Python) Fundamentals

Page 1: (Python) Fundamentals

(Python) Fundamentals

Page 2: (Python) Fundamentals

[email protected], 18.05.10

geek think

GC-content: the percentage of nitrogenous bases on a DNA molecule which are either guanine or cytosine http://en.wikipedia.org/wiki/GC-content

Problem: Calculate GC-content of a DNA molecule

Given: DNA sequence, e.g. "ACTGCT..."= simple model of a DNA molecule=> implies a data structure to work with

How: use a formula/algorithm

algorithm: count nucleotide frequencies and apply formula

formula:

Page 3: (Python) Fundamentals

[email protected], 18.05.10

algorithm algorithm = protocol: a formal description of how to do something programming is:

development/implementation of an algorithm in some programming language

works but can we do better?

gc = sequence.count('G') + sequence.count('C')print "GC-content", 100.0*gc/len(sequence)

sequence = 'AAAGTCTGACTTTATCTATGG'

DNA model:

A = sequence.count('A')T = sequence.count('T')G = sequence.count('G')C = sequence.count('C')100*(G+C)/(A+T+G+C)

Algorithm:

Page 4: (Python) Fundamentals

[email protected], 18.05.10

What can possibly go wrong…

gc = 0for base in sequence: if base in 'GC': gc = gc + 1.0 if base in 'YRWSKM': gc = gc + 0.5 if base in 'DH': gc = gc + 0.33 if base in 'VB': gc = gc + 0.66

What about lower case letters, e.g. "actgcag"?

What about ambiguity codes, e.g. Y = C/T

sequence = sequence.upper()

ignore or count or use fraction ...

What about gaps, e.g. "ACTA--GCG-T"?sequence = sequence.remove('-')

Page 5: (Python) Fundamentals

[email protected], 18.05.10

IDLE

unix: #! /usr/bin/env python

Help: help(), help(...)Python doc: F1 save with extension .py

History previous: ALT+p History next: ALT+n

a python shell and editor Shell: allows you to execute Python commands Editor: allows you to write Python programs and run them

Shell Editor

Run program: F5Indentation has meaning!

see also PyPE editor: http://pype.sourceforge.net/index.shtml

Page 6: (Python) Fundamentals

[email protected], 18.05.10

simple data types

bool: Boolean, e.g. True, False int: Integer, e.g. 12, 23345 float: Floating point number, e.g. 3.1415, 1.1e-5 string: Character string, e.g. "This is a string" ...

Values of different type allow different operations

12 + 3

"12" + "3"

=> 15 12 - 3 => 9

=> "123"

"12" - "3"

=> ERROR!

"12"*3 => "121212"

Page 7: (Python) Fundamentals

[email protected], 18.05.10

structured typesstructured data types are composed of other simple or structured types

string: 'abc' list of integers: [1,2,3] list of strings: ['abc', 'def'] list of lists of integers: [[1,2,3],[4,5,6]] ...

there are much more advanced data structures...

data structures are models of the objects you want to work with

Page 8: (Python) Fundamentals

[email protected], 18.05.10

variables and references container for a value, name of a memory cell primitive values: numbers, booleans, ... reference values: address of a data structure, eg. a list, string, ...

age = 42

ages = [12,4,7]

42age

&0

42

4

1247

&0&1&2&3

&5&6

&4

...

memoryaddress

ages

12 74reference &44&2

Page 9: (Python) Fundamentals

[email protected], 18.05.10

variables and references 2

ages1 = [12,4,7]ages2 = ages1

42age1

age1 = 42age2 = age1

&0 42

age2&1

ages1

12 74&44&2

ages2

4&3ages1[1] = 9

age1 = 7

=> age2 is 42

=> ages2[1] is 9 !

try:id(age1)id(age2)7

age1&0 42

age2&1

9

Page 10: (Python) Fundamentals

[email protected], 18.05.10

data structures 1

List (Array)- collection of (many) things- constant access time - linear search time- changeable (=mutable)

colors = ['red', 'red', 'blue']colors[1] => 'red'

'blue' in colors

=> True

colors[1] = “pink”

organizing data depending on task and usage pattern

address = (3, 'Queen Street')

address[0]

=> 3

3 in address => True

address[0] = 4

Tuple- collection of (a few) things- constant access time - linear search time- not changeable (=immutable) => can be used as hash key

Page 11: (Python) Fundamentals

[email protected], 18.05.10

data structures 2 Set

- collection of unique things- no random access- constant search time- no guaranteed order of values

even = set([2,4,6,8])

even[1]

6 in even

=> True

for e in even: print e

tel = {'Stef':62606, 'Mel':62663}tel['Mel']

=> 62663

for name in tel: print name

'Stef' in tel => True

62663 in tel.values()

=> True

Dictionary (Hash)- maps keys to values- constant access time via key- constant search time for key- linear search time for value- no guaranteed order

Page 12: (Python) Fundamentals

[email protected], 18.05.10

data structures - tips

List- many similar items to store e.g, numbers, protein ids, sequences, ...- no need to find a specific item fast- fast access to items at a specific position in the list

Tuple- a few (<10), different items to store e.g. addresses, protein id and its sequence, ...- want to use it as dictionary key

Set- many, unique items to store e.g. unique protein ids- need to know quickly if a specific item is in the set

Dictionary- map from keys to values, is a look-up table e.g. telephone dictionary, amino acid letters to hydrophobicity values- need to get quickly the value for a key

when to use what?

Page 13: (Python) Fundamentals

[email protected], 18.05.10

1,2,3... action

statement- executes some function or operation, e.g. print 1+2

condition- describes when something is done, e.g. if number > 3: print "greater than 3"

iteration- describes how to repeat something, e.g. for number in [1,2,3]: print number

how to do something

Page 14: (Python) Fundamentals

[email protected], 18.05.10

conditionif x < 10: print "in range"

if x < 5: print "lower range"elif x < 10: print "upper range"else: print "out of range"

if condition : do_something

if condition : do_somethingelse: do_something_else

if x < 5: print "lower range"else: print "out of range"

if condition : do_somethingelif condition2 : do_something_else1 else: do_something_else2

Page 15: (Python) Fundamentals

[email protected], 18.05.10

iteration

for char in "some text": print char

for variable in sequence : do_something

for i in xrange(10): print i

for color in ["red","green","blue"]: print color

while condition : statement1 statement2 …

i = 0while i < 10 : print i i += 1

Page 16: (Python) Fundamentals

[email protected], 18.05.10

functions

def function(p1,p2,...): do_something return ...

break complex problems in manageable pieces encapsulate/generalize common functionalities

def divider(): print "--------"

def divider(ch,n): print ch*n

def output(text): print text

text = "outer"print textoutput("inner") print text

def add(a,b): return a+b

text = "outer"

text == "inner"

output(text)

Scope of a variable

Page 17: (Python) Fundamentals

[email protected], 18.05.10

complete example def count_gc(sequence): """Counts the nitrogenous bases of the given sequence. Ambiguous bases are counted fractionally. Sequence must be in upper case""" gc = 0 for base in sequence: if base in 'GC': gc += 1.0 elif base in 'YRWSKM': gc += 0.5 elif base in 'DH': gc += 0.33 elif base in 'VB': gc += 0.66 return gc

def gc_content(sequence): """Calculates the GC content of a DNA sequence. Mixed case, gaps and ambiguity codes are permitted""" sequence = sequence.upper().remove('-') if not sequence: return 0 return 100.0 * count_gc(sequence) / len(sequence)

print gc_content("actacgattagag")

Page 18: (Python) Fundamentals

[email protected], 18.05.10

tips

1. no tabs, use 4 spaces for indentation2. lines should not be longer than 80 characters3. break complex code into small functions4. do not duplicate code, create functions instead

Page 19: (Python) Fundamentals

[email protected], 18.05.10

questions

Page 20: (Python) Fundamentals

[email protected], 18.05.10

morning tea

with kind regards of

Page 21: (Python) Fundamentals

Python Basics

Page 22: (Python) Fundamentals

[email protected], 18.05.10

overview

9:00 - 9:45 Programming basics

Morning tea

10:00 - 11:45 Python basics

Break

12:00-12:45 Advanced Python

12:45-13:00 QFAB

... please ask questions!

Page 23: (Python) Fundamentals

[email protected], 18.05.10

let’s play

• load fasta sequences• print name, length, first 10 symbols• min, max, mean length• find shortest• plot lengths histogram• calc GC content• write GCs to file• plot GC histogram• calc correlation coefficient• scatter plot• scatter plot over many

Page 24: (Python) Fundamentals

[email protected], 18.05.10

survival kit

#! /usr/bin/env python $ chmod +x myscript.py

help(...), dir(...), google

Auto completion: CTRL+SPACE/TABCall tips: CTRL+BACKSLASH

Python doc: F1 IDLE:

Always 4 spaces, never tabs! Indentation has meaning!

def/if/for/while … : http://www.quuux.com/stefan/slides.htmlhttp://www.python.orghttp://www.java2s.com/Code/Python/CatalogPython.htmhttp://biopython.org http://matplotlib.sourceforge.net/http://www.scipy.org/Cookbook/Matplotlibhttp://cheeseshop.python.orghttp://www.scipy.org/Cookbook

History previous: ALT+p History next: ALT+n

Page 25: (Python) Fundamentals

[email protected], 18.05.10

data types

# simple typesstring : "string" integer : 42long : 4200000Lfloat : 3.145hex : 0xFFboolean : Truecomplex =:1+2j

# structured typeslist = [1,2,'a']tuple = (1,2,'a') dict = {"pi":3.14, "e":2.17}set([1,2,'a'])frozenset([1,2,'a'])func = lambda x,y: x+y

all data types are objects

dir(3.14) dir(float)help(float)help(float.__add__)help(string)

Page 26: (Python) Fundamentals

[email protected], 18.05.10

tuplestuples are not just round bracketstuples are immutable

(1,2,3)('red','green','blue')('red',)(1,) != (1) () # empty tuple

(1,2,3,4)[0] # -> 1(1,2,3,4)[2] # -> 3(1,2,3,4)[1:3] # -> (2,3)

a,b = b,a # swap

for i,c in [(1,'I'), (2,'II), (3,'III')]: print i,c

# vector additiondef add(v1, v2): x,y = v1[0]+v2[0], v1[1]+v2[1] return (x,y)

help(())dir(())

(a,b,c) = (1,2,3)(a,b,c) = 1,2,3 a,b,c = (1,2,3) a,b,c = [1,2,3]

Page 27: (Python) Fundamentals

[email protected], 18.05.10

lists lists are mutable*lists are arrays

dir([]), dir(list)help([])help([].sort)

[] == list()nums = [1,2,3,4]nums[0]nums[:]nums[0:2]nums[::2]nums[1::2]nums[1::2] = 0nums.append(5)nums + [5,6]

range(5)sum(nums)max(nums) [0]*5

*since lists are mutable you cannot use them as a dictionary keys!

nums.reverse() # in placenums2 = reversed(nums) # new list

nums.sort() # in placenums2 = sorted(nums) # new list

Page 28: (Python) Fundamentals

[email protected], 18.05.10

lists examples

l1 = ['a','b','c']l2 = [1,2,3] l3 = zip(l1,l2)zip(*l3)

l = [('a',3), ('b',2), ('c',1)]l.sort(key = lambda x: x[1])l.sort(key = lambda (c,n): n) l.sort(cmp = lambda x,y: x[1]-y[1])l.sort(cmp = lambda (c1,n1),(c2,n2): n1-n2)

colstr = ",".join(colors)

mat = [(1,2,3), (4,5,6)]flip = zip(*mat)flipback = zip(*flip)

colors = ['red','green','blue']colstr = ''for color in colors: colstr = colstr+','+color

Page 29: (Python) Fundamentals

[email protected], 18.05.10

slicing

slice[start:end:stride]

s[0:len(s)]s[:]s[2:7]s[-1]s[:-1]s[-6:]s[:-6]s[::2]s[::-1]

s = "another string"

slicing works the same for lists and tuples (<= sequences)

start inclusive end exclusive stride optional

from numpy import arraymat = array([[1,2,3], [4,5,6]])

mat[1][1]mat[:,:]mat[1:3, 0:2]mat[1:3, ...]

Page 30: (Python) Fundamentals

[email protected], 18.05.10

setsset([3,2,2,3,4])frozenset([3,2,2,3,4])

s = "my little string"set(s)

dir(set())help(set)help(set.add) help(frozenset)

sets are mutablefrozensets are immutable

s = set([2,3,3,34,51,1])max(s)min(s)sum(s)

s1 = set([1,2,3,4])s2 = set([3,4,5])s1.union(s2)s1.difference(s2)s1 - s2s1 or s2s1 and s2

s.remove('t')s.pop()

Page 31: (Python) Fundamentals

[email protected], 18.05.10

dictionariesdirectories are hashesonly immutables are allowed as keys

d = {}d = dict()d = {'pi':3.14, 'e':2.7}d = dict(pi=3.14, e=2.7)d = dict([('pi',3.14),('e',2.7)])

dir({})help({})help(dict)help(dict.values)help(dict.keys)

mat = [[0,1], [1,3], [2,0]]sparse = dict([((i,j),e) for i,r in enumerate(mat) for j,e in enumerate(r) if e])

d.get('two', 2)d.setdefault('one', 1)d.has_key('one')'one' in d

d['pi']d['pi'] = 3.0d['zero'] = 0.0 d[math.pi] = "pi"d[(1,2)] = "one and two"

Page 32: (Python) Fundamentals

[email protected], 18.05.10

data structures - tips

List- many similar items to store e.g, numbers, protein ids, sequences, ...- no need to find a specific item fast- fast access to items at a specific position in the list

Tuple- a few (<10), different items to store e.g. addresses, protein id and its sequence, ...- want to use it as dictionary key

Set- many, unique items to store e.g. unique protein ids- need to know quickly if a specific item is in the set

Dictionary- map from keys to values, is a look-up table e.g. telephone dictionary, amino acid letters to hydrophobicity values- need to get quickly the value for a key

when to use what?

Page 33: (Python) Fundamentals

[email protected], 18.05.10

boolean logicFalse: False, 0, None, [], (,)

True: everything else, e.g.: 1, True, ['blah'], ...A = 1B = 2

A and BA or Bnot A

1 in [1,2,3]"b" in "abc"

all([1,1,1])any([0,1,0])

l1 = [1,2,3]l2 = [4,5]

if not l1: print "list is empty or None"

if l1 and l2: print "both lists are filled"

Page 34: (Python) Fundamentals

[email protected], 18.05.10

comparisons(1, 2, 3) < (1, 2, 4) [1, 2, 3] < [1, 2, 4] 'C' < 'Pascal' < 'Perl' <'Python'(1, 2, 3, 4) < (1, 2, 4) (1, 2) < (1, 2, -1) (1, 2, 3) == (1.0, 2.0, 3.0) (1, 2, ('aa', 'ab')) < (1, 2, ('abc', 'a'), 4)

s1 = "string1"s2 = "string2"s1 = s3

comparison of complex objectschained comparisons

s1 == s2 # same contents1 is s3 # same reference

Java people watch out!

Page 35: (Python) Fundamentals

[email protected], 18.05.10

ifif 1 < x < 10: print "in range"

if 1 < x < 5: print "lower range"elif 5 < x < 10: print "upper range"else: print "out of range"

indentation has meaningthere is no switch() statement

# one-line ifif condition : statement

# conditional expressionfrac = 1/x if x>0 else 0

result = statement if condition else alternative

if condition : do_somethingelif condition2 : do_something_else1 else: do_something_else2

Page 36: (Python) Fundamentals

[email protected], 18.05.10

for

for ch in "mystring": print ch

for variable in sequence : statement1 statement2 …

for i in xrange(10): print i

help(range)help(xrange)

for i in xrange(10,0,-1): print i

for e in ["red","green","blue"]: print e

for line in open("myfile.txt"): print line

Page 37: (Python) Fundamentals

[email protected], 18.05.10

more for

for i,line in enumerate(open("myfile.txt")): print i,line

i = 0for ch in "mystring": print i,ch i = i + 1

for i,ch in enumerate("mystring"): print i,ch

for i in xrange(10): if 2<i<5 : continue print i

for ch in "this is a string": if ch == ' ': break print ch

nums = [1,2,3,4,5]for i in nums: if i==3: del nums[3] print i

Don't modify list while iterating over it!

Page 38: (Python) Fundamentals

[email protected], 18.05.10

while

while condition : statement1 statement2 …

i = 0while i < 10 : print i i += 1

i = 0while 1 : print i i += 1 if i >= 10: break

i = 0while 1 : i += 1 if i < 5: continue print i

Page 39: (Python) Fundamentals

[email protected], 18.05.10

strings

'apostrophes' 'You can "mix" them'

"""Text overmultiple lines"""

'or you \'escape\' them'

'''Or like this,if you like.'''

r"(a-z)+\.doc"

# äöüu"\xe4\xf6\xfc"

if you code in C/C++/Java/... as well, I suggest apostrophes for characters and quotes for strings, e.g: 'c' and "string"

"quotes"

"a tab \t and a newline \n"

strings are immutable

"repeat "*3

"a"+" "+"string"

Page 40: (Python) Fundamentals

[email protected], 18.05.10

string formatting

"height=",12," meters""height="+str(12)+" meters""height=%d meters" % 12 "%s=%.3f meters or %d cm" % ("height", 1.0, 100)

format codes (%d, %s, %f, …) similar to C/C++/Java

print "new line" print "same line",

# template stringsdic = {"prop1":"height", "len":100, "color":"green"} "%(prop1)s = %(len)d cm" % dic "The color is %(color)s" % dic

Page 41: (Python) Fundamentals

[email protected], 18.05.10

string methods

":".join(["red","green","blue"])

dir("")dir(str)help("".count)help(str)

len(s)s.find("string")s.count("t")s.strip()s.replace("my", "your")

s = " my little string "

str(3.14); float("3.14"); int("3")

s[4]s[4:10]

Page 42: (Python) Fundamentals

[email protected], 18.05.10

references

v1 = 10v2 = v1 -> v2 = 10 # content copiedv1 = 50 -> v2 = 10 # as expected

same for sets (and dictionaries) but not for tuples, strings or frozensets (<- immutable)

import copy help(copy.copy)help(copy.deepcopy)

l1 = [10]l2 = l1[:] -> l2 = [10] # content copiedl1[0] = 50 -> l2 = [10] # that's okay now

l1 = [10]l2 = l1 -> l2 = [10] # address copiedl1[0] = 50 -> l2 = [50] # oops

Page 43: (Python) Fundamentals

[email protected], 18.05.10

list comprehension

[x*x for x in xrange(10)] # square

[x for x in xrange(10) if not x%2] # even numbers

[(b,a) for a,b in [(1,2), (3,4)]] # swap

[expression for variable in sequence if condition]

condition is optional

s = "mary has a little lamb"[ord(c) for c in s][i for i,c in enumerate(s) if c==' ']

# what's this doing?[p for p in xrange(100) if not [x for x in xrange(2,p) if not p%x]]

Page 44: (Python) Fundamentals

[email protected], 18.05.10

generators

(x*x for x in xrange(10))

for n in (x*x for x in xrange(10)): print n

sum(x*x for x in xrange(10))

"-".join(c for c in "try this")

def xrange1(n): return (x+1 for x in xrange(n))

(expression for variable in sequence if condition)

def my_xrange(n): i = 0 while i<n : i += 1 yield i

def my_range(n): l = [] i = 0 while i<n : i += 1 l.append(i) return l

Page 45: (Python) Fundamentals

[email protected], 18.05.10

functionsdef add(a, b): return a+b

def list_add(l): return sum(l)

def inc(a, b=1): return a+b

def function(p1,p2,...): """ doc string """ ... return ...

def add(a, b): """adds a and b""" return a+b

help(add)

# duck typingadd(1,2)add("my", "string")add([1, 2], [3, 4])

def list_add(l1, l2): return [a+b for a,b in zip(l1,l2)]

Page 46: (Python) Fundamentals

[email protected], 18.05.10

functions - args

def add(*args): """example: add(1,2,3)""" return sum(args)

def scaled_add(c, *args): """example: scaled_add(2, 1,2,3)""" return c*sum(args)

def function(p1,p2,...,*args,*kwargs): return ...

def super_add(*args, **kwargs): """example: super_add(1,2,3, scale=2)""" scale = kwargs.get('scale', 1) offset = kwargs.get('offset', 0) return offset + scale * sum(args)

variable arguments are lists or dictionaries

def showme(*args): print args

def showmemore(**kwargs): print kwargs

Page 47: (Python) Fundamentals

[email protected], 18.05.10

Exceptions - handle

try: f = open("c:/somefile.txt") x = 1/yexcept ZeroDivisionError: print "cannot divide by zero"except IOError, msg: print "file error: ",msgexcept Exception, msg: print "ouch, surprise: ",msgelse: x = x+1finally: f.close()

try: f = open("c:/somefile.txt")except: print "cannot open file"

Page 48: (Python) Fundamentals

[email protected], 18.05.10

Exceptions - raise

try: # do something and raise an exception raise IOError, "Something went wrong"except IOError, error_text: print error_text

Page 49: (Python) Fundamentals

[email protected], 18.05.10

doctest

def add(a, b): """Adds two numbers or lists >>> add(1,2) 3 >>> add([1,2], [3,4]) [1, 2, 3, 4] """ return a+b

if __name__ == "__main__": import doctest doctest.testmod()

Page 50: (Python) Fundamentals

[email protected], 18.05.10

unittestimport unittest

def add(a,b): return a+bdef mult(a,b): return a*b

class TestCalculator(unittest.TestCase): def test_add(self): self.assertEqual( 4, add(1,3)) self.assertEqual( 0, add(0,0)) self.assertEqual(-3, add(-1,-2)) def test_mult(self): self.assertEqual( 3, mult(1,3)) self.assertEqual( 0, mult(0,3))

if __name__ == "__main__": unittest.main()

Page 51: (Python) Fundamentals

[email protected], 18.05.10

importimport mathmath.sin(3.14)

from math import sinsin(3.14)math.cos(3.14)

from math import * # careful!sin(3.14) cos(3.14)

import math as mm.sin(3.14)m.cos(m.pi)

from math import sin, cossin(3.14)cos(3.14) import math

help(math)dir(math)help(math.sin)

import moduleimport module as mfrom module import f1, f2from module import *

Page 52: (Python) Fundamentals

[email protected], 18.05.10

import example# module calculator.py

def add(a,b): return a+b

if __name__ == "__main__": print add(1,2) # test

# module do_calcs.py

import calculator

def main(): print calculator.add(3,4)

if __name__ == "__main__": main()

Page 53: (Python) Fundamentals

[email protected], 18.05.10

package example

calcpack/__init__.pycalcpack/calculator.pycalcpack/do_calcs.py

# in a different package

from calcpack.calculator import addx = add(1,2)

from calcpack.do_calcs import mainmain()

Page 54: (Python) Fundamentals

[email protected], 18.05.10

template""" This module implements some calculator functions"""

def add(a,b): """Adds two numbers a -- first number b -- second number returns the sum """ return a+b

def main(): """Main method. Adds 1 and 2""" print add(1,2)

if __name__ == "__main__": main()

Page 55: (Python) Fundamentals

[email protected], 18.05.10

regular expressionsimport re

text = "date is 24/07/2008"re.findall(r'(..)/(..)/(....)', text)re.split(r'[\s/]', text)re.match(r'date is (.*)', text).group(1)re.sub(r'(../)(../)', r'\2\1', text)

Perl addicts: only use regex if there is no other way.Tip: string methods and data structures

# compile pattern if used multiple timespattern = compile(r'(..)/(..)/(....)')pattern.findall(text)pattern.split(...)pattern.match(...)pattern.sub(...)

Page 56: (Python) Fundamentals

[email protected], 18.05.10

file reading/writingf = open(fname)for line in f: print linef.close()

def write_matrix(fname, mat): f = open(fname, 'w') f.writelines([' '.join(map(str, row))+'\n' for row in mat]) f.close()

def read_matrix(fname): return [map(float, line.split()) for line in open(fname)]

f = open(fname, 'w')f.write("blah blah")f.close()

open(fname).read()open(fname).readline()open(fname).readlines()

dir(file)help(file)

#skip header and first colf = open(fname)f.next() for line in f: print line[1:]f.close()

Page 57: (Python) Fundamentals

[email protected], 18.05.10

file handling

import osos.listdir('.')os.getcwd()

import globglob.glob("*.py")

import osdir(os)help(os.walk)

import os.pathdir(os.path)

import shutildir(shutil)help(shutil.move)

import os.path as pathpath.split("c:/myfolder/test.dat")path.join("c:/myfolder", "test.dat")

Page 58: (Python) Fundamentals

[email protected], 18.05.10

file processing examplesdef number_of_lines(fname): return len(open(fname).readlines())

def number_of_words(fname): return len(open(fname).read().split())

def enumerate_lines(fname): return [t for t in enumerate(open(fname))]

def shortest_line(fname): return min(enumerate(open(fname)), key=lambda (i,l): len(l))

def wordiest_line(fname): return max(enumerate(open(fname)), key=lambda (i,l): len(l.split()))

Page 59: (Python) Fundamentals

[email protected], 18.05.10

system

import sysdir(sys)sys.versionsys.path

import osdir(os)

help(os.sys)help(os.getcwd)help(os.mkdir)

import subprocess# run and do not waitsubprocess.Popen("mydir/blast -o %s" % fname, shell=True)

import sysif __name__ == "__main__": args = sys.argv print "script name: ", args[1] print "script args: ", args[1:]

import os# run and waitos.system("mydir/blast -o %s" % fname)

Page 60: (Python) Fundamentals

[email protected], 18.05.10

last famous words

1. line length < 802. complexity < 103. no code duplication4. value-adding comments5. use language idioms6. automated tests

Page 61: (Python) Fundamentals

[email protected], 18.05.10

questions

Page 62: (Python) Fundamentals

Advanced Python

Page 63: (Python) Fundamentals

[email protected], 18.05.10

overview Functional programming Object oriented programming BioPython Scientific python

Page 64: (Python) Fundamentals

[email protected], 18.05.10

Functional programming

makes some things easier limited support in Python

Functional Programming is a programming paradigm that emphasizes the application of functions and avoids state and mutable data, in contrast to the imperative programming style, which emphasizes changes in state.

Functions can be treated like any other type of data

def dayFormat(date): return "Day: %sd" % (date.day)

def timeFormat(date): return "%2d:%2d" % (date.hour,date.min)

def datePrinter(dates, format): for date in dates: print format(date)

datePrinter(dates, timeFormat)

def add(a,b): return a+bplus = addplus(1,2)

def inc_factory(n): def inc(a): return n+a return incinc2 = inc_factory(2)inc3 = inc_factory(3) inc3(7)

Page 65: (Python) Fundamentals

[email protected], 18.05.10

FP - lambda functions

#with lambda functionsl.sort(key = lambda (c,n): n)l.sort(cmp = lambda x,y: x[1]-y[1])

Lambda functions are anonymous functions. Typically for very short functions that are used only once.

#without lambda functionsdef key(x): return x[1]l.sort(key = key)

def cmp(x,y): return x[1]-y[1] l.sort(cmp = cmp)

l = [('a',3), ('b',2), ('c',1)]

Page 66: (Python) Fundamentals

[email protected], 18.05.10

functional programming

reduce(lambda a,b: a*b, [1,2,3,4,5])

filter(lambda x: x>3, [1,2,3,4,5])

map(str, [1,2,3,4,5]) l = []for n in [1,2,3,4,5]: l.append(str(n))

l = []for x in [1,2,3,4,5]: if x>3: l.append(n)

map applies a function to the elements of a sequence

filter extracts elements from a sequence depending on a predicate function

reduce iteratively applies a binary function, reducing a sequence to a single element

prod = 1for x in [1,2,3,4,5]: prod = prod * x

Page 67: (Python) Fundamentals

[email protected], 18.05.10

FP - example

Functionalrowsums = [sum(map(float,row.split())) for row in open(fname)]

Imperative

rowsums = []for row in open(fname): elems = row.split() rowsum = 0 for e in elems: rowsum += float(e) rowsums.append(rowsum)

1 2 34 5 6

615

Problemsum over matrix rows stored in a file

File List

rowsums = map(sum,map(lambda row:map(float,row.split()),open(fname)))

Extra Functional ;-)

Numpyrowsums = sum(loadtxt(fname),axis=1)

Page 68: (Python) Fundamentals

[email protected], 18.05.10

FP - more examplesnumbers = [1,2,3,4]numstr = ",".join(map(str,numbers))numbers = map(int,numstr.split(',')) v1 = [1,2,3]v2 = [3,4,5]dotprod = sum(map(lambda (x,y): x*y, zip(v1,v2)))dotprod = sum(x*y for x,y in zip(v1,v2))

Page 69: (Python) Fundamentals

[email protected], 18.05.10

Object Oriented Programming

brings data and functions together helps to manage complex code limited support in Python

Object-oriented programming (OOP) is a programming paradigm that uses "objects" – data structures consisting of data fields and associated methods.

text = "some text"text.upper()len(text) #not a method calltext.__len__()

f = open(filename)if not f.closed : lines = f.readlines()f.close()

Examples

Dot notationobject.attributeobject.method()

try:dir(file)help(file)

Class- attributes+ methods

Car- color- brand+ consumption(speed)

Page 70: (Python) Fundamentals

[email protected], 18.05.10

OO motivationProblem for some genes print out name, length and GC content

Object orientedfor gene in genes: print gene.name(), gene.length(), gene.gc_content()

Non-OO approach

for gene in genes: print gene[0], gene[2]-gene[1], gc_content(gene[3])

genes = [ ['gatA', 2108, 3583, 'agaccta'], ['yfgA', 9373, 9804, 'agaaa'], ... ]

def gc_content(seq): … return gc

Page 71: (Python) Fundamentals

[email protected], 18.05.10

OO definitionsClass: a template that defines attributes and functions of something, e.g. Car - brand - color - calc_fuel_consumption(speed)

Attributes, Fields, Properties: things that describe an object, e.g. Brand, Color

Methods, Operations, Functions: something that an object can do e.g. calc_fuel_consumption(speed)

Instance: an actual, specific object created from the template/class e.g. red BMW M5

Object: some unspecified class instance

Page 72: (Python) Fundamentals

[email protected], 18.05.10

BioSeq classclass BioSeq:

def __init__(self, name, letters): self.name = name self.letters = letters def toFasta(self): return ">"+ self.name+"\n"+self.letters

def __getslice__(self,start,end): return self.letters[start:end]

if __name__ == "__main__": seq = BioSeq("AC1004", "actgcaccca") print seq.name, seq.letters print seq.toFasta() print seq[0:11]

constructorattributes

method

special method

Page 73: (Python) Fundamentals

[email protected], 18.05.10

Inheritance

BioSeq- name- letters+ toFasta()

DNASeq+ revcomp()+ gc_content()+ transcribe()+ translate()

RNASeq+ invrepeats()+ translate()

AASeq+ hydrophobicity()

super-class

sub-classes

Page 74: (Python) Fundamentals

[email protected], 18.05.10

DNASeqclass DNASeq(BioSeq): _alpha = {'a':'t', 't':'a', 'c':'g', 'g':'c'}

def __init__(self, name, letters): BioSeq.__init__(self, name, letters.lower()) if not all(self._alpha.has_key(c) for c in self.letters): raise ValueError("Invalid nucleotide:"+c)

def revcomp(self): return "".join(self._alpha[c] for c in reversed(self.letters)) @classmethod def alphabet(cls): return cls._alpha.keys()if __name__ == "__main__": seq = DNASeq("AC1004", "TTGACA") print seq.revcomp() print DNASeq.alphabet()

Page 75: (Python) Fundamentals

[email protected], 18.05.10

special methodsclass BioSeq: def __init__(self, name, letters): self.name = name self.letters = letters def __getslice__(self,start,end): # seq[2:34] return self.letters[start:end] def __getitem__(self,index): # seq[4] return self.letters[index] def __eq__(self,other): # seq1 == seq2 return self.letters == other.letters def __add__(self,other): # seq1 + seq2 return BioSeq(self.name+"_"+other.name, self.letters+other.letters) def __str__(self): # print seq return self.name+":"+self.letters def __len__(self): # len(seq) return len(self.letters)

Page 76: (Python) Fundamentals

[email protected], 18.05.10

tips

• Functional programming• can help even for small problems• tends to be less efficient than imperative • can be hard to read

• Object oriented programming• brings data and functions together• helps to manage larger problems• code becomes easier to read

Page 77: (Python) Fundamentals

[email protected], 18.05.10

BioPythonhttp://biopython.org

Sequence analysis Parsers for various formats (Genbank, Fasta, SwissProt) BLAST (online, local) Multiple sequence alignment (ClustalW, MUSCLE, EMBOSS) Using online databases (InterPro, Entrez, PubMed, Medline) Structure models (PDB) Machine Learning (Logistic Regression, k-NN, Naïve Bayes, Markov Models) Graphical output (Genome diagrams, dot plots, ...) ...

http://biopython.org/DIST/docs/tutorial/Tutorial.html

Page 78: (Python) Fundamentals

[email protected], 18.05.10

BioPython - examplefrom Bio.Seq import Seq

dnaSeq = Seq("AGTACACTGGT")print dnaSeq # => 'AGTACACTGGT'print dnaSeq[3:7] # => 'ACAC'print dnaSeq.complement() # => 'TCATGTGACCA'print dnaSeq.reverse_complement() # => 'ACCAGTGTACT'

from Bio import SeqIO from Bio.SeqUtils import GC for seq_record in SeqIO.parse("orchid.gbk", "genbank"): print seq_record.id print len(seq_record) print GC(seq_record.seq)

Page 79: (Python) Fundamentals

[email protected], 18.05.10

Scientific Python

scipy

numpymatplotlib

pylab

• NumPy: a library for array and matrix types and basic operations on them.• SciPy: library that uses NumPy to do advanced math.• matplotlib: a library that facilitates plotting.• pylab: a thin wrapper to simplify the API (http://www.scipy.org/PyLab).

Page 80: (Python) Fundamentals

[email protected], 18.05.10

SciPy

• statistics • optimization • numerical integration • linear algebra • Fourier transforms • signal processing • image processing • genetic algorithms • ODE solvers • special functions

http://www.scipy.org/import olsfrom numpy.random import randndata = randn(100,5)y = data[:,0]x = data[:,1:]mymodel = ols.ols(y,x,'y',['x1','x2','x3','x4'])print mymodel.summary() ==================================================================variable coefficient std. Error t-statistic prob.==================================================================const 0.107348 0.107121 1.002113 0.318834x1 -0.037116 0.113819 -0.326100 0.745066x2 0.006657 0.114407 0.058183 0.953725...===================================================================Models stats Residual stats===================================================================R-squared 0.033047 Durbin-Watson stat 2.012949Adjusted R-squared -0.007667 Omnibus stat 5.664393F-statistic 0.811684 Prob(Omnibus stat) 0.058883Prob (F-statistic) 0.520770 JB stat 6.109005...

multi-variate regression model

http://www.scipy.org/Cookbook/OLS

Page 81: (Python) Fundamentals

[email protected], 18.05.10

NumPyhttp://www.scipy.org/

array([1,2,3]) array([1,2,3])

a = array([1,2,3])

M = array([[1, 2], [3, 4]])M.sum()M.sum(axis=1)M[M>2]

mydescriptor = {'names': ('gender','age','weight'), 'formats': ('S1', 'f4', 'f4')} M = array([('M', 64.0, 75.0), ('F', 25.0, 60.0)], dtype=mydescriptor)M['weight']

array([1,2,3])

http://www.scipy.org/Tentative_NumPy_Tutorial

and much, much more ...

http://www.scipy.org/NumPy_for_Matlab_Users?highlight=%28CategorySciPyPackages%29http://www.tramy.us/numpybook.pdf

Page 82: (Python) Fundamentals

[email protected], 18.05.10

speed?

Type of solution Time (sec) Python 1500.0 Python + Psyco 1138.0 Python + NumPy

Expression 29.3

Blitz 9.5 Inline 4.3 Fast Inline 2.3

for i in range(1, nx-1): for j in range(1, ny-1): u[i,j] = ((u[i-1, j] + u[i+1, j])*dy**2 + (u[i, j-1] + u[i, j+1])*dx**2)/(2.0*(dx**2 + dy**2))

http://www.scipy.org/PerformancePython

inner loop to solve a 2D Laplace equation using Gauss-Seidel iteration

Type of solution Time (sec) Python/Fortran 2.9 Pyrex 2.5 Matlab 29.0 Octave 60.0 Pure C++ 2.16

Page 83: (Python) Fundamentals

[email protected], 18.05.10

matplotlibhttp://matplotlib.sourceforge.netfrom pylab import *

t = arange(0.0, 2.0, 0.01)s = sin(2*pi*t)plot(t, s, linewidth=1.0)

xlabel('time (s)')ylabel('voltage (mV)')title('About as simple as it gets, folks')grid(True)show()http://matplotlib.sourceforge.net/users/screenshots.html

Page 84: (Python) Fundamentals

[email protected], 18.05.10

whatever you want ...• Scientific computing: SciPy, NumPy, matplotlib• Bioinformatics: BioPython• Phylogenetic trees: Mavric, Plone, P4, Newick• Microarrays: SciGraph, CompClust• Molecular modeling: MMTK, OpenBabel, CDK, RDKit, cinfony, mmLib• Dynamic systems modeling: PyDSTools• Protein structure visualization: PyMol, UCSF Chimera• Networks/Graphs: NetworkX, igraph• Symbolic math: SymPy, Sage• Wrapper for C/C++ code: SWIG, Pyrex, Cython • R/SPlus interface: RSPython, RPy• Java interface: Jython• Fortran to Python: F2PY• …

Page 85: (Python) Fundamentals

[email protected], 18.05.10

summary IMHO: Python becomes lingua franca in scientific computing has replaced Matlab, R, C, C++ for me many, many libraries (of varying quality) the difficulty is in finding what you need

(and installing it sometime) most libraries are in C and therefore fast interplay of versions can be difficult docstring documentation is often mediocre online documentation varies in quality many, many examples online Enthought Python Distribution (EPD) is excellent

(http://www.enthought.com/products/epd.php)

Page 86: (Python) Fundamentals

[email protected], 18.05.10

questions

Page 87: (Python) Fundamentals

[email protected], 18.05.10

QFAB services

Page 88: (Python) Fundamentals

[email protected], 18.05.10

links• Wikipedia – Python

http://en.wikipedia.org/wiki/Python• Instant Python

http://hetland.org/writing/instant-python.html• How to think like a computer scientist

http://openbookproject.net//thinkCSpy/• Dive into Python

http://www.diveintopython.org/• Python course in bioinformatics

http://www.pasteur.fr/recherche/unites/sis/formation/python/index.html• Beginning Python for bioinformatics

http://www.onlamp.com/pub/a/python/2002/10/17/biopython.html• SciPy Cookbook

http://www.scipy.org/CookbookMatplotlib Cookbookhttp://www.scipy.org/Cookbook/Matplotlib

• Biopython tutorial and cookbookhttp://www.bioinformatics.org/bradstuff/bp/tut/Tutorial.html

• Huge collection of Python tutorialhttp://www.awaretek.com/tutorials.html

• What’s wrong with Perlhttp://www.garshol.priv.no/download/text/perl.html

• 20 Stages of Perl to Python conversionhttp://aspn.activestate.com/ASPN/Mail/Message/python-list/1323993

• Why Pythonhttp://www.linuxjournal.com/article/3882

Page 89: (Python) Fundamentals

[email protected], 18.05.10

booksPython for BioinformaticsSebastian Bassi2009

Bioinformatics Programming using PythonMitchell L. Model2009

Python for BioinformaticsJason Kinser2008

A Primer on Scientific Programming with PythonHans Petter Langtangen2009

Matplotlib for Python DevelopersSandro Tosi2009