Python course in Bioinformaticsxhx/courses/CS174/lectures/python_tutorial.pdf · Outline General...

35
Outline General Introduction Basic Types in Python Programming Exercises Python course in Bioinformatics Xiaohui Xie March 31, 2009 Xiaohui Xie Python course in Bioinformatics

Transcript of Python course in Bioinformaticsxhx/courses/CS174/lectures/python_tutorial.pdf · Outline General...

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Python course in Bioinformatics

    Xiaohui Xie

    March 31, 2009

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    General Introduction

    Basic Types in Python

    Programming

    Exercises

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Why Python?

    I Scripting language, raplid applications

    I Minimalistic syntax

    I Powerful

    I Flexiablel data structure

    I Widely used in Bioinformatics, and many other domains

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Where to get Python and learn more?

    I Main source of information: http://docs.python.org/

    I Tutorial: http://docs.python.org/tutorial/index.html

    I Biopython: http://biopython.org/wiki/Main Page

    Xiaohui Xie Python course in Bioinformatics

    http://docs.python.org/http://docs.python.org/tutorial/index.htmlhttp://biopython.org/wiki/Main_Page

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Invoking Python

    I To start: type python in command lineI It will look like

    Python 2.5.2 (r252:60911, Mar 25 2009, 00:12:33)

    [GCC 4.1.2 (Gentoo 4.1.2 p1.0.2)] on linux2

    Type "help", "copyright", "credits" or "license" for more information.

    >>>

    I You can now type commands in the line denoted by >>>

    I To leave: type end-of-file character ctrl-D on Unix, ctrl-zon Windows

    I This is called interactive mode

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Appetizer Example

    I Task: Print all numbers in a given fileI File: numbers.txt

    2.1

    3.2

    4.3

    I Code: print.py

    # Note: the code lines begin in the first column of the file. In

    # Python code indentation *is* syntactically relevant. Thus, the

    # hash # (which is a comment symbol, everything past a hash is

    # ignored on current line) marks the first column of the code

    data = open("numbers.txt", "r")

    for d in data:

    print d

    data.close()

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Appetizer Example cont’d

    I Task: Print the sum of all the data in the fileI Code: sum.py

    data = open("numbers.txt", "r")

    s = 0

    for d in data:

    s = s + float(d)

    print s

    data.close()

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Interative Mode

    I prompt >>> allows to enter command

    I command is ended by newline

    I variables need not be initialized or declared

    I a colon “:” opens a block

    I ... prompt denotes that block is expected

    I no prompt means python output

    I a block is indented

    I by ending indentation, block is ended

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Differences to Java or C

    I can be used interatively. This makes it much easier to testprograms and to debug

    I no declaration of variables

    I no brackets denote block, just indentation (Emacs supportsthe style)

    I a comment begins with a “#”. Everything after that isignored.

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Numbers

    I Example

    >>> 2+2

    4

    >>> # This is a comment

    ... 2+2

    4

    >>> 2+2 # and a comment on the same line as code

    4

    >>> (50-5*6)/4

    5

    >>> # Integer division returns the floor:

    ... 7/3

    2

    >>> 7/-3

    -3

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Numbers cont’d

    I Example

    >>> width = 20

    >>> height = 5*9

    >>> width * height

    900

    >>> # Variables must be defined (assigned a value) before they can be

    >>> # used, or an error will occur:

    >>> # try to access an undefined variable

    ... n

    Traceback (most recent call last):

    File "", line 1, in

    NameError: name ’n’ is not defined

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Strings

    I Strings can be enclosed in single quotes or double quotesI Example

    >>> ’spam eggs’

    ’spam eggs’

    >>> ’doesn\’t’

    "doesn’t"

    >>> "doesn’t"

    "doesn’t"

    >>> ’"Yes," he said.’

    ’"Yes," he said.’

    >>> "\"Yes,\" he said."

    ’"Yes," he said.’

    >>> ’"Isn\’t," she said.’

    ’"Isn\’t," she said.’

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Strings cont’d

    I Strings can be surrounded in a pair of matching triple-quotes:""" or ’’’. End of lines do not need to be escaped whenusing triple-quotes, but they will be included in the string.

    I Example

    print """

    Usage: thingy [OPTIONS]

    -h Display this usage message

    -H hostname Hostname to connect to

    """

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Strings cont’d

    I Strings can be concatenated (glued together) with the +operator, and repeated with *:

    I Example

    >>> word = ’Help’ + ’A’

    >>> word

    ’HelpA’

    >>> ’’

    ’’

    >>> ’str’ ’ing’ # >> ’str’.strip() + ’ing’ #

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Strings cont’d

    I Strings can be subscripted (indexed); like in C, the firstcharacter of a string has subscript (index) 0.

    I There is no separate character type; a character is simply astring of size one.

    I Substrings can be specified with the slice notation: twoindices separated by a colon.

    I Example

    >>> word = ’Help’ + ’A’

    >>> word[4]

    ’A’

    >>> word[0:2]

    ’He’

    >>> word[:2] # The first two characters

    ’He’

    >>> word[2:] # Everything except the first two characters

    ’lpA’

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Strings cont’d

    I Unlike a C string, Python strings cannot be changed.Assigning to an indexed position in the string results in anerror:

    I Example

    >>> word[0] = ’x’

    Traceback (most recent call last):

    File "", line 1, in ?

    TypeError: object doesn’t support item assignment

    >>> ’x’ + word[1:]

    ’xelpA’

    >>> ’Splat’ + word[4]

    ’SplatA’

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Strings cont’d

    I Example

    >>> from string import *

    >>> dna = ’gcatgacgttattacgactctg’

    >>> len(dna)

    22

    >>> ’n’ in dna

    False

    >>> count(dna,’a’)

    5

    >>> replace(dna, ’a’, ’A’)

    ’gcAtgAcgttAttAcgActctg’

    I Exercise: Calculate GC percent of dna

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Strings cont’d

    I Solution: Calculate GC percent

    >>> gc = (count(dna, ’c’) + count(dna, ’g’)) / float(len(dna)) * 100

    >>> "%.2f" % gc

    ’64.08’

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Strings cont’d

    I Exercise: Calculate the complement of DNA

    A - T

    C - G

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Lists

    I A list of comma-separated values (items) between squarebrackets.

    I List items need not all have the same type (compund datatypes)

    >>> a = [’spam’, ’eggs’, 100, 1234]

    >> a[0]

    ’spam’

    >>> a[3]

    1234

    >>> a[-2]

    100

    >>> a[1:-1]

    [’eggs’, 100]

    >>> a[:2] + [’bacon’, 2*2]

    [’spam’, ’eggs’, ’bacon’, 4]

    >>> 3*a[:3] + [’Boo!’]

    [’spam’, ’eggs’, 100, ’spam’, ’eggs’, 100, ’spam’, ’eggs’, 100, ’Boo!’]

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Lists cont’d

    I Unlike strings, which are immutable, it is possible to changeindividual elements of a list

    I Assignment to slices is also possible, and this can even changethe size of the list or clear it entirely

    I Example

    >>> a

    [’spam’, ’eggs’, 100, 1234]

    >>> a[2] = a[2] + 23

    >>> a[0:2] = [1, 12] # Replace some items:

    >>> a[0:2] = [] # Remove some:

    >>> a

    [123, 1234]

    >>> a[1:1] = [’bletch’, ’xyzzy’] # Insert some:

    >>> a

    [123, ’bletch’, ’xyzzy’, 1234]

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Lists cont’d

    I Functions returning a list

    >>> range(3)

    [0, 1, 2]

    >>> range(10,20,2)

    [10, 12, 14, 16, 18]

    >>> range(5,2,-1)

    [5, 4, 3]

    >>> aas = "ALA TYR TRP SER GLY".split()

    >>> aas

    [’ALA’, ’TYR’, ’TRP’, ’SER’, ’GLY’]

    >>> " ".join(aas)

    ’ALA TYR TRP SER GLY’

    >>> l = list(’atgatgcgcccacgtacga’)

    [’a’, ’t’, ’g’, ’a’, ’t’, ’g’, ’c’, ’g’, ’c’, ’c’, ’c’, ’a’,

    ’c’, ’g’, ’t’, ’a’, ’c’, ’g’, ’a’]

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Dictionaries

    I A dictionary is an unordered set of key: value pairs, with therequirement that the keys are unique

    I A pair of braces creates an empty dictionary: .I Placing a comma-separated list of key:value pairs within the

    braces adds initial key:value pairs to the dictionaryI The main operations on a dictionary are storing a value with

    some key and extracting the value given the keyI Example

    >>> tel = {’jack’: 4098, ’sape’: 4139}

    >>> tel[’guido’] = 4127

    >>> tel

    {’sape’: 4139, ’guido’: 4127, ’jack’: 4098}

    >>> tel[’jack’]

    4098

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Dictionaries cont’d

    I Example

    >>> tel = {’jack’: 4098, ’sape’: 4139, ’guido’ = 4127}

    >>> del tel[’sape’]

    >>> tel[’irv’] = 4127

    >>> tel

    {’guido’: 4127, ’irv’: 4127, ’jack’: 4098}

    >>> tel.keys()

    [’guido’, ’irv’, ’jack’]

    >>> ’guido’ in tel

    True

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Programming

    I Example

    a, b = 3, 4

    if a > b:

    print a + b

    else:

    print a - b

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Programming cont’d

    I Example

    >>> # Fibonacci series:

    ... # the sum of two elements defines the next

    ... a, b = 0, 1

    >>> while b < 10:

    ... print b

    ... a, b = b, a+b

    ...

    1

    1

    2

    3

    5

    8

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Programming features

    I multiple assignment: rhs evaluated before anything on theleft, and (in rhs) from left to right

    I while loop executes as long as condition is True (non-zero,not the empty string, not None)

    I block indentation must be the same for each line of block

    I need empty line in interactive mode to indicate end of block(not required in edited code)

    I use of print

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Printing

    I Example

    >>> i = 256*256

    >>> print ’The value of i is’, i

    The value of i is 65536

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Flow control

    I Example

    x = 35

    if x < 0:

    x = 0

    print ’Negative changed to zero’

    elif x == 0:

    print ’Zero’

    elif x == 1:

    print ’Single’

    else:

    print ’More’

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Iteration

    I Python for iterates over sequence (string, list, generatedsequence)

    I Example

    a = [’cat’, ’window’, ’defenestrate’]

    for x in a:

    print x, len(x)

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Iteration

    I Python for iterates over sequence (string, list, generatedsequence)

    I Example

    a = [’cat’, ’window’, ’defenestrate’]

    for x in a:

    print x, len(x)

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Definiting functions

    I Example

    def fib(n): # write Fibonacci series up to n

    """Print a Fibonacci series up to n."""

    a, b = 0, 1

    while b < n:

    print b,

    a, b = b, a+b

    # Now call the function we just defined:

    fib(2000)

    # will return:

    # 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Reverse Complement of DNA

    I Excercise: Find the reverse complement of a DNA sequenceI Example

    5’ - ACCGGTTAATT 3’ : forward strand

    3’ - TGGCCAATTAA 5’ : reverse strand

    So the reverse complement of ACCGGTTAATT is AATTAACCGGA

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Reverse Complement of DNA

    I Solution: Find the reverse complement of a DNA sequence

    from string import *

    def revcomp(dna):

    """ reverse complement of a DNA sequence """

    comp = dna.translate(maketrans("AGCTagct", "TCGAtcga"))

    lcomp = list(comp)

    lcomp.reverse()

    return join(lcomp, "")

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Translate a DNA sequence

    I Excercise: Translate a DNA sequence to an amino acidsequence

    I Genetic code

    standard = { ’ttt’: ’F’, ’tct’: ’S’, ’tat’: ’Y’, ’tgt’: ’C’,

    ’ttc’: ’F’, ’tcc’: ’S’, ’tac’: ’Y’, ’tgc’: ’C’,

    ’tta’: ’L’, ’tca’: ’S’, ’taa’: ’*’ , ’tca’: ’*’,

    ’ttg’: ’L’, ’tcg’: ’S’, ’tag’: ’*’, ’tcg’: ’W’,

    ’ctt’: ’L’, ’cct’: ’P’, ’cat’: ’H’, ’cgt’: ’R’,

    ’ctc’: ’L’, ’ccc’: ’P’, ’cac’: ’H’, ’cgc’: ’R’,

    ’cta’: ’L’, ’cca’: ’P’, ’caa’: ’Q’, ’cga’: ’R’,

    ’ctg’: ’L’, ’ccg’: ’P’, ’cag’: ’Q’, ’cgg’: ’R’,

    ’att’: ’I’, ’act’: ’T’, ’aat’: ’N’, ’agt’: ’S’,

    ’atc’: ’I’, ’acc’: ’T’, ’aac’: ’N’, ’agc’: ’S’,

    ’ata’: ’I’, ’aca’: ’T’, ’aaa’: ’K’, ’aga’: ’R’,

    ’atg’: ’M’, ’acg’: ’T’, ’aag’: ’K’, ’agg’: ’R’,

    ’gtt’: ’V’, ’gct’: ’A’, ’gat’: ’D’, ’ggt’: ’G’,

    ’gtc’: ’V’, ’gcc’: ’A’, ’gac’: ’D’, ’ggc’: ’G’,

    ’gta’: ’V’, ’gca’: ’A’, ’gaa’: ’E’, ’gga’: ’G’,

    ’gtg’: ’V’, ’gcg’: ’A’, ’gag’: ’E’, ’ggg’: ’G’ }

    Xiaohui Xie Python course in Bioinformatics

    OutlineGeneral IntroductionBasic Types in PythonProgrammingExercises