Introduction to Python for...

100
I P B Katerina Taˇ skova 1 Jean-Fred Fontaine 1,2 1 Faculty of Biology, Johannes Gutenberg-Universit¨ at Mainz, Mainz, Germany 2 Genomics and Computational Biology, Kernel Press, Mainz, Germany https://cbdm.uni-mainz.de/mb17 March 21, 2017

Transcript of Introduction to Python for...

Page 1: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists

Katerina Taskova1 Jean-Fred Fontaine1,2

1Faculty of Biology, Johannes Gutenberg-Universitat Mainz, Mainz, Germany

2Genomics and Computational Biology, Kernel Press, Mainz, Germany

https://cbdm.uni-mainz.de/mb17

March 21, 2017

Page 2: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists –

Table of Contents

IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries

Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 2

Page 3: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Introduction

IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries

Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 3

Page 4: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Introduction

What is Python?

� Python is a general-purpose programming language� created by Guido van Rossum (1991)� high-level (abstraction from the details of the computer)� interpreted (needs an interpreter software)

� Python design philosophy� code readability� syntax brevity

� Python is widely used for Biology� rich built-in features� powerful scientific extensions� plotting capabilities

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 4

Page 5: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Introduction

Structured programming I

� Instructions are executed sequentially, one per line� Conditional statements allow selective execution of code

blocks� Loops allow repeated execution of code blocks� Functions allow on-demand execution of code blocks

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 5

Page 6: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Introduction

Structured programming II1 i n s t r u c t i o n 1 # 1 s t i n s t r u c t i o n ( hashtag # s t a r t s comments )2 # blank l i n e3 repeat 20 t imes # 2nd i n s t r u c t i o n ( loop s t a r t s a block )4 i n s t r u c t i o n a # block d e f i n e d by i n d e n t a t i o n ( spaces or tabs )5 i n s t r u c t i o n b # 2nd i n s t r u c t i o n i n b lock6 # blank l i n e7 i f n>10 # 3 rd i n s t r u c t i o n ( C o n d i t i o n a l statement )8 i n s t r u c t i o n a # 1 s t i n s t r u c t i o n i n b lock9 i n s t r u c t i o n b # 2nd i n s t r u c t i o n i n b lock

10 # blank l i n e11 # blank l i n e12 # backslashs j o i n l i n e s13 i n s t r u c t i o n 3 \ # 3 rd i n s t r u c t i o n , p a r t 114 i n s t r u c t i o n 3 # 3 rd i n s t r u c t i o n , p a r t 215 # blank l i n e16 # Expressions i n ( ) , {} , or [ ] can span m u l t i p l e l i n e s17 i n s t r u c t i o n 4 ( 1 , 2 , 3 # 4 t h i n s t r u c t i o n , p a r t 118 4 , 5 , 6) # 4 t h i n s t r u c t i o n , p a r t 2

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 6

Page 7: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Introduction

Namespace

� Variables are names associated with data� e.g. a=2 assigns value 2 to variable a

� Functions are names associated to specific code blocks� built-in functions are available (see list on slide 100)� e.g. print(a) will display ’2’ on the screen

� The user namespace is the set of names available to theuser

� users can define new names of variables and functions in theirnamespace

� imported modules can add names of variables and functionsin the user namespace

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 7

Page 8: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Introduction

Object-oriented programming� Data is organized in classes and objects

� a class is a template defining what objects can store and do� an object is an instance of a class� objects have attributes to store data and methods to do

actions� object namespaces are different from user namespace

� Example class ”Human” is defined as:� has a name (an attribute ”name”)� has an age (an attribute ”age”)� can introduce itself (a method ”who”)� example with 1 existing Human object P1:1 P1 . name = ” Mary ” # assigns value t o a t t r i b u t e name2 P1 . age = 26 # assigns value t o a t t r i b u t e age3 P1 . who ( ) # d i s p l a y s ”My name i s Mary I am 2 6 ! ”4 who ( ) # e r r o r ! not i n the user namespace

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 8

Page 9: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Introduction

Modules� Modules can add functionalities to Python

� e.g. classes and functions� Example of available modules:

� NumPy for scientific computing� Matplotlib for plotting� BioPython for Biology

� Modules have to be imported into the code1 # i m p o r t datet ime module i n i t s own namespace2 i m p o r t datet ime3 datet ime . date . today ( ) # 2017−03−164 today ( ) # e r r o r !56 # i m p o r t f u n c t i o n s log2 and log10 from module math7 # i n c u r r e n t namespace8 from math i m p o r t log2 , log109 log10 ( 1 ) # equal 0

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 9

Page 10: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Running code

IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries

Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 10

Page 11: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Running code

Running code I

� From a terminal by using the interactive Python shell1 $ python3 # opens Python s h e l l2 a=2 # assigns 2 t o a3 b=3 # assigns 3 t o b4 e x i t ( ) # c loses Python s h e l l

� From a terminal by running a script file� e.g. let say myscript.py is a script file (simple text file)� and it contains: print(”hello world!”)

1 $ python3 m y s c r i p t . py # runs python3 and the s c r i p t2 h e l l o wor ld ! # r e s u l t o f the s c r i p t on the t e r m i n a l

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 11

Page 12: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Running code

Running code II

� From Jupyter Notebook� web-based graphical interface� manage cells of code or text� see execution results on the same notebook� save/open notebooks

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 12

Page 13: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Running code

Documentation and messages I

Documentation and help:� https://docs.python.org/3� use the built-in help() function

� e.g. help(print) to display help for function print()� see help menu or Google it

Examples of error messages1 # F o r g e t t i n g quotes2 p r i n t ( H e l l o wor ld )3 # F i l e ”<s t d i n >” , l i n e 24 # p r i n t ( H e l l o wor ld )5 # ˆ6 # SyntaxError : i n v a l i d syntax

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 13

Page 14: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Running code

Documentation and messages II

1 # S p e l l i n g mistakes2 p r i n ( ” H e l l o wor ld ” )3 # Traceback ( most r e c e n t c a l l l a s t ) :4 # F i l e ”<s t d i n >” , l i n e 2 , i n <module>5 # NameError : name ’ p r i n ’ i s not d e f i n e d

1 # Wrong l i n e break w i t h i n a s t r i n g2 p r i n t ( ” H e l l o3 World ” )4 # F i l e ”<s t d i n >” , l i n e 25 # p r i n t ( ” H e l l o6 # ˆ7 # SyntaxError : EOL w h i l e scanning s t r i n g l i t e r a l

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 14

Page 15: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Literals and variables

IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries

Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 15

Page 16: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Literals and variables

Numeric and strings literals I

1 # Numeric l i t e r a l s2 123 −1234 1.6E3 # means 160056 # S t r i n g s l i t e r a l s7 ’A s t r i n g ’ # A s t r i n g8 ’A ” s t r i n g ” ’ # A ” s t r i n g ”9 ”A ’ s t r i n g ’ ” # A ’ s t r i n g ’

10 ’ ’ ’ Three s i n g l e quotes ’ ’ ’ # Three s i n g l e quotes11 ” ” ” Three double quotes ” ” ” # Three double quotes12 ’A \ ’ s t r i n g \ ’ ’ # A ’ s t r i n g ’ ( backslash escape sequence )13 r ’A \ ’ s t r i n g \ ’ ’ # A \ ’ s t r i n g \ ’ ( raw s t r i n g )

Python stores literals in objects of corresponding classes (class intfor integers, float for floatting point, and str for strings)

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 16

Page 17: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Literals and variables

Numeric and strings literals IIPrinting numeric and strings literals

1 p r i n t ( 1 2 ) # 122 p r i n t (1+2) # 334 p r i n t ( ’ H e l l o World ’ ) # H e l l o World56 p r i n t ( ’ H e l l o World ’ , 1+2) # H e l l o World 37 p r i n t ( ’ H e l l o World ’ , 1+2 , sep= ’− ’ ) # H e l l o World−38 p r i n t ( ’ H e l l o World ’ , 1+2 , sep= ’ \ t ’ ) # H e l l o World 39 # (\ t : tab , \n : newl ine )

1011 p r i n t ( ’AB ’ , end= ’ ’ ) # AB ( avoid newl ine a t the end )12 p r i n t ( ’CD ’ ) # ABCD1314 p r i n t ( ’Max i s ’ , 12 , ’ and Min i s ’ , 3) # Max i s 12 and Min i s 3

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 17

Page 18: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Literals and variables

Variables IVariables are names used to access objects

� first letter is a character (not a digit)� no space characters allowed� case-sensitive (variable name var is not Var)� prefer alphanumeric characters (e.g. abc123)

� avoid accents, non-alphanumeric, non English� underscores may be used (e.g. abc 123)

The following keywords can not be used as variable names� and, assert, break, class, continue� def, del, elif, else, except, exec, finally, for, from� global, if, import, in, is, lambda, not, or, pass� print, raise, return, try, while, yield

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 18

Page 19: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Literals and variables

Variables II

1 # Numeric types2 a=2 # a i s assigned an i n t o b j e c t o f value 23 p r i n t ( a ) # p r i n t s the o b j e c t assigned t o a ( 2 )4 b=a # b i s assigned the same o b j e c t as a ( 2 )5 p r i n t ( b ) # 26 a=5 # a i s assigned a new o b j e c t o f value 57 p r i n t ( a ) # 58 p r i n t ( b ) # 2 ( b i s s t i l l assigned t o o b j e c t o f value 2)9

10 # S t r i n g s11 c1= ’ a ’12 p r i n t ( c1 ) # ’ a ’13 myName125 = ’ abc ’

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 19

Page 20: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Numeric types

IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries

Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 20

Page 21: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Numeric types

Numeric types I1 type ( 7 ) # <c l a s s ’ i n t ’> ( i n t e g e r number )2 type ( 8 . 2 5 ) # <c l a s s ’ f l o a t ’> ( f l o a t i n g p o i n t )3 type ( 4 . 5 2 e−3) # <c l a s s ’ f l o a t ’> ( f l o a t i n g p o i n t )45 # Operators ( s p e c i a l b u i l t−i n f u n c t i o n s )6 1 + 3 # 4 ( a d d i t i o n )7 4 − 1 # 3 ( s u b s t r a c t i o n )8 3 ∗ 2 # 6 ( m u l t i p l i c a t i o n )9 9 / 2 # 4.5 ( d i v i s i o n )

10 9 / / 2 # 4 ( i n t e g e r d i v i s i o n )11 9 % 2 # 1 ( i n t e g e r d i v i s i o n remainder )12 2∗∗3 # 8 ( exponent )1314 # Lowest t o h i g h e s t o p e r a t o r s precedence ( equal i f on same l i n e )15 +,− # A d d i t i o n , S u b t r a c t i o n16 ∗ , / , / / , % # M u l t i p l i c a t i o n , D i v i s i o n s , Remainder17 +x , −x # P o s i t i v e , Negative18 ∗∗ # E x p o n e n t i a t i o n

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 21

Page 22: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Numeric types

Numeric types II1 # B u i l t−i n f u n c t i o n s2 abs (−2.58) # 2.58 ( a b s o l u t e value o f x )3 round ( 2 . 5 ) # 2 ( round t o c l o s e s t i n t e g e r )45 # With v a r i a b l e s6 a = 1 # 17 b = 1 + 1 # 28 c = a + b # 39 d = a+c∗b # 7 ( precedence o f ∗ over +)

10 d = ( a+c ) ∗b # 8 ( use parentheses t o break precedence )1112 # Short n o t a t i o n s ( v a l i d f o r + , −, ∗ , / , . . . )13 a += 1 # a = a + 114 a ∗= 5 # a = a ∗ 51516 # S p e c i a l f l o a t values17 f l o a t ( ’NaN ’ ) # nan ( Not a Number )18 f l o a t ( ’ I n f ’ ) # i n f : I n f i n i t e p o s i t i v e ; − i n f : I n f i n i t e n e g a t i v e

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 22

Page 23: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Strings

IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries

Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 23

Page 24: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Strings

Sequence types

Text sequence type:� Strings: immutable sequences of characters

Basic sequence types:� Lists: mutable sequences� Tuples: immutable sequences� Ranges: immutable sequence of numbers

Sequence operations:� All sequence types support common sequence operations

(slide 98)� Mutable sequence types support specific operations (slide 99)

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 24

Page 25: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Strings

Strings I1 # Quotes2 ’A s t r i n g ’ # A s t r i n g3 ’A ” s t r i n g ” ’ # A ” s t r i n g ”4 ”A ’ s t r i n g ’ ” # A ’ s t r i n g ’5 ’ ’ ’ Three s i n g l e quotes ’ ’ ’ # Three s i n g l e quotes6 ” ” ” Three double quotes ” ” ” # Three double quotes78 # Escape sequences ( see annexes )9 ”A s i n g l e quote ’ ” # A s i n g l e quote ’

10 ’A s i n g l e quote \ ’ ’ # A s i n g l e quote ’11 ”A t a b u l a t i o n \ t ”12 ”A newl ine \n ”

� See other escape sequences in slide 97� Triple quoted strings may span multiple lines - all associated

whitespace will be included in the string literal

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 25

Page 26: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Strings

Strings II

1 # Operators2 ’ p ipe ’ + ’ t t e ’ # = ’ p i p e t t e ’ ( c o n c a t e n a t i o n )3 ’A ’ ∗7 # = ’AAAAAAA ’ ( r e p l i c a t i o n )4 ’A ’ ∗3 + ’C ’ ∗2 # = ’AAACC ’5 ’A ’ + s t r ( 2 . 0 ) # = ’A2 . 0 ’ ( c o n v e r t number then concatenate )67 # B u i l t−i n f u n c t i o n s8 l e n ( ’A s t r i n g o f c h a r a c t e r s ’ ) # 22 ( l e n g t h i n c h a r a c t e r s )9 type ( ’ a ’ ) # <c l a s s ’ s t r ’> ( s t r i n g )

1011 # S l i c e s [ s t a r t : end : step ] (0 i s index o f f i r s t c h a r a c t e r )12 ”ABCDEFG” [ 2 : 5 ] # ’CDE ’ ( F a t index 5 excluded )13 ”ABCDEFG” [ : 5 ] # ’ABCDE ’ ( from b e g i n i n g )14 ”ABCDEFG” [ 5 : ] # ’FG ’ ( t o the end )15 ”ABCDEFG” [−2 : ] # ’FG ’ (−2 from the end : t o the end )16 ”ABCDEFG” [ 0 : 5 : 2 ] # ’ACE ’ ( every second l e t t e r w i t h step =2)

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 26

Page 27: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Strings

Strings methods IStrings are immutable: new objects are created for changes

1 seq = ”ACGtCCAgTnAGaaGT”23 # Case4 seq . c a p i t a l i z e ( ) # ’ Acgtccagtnagaagt ’5 seq . c a s e f o l d ( ) # ’ acgtccagtnagaagt ’ ( e s z e t t => ” ss ” )6 seq . lower ( ) # ’ acgtccagtnagaagt ’ ( e s z e t t => e s z e t t )7 seq . swapcase ( ) # ’ acgTccaGtNagAAgt ’8 seq . upper ( ) # ’ACGTCCAGTNAGAAGT ’9

10 # Search and r e p l a c e11 seq . count ( ’ a ’ ) # 2 ( case s e n s i t i v e )12 seq . count ( ’G ’ , 0 , 4) # 1 ( s l i c e s t a r t and end indexes )13 seq . endswith ( ’GT ’ ) # True14 seq . endswith ( ’G ’ , 0 , 4) # False ( s l i c e s t a r t and end indexes )15 seq . f i n d ( ’ GtC ’ ) # 2 (1 s t h i t index , −1 o t h e r w i s e )16 seq . r e p l a c e ( ” aa ” , ” t t ” ) # ’ ACGtCCAgTnAGttGT ’ ( case s e n s i t i v e )17 seq . r e p l a c e ( ”A” , ” x ” , 2) # ’xCGtCCxgTnAGaaGT ’ (2 f i r s t h i t s o n l y )

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 27

Page 28: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Strings

Strings methods II

1 seq = ”ACGtCCAgTnAGaaGT”23 # I s f u n c t i o n s4 seq . isalnum ( ) # True ( Are a l l c h a r a c t e r s alphanumeric ?)5 seq . i s a l p h a ( ) # True ( Are a l l c h a r a c t e r s a l p h a b e t i c ?)6 seq . i s l o w e r ( ) # False ( Are a l l c h a r a c t e r s lowercase ?)7 seq . i s n u m e r i c ( ) # False ( Are a l l numeric c h a r a c t e r s ?)8 seq . isspace ( ) # False ( Are a l l whitespace c h a r a c t e r s ?)9 seq . i s u p p e r ( ) # False ( Are a l l c h a r a c t e r s uppercase ?)

1011 # J o i n and s p l i t12 ”−” . j o i n ( [ ”A” , ”B” ] ) # ’A−B ’13 ”−” . j o i n ( seq ) # ’A−C−G−t−C−C−A−g−T−n−A−G−a−a−G−T ’14 seq . p a r t i t i o n ( ” aa ” ) # ( ’ ACGtCCAgTnAG ’ , ’ aa ’ , ’GT ’ ) : a t u p l e15 seq . s p l i t ( ” aa ” ) # [ ’ ACGtCCAgTnAG ’ , ’GT ’ ] : a l i s t16 ’ 1\n2 ’ . s p l i t l i n e s ( ) # [ ’ 1 ’ , ’ 2 ’ ] ( s p l i t a t l i n e boundaries \ r , \n )

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 28

Page 29: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Strings

Strings methods III1 seq = ”ACGtCCAgTnAGaaGT”23 # D e l e t i n g4 seq . l s t r i p ( ) # remove l e a d i n g whitespace c h a r a c t e r s5 seq . r s t r i p ( ) # remove t r a i l i n g whitespace c h a r a c t e r s6 seq . s t r i p ( ) # remove whitespace c h a r a c t e r s from both ends78 seq . l s t r i p ( ”AC” ) # ’GtCCAgTnAGaaGT ’ ( remove C ’ s or A ’ s )9 seq . l s t r i p ( ”CA” ) # ’GtCCAgTnAGaaGT ’ ( remove C ’ s or A ’ s )

10 seq . l s t r i p ( ”C” ) # ’ACGtCCAgTnAGaaGT ’ ( no impact )11 # same f o r r s t r i p but from the r i g h t and s t r i p from both ends1213 # Simple p a r s i n g o f t e x t l i n e s from CSV f i l e s14 l i n e . s t r i p ( ) . s p l i t ( ’ , ’ ) # remove newl ine and s p l i t CSV (\ t i f TSV)1516 # t r a n s l a t e ( case s e n s i t i v e )17 t a b l e = seq . maketrans ( ’ atcg ’ , ’ tagc ’ ) # map c h a r a c t e r s by index18 seq . lower ( ) . t r a n s l a t e ( t a b l e ) # ’ t g c a g g t c a n t c t t c a ’

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 29

Page 30: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – – Exercise–

IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries

Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 30

Page 31: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – – Exercise–

Exercise

Create the following directory structure� Dokumente

� python� notebooks� data

Jupyter Notebook� File: Literals.ipynb� URL: https://cbdm.uni-mainz.de/mb17� Download the file into the notebooks folder

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 31

Page 32: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Lists, tuples and ranges

IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries

Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 32

Page 33: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Lists, tuples and ranges

Sequence types

Text sequence type:� Strings: immutable sequences of characters

Basic sequence types:� Lists: mutable sequences� Tuples: immutable sequences� Ranges: immutable sequence of numbers

Sequence operations:� All sequence types support common sequence operations

(slide 98)� Mutable sequence types support specific operations (slide 99)

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 33

Page 34: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Lists, tuples and ranges

Lists I

A List is an ordered collection of objects1 L i s t 1 = [ ] # an empty l i s t23 L i s t 1 = [ ’ b ’ , ’ a ’ , 1 , ’ c a t ’ , ’K ’ , ’ dog ’ , ’ F ’ ]4 L i s t 1 [ 0 ] # ’ b ’ ( access i tem o f index 0)5 L i s t 1 [ 1 ] # ’ a ’ ( access i tem o f index 1)6 L i s t 1 [−1] # ’ F ’ ( access the l a s t i tem )7 L i s t 1 [−2] # ’ dog ’ ( access the second l a s t i tem )89 # S l i c e s [ s t a r t : end : step ]

10 L i s t 1 [ 2 : 5 ] # [ 1 , ’ c a t ’ , ’K ’ ] ( index 5 excluded )11 L i s t 1 [ : 5 ] # [ ’ b ’ , ’ a ’ , 1 , ’ c a t ’ , ’K ’ ]12 L i s t 1 [ 5 : ] # [ ’ dog ’ , ’ F ’ ]13 L i s t 1 [−2 : ] # [ ’ dog ’ , ’ F ’ ]14 L i s t 1 [ 0 : 5 : 2 ] # [ ’ b ’ , 1 , ’K ’ ]

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 34

Page 35: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Lists, tuples and ranges

Lists II1 # B u i l t−i n f u n c t i o n s2 L i s t 2 = [ 1 , 2 , 3 , 4 , 5 ]3 l e n ( L i s t 1 ) # 5 ( l e n g t h = 7 i tems )4 max( L i s t 2 ) # 55 min ( L i s t 2 ) # 16 sum( L i s t 2 ) # 1578 # L i s t methods9 L i s t 2 = [ ] # empty l i s t

10 L i s t 2 . append ( 1 ) # [ 1 ]11 L i s t 2 . append ( ’A ’ ) # [ 1 , ’A ’ ]12 L i s t 2 . extend ( [ ’B ’ , 2 ] ) # [ 1 , ’A ’ , ’B ’ , 2 ]13 L i s t 2 . pop ( 2 ) # [ 1 , ’A ’ , 2 ]14 L i s t 2 . i n s e r t ( 3 , ’A ’ ) # [ 1 , ’A ’ , 2 , ’A ’ ] ( i n s e r t ’A ’ a t index 3)15 L i s t 2 . index ( ’A ’ ) # 1 ( index o f the 1 s t ’A ’ )16 L i s t 2 . count ( ’A ’ ) # 2 ( number o f ’A ’ )17 L i s t 2 . reverse ( ) # [ ’ A ’ , 2 , ’A ’ , 1 ]

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 35

Page 36: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Lists, tuples and ranges

Lists III

1 # s o r t i n g2 L i s t 3 = [ 5 , 3 , 4 , 1 , 2 ]3 s o r t e d ( L i s t 3 ) # [ 1 , 2 , 3 , 4 , 5 ] ( b u i l d a new s o r t e d l i s t )4 L i s t 3 # [ 5 , 3 , 4 , 1 , 2 ] ( L i s t 3 not changed )5 L i s t 3 . s o r t ( ) # m o d i f i e s the l i s t in−place6 L i s t 3 # [ 1 , 2 , 3 , 4 , 5 ] ( . s o r t ( ) d i d modify L i s t 3 ! )78 # nested l i s t / 2D l i s t s / t a b l e s9 myList = [ [ ’ b ’ , ’ a ’ ] ,

10 [ 1 , ’ c a t ’ ] ] # a l i s t o f 2 l i s t s11 myList [ 0 ] # r e t u r n s the f i r s t l i s t [ ’ b ’ , ’ a ’ ]12 myList [ 0 ] [ 0 ] # ’ b ’ (1 s t i tem o f the 1 s t l i s t )13 myList [ 0 ] [ 1 ] # ’ a ’ (2 nd i tem o f the 1 s t l i s t )14 myList [ 1 ] # r e t u r n s the 2nd l i s t [ 1 , ’ c a t ’ ]15 myList [ 1 ] [ 0 ] = 1 0 # [ [ ’ b ’ , ’ a ’ ] , [ 1 0 , ’ c a t ’ ] ]

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 36

Page 37: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Lists, tuples and ranges

Lists IV

1 myList = [ [ ’ b ’ , ’ a ’ ] ,2 [ 1 , ’ c a t ’ ] ]34 f o r s u b l i s t i n myList : # loop over s u b l i s t s5 f o r value i n s u b l i s t : # loop over values6 p r i n t ( value ) # p r i n t 1 value per l i n e7 # b8 # a9 # 10

10 # c a t1112 f o r s u b l i s t i n myList : # loop over s u b l i s t s13 n e w s u b l i s t = map( s t r , s u b l i s t ) # c o n v e r t each i tem t o s t r i n g14 p r i n t ( ’ \ t ’ . j o i n ( n e w s u b l i s t ) ) # p r i n t as TSV t a b l e15 # b a16 # 10 c a t

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 37

Page 38: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Lists, tuples and ranges

Tuples and rangesA Tuple is an ordered collection of objects

1 Tuple1 = ( ) # empty t u p l e2 Tuple1 = ( ’ b ’ , ’ a ’ , 1 , ’ c a t ’ , ’K ’ , ’ dog ’ , ’ F ’ ) # d e f i n e d t u p l e34 Tuple1 [ 0 ] # ’ b ’5 Tuple1 [ 1 : 3 ] # ( ’ a ’ , 1) ( index 3 excluded )

Ranges1 # Range ( s t a r t , stop [ , step ] )2 range ( 1 0 ) # range ( 0 , 10) => no n i c e p r i n t method3 l i s t ( range ( 1 0 ) ) # [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ]4 l i s t ( range ( 0 , 30 , 5) ) # [ 0 , 5 , 10 , 15 , 20 , 25]5 l i s t ( range ( 0 , −5, −1) ) # [ 0 , −1, −2, −3, −4]6 l i s t ( range ( 0 ) ) # [ ]7 l i s t ( range ( 1 , 0) ) # [ ]

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 38

Page 39: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Sets and dictionaries

IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries

Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 39

Page 40: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Sets and dictionaries

Sets IA Set is a mutable unordered collection of objects

1 S0 = s e t ( ) # an empty s e t2 S0 = { ’ a ’ , 1} # a new s e t o f 2 i tems3 S1 = { ’ a ’ , 1 , ’ b ’ , ’R ’ } # a new s e t o f 4 i tems4 S2 = { ’ a ’ , 1 , ’ b ’ , ’S ’ } # a new s e t o f 4 i tems5 l e n ( S0 ) # 267 # Operators8 ’R ’ i n S1 # True9 ’R ’ not i n S2 # True

10 S1 − S2 # i n S1 but not i n S2 => { ’R ’}11 S1 | S2 # i n S1 or i n S2 => {1 , ’ a ’ , ’S ’ , ’R ’ , ’ b ’}12 S1 & S2 # i n S1 and i n S2 => {1 , ’ b ’ , ’ a ’}13 S1 ˆ S2 # i n S1 or i n S2 but not i n both => { ’R ’ , ’S ’}14 S0 <= S1 # S0 i s subset o f S2 => True15 S1 >= S2 # S1 i s superset o f S2 => False16 S1 >= S0 # True17 S0 . i s d i s j o i n t ( S1 ) # False

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 40

Page 41: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Sets and dictionaries

Sets II

1 # Methods2 S0 . copy ( ) # r e t u r n a new s e t w i t h a shal low copy o f S03 S0 . add ( i tem ) # add element i tem t o the s e t4 S0 . remove ( i tem ) # remove element i tem from the s e t5 S0 . d i s c a r d ( i tem ) # remove element i tem from the s e t i f present6 S0 . pop ( ) # remove and r e t u r n an a r b i t r a r y element7 S0 . c l e a r ( ) # remove a l l elements from the s e t

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 41

Page 42: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Sets and dictionaries

Dictionaries I

A Dictionary is a mutable indexed collection of objects (indexedby unique keys)

1 d = {} # empty d i c t i o n a r y2 d = { ’A ’ : ”ALA” , ’C ’ : ”CYS” } # d i c t i o n a r y w i t h 2 i tems3 d [ ’A ’ ] # ’ALA ’4 d [ ’C ’ ] # ’CYS ’5 d [ ’H ’ ] = ” HIS ” # add new item6 d # { ’H ’ : ’ HIS ’ , ’C ’ : ’CYS ’ , ’A ’ : ’ALA ’}7 d e l d [ ’A ’ ] # { ’C ’ : ’CYS ’ , ’H ’ : ’ HIS ’}89 ’C ’ i n d # True ( key ’C ’ i s i n d )

10 ’A ’ not i n d # True ( key ’A ’ i s not i n d anymore )

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 42

Page 43: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Sets and dictionaries

Dictionaries II

d[key] get value by keyd[key] = val set value by keydel d[key] delete item by keyd.clear() delete all itemslen(d) number of itemsd.copy() make a shallow copyd.keys() return a view of all keysd.values() return a view of all valuesd.items() return a view of all items (key,value)d.update(d2) add all items from dictionary d2d.get(key [, val]) get value by key if exists, otherwise vald.setdefaults(key [, val]) like d.get(k,val), also set d[k]=val if k not in dpop(key[, default]) remove key and return its value, return default otherwise.d.popitem() remove a random item and returns it as tuple

Table: Functions for dictionaries

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 43

Page 44: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Convert and copy

IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries

Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 44

Page 45: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Convert and copy

Converting types I

Many Python functions are sensitive to the type of data. Forexample, you cannot concatenate a string with an integer:

1 s i g n = ’ You are ’ + 21 + ’−years−o l d ’ # e r r o r ! !2 s i g n = ’ You are ’ + s t r ( 2 1 ) + ’−years−o l d ’ # OK3 s i g n # ’ You are 21−years−o l d ’45 # c o n v e r t t o i n t ( from s t r or f l o a t )6 i n t ( ’ 2014 ’ ) # from a s t r i n g7 i n t (3.141592) # from a f l o a t89 # c o n v e r t t o f l o a t ( from s t r or i n t )

10 f l o a t ( ’ 1.99 ’ ) # from a s t r i n g11 f l o a t ( 5 ) # from an i n t e g e r

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 45

Page 46: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Convert and copy

Converting types II

1 # c o n v e r t t o s t r ( from i n t , f l o a t , l i s t , t u p l e , d i c t and s e t )2 s t r (3.141592) # ’3.141592 ’3 s t r ( [ 1 , 2 , 3 , 4 ] ) # ’ [ 1 , 2 , 3 , 4 ] ’45 # c o n v e r t a sequence type t o another6 # ( s t r , l i s t , t u p l e , and s e t f u n c t i o n s )7 new set = s e t ( o l d l i s t ) # l i s t t o s e t8 new tuple = t u p l e ( o l d l i s t ) # l i s t t o t u p l e9 new set = s e t ( ” H e l l o ” ) # s t r i n g t o s e t { ’H ’ , ’ o ’ , ’ e ’ , ’ l ’}

10 n e w l i s t = l i s t ( ” H e l l o ” ) # s t r i n g t o l i s t [ ’ H ’ , ’ e ’ , ’ l ’ , ’ l ’ , ’ o ’ ]

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 46

Page 47: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Convert and copy

Copy I

� Assignments (=) do not copy objects, they create bindingsbetween a target and an object.

1 # Numeric types ( immutable )2 a = 1 # a binds the o b j e c t 13 b = a # b binds the o b j e c t 14 b = b + 1 # b binds a new o b j e c t created by the sum5 a # 16 b # 278 # S t r i n g s ( immutable )9 a = ” H e l l o ” # a binds the o b j e c t ” H e l l o ”

10 b = a # b binds the o b j e c t ” H e l l o ”11 a = a . r e p l a c e ( ’ o ’ , ’ o World ! ’ ) # a binds a new o b j e c t12 a # ’ H e l l o World ! ’13 b # ’ H e l l o ’

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 47

Page 48: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Convert and copy

Copy II� For collections that are mutable or contain mutable items, a

shallow copy is sometimes needed so one can change onecopy without changing the other.

1 # D i c t i o n a r y ( mutable )2 d1 = { ’A ’ : ”ALA” , ’C ’ : ”CYS” } # d1 binds the o b j e c t3 d2 = d1 # d2 binds the o b j e c t4 d2 [ ’H ’ ] = ” HIS ” # add i tem t o the o b j e c t5 d1 # { ’A ’ : ’ALA ’ , ’H ’ : ’ HIS ’ , ’C ’ : ’CYS ’}6 d2 # { ’A ’ : ’ALA ’ , ’H ’ : ’ HIS ’ , ’C ’ : ’CYS ’}78 d2 = d1 . copy ( ) # d2 binds a shal low copy o f the o b j e c t9 d2 [ ’P ’ ] = ”PRO” # add i tem t o the copied o b j e c t

10 d1 # { ’A ’ : ’ALA ’ , ’H ’ : ’ HIS ’ , ’C ’ : ’CYS ’}11 d2 # { ’A ’ : ’ALA ’ , ’H ’ : ’ HIS ’ , ’P ’ : ’PRO ’ , ’C ’ : ’CYS ’}

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 48

Page 49: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Convert and copy

Copy III

1 # L i s t ( mutable )2 l 1 = [ ’A ’ , ’H ’ , ’C ’ ]3 l 2 = l 14 l 2 . append ( ’P ’ )5 l 1 # [ ’ A ’ , ’H ’ , ’C ’ , ’P ’ ]6 l 2 # [ ’ A ’ , ’H ’ , ’C ’ , ’P ’ ]78 l 2 = l 1 [ : ] # shal low copy by a s s i g n i n g a s l i c e o f the a l l l i s t9 l 2 . append ( ’V ’ )

10 l 1 # [ ’ A ’ , ’H ’ , ’C ’ , ’P ’ ]11 l 2 # [ ’ A ’ , ’H ’ , ’C ’ , ’P ’ , ’V ’ ]

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 49

Page 50: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Convert and copy

Copy IV

� Convert types to get copies1 n e w l i s t = l i s t ( o l d l i s t ) # shal low copy2 n e w d i c t = d i c t ( o l d d i c t ) # shal low copy3 new set = s e t ( o l d l i s t ) # copy l i s t as a s e t4 new tuple = t u p l e ( o l d l i s t ) # copy l i s t a t u p l e

� The copy module1 i m p o r t copy2 x . copy ( ) # shal low copy o f x3 x . deepcopy ( ) # deep copy o f x , i n c l u d i n g embedded o b j e c t s

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 50

Page 51: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Loops

IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries

Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 51

Page 52: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Loops

For loop I

1 # For i tems i n a l i s t2 f o r person i n [ ’ I s a b e l ’ , ’ Kate ’ , ’ Michael ’ ] :3 p r i n t ( ” Hi ” , person )4 # Hi I s a b e l5 # Hi Kate6 # Hi Michael78 # For i tems i n a d i c t i o n a r y9 seq = ’ ’ # an empty s t r i n g

10 d = { ’A ’ : ”ALA” , ’C ’ : ”CYS” } # a d i c t i o n a r y w i t h 2 keys11 f o r k i n d . keys ( ) : # loop over the keys12 seq += d [ k ] # append value t o seq13 p r i n t ( seq ) # ’CYSALA ’

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 52

Page 53: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Loops

For loop II1 # For i tems i n a s t r i n g2 f o r c i n ’ abc ’ :3 p r i n t ( c )4 # a5 # b6 # c78 # For i tems i n a range9 f o r n i n range ( 3 ) :

10 p r i n t ( n )11 # 012 # 113 # 21415 # For i tems from any i t e r a t o r16 f o r n i n i t e r a t o r :17 p r i n t ( n )

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 53

Page 54: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Loops

Enumerate

1 # loop g e t t i n g index and value2 RNAs = [ ’miRNA ’ , ’ tRNA ’ , ’mRNA ’ ]3 f o r i , rna i n enumerate (RNAs) :4 p r i n t ( i , rna )5 # 0 miRNA6 # 1 tRNA7 # 2 mRNA89 # loop over 2 l i s t s

10 RNAtypes = [ ’ micro ’ , ’ t r a n s f e r ’ , ’ messenger ’ ]11 f o r i , t i n enumerate ( RNAtypes ) :12 r = RNAs [ i ]13 p r i n t ( i , t , r )14 # 0 micro miRNA15 # 1 t r a n s f e r tRNA16 # 2 messenger mRNA

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 54

Page 55: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Loops

While loop

1 i =02 value =13 w h i l e value <200:4 i +=15 value ∗= i6 p r i n t ( i , value )7 # 1 18 # 2 29 # 3 6

10 # 4 2411 # 5 12012 # 6 720

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 55

Page 56: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – – Exercise –

IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries

Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 56

Page 57: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – – Exercise –

Exercise

URL� https://cbdm.uni-mainz.de/mb17

Jupyter Notebook� File: Sequences.ipynb� Download the file into the notebooks folder

Data file� File: shrub dimensions.csv� Download the file into the data folder

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 57

Page 58: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Functions

IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries

Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 58

Page 59: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Functions

Functions I1 from random i m p o r t choice # i m p o r t f u n c t i o n ’ choice ’23 # Simple f u n c t i o n4 def kmerFixed ( ) : # d e f i n e f u n c t i o n kmerFixed5 p r i n t ( ”ACGTAGACGC” ) # p r i n t p r e d e f i n e d s t r i n g67 kmerFixed ( ) # d i s p l a y ’ACGTAGACGC ’89 # Return ing a value

10 def kmer10 ( ) : # d e f i n e f u n c t i o n kmer1011 seq= ” ” # d e f i n e an empty s t r i n g12 f o r count i n range ( 1 0 ) : # repeat 10 t imes13 seq += choice ( ”CGTA” ) # add 1 random n t t o s t r i n g14 r e t u r n ( seq ) # r e t u r n s t r i n g1516 newKmer = kmer10 ( ) # get r e s u l t o f f u n c t i o n i n t o v a r i a b l e17 p r i n t ( newKmer ) # c a l l the f u n c t i o n e . g . ’ACGGATACGC ’

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 59

Page 60: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Functions

Functions II

1 # One parameter2 def kmer ( k ) : # d e f i n e kmer w i t h 1 param . k3 seq= ” ”4 f o r count i n range ( k ) : # k i s used t o d e f i n e the range5 seq+= choice ( ”CGTA” )6 r e t u r n ( seq )78 p r i n t ( kmer ( k =4) ) # e . g . ’TACC ’9 p r i n t ( kmer ( 2 0 ) ) # e . g . ’CACAATGGGTACCCCGGACC ’

10 p r i n t ( kmer ( 0 ) ) #11 p r i n t ( kmer ( ) ) # TypeError : kmer ( ) missing 1 r e q u i r e d12 # p o s i t i o n a l argument : ’ k ’

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 60

Page 61: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Functions

Functions III

1 # Parameters w i t h more parameters and d e f a u l t values2 def gener ic kmer ( a lphabet = ”ACGT” , k =10) :3 seq= ” ”4 f o r count i n range ( k ) :5 seq+= choice ( a lphabet )6 r e t u r n ( seq )78 gener ic kmer ( ” AB12 ” , 15) # e . g . ’112AA1A12AA1121 ’9 gener ic kmer ( ” AB12 ” ) # e . g . ’1AA1B1BA2A ’

10 gener ic kmer ( k =20) # e . g . ’GTGGGCTTGTGCCCTGCACT ’11 gener ic kmer ( ) # e . g . ’CTTGCCGGGA ’12 gener ic kmer ( k =8 , a lphabet = ” #$%&” ) # e . g . ’ $$#&%$%$ ’

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 61

Page 62: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Functions

Name spaces I

� Variable and function names defined globally can be seen infunctions: this is the global namespace

1 a = 10 # g l o b a l v a r i a b l e23 def m y f u n c t i o n ( ) :4 p r i n t ( a ) # w i l l use the g l o b a l v a r i a b l e56 m y f u n c t i o n ( ) # 10 ( the g l o b a l a )7 p r i n t ( a ) # 10 ( the g l o b a l a )

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 62

Page 63: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Functions

Name spaces II

� Names defined within a function can not be seen outside: thefunction has its own namespace.

1 a = 10 # g l o b a l v a r i a b l e23 def m y f u n c t i o n ( ) :4 a = 1 # l o c a l v a r i a b l e d e f i n e d by assignment5 b = 2 # l o c a l v a r i a b l e d e f i n e d by assignment6 p r i n t ( a )78 m y f u n c t i o n ( ) # 1 ( the l o c a l a )9 p r i n t ( a ) # 10 ( the g l o b a l a )

10 p r i n t ( b ) # NameError : name ’ b ’ i s not d e f i n e d

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 63

Page 64: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Functions

Name spaces III� Use parameters and returned values to get and set variables

outside the name space1 a = 10 # g l o b a l v a r i a b l e23 def m y f u n c t i o n ( v a l ) : # l o c a l v a r i a b l e v a l4 b = 25 v a l = v a l + b6 r e t u r n ( v a l )7 p r i n t ( a ) # 10 ( the g l o b a l a )8 p r i n t ( m y f u n c t i o n ( a ) ) # 129 p r i n t ( a ) # 10 ( the g l o b a l a unchanged )

1011 c = m y f u n c t i o n ( a ) # s e t v a l t o 10 and assign 10+2 t o c12 p r i n t ( c ) # 12 ( g l o b a l a was changed )13 p r i n t ( a ) # 10 ( g l o b a l a was unchanged )1415 a = m y f u n c t i o n ( a ) # change g l o b a l a w i t h value 10+2

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 64

Page 65: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Branching

IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries

Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 65

Page 66: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Branching

Truth Value Testing I

Any object can be tested for truth value. The following values areconsidered false (other values are considered True):

� None� False� zero value: e.g. 0 or 0.0� an empty sequence or mapping: e.g. ’ ’, (), [ ], { }.

Operations and built-in functions that have a Boolean resultalways return 0 for False and 1 for True

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 66

Page 67: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Branching

Boolean Operations I

A Boolean is equal to True or False� a and b (true if a and b are true, false otherwise)� a or b (true if a or b is true (1 alone or both), false otherwise)� a ˆ b (true if either a or b is true (not both), false otherwise)� not b (true if b is false, false otherwise)

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 67

Page 68: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Branching

Boolean Operations II

All example code for tests below return ”True” unless otherwisespecified

1 # l e t s e t values o f 3 v a r i a b l e s ( s i n g l e ” = ” symbol )2 a = True3 b = False4 c = True567 # simple t e s t s using two ” = ” symbols ( = = )8 a == True9 b == False

10 c == True

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 68

Page 69: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Branching

Boolean Operations III

1 # l e t s e t values o f 3 v a r i a b l e s ( one ” = ” symbol )2 a = True3 b = False4 c = True56 # order i s i r r e l e v a n t7 ( a or b ) == ( b or a )8 ( a and b ) == ( b and a )9

10 # n e u t r a l ( whatever value o f a )11 ( a or False ) == a12 ( a and True ) == a1314 # always the same ( whatever value o f a )15 ( a and False ) == False16 ( a or True ) == True

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 69

Page 70: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Branching

Boolean Operations IV1 # l e t s e t values o f 3 v a r i a b l e s ( one ” = ” symbol )2 a = True3 b = False4 c = True56 # precedence ” = = ” > ” not ” > ” and ” > ” or ”7 ( a and b or c ) == ( ( a and b ) or c )8 ( not a == b ) == ( not ( a == b ) )9

10 # e q u i v a l e n t expressions11 ( ( a or b ) or c ) == ( a or ( b or c ) ) == ( a or b or c )12 ( a or a or a ) == a13 ( b and b and b ) == b1415 b and b and b == b # False and False and True => False ! !1617 a and ( b or c ) == ( a and b ) or ( a and c )18 a or ( b and c ) == ( a or b ) and ( a or c )

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 70

Page 71: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Branching

Comparisons1 Operat ions2 < # s t r i c t l y l e s s than3 <= # l e s s than or equal4 > # s t r i c t l y g r e a t e r than5 >= # g r e a t e r than or equal6 == # equal ( two symbols =)7 math . i s c l o s e ( a , b ) # equal f o r f l o a t i n g p o i n t s a and b8 ! = # not equal9 i s # o b j e c t i d e n t i t y

10 i s not # negated o b j e c t i d e n t i t y11 x < y <= z # i s e q u i v a l e n t t o ” x < y and y <= z ”

� Comparisons between objects of same class are supported ifoperator defined for the class.

� Different numerical types can be compared: e.g. 2<4.56� Floating points can not be compared exactly due to the limited

precision to represent infinite numbers such as 1/3 =0.33333...

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 71

Page 72: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Branching

Conditionals

� IF-ELIF-ELSE1 seq = ’ATGAnnATG ’2 i f ’ n ’ i n seq :3 p r i n t ( ” sequence c o n t a i n s undef ined bases ( n ) ” )4 e l i f ’ x ’ i n seq :5 p r i n t ( ” sequence c o n t a i n s unknown bases x but not n ” )6 e l s e :7 p r i n t ( ” no undef ined bases i n sequence ” )89 #

10 # sequence c o n t a i n s undef ined bases

� ELIF and ELSE are optional� multiple ELIF are possible

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 72

Page 73: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – – Exercise –

IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries

Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 73

Page 74: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – – Exercise –

Exercise

URL� https://cbdm.uni-mainz.de/mb17

Jupyter Notebook� File: Conditionals.ipynb� Download the file into the notebooks folder

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 74

Page 75: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Regular Expressions

IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries

Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 75

Page 76: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Regular Expressions

RE: Regular Expressions I

� Regular expressions (called REs, or regexes, or regexpatterns) are a powerful language for matching text patterns(re module)

� In Python a regular expression search is typically written as:1 match = re . search ( expression , s t r i n g )

� The re.search() method takes a regular expression patternand a string and searches for that pattern within the string.

� If the search is successful, re.search() returns a Matchobject (actually class ’ sre.SRE Match’) or None otherwise.

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 76

Page 77: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Regular Expressions

RE: Regular Expressions II

1 i m p o r t re # i m p o r t re module2 s t r = ’ an example word : c a t ! ! ’ # Example s t r i n g3 match = re . search ( r ’ word :\w\w\w ’ , s t r ) # Search a p a t t e r n4 i f match :5 p r i n t ( ’ found ’ , match . group ( ) ) # ’ found word : c a t ’6 e l s e :7 p r i n t ( ’ d i d not f i n d ’ )

� In the pattern string, \w codes a character (letter, digit orunderscore)

� The ’r’ at the start of the pattern string designates a python”raw” string which passes through backslashes withoutchange.

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 77

Page 78: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Regular Expressions

RE: Basic Patterns

Pattern Matcha, X, 9, < ordinary characters match themselves exactly. a period matches any single character except newline\w matches a ”word” character: a letter or digit or underbar [a-zA-Z0-9 ]\W matches any non-word character\b boundary between word and non-word\s a single whitespace character – space, newline, return, tab, form [\n \r \t \f]\S matches any non-whitespace character\t tab\n newline\r return\d decimal digit [0-9]ˆ circumflex (top hat) matches the start of a string$ dollar matches the end of a string\ inhibits the ”specialness” of a character. So, for example, use \. to match a period

Table: Regular expressions: basic patterns

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 78

Page 79: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Regular Expressions

RE: Basic examples I

The basic rules of RE search for a pattern within a string are:� The search proceeds through the string from start to end,

stopping at the first match found� All of the pattern must be matched, but not all of the string� If match = re.search(pat, str) is successful, match is not

None and in particular match.group() is the matching text

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 79

Page 80: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Regular Expressions

RE: Basic examples II

1 match = re . search ( r ’ i i i ’ , ’ p i i i g ’ ) # found2 match . group ( ) == ” i i i ” # True34 match = re . search ( r ’ i g s ’ , ’ p i i i g ’ ) # not found5 match == None # True67 match = re . search ( r ’ . . g ’ , ’ p i i i g ’ ) # found8 match . group ( ) == ” i i g ” # True9

10 match = re . search ( r ’ \d\d\d ’ , ’ p123g ’ ) # found11 match . group ( ) == ” 123 ” # True1213 match = re . search ( r ’ \w\w\w ’ , ’@@abcd ! ! ’ ) # found14 match . group ( ) == ” abc ” # True

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 80

Page 81: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Regular Expressions

RE: Repetitions IRepetitions are defined using +, *, ? and { }

� + means 1 or more occurrences of the pattern to its left� e.g. i+ = one or more i’s

� * means 0 or more occurrences of the pattern to its left� ? means match 0 or 1 occurrences of the pattern to its left� curly brackets are used to specify exact number of repetitions

� e.g. A{5} for 5 A letters� A{6,10} for 6 to 10 A letters

Leftmost and Largest:� First the search finds the leftmost match for the pattern, and

second it tries to use up as much of the string as possible� i.e. + and * go as far as possible (they are said to be

”greedy”).

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 81

Page 82: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Regular Expressions

RE: Repetitions II

1 # simple r e p e t i t i o n s2 re . search ( r ’ p i + ’ , ’ p i i i g ’ ) . group ( ) # p i i i3 re . search ( r ’ p i ? ’ , ’ ap ’ ) . group ( ) # p4 re . search ( r ’ p i ? ’ , ’ a p i i ’ ) . group ( ) # p i5 re . search ( r ’ p i ∗ ’ , ’ ap ’ ) . group ( ) # p6 re . search ( r ’ p i ∗ ’ , ’ a p i i ’ ) . group ( ) # p i i7 re . search ( r ’ p i {3} ’ , ’ a p i i i i i ’ ) . group ( ) # p i i i8 re . search ( r ’ i + ’ , ’ p i i g i i i i ’ ) . group ( ) # i i (1 s t h i t o n l y )9

10 # 3 d i g i t s p o s s i b l y separated by whitespaces (\ s ∗ )11 re . search ( r ’ \d\s∗\d\s∗\d ’ , ’ xx1 2 3xx ’ ) . group ( ) # ”1 2 3”12 re . search ( r ’ \d\s∗\d\s∗\d ’ , ’ xx12 3xx ’ ) . group ( ) # ”12 3”13 re . search ( r ’ \d\s∗\d\s∗\d ’ , ’ xx123xx ’ ) . group ( ) # ”123”

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 82

Page 83: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Regular Expressions

RE: Sets of characters I

� Square brackets indicate a set of characters� [ABC] matches ’A’ or ’B’ or ’C’.

� The codes \w, \s etc. work inside square brackets too withthe one exception that dot (.) just means a literal dot

� Dash indicate a range or itself if put at the end� [a-z] for lowercase alphabetic characters� [a-zA-Z] for alphabetic characters� [AB-] for A, B or dash

� Circumflex (ˆ) at the start inverts the set� [ˆAB] for any character except A or B.

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 83

Page 84: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Regular Expressions

RE: Sets of characters II

1 s t r = ’ p u r p l e a l i c e−b@google . com monkey dishwasher ’2 match = re . search ( r ’ \w+@\w+ ’ , s t r )3 i f match :4 p r i n t match . group ( ) ## ’ b@google ’56 match = re . search ( r ’ [\w.−]+@[\w.−]+ ’ , s t r )7 i f match :8 p r i n t match . group ( ) ## ’ a l i c e−b@google . com ’

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 84

Page 85: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Regular Expressions

RE: Functions IRE module functions:

� re.match() returns a Match object if occurrence found at beginingof string, None otherwise

� re.search() returns a Match object for 1st occurrence, None if notfound

� re.findall() returns a list of matched sub strings, an empty list if notfound

� re.finditer() returns an iterator on Match objects of theoccurrences, an empty iterator if not found

Match object methods:� match.start() returns start index� match.end() returns end index� match.span() returns start and end index in a tuple� match.group() returns matched string

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 85

Page 86: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Regular Expressions

RE: Functions II

1 i m p o r t re2 seq = ”RPAPPDRAPDQX” # A sequence3 expr = ’A.{1 ,2}D ’ # A and D separated by 1 or 2 c h a r a c t e r s45 match = re . search ( expr , seq )6 i f match :7 p r i n t (8 match . s t a r t ( ) , # s t a r t index9 match . end ( ) , # end index

10 match . span ( ) , # s t a r t and end index11 match . group ( ) , # the matched s t r i n g12 seq [ match . s t a r t ( ) : match . end ( ) ] , # the matched s t r i n g13 sep= ’ − ’14 )15 # 2 − 6 − ( 2 , 6) − APPD − APPD

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 86

Page 87: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Regular Expressions

RE: Functions III1 i m p o r t re2 seq = ”RPAPPDRAPDQX” # A sequence3 expr = ’A.{1 ,2}D ’ # A and D separated by 1 or 2 c h a r a c t e r s45 match = re . match ( expr , seq ) # Not found a t b e g i n i n g6 p r i n t ( match )7 # None89 matches = re . f i n d a l l ( expr , seq ) # Found 2 occurrences

10 p r i n t ( matches )11 # [ ’APPD ’ , ’APD ’ ]1213 matches = re . f i n d i t e r ( expr , seq ) # Found 2 occurrences14 f o r m i n matches : # I t e r a t e over Match o b j e c t s15 p r i n t ( m. span ( ) , m. group ( ) ) # Use each Match o b j e c t16 # ( 2 , 6) APPD17 # ( 7 , 10) APD

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 87

Page 88: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Regular Expressions

RE: Group Extraction

� Groups are defined with parentheses� On a successful search

� match.group(): the whole match text� match.group(1): match text of 1st left parenthesis� match.group(2): match text of 2nd left parenthesis� ...

1 i m p o r t re2 s t r = ’ p u r p l e a l i c e−b@google . com monkey dishwasher ’3 match = re . search ( ’ ( [ \w.− ]+)@( [ \w.− ]+) ’ , s t r )4 i f match :5 p r i n t ( match . group ( ) ) ## ’ a l i c e−b@google . com ’6 p r i n t ( match . group ( 1 ) ) ## ’ a l i c e−b ’7 p r i n t ( match . group ( 2 ) ) ## ’ google . com ’

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 88

Page 89: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Regular Expressions

RE: Group Extraction and Findall� If the pattern includes a single set of parenthesis, then

findall() returns a list of strings corresponding to that singlegroup

� If the pattern includes 2 or more parenthesis groups, theninstead of returning a list of strings, findall() returns a list oftuples. Each tuple represents one match of the pattern, andinside the tuple is the group(1), group(2) ... data.

1 s t r = ’ al ice@google . com , monkey bob@abc . com dishwasher ’2 t u p l e s = re . f i n d a l l ( r ’ ( [ \w\ .− ]+)@( [ \w\ .− ]+) ’ , s t r )3 p r i n t ( t u p l e s )4 # [ ( ’ a l i c e ’ , ’ google . com ’ ) , ( ’ bob ’ , ’ abc . com ’ ) ]56 f o r t i n t u p l e s :7 p r i n t ( t [ 0 ] , t [ 1 ] , sep= ’ | ’ )8 # a l i c e | google . com9 # bob | abc . com

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 89

Page 90: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Regular Expressions

RE: Options

The re functions take options to modify the behavior of the patternmatch. The option flag is added as an extra argument to thesearch() or findall() etc., e.g. re.search(pat, str,re.IGNORECASE).

� IGNORECASE ignores upper/lowercase differences formatching

� DOTALL allows dot (.) to match newline – normally it matchesanything but newline.

� Note that \s (whitespace) includes newlines� MULTILINE allows ˆand $ to match the start and end of each

line within a string made of many lines. Normally they justmatch the start and end of the whole string.

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 90

Page 91: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Regular Expressions

Greedy vs. Non-Greedy

� .* or .+ return the largest match (aka it is ”greedy”)� to get nested occurrences use .*? or .+?

1 s t r i n g = ’<b>foo </b> and <i>so on</ i> ’ # s t r i n g w i t h xml tags23 matches = re . f i n d a l l ( r ’<.∗> ’ , s t r i n g ) # <.∗>4 p r i n t ( matches ) # [ ’<b>foo </b> and <i>so on</ i > ’ ] # got a l l s t r i n g56 matches = re . f i n d a l l ( r ’<.∗?> ’ , s t r i n g ) # <.∗?>7 p r i n t ( matches ) # [ ’<b> ’ , ’</b> ’ , ’< i > ’ , ’</ i > ’ ] # got each tag

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 91

Page 92: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Regular Expressions

Substitution� re.sub(expression, replacement, string)

1 t e x t 1 = ’ al ice@google . com and bob@abc . net ’2 t e x t 2 = re . sub ( r ’ \ .\w+ ’ , r ’ . de ’ , t e x t 1 )3 p r i n t ( t e x t 2 )4 # alice@google . de and bob@abc . de

� \1, \2 ... in replacement refer to match group(1), group(2) ...1 t e x t 1 = ’ al ice@google . com and bob@abc . com ’2 t e x t 2 = re . sub (3 r ’ ( [ \w\ .− ]+)@( [ \w\ .− ]+) ’ , # Expression4 r ’ \2@\1 ’ , # Replacement s t r i n g5 s t r ) # I n p u t s t r i n g6 p r i n t ( t e x t 2 )7 ## google . com@alice and abc . com@bob

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 92

Page 93: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – – Exercise –

IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries

Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 93

Page 94: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – – Exercise –

Exercise

URL� https://cbdm.uni-mainz.de/mb17

Jupyter Notebook� File: Regex.ipynb� Download the file into the notebooks folder

Data file� File: sequences.tsv� Download the file into the data folder

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 94

Page 95: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Annexes

IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries

Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 95

Page 96: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Annexes

References

� Python documentation� https://docs.python.org

� Online tutorials (Python 2 or 3)� Google’s Python Class� ProgrammingForBiologists.org

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 96

Page 97: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Annexes

Escape sequences

Escape Sequence Meaning\newline Backslash and newline ignored\\ Backslash (\)\’ Single quote (’)\” Double quote (”)\a ASCII Bell (BEL)\b ASCII Backspace (BS)\f ASCII Formfeed (FF)\n ASCII Linefeed (LF)\r ASCII Carriage Return (CR)\t ASCII Horizontal Tab (TAB)\v ASCII Vertical Tab (VT)\ooo Character with octal value ooo\xhh Character with hex value hh

Table: Escape sequences

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 97

Page 98: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Annexes

Common Sequence Operations

Operation Resultx in s True if an item of s is equal to x, else Falsex not in s False if an item of s is equal to x, else Trues + t the concatenation of s and ts * n or n * s equivalent to adding s to itself n timess[i] ith item of s, origin 0s[i:j] slice of s from i to js[i:j:k] slice of s from i to j with step klen(s) length of smin(s) smallest item of smax(s) largest item of ss.index(x[, i[, j]]) index of the first occurrence of x in s (at or after index i and before index j)s.count(x) total number of occurrences of x in s

Table: Sequence operations sorted in ascending priority. s and t aresequences of the same type, n, i, j and k are integers and x is anarbitrary object that meets any type and value restrictions imposed by s.

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 98

Page 99: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Annexes

Operations on mutable sequence types

Operation Results[i] = x item i of s is replaced by xs[i:j] = t slice of s from i to j is replaced by the contents of the iterable tdel s[i:j] same as s[i:j] = []s[i:j:k] = t the elements of s[i:j:k] are replaced by those of tdel s[i:j:k] removes the elements of s[i:j:k] from the lists.append(x) appends x to the end of the sequence (same as s[len(s):len(s)] = [x])s.clear() removes all items from s (same as del s[:])s.copy() creates a shallow copy of s (same as s[:])s.extend(t) or s += t extends s with the contents of t (for the most part the same as s[len(s):len(s)] = t)s *= n updates s with its contents repeated n timess.insert(i, x) inserts x into s at the index given by i (same as s[i:i] = [x])s.pop([i]) retrieves the item at i and also removes it from ss.remove(x) remove the first item from s where s[i] == xs.reverse() reverses the items of s in place

Table: s is an instance of a mutable sequence type, t is any iterableobject and x is an arbitrary object that meets any type and valuerestrictions imposed by s

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 99

Page 100: Introduction to Python for Biologistscbdm-01.zdv.uni-mainz.de/~jfontain/files/mb17/slides/Python4Biologists.pdfIntroduction to Python for Biologists – Introduction Namespace Variables

Introduction to Python for Biologists – Annexes

Built-in functions

abs() Return the absolute value of a number.all() Return True if all elements of the iterable are true (or if the iterable is empty).any() Return True if any element of the iterable is true. If the iterable is empty, return False.ascii() Return a string containing a printable representation of an object (escape non-ASCII characters).bin() Convert an integer number to a binary string.bool() Convert a value to a Boolean.chr() Return the string representing a character.dict() Create a new dictionary.dir() Return the list of names in the current local scope.float() Convert a string or a number to floating point.format() Convert a value to a ”formatted” representation.help() Invoke the built-in help system.hex() Convert an integer number to a hexadecimal string.

Table: Python built-in functions

March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 100