An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation...

163
An Introduction to Perl for bioinformatics C. Tranchant-Dubreuil, F. Sabot UMR DIADE, IRD June 2015 1 / 98

Transcript of An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation...

Page 1: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

An Introduction to Perlfor bioinformatics

C. Tranchant-Dubreuil, F. Sabot

UMR DIADE, IRD

June 2015

1 / 98

Page 2: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Sommaire

1 IntroductionObjectivesWhat is programming ?What is perl ? Why used Perl ?Perl in bioinformaticsRunning a perl script

2 Perl in detail

2 / 98

Page 3: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Objectives

• To demonstrate how Perl can be used in bioinformatics

• To empower you with the basic knowledge required to quicklyand effectively create simple scripts to process biological data

• Write your own programs once time, use many times

3 / 98

Page 4: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Objectives

• To demonstrate how Perl can be used in bioinformatics• To empower you with the basic knowledge required to quicklyand effectively create simple scripts to process biological data

• Write your own programs once time, use many times

3 / 98

Page 5: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Objectives

• To demonstrate how Perl can be used in bioinformatics• To empower you with the basic knowledge required to quicklyand effectively create simple scripts to process biological data

• Write your own programs once time, use many times

3 / 98

Page 6: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

What is programming

• Programs : a set of instructions telling computer what to do

• Programming languages : bridges between human languages(A-Z) and machine languages (0&1)

• Compilers convert programming languages to machinelanguages

4 / 98

Page 7: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

What is programming

• Programs : a set of instructions telling computer what to do• Programming languages : bridges between human languages(A-Z) and machine languages (0&1)

• Compilers convert programming languages to machinelanguages

4 / 98

Page 8: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

What is programming

• Programs : a set of instructions telling computer what to do• Programming languages : bridges between human languages(A-Z) and machine languages (0&1)

• Compilers convert programming languages to machinelanguages

4 / 98

Page 9: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

What is a program

5 / 98

Page 10: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Methods for programming

Correct the errors (debugging)

Run the program

Look at the output

Writing the program

Page 11: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Perl, Practical Extraction and Report Language

• Created in 1987 by Larry Wall

• Optimized for scanning text files and extractinginformation from them

• A massive library of reusable code (http://www.cpan.org/)• Scripts often faster/easier to write than full compiledprograms

Syntax sometimes confusing to new users : There’s morethan one way to do it

7 / 98

Page 12: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Perl, Practical Extraction and Report Language

• Created in 1987 by Larry Wall• Optimized for scanning text files and extractinginformation from them

• A massive library of reusable code (http://www.cpan.org/)• Scripts often faster/easier to write than full compiledprograms

Syntax sometimes confusing to new users : There’s morethan one way to do it

7 / 98

Page 13: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Perl, Practical Extraction and Report Language

• Created in 1987 by Larry Wall• Optimized for scanning text files and extractinginformation from them

• A massive library of reusable code (http://www.cpan.org/)

• Scripts often faster/easier to write than full compiledprograms

Syntax sometimes confusing to new users : There’s morethan one way to do it

7 / 98

Page 14: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Perl, Practical Extraction and Report Language

• Created in 1987 by Larry Wall• Optimized for scanning text files and extractinginformation from them

• A massive library of reusable code (http://www.cpan.org/)• Scripts often faster/easier to write than full compiledprograms

Syntax sometimes confusing to new users : There’s morethan one way to do it

7 / 98

Page 15: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Perl, Practical Extraction and Report Language

• Created in 1987 by Larry Wall• Optimized for scanning text files and extractinginformation from them

• A massive library of reusable code (http://www.cpan.org/)• Scripts often faster/easier to write than full compiledprograms

Syntax sometimes confusing to new users : There’s morethan one way to do it

7 / 98

Page 16: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Choose a language based on your needs

Perl is NOT suitable for• significant computation• sophisticated datastructures that use largeamounts of memory

Perl is suitable for• Quick and dirty solutions(prototyping)

• Text processing• Almost anything ifperformance is not an issue

8 / 98

Page 17: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Choose a language based on your needs

Perl is NOT suitable for• significant computation• sophisticated datastructures that use largeamounts of memory

Perl is suitable for• Quick and dirty solutions(prototyping)

• Text processing• Almost anything ifperformance is not an issue

8 / 98

Page 18: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Perl in BioinformaticsHow Perl saved the Human Genome Project

Lincoln Stein (1996) www.perl.org

• Open Source project that has recruited developers from all overthe world

• A project to produce modules to process all known forms ofbiological data

• Perl allowed various genome centers to effectively communicatetheir data with each other

=⇒Bioperl project – www.bioperl.org

9 / 98

Page 19: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Perl in BioinformaticsHow Perl saved the Human Genome Project

Lincoln Stein (1996) www.perl.org• Open Source project that has recruited developers from all overthe world

• A project to produce modules to process all known forms ofbiological data

• Perl allowed various genome centers to effectively communicatetheir data with each other

=⇒Bioperl project – www.bioperl.org

9 / 98

Page 20: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Perl in BioinformaticsHow Perl saved the Human Genome Project

Lincoln Stein (1996) www.perl.org• Open Source project that has recruited developers from all overthe world

• A project to produce modules to process all known forms ofbiological data

• Perl allowed various genome centers to effectively communicatetheir data with each other

=⇒Bioperl project – www.bioperl.org

9 / 98

Page 21: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Perl in BioinformaticsHow Perl saved the Human Genome Project

Lincoln Stein (1996) www.perl.org• Open Source project that has recruited developers from all overthe world

• A project to produce modules to process all known forms ofbiological data

• Perl allowed various genome centers to effectively communicatetheir data with each other

=⇒Bioperl project – www.bioperl.org

9 / 98

Page 22: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Bioperl code example

• Retrieve a FASTA sequence from a remote sequence database byaccession

10 / 98

Page 23: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Bioperl code example

• Retrieve a FASTA sequence from a remote sequence database byaccession

10 / 98

Page 24: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

First steps

• Perl programs start with : #!/usr/bin/perl -w

• Perl statements end with a semicolon‘ ;’• # means comment

I Anything is ignored after a # in a lineI Comments are freeI Helps you and others understand your code

11 / 98

Page 25: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

First steps

• Perl programs start with : #!/usr/bin/perl -w• Perl statements end with a semicolon‘ ;’

• # means commentI Anything is ignored after a # in a lineI Comments are freeI Helps you and others understand your code

11 / 98

Page 26: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

First steps

• Perl programs start with : #!/usr/bin/perl -w• Perl statements end with a semicolon‘ ;’• # means comment

I Anything is ignored after a # in a lineI Comments are freeI Helps you and others understand your code

11 / 98

Page 27: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

4 lines of code

Listing 1 – retrieve-accession.pl�1 #!/ u s r / b i n / p e r l2

3 #r e t r i e v e_ a c c e s s i o n . p l4

5 #Load b i o p e r l l i b r a r y used6 use Bio : :DB : : GenBank ;7

8 $gb=Bio : :DB : : GenBank−>new ( ) ;9 $seq=$gb−>get_Seq_by_version ( ’ CK085358 . 1 ’ ) ; #GI ←↩

Number10

11 #Write the sequence i n t o a f i l e12 $out=Bio : : SeqIO−>new( ’− f i l e ’ => ">CK085358 . 1 . f a " ) ;13 $out−>wr i te_seq ( $seq ) ;� �

12 / 98

Page 28: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Running Perl program

1 placing the perl path interpreter in the first line of the script�1 #!/ u s r / l o c a l / b i n / p e r l� �

2 Don’t forget make your program executable !chmod +x your_program.pl

3 Run your program !/path2program/your_program.pl

13 / 98

Page 29: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Running Perl program

1 placing the perl path interpreter in the first line of the script�1 #!/ u s r / l o c a l / b i n / p e r l� �

2 Don’t forget make your program executable !chmod +x your_program.pl

3 Run your program !/path2program/your_program.pl

13 / 98

Page 30: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Running Perl program

1 placing the perl path interpreter in the first line of the script�1 #!/ u s r / l o c a l / b i n / p e r l� �

2 Don’t forget make your program executable !chmod +x your_program.pl

3 Run your program !/path2program/your_program.pl

13 / 98

Page 31: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Practice

Task P1.11 Open a terminal2 Connect on the distant server ([email protected])3 Change to the directory /home/formation1/SCRIPTS4 Run the program retrieve-accession.pl

14 / 98

Page 32: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Practice

Task P1.21 Change the accession2 Run the program retrieve-accession.pl

15 / 98

Page 33: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Sommaire

1 Introduction

2 Perl in detailWriting a first programVariablesControl structureswhileInput/output

16 / 98

Page 34: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Plan

1 Introduction

2 Perl in detailWriting a first programVariablesControl structureswhileInput/output

17 / 98

Page 35: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Write your first programThe traditional first program

Listing 2 – helloWorld.pl�1 # he l l oWor l d . p l2

3 p r i n t " He l l o ␣wor ld . . . ␣ o r ␣ not \n" ;� �print write to the terminal

\n signify a newline

18 / 98

Page 36: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Write your first programThe traditional first program

Listing 3 – helloWorld.pl�1 # he l l oWor l d . p l2

3 p r i n t " He l l o ␣wor ld . . . ␣ o r ␣ not \n" ;� �print write to the terminal\n signify a newline

18 / 98

Page 37: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeRun your first program

Task P2.11 Write your script helloWorld.pl and save it into SCRIPTS

directory2 Run it

19 / 98

Page 38: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeRun your first program

Task P2.2• Modify the program to output some other text.• Add a few more print statements and experiment with whathappens if you omit or add extra newlines.

20 / 98

Page 39: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeRun your first program

Task P2.3• Make a few deleterious mutations to your program. For example,leave off the semicolon or ". Observe the error messages.

One of the most important aspects of programming is debugging.Probably more time is spent debugging than programming, so it’s agood idea to start recognizing errors now.

21 / 98

Page 40: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

What’s a variable ?

• Variables are used to store information to be referenced andmanipulated in a computer program.

• The value of the variable can change, depending on conditionsor on information passed to the program.

22 / 98

Page 41: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Variables

• Three types of variable• scalar :

I The most basic kind of variableI Single value, can be string, numberI prefixed with $ e.g. $dna

• array : list of values, prefixed with @• hash : paired values (key and value) prefixed with %

23 / 98

Page 42: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Variables

• Three types of variable• scalar :

I The most basic kind of variableI Single value, can be string, numberI prefixed with $ e.g. $dna

• array : list of values, prefixed with @• hash : paired values (key and value) prefixed with %

23 / 98

Page 43: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Variables

• Three types of variable• scalar :

I The most basic kind of variableI Single value, can be string, numberI prefixed with $ e.g. $dna

• array : list of values, prefixed with @• hash : paired values (key and value) prefixed with %

23 / 98

Page 44: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Assigning values

• = means "assign a value to a variable"�1 $dna=’ATATACGACGAGACATAG ’� �

• No formal declarations of variables necessary

I Good practice to “use strict” : forces variable declaration�1 #!/ u s r / l o c a l / b i n / p e r l2

3 use s t r i c t ;4

5 my $ v a r i a b l e ; # a va r d e c l a r e d wi th my� �

24 / 98

Page 45: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Assigning values

• = means "assign a value to a variable"�1 $dna=’ATATACGACGAGACATAG ’� �

• No formal declarations of variables necessaryI Good practice to “use strict” : forces variable declaration�

1 #!/ u s r / l o c a l / b i n / p e r l2

3 use s t r i c t ;4

5 my $ v a r i a b l e ; # a va r d e c l a r e d wi th my� �

24 / 98

Page 46: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Variable naming

• You can use (almost) anything for your variable names : a-z,A-Z, 0-9,_

• no space• casse sensitive• try to use names which are descriptive and not too long.�1 $a = "ATCAGGG" ; # $a obscu r2 $dna_sequence_var iab le ; # too long3 $sequence = "ATCAGGG" ; # $sequence i s b e t t e r4 $dna = "ATCAGGG" ; # $dna i s even b e t t e r� �

25 / 98

Page 47: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Scalar : string

Listing 4 – scalar-string.pl�1 #!/ u s r / b i n / p e r l2

3 #sc a l a r−s t r i n g . p l4

5 use s t r i c t ;6 use warn ings ;7

8 # Va r i a b l e d e c l a r a t i o n and i n i t i a l i z a t i o n9 my $number= "12" ;

10

11 my $compos i te= "There ␣ a r e ␣$number␣ sequence s " ;12 p r i n t $compos i te , "\n" ;� �

[login@node1]$./scalarstring.plThere are 12 sequences

26 / 98

Page 48: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Scalar : string

Listing 5 – scalar-string.pl�1 #!/ u s r / b i n / p e r l2

3 #sc a l a r−s t r i n g . p l4

5 use s t r i c t ;6 use warn ings ;7

8 # Va r i a b l e d e c l a r a t i o n and i n i t i a l i z a t i o n9 my $number= "12" ;

10

11 my $compos i te= "There ␣ a r e ␣$number␣ sequence s " ;12 p r i n t $compos i te , "\n" ;� �

[login@node1]$./scalarstring.plThere are 12 sequences

26 / 98

Page 49: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Manipulate strings

Function Meaningsubstr takes characters out of a string

Example

• Getting 5 letters from one position in a string variable�1 $ l e t t e r = s u b s t r ( $dnaSeq , $ p o s i t i o n , 5) ;� �

27 / 98

Page 50: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Manipulate strings

Function Meaningsubstr takes characters out of a string

Example

• Getting 5 letters from one position in a string variable�1 $ l e t t e r = s u b s t r ( $dnaSeq , $ p o s i t i o n , 5) ;� �

27 / 98

Page 51: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Manipulate strings

Function Meaningsubstr takes characters out of a stringlength get length of a string

Example�1 p r i n t l e n g t h ( $dnaSeq ) ;� �

28 / 98

Page 52: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Manipulate strings

Function Meaningsubstr takes characters out of a stringlength get length of a string

. concatenate two string

Example�1 $ a c c e s s i o n=">myseq\n" ;2 $sequence="ATACGTCAGCTAGCTACTGCCCT\n" ;3 $mySeq=$a c c e s s i o n . $sequence ;� �

29 / 98

Page 53: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse variables

Task P3.1• Modify the program retrieve-accession.pl by adding a variable$accession and $fastaFile

• Print on the terminal, once the sequence downloaded, thefollowing generic message : The sequence CK085358.1 has beendownloaded and saved in the file CK085358.1.fasta

30 / 98

Page 54: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse variables

Task P3.2• Mutate your program. Delete a $, remove my and see what errormessage you get.

Task P3.3• Modify the program by changing the contents of the variables.Observe the output. Try experimenting by creating morevariables.

31 / 98

Page 55: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Scalar : number

Listing 6 – scalar-number.pl�1 my $x=3;2 my $y=2;3

4 p r i n t "$x␣+␣$y␣ i s ␣" , $x + $y , "\n" ;5 p r i n t "$x␣−␣$y␣ i s ␣" , $x − $y , "\n" ;6 p r i n t "$x␣∗␣$y␣ i s ␣" , $x ∗ $y , "\n" ;7 p r i n t "$x␣/␣$y␣ i s ␣" , $x / $y , "\n" ;� �

32 / 98

Page 56: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Scalar : number

• Numbers can be incremented by one with ’++’

�1 $x = 2 ;2 $x++;3 p r i n t "$x\n" ; # 3� �

• Numbers can be decremented by one with ’–’�1 $x = 2 ;2 $x−−;3 p r i n t "$x\n" ; # 1� �

33 / 98

Page 57: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Scalar : number

• Numbers can be incremented by one with ’++’�1 $x = 2 ;2 $x++;3 p r i n t "$x\n" ; # 3� �

• Numbers can be decremented by one with ’–’�1 $x = 2 ;2 $x−−;3 p r i n t "$x\n" ; # 1� �

33 / 98

Page 58: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Scalar : number

• Numbers can be incremented by one with ’++’�1 $x = 2 ;2 $x++;3 p r i n t "$x\n" ; # 3� �

• Numbers can be decremented by one with ’–’

�1 $x = 2 ;2 $x−−;3 p r i n t "$x\n" ; # 1� �

33 / 98

Page 59: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Scalar : number

• Numbers can be incremented by one with ’++’�1 $x = 2 ;2 $x++;3 p r i n t "$x\n" ; # 3� �

• Numbers can be decremented by one with ’–’�1 $x = 2 ;2 $x−−;3 p r i n t "$x\n" ; # 1� �

33 / 98

Page 60: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse variables scalar and array

Task P4.1• Write a new program called average.pl that calculates theaverage of three numeric variables x, y, z ; stores the result into avariable and print the result.�

1 my $ave rage = ( $x + $y ) / 2 ;� �

34 / 98

Page 61: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variables

• a named list of information

I Each array element can be any scalar variableI Array is indexed by integers beginning with zero.

The first element of an array is therefore the zero-th element�1 my @gene_l i s t=( ’ CK085358 . 1 ’ , ’ CK085357 . 1 ’ , ’ CK085356←↩

. 1 ’ ) ;2 p r i n t " $g en e_ l i s t [ 1 ] \ n" ; # CK085357 . 13 p r i n t " $g en e_ l i s t [ 0 ] \ n" ; # CK085358 . 1� �

35 / 98

Page 62: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variables

• a named list of informationI Each array element can be any scalar variable

I Array is indexed by integers beginning with zero.The first element of an array is therefore the zero-th element�

1 my @gene_l i s t=( ’ CK085358 . 1 ’ , ’ CK085357 . 1 ’ , ’ CK085356←↩. 1 ’ ) ;

2 p r i n t " $g en e_ l i s t [ 1 ] \ n" ; # CK085357 . 13 p r i n t " $g en e_ l i s t [ 0 ] \ n" ; # CK085358 . 1� �

35 / 98

Page 63: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variables

• a named list of informationI Each array element can be any scalar variableI Array is indexed by integers beginning with zero.

The first element of an array is therefore the zero-th element�1 my @gene_l i s t=( ’ CK085358 . 1 ’ , ’ CK085357 . 1 ’ , ’ CK085356←↩

. 1 ’ ) ;2 p r i n t " $g en e_ l i s t [ 1 ] \ n" ; # CK085357 . 13 p r i n t " $g en e_ l i s t [ 0 ] \ n" ; # CK085358 . 1� �

35 / 98

Page 64: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse array variables

Task P4.2• Create and run the following program

Listing 7 – array.pl�1 my @animals = ( ’ ca t ’ , ’ dog ’ , ’ p i g ’ ) ;2 p r i n t "1 s t ␣ an ima l ␣ i n ␣ a r r a y ␣ i s : ␣ $an ima l s [ 0 ] \ n" ;3 p r i n t "2nd␣ an ima l ␣ i n ␣ a r r a y ␣ i s : ␣ $an ima l s [ 1 ] \ n" ;4 p r i n t " E n t i r e ␣ an ima l s ␣ a r r a y ␣ c o n t a i n s : ␣@animals \←↩

n" ;� �

36 / 98

Page 65: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse array variables

Task P4.3• Mutate your program as below and see what error message youget.�

1 p r i n t "@animals [ 0 ] \ n" ;� �

37 / 98

Page 66: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variablesFunctions

• Perl arrays are dynamic• add or remove entries/element from list is easy

Function Meaning Examplepush to add to end push(@array, "apple")pop to remove from end $pop_val = pop(@array)

unshift add to front unshift(@array, "some value")shift remove from front $shift_val = shift(@array)

38 / 98

Page 67: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variablesFunctions

• Perl arrays are dynamic• add or remove entries/element from list is easy

Function Meaning Examplepush to add to end push(@array, "apple")

pop to remove from end $pop_val = pop(@array)unshift add to front unshift(@array, "some value")shift remove from front $shift_val = shift(@array)

38 / 98

Page 68: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variablesFunctions

• Perl arrays are dynamic• add or remove entries/element from list is easy

Function Meaning Examplepush to add to end push(@array, "apple")pop to remove from end $pop_val = pop(@array)

unshift add to front unshift(@array, "some value")shift remove from front $shift_val = shift(@array)

38 / 98

Page 69: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variablesFunctions

• Perl arrays are dynamic• add or remove entries/element from list is easy

Function Meaning Examplepush to add to end push(@array, "apple")pop to remove from end $pop_val = pop(@array)

unshift add to front unshift(@array, "some value")

shift remove from front $shift_val = shift(@array)

38 / 98

Page 70: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variablesFunctions

• Perl arrays are dynamic• add or remove entries/element from list is easy

Function Meaning Examplepush to add to end push(@array, "apple")pop to remove from end $pop_val = pop(@array)

unshift add to front unshift(@array, "some value")shift remove from front $shift_val = shift(@array)

38 / 98

Page 71: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variablesFunctions

• example with push method

Listing 8 – array.pl�1

2 push @animals , " f ox " ; # the a r r a y i s now l o n g e r3 my $ l eng t h = @animals ;4 p r i n t "The␣ a r r a y ␣now␣ c on t a i n s ␣ $ l e ng t h ␣ e l ement s \←↩

n" ;5 p r i n t " E n t i r e ␣ an ima l s ␣ a r r a y ␣ c o n t a i n s : ␣@animals \←↩

n" ;� �

39 / 98

Page 72: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse array variables

Task P5.1• Experiment with the array functions by adding some new lines toarray.pl.

• Rather than just adding a text string to an array, try to see ifyou can use the push() or unshift() functions to add variables.

• For the shift() and pop() functions, try to see what happens ifyou don’t assign the popped or shifted value to a variable.�

1 my $va l u e = pop ( @ar ray ) ;2 pop ( @ar ray ) ;� �

40 / 98

Page 73: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variablesFrom strings to arrays and back

Function Meaningjoin to create a string from an array

Listing 9 – string-array.pl�1 #!/ u s r / b i n / p e r l2

3 # s t r i n g−a r r a y . p l4

5 use s t r i c t ;6 use warn ings ;7

8 my @gene_names = ( "unc−10" , " cyc−1" , " act−1" , " l e t −7" ,←↩" dyf−2" ) ;

9 my $ j o i n e d_ s t r i n g = j o i n ( " , ␣" , @gene_names ) ;10 p r i n t " $ j o i n e d_ s t r i n g \n" ;� �

41 / 98

Page 74: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variablesFrom strings to arrays and back

Function Meaningjoin to create a string from an array

Listing 10 – string-array.pl�1 #!/ u s r / b i n / p e r l2

3 # s t r i n g−a r r a y . p l4

5 use s t r i c t ;6 use warn ings ;7

8 my @gene_names = ( "unc−10" , " cyc−1" , " act−1" , " l e t −7" ,←↩" dyf−2" ) ;

9 my $ j o i n e d_ s t r i n g = j o i n ( " , ␣" , @gene_names ) ;10 p r i n t " $ j o i n e d_ s t r i n g \n" ;� �

41 / 98

Page 75: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variablesFrom strings to arrays and back

Function Meaningjoin to create a string from an arraysplit to divides a string into an array

42 / 98

Page 76: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variablesFrom strings to arrays and back

Function Meaningjoin to create a string from an arraysplit to divides a string into an array

�1 my $ l i n e=" . . . . . " :2 my $ s e p a r a t o r="\ t " ;3

4 my @col = s p l i t ( $ s epa r a t o r , $ l i n e ) ;� �

43 / 98

Page 77: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variablesFrom strings to arrays and back

Function Meaningjoin to create a string from an arraysplit to divides a string into an array�

1 my $ l i n e=" . . . . . " :2 my $ s e p a r a t o r="\ t " ;3

4 my @col = s p l i t ( $ s epa r a t o r , $ l i n e ) ;� �

43 / 98

Page 78: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse split function

Task P6.1• Running the program string-array.pl• Add the following lines to this program and use the split functionto digest the DNA sequence by the EcoRI enzyme. Print theresult of the digestion.�

1 my $dna = "aaaaGAATTCttttttGAATTCggggggg" ;2 my $EcoRI = "GAATTC" ;� �

44 / 98

Page 79: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variablesSorting

Task P7.1• Create the following program and run it

Listing 11 – sorting-array.pl�1 #!/ u s r / b i n / p e r l2

3 # so r t i n g−a r r a y . p l4 use s t r i c t ;5 use warn ings ;6

7 my@l i s t=("c" , "b" , "a" , "C" , "B" , "A" , "a" , "b" , "c"←↩, 3 , 2 , 1 ) ;#an un so r t ed l i s t

8 my @s o r t e d_ l i s t = s o r t @ l i s t ;9 p r i n t " d e f a u l t ␣ s o r t i n g : ␣ @ s o r t e d_ l i s t \n" ;� �

45 / 98

Page 80: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variablesSorting

Task 7.2• Change the last program with an array of number and run it• Try sorting with the numeric comparison operator <=> andprint the array sorted�

1 @so r t e d_ l i s t = s o r t {$a <=> $b} @ l i s t ;� �

46 / 98

Page 81: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variablesSorting

Task P7.3• Change the last program with an array of number and run it• Try sorting in reverse direction�

1 @so r t e d_ l i s t = s o r t {$b <=> $a} @ l i s t ;� �

47 / 98

Page 82: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variablesThe famous array @ARGV

• When you execute a perl script, you can pass any arguments../detect_SNP.pl bam_directory vcf_file_out

How know which values were passed ?• By using the array @ARGV created automatically by perl• @ARGV holds all the values from the command line.

I If there are no parameters, the array will be empty.I If there is one parameter passed, that value will be the only

element in @ARGV

48 / 98

Page 83: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variablesThe famous array @ARGV

• When you execute a perl script, you can pass any arguments../detect_SNP.pl bam_directory vcf_file_outHow know which values were passed ?

• By using the array @ARGV created automatically by perl• @ARGV holds all the values from the command line.

I If there are no parameters, the array will be empty.I If there is one parameter passed, that value will be the only

element in @ARGV

48 / 98

Page 84: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variablesThe famous array @ARGV

• When you execute a perl script, you can pass any arguments../detect_SNP.pl bam_directory vcf_file_outHow know which values were passed ?

• By using the array @ARGV created automatically by perl

• @ARGV holds all the values from the command line.I If there are no parameters, the array will be empty.I If there is one parameter passed, that value will be the only

element in @ARGV

48 / 98

Page 85: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variablesThe famous array @ARGV

• When you execute a perl script, you can pass any arguments../detect_SNP.pl bam_directory vcf_file_outHow know which values were passed ?

• By using the array @ARGV created automatically by perl• @ARGV holds all the values from the command line.

I If there are no parameters, the array will be empty.I If there is one parameter passed, that value will be the only

element in @ARGV

48 / 98

Page 86: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variablesThe famous array @ARGV

• When you execute a perl script, you can pass any arguments../detect_SNP.pl bam_directory vcf_file_outHow know which values were passed ?

• By using the array @ARGV created automatically by perl• @ARGV holds all the values from the command line.

I If there are no parameters, the array will be empty.

I If there is one parameter passed, that value will be the onlyelement in @ARGV

48 / 98

Page 87: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Array variablesThe famous array @ARGV

• When you execute a perl script, you can pass any arguments../detect_SNP.pl bam_directory vcf_file_outHow know which values were passed ?

• By using the array @ARGV created automatically by perl• @ARGV holds all the values from the command line.

I If there are no parameters, the array will be empty.I If there is one parameter passed, that value will be the only

element in @ARGV

48 / 98

Page 88: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse variables scalar and array

Task P8.1• In the program called average.pl, the average of three numericvariables x, y, z is calculated and the result is printed on thescreen.

• Instead of assigning the variables inside the code, three numericvariables will be given as argument by the user when the scriptwill be executed. Save the script as average-arg.pl.

49 / 98

Page 89: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

What’s a control structure ?

• block of programming that analyzes variables and chooses adirection in which to go based on given parameters.

• Different typesI IF /ELSE : a simple control that tests whether a condition is

true or false

I Loop structure : repeats a bunch of function until it is done.WHILE, FOR, FOREACH

50 / 98

Page 90: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

What’s a control structure ?

• block of programming that analyzes variables and chooses adirection in which to go based on given parameters.

• Different typesI IF /ELSE : a simple control that tests whether a condition is

true or falseI Loop structure : repeats a bunch of function until it is done.

WHILE, FOR, FOREACH

50 / 98

Page 91: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

IF - ELSE

• control expression thatI IF the condition is true, one statement block is executed,I ELSE a different statement block is exected (ifelse.pl).

�1 i f ( c o n t r o l_ e x p r e s s i o n i s TRUE)2 {3 do t h i s ;4 and t h i s ;5 }6 e l s e7 {8 do tha t ;9 and tha t ;

10 }� �

51 / 98

Page 92: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

IF - ELSE

• control expression thatI IF the condition is true, one statement block is executed,I ELSE a different statement block is exected (ifelse.pl).�

1 i f ( c o n t r o l_ e x p r e s s i o n i s TRUE)2 {3 do t h i s ;4 and t h i s ;5 }6 e l s e7 {8 do tha t ;9 and tha t ;

10 }� �51 / 98

Page 93: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

IF - ELSE

• If you want to test mutltiple statements you can combine elseand if to make ’elsif’

�1 i f ( c o n d i t i o n 1 i s TRUE)2 {3 do t h i s ;4 }5 e l s i f ( c o n d i t i o n 2 i s TRUE)6 {7 do tha t ;8 }9 e l s e { do whatever ; } #a l l t e s t s a r e f a i l e d� �

52 / 98

Page 94: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

IF - ELSE

• If you want to test mutltiple statements you can combine elseand if to make ’elsif’�

1 i f ( c o n d i t i o n 1 i s TRUE)2 {3 do t h i s ;4 }5 e l s i f ( c o n d i t i o n 2 i s TRUE)6 {7 do tha t ;8 }9 e l s e { do whatever ; } #a l l t e s t s a r e f a i l e d� �

52 / 98

Page 95: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Numerical comparison operator

Operator Meaning Example== equal to if ( x ==y)!= not equal to if (x ! = y)> greater than if (x >y)< less than if (x <y)>= greater than or equal to if (x >=y)<= less than or equal to if (x <=y)

• Be careful ! ! !I equality is tested with two equals signs !I = used to assign a variable

53 / 98

Page 96: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Numerical comparison operator

Operator Meaning Example== equal to if ( x ==y)!= not equal to if (x ! = y)> greater than if (x >y)< less than if (x <y)>= greater than or equal to if (x >=y)<= less than or equal to if (x <=y)

• Be careful ! ! !I equality is tested with two equals signs !I = used to assign a variable

53 / 98

Page 97: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

String operator

Operator Meaning Exampleeq equal to if ($x eq $y)ne not equal to if ($x ne $y). concatenation $z = $x . $y

54 / 98

Page 98: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Multiple comparisonsLogical/boolean operators

AND• something is overall true only if both of the conditions are true.

Example

• I’ll go if Peter goes AND Paul goes.• "Peter goes" and "Paul goes" must both be true beforeI’ll go

�1 i f ( c o n d i t i o n 1 AND cond i t i o n 2 )2 {3 . . .4 }� �

55 / 98

Page 99: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Multiple comparisonsLogical/boolean operators

AND• something is overall true only if both of the conditions are true.

Example

• I’ll go if Peter goes AND Paul goes.• "Peter goes" and "Paul goes" must both be true beforeI’ll go

�1 i f ( c o n d i t i o n 1 AND cond i t i o n 2 )2 {3 . . .4 }� �

55 / 98

Page 100: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Multiple comparisonsLogical/boolean operators

AND• something is overall true only if both of the conditions are true.

Example

• I’ll go if Peter goes AND Paul goes.• "Peter goes" and "Paul goes" must both be true beforeI’ll go

�1 i f ( c o n d i t i o n 1 AND cond i t i o n 2 )2 {3 . . .4 }� �

55 / 98

Page 101: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Multiple comparisonsLogical/boolean operators

OR• something is overall true if there exists any condition that is true

Example

• I’ll go if Peter goes OR Paul goes.

�1 i f ( c o n d i t i o n 1 OR cond i t i o n 2 )2 {3 . . .4 }� �

56 / 98

Page 102: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Multiple comparisonsLogical/boolean operators

OR• something is overall true if there exists any condition that is true

Example

• I’ll go if Peter goes OR Paul goes.

�1 i f ( c o n d i t i o n 1 OR cond i t i o n 2 )2 {3 . . .4 }� �

56 / 98

Page 103: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Multiple comparisonsLogical/boolean operators

OR• something is overall true if there exists any condition that is true

Example

• I’ll go if Peter goes OR Paul goes.

�1 i f ( c o n d i t i o n 1 OR cond i t i o n 2 )2 {3 . . .4 }� �

56 / 98

Page 104: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse if / elsif / else

Task P9.1• Perform the script average-arg.pl : Print a different messageaccording if the average is lower than 10, or comprised between10 and 14 or greater than 14.

57 / 98

Page 105: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

String matching

• One of the most useful features of Perl and common task

• to find a pattern within another string.• characteristics

I contained in slashes,I matching occurs with the =~operator

Listing 12 – matching.pl�1 i f ( $sequence =~ m/GAATTC/)2 {3 p r i n t "EcoRI ␣ s i t e ␣ found \n" ;4 }5 e l s e6 {7 p r i n t "no␣EcoRI ␣ s i t e ␣ found \n" ;8 }� �

58 / 98

Page 106: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

String matching

• One of the most useful features of Perl and common task• to find a pattern within another string.

• characteristicsI contained in slashes,I matching occurs with the =~operator

Listing 13 – matching.pl�1 i f ( $sequence =~ m/GAATTC/)2 {3 p r i n t "EcoRI ␣ s i t e ␣ found \n" ;4 }5 e l s e6 {7 p r i n t "no␣EcoRI ␣ s i t e ␣ found \n" ;8 }� �

58 / 98

Page 107: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

String matching

• One of the most useful features of Perl and common task• to find a pattern within another string.• characteristics

I contained in slashes,I matching occurs with the =~operator

Listing 14 – matching.pl�1 i f ( $sequence =~ m/GAATTC/)2 {3 p r i n t "EcoRI ␣ s i t e ␣ found \n" ;4 }5 e l s e6 {7 p r i n t "no␣EcoRI ␣ s i t e ␣ found \n" ;8 }� �

58 / 98

Page 108: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

String matching

• One of the most useful features of Perl and common task• to find a pattern within another string.• characteristics

I contained in slashes,I matching occurs with the =~operator

Listing 15 – matching.pl�1 i f ( $sequence =~ m/GAATTC/)2 {3 p r i n t "EcoRI ␣ s i t e ␣ found \n" ;4 }5 e l s e6 {7 p r i n t "no␣EcoRI ␣ s i t e ␣ found \n" ;8 }� �

58 / 98

Page 109: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse matching operators

Task P10.1• Add the following lines and observe what happens when you usethe substitution operator.�

1 $sequence =~ s /GAATTC/gaaTtc / ;2 p r i n t " $sequence \n" ;� �

• Add the following lines and find out what happens to $sequence�1 $sequence =~ s /A/ aden ine / ;2 p r i n t " $sequence \n" ;3 $sequence =~ s /C// ;4 p r i n t " $sequence \n" ;� �

59 / 98

Page 110: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse matching operators

Task P10.2• With the last program, we have only replaced the firstoccurrence of the matching pattern.

• If you wanted to replace all occurrences, add a letter ’g’ to theend of the regular expression (global option)�

1 $sequence =~ s /C//g ;� �

60 / 98

Page 111: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse matching operators

Task P10.3• Add the following lines to the script and try to work out whathappens when you add an ‘i’ to the to matching operator�

1 my $p r o t e i n = "←↩MVGGKKKTKICDKVSHEEDRISQLPEPLISEILFHLSTKDLWQSVPGLD←↩" ;

2 p r i n t " P r o t e i n ␣ c o n t a i n s ␣ p r o l i n e \n" i f ( $ p r o t e i n←↩=~ m/p/ i ) ;� �

61 / 98

Page 112: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse matching operators

Task P10.4• The substitution operator can also be used to count how manychanges are made. Add the following lines to your script.�

1 my $sequence = "AACTAGCGGAATTCCGACCGT" ;2 my $g_count = ( $sequence =~ s /G/G/) ;3 p r i n t "The␣ l e t t e r ␣G␣ occu r s ␣$g_count␣ t imes ␣ i n ␣←↩

$sequence \n" ;� �

62 / 98

Page 113: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse matching operators

Task P10.4

Explanation

• Perl performs the code inside the parentheses first andthis performs the substitution.

• The result is that lots of G->G substitutions are madewhich leaves $sequence unchanged.

• The substitution operator counts how many changes aremade and the count is assigned to a variable (as in thisexample).

63 / 98

Page 114: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Matching operator

Operator Meaning Example=~ m// match if ($seq =~ m/GAATTC/)

!~ m// no match if ($seq !~ m/GAATTC/)=~ s// substitution $seq =~ s/thing/other/

64 / 98

Page 115: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Matching operator

Operator Meaning Example=~ m// match if ($seq =~ m/GAATTC/)!~ m// no match if ($seq !~ m/GAATTC/)

=~ s// substitution $seq =~ s/thing/other/

64 / 98

Page 116: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Matching operator

Operator Meaning Example=~ m// match if ($seq =~ m/GAATTC/)!~ m// no match if ($seq !~ m/GAATTC/)=~ s// substitution $seq =~ s/thing/other/

64 / 98

Page 117: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Project 1 : DNA composition

• At this point, we know enough Perl to write our first usefulprogram.

How to write your program

• Think in terms of input, process and output• Identify what each of these are. For example, input mightbe "three numbers given as argument", variable might be"average", process is "calculate average of x, y, z" andoutput is "return the average".

• Figure out how to turn the algorithm into code. Express"instructions in words/equations" as "operations in code".

65 / 98

Page 118: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Project 1 : DNA composition

• Your program will read a sequence stored in a variable andreport the following :

I The length of the sequenceI The total number of A, C, G and T nucleotidesI The fraction of A,C, G, and T nucleotidesI The GC fraction

66 / 98

Page 119: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

LoopsFor

• generally iterates over integers, usually from zero to some othernumber.

• 3 steps :I initialization : provide some starting value for the loop counter

An initial expression is set up ($count=0)I validation : provide a condition for when the loop should end.

This test expression is tested at each iteration ($count < 10).I update : how should the loop counter be changed in each loop

cycle. ($count ++).�1 f o r ( $ i n i t_ v a l =0; $ i n i t_ v a l < 10 ; $ i n i t_ v a l++)2 {3 do t h i s ;4 do tha t ;5 }� �

67 / 98

Page 120: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

LoopsFor

• generally iterates over integers, usually from zero to some othernumber.

• 3 steps :I initialization : provide some starting value for the loop counter

An initial expression is set up ($count=0)

I validation : provide a condition for when the loop should end.This test expression is tested at each iteration ($count < 10).

I update : how should the loop counter be changed in each loopcycle. ($count ++).�

1 f o r ( $ i n i t_ v a l =0; $ i n i t_ v a l < 10 ; $ i n i t_ v a l++)2 {3 do t h i s ;4 do tha t ;5 }� �

67 / 98

Page 121: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

LoopsFor

• generally iterates over integers, usually from zero to some othernumber.

• 3 steps :I initialization : provide some starting value for the loop counter

An initial expression is set up ($count=0)I validation : provide a condition for when the loop should end.

This test expression is tested at each iteration ($count < 10).

I update : how should the loop counter be changed in each loopcycle. ($count ++).�

1 f o r ( $ i n i t_ v a l =0; $ i n i t_ v a l < 10 ; $ i n i t_ v a l++)2 {3 do t h i s ;4 do tha t ;5 }� �

67 / 98

Page 122: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

LoopsFor

• generally iterates over integers, usually from zero to some othernumber.

• 3 steps :I initialization : provide some starting value for the loop counter

An initial expression is set up ($count=0)I validation : provide a condition for when the loop should end.

This test expression is tested at each iteration ($count < 10).I update : how should the loop counter be changed in each loop

cycle. ($count ++).

�1 f o r ( $ i n i t_ v a l =0; $ i n i t_ v a l < 10 ; $ i n i t_ v a l++)2 {3 do t h i s ;4 do tha t ;5 }� �

67 / 98

Page 123: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

LoopsFor

• generally iterates over integers, usually from zero to some othernumber.

• 3 steps :I initialization : provide some starting value for the loop counter

An initial expression is set up ($count=0)I validation : provide a condition for when the loop should end.

This test expression is tested at each iteration ($count < 10).I update : how should the loop counter be changed in each loop

cycle. ($count ++).�1 f o r ( $ i n i t_ v a l =0; $ i n i t_ v a l < 10 ; $ i n i t_ v a l++)2 {3 do t h i s ;4 do tha t ;5 }� �

67 / 98

Page 124: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse variables scalar and array

Task P10.1• Run the loops-for.pl program• Try looping backwards from 50 to 40• Try skipping 10 by 10, from 0 to 100

68 / 98

Page 125: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse variables scalar and array

Task P10.2• Write a program calculateSum.pl that computes the sum ofintegers from 1 to n, with n is a number given as argument.

69 / 98

Page 126: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse variables scalar and array

Task P10.3• Update the script retrieve-accession.pl :

I several sequence accessions will be given by the user asarguments

I A for loop will be used to parse the array that contains theaccession and to get fasta sequence

70 / 98

Page 127: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse variables scalar and array

Task P10.4• One of the most common operations you will do as aprogrammer is to loop over arrays. To make it interesting, wewill loop over two arrays simultaneously :�

1 my @animals = ( " ca t " , "dog" , "cow" ) ;2 my @sounds = ( "Meow" , "Woof" , "Moo" ) ;3 f o r (my $ i = 0 ; $ i < @animals ; $ i++)4 {5 p r i n t " $ i ) ␣ $an ima l s [ $ i ] ␣ $sounds [ $ i ] \ n" ;6 }� �

• Write and run this program

71 / 98

Page 128: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

While

• to do a series of actions while some condition is true�1 wh i l e ( e x p r e s s i o n i s t r u e )2 {3 do t h i s ;4 do tha t ;5 . . .6 }� �

72 / 98

Page 129: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Perl input/outputHow your program talk to the rest of the "world"

• input : getting information into your program

• output : getting informations out of your programI the terminal by default

Example

• open a file and read its contents• write your results to a file

73 / 98

Page 130: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Perl input/outputHow your program talk to the rest of the "world"

• input : getting information into your program• output : getting informations out of your program

I the terminal by default

Example

• open a file and read its contents• write your results to a file

73 / 98

Page 131: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Perl input/outputHow your program talk to the rest of the "world"

• input : getting information into your program• output : getting informations out of your program

I the terminal by default

Example

• open a file and read its contents• write your results to a file

73 / 98

Page 132: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Reading filesOpening a file

open() function : open a file for reading or writing

�1 open ( IN , MODE, $ f i l eName ) ;� �• Three arguments

I The name of a filehandle : a special variable used to refer to afile once it has been opened

I Open modeI The name of the file to open

74 / 98

Page 133: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Reading filesOpening a file

open() function : open a file for reading or writing�1 open ( IN , MODE, $ f i l eName ) ;� �• Three arguments

I The name of a filehandle : a special variable used to refer to afile once it has been opened

I Open modeI The name of the file to open

74 / 98

Page 134: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Reading filesOpening a file

open() function : open a file for reading or writing

• To read from a file�1 open ( IN , "<" , $ f i l eName ) ;� �

• To create a file an write to a file�1 open ( IN , ">" , $ f i l eName " ) ;� �

• To append to a file�1 open ( IN , ">>" , $ f i l eName " ) ;� �

75 / 98

Page 135: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Reading filesOpening a file

open() function : open a file for reading or writing

• To read from a file�1 open ( IN , "<" , $ f i l eName ) ;� �

• To create a file an write to a file�1 open ( IN , ">" , $ f i l eName " ) ;� �

• To append to a file�1 open ( IN , ">>" , $ f i l eName " ) ;� �

75 / 98

Page 136: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Reading filesOpening a file

open() function : open a file for reading or writing

• To read from a file�1 open ( IN , "<" , $ f i l eName ) ;� �

• To create a file an write to a file�1 open ( IN , ">" , $ f i l eName " ) ;� �

• To append to a file�1 open ( IN , ">>" , $ f i l eName " ) ;� �

75 / 98

Page 137: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Reading filesError management

die function• open returns 0 on failure and a non-zero value on success

• die kills your script safely and prints a message• often used to prevent doing something regrettable - e.g. runningyour script on a file that doesn’t exist or overwriting an existingfile.�

1 open ( IN , "<" , $ f i l eName ) or d i e ( "Can ’ t ␣open␣ the ␣←↩F i l e ␣ $ f i l eName " ) ;� �

76 / 98

Page 138: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Reading filesError management

die function• open returns 0 on failure and a non-zero value on success• die kills your script safely and prints a message• often used to prevent doing something regrettable - e.g. runningyour script on a file that doesn’t exist or overwriting an existingfile.�

1 open ( IN , "<" , $ f i l eName ) or d i e ( "Can ’ t ␣open␣ the ␣←↩F i l e ␣ $ f i l eName " ) ;� �

76 / 98

Page 139: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Reading filesClosing a file

close() function : close file handles

�1 c l o s e ( IN ) ;� �

77 / 98

Page 140: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Reading filesClosing a file

close() function : close file handles�1 c l o s e ( IN ) ;� �

77 / 98

Page 141: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Reading filesReading one line

• In a scalar context, the input operator <> reads one line fromthe specified file handle�

1 $ l i n e = <IN>� �• Note that, in the above examples, $line still contains the trailingnewline character ; it can be removed with the chomp function�

1 chomp $ l i n e ;� �

78 / 98

Page 142: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Writing files

• While is often used to reads one line at a time from one file

Listing 16 – readFasta.pl�1 open ( IN , "<" , $ARGV [ 0 ] ) o r d i e " e r r o r ␣ r e a d i n g ␣$ARGV←↩

[ 0 ] ␣ f o r ␣ r e a d i n g " ;2

3 wh i l e (my $ l i n e=<IN>)4 {5 chomp $ l i n e ;6 i f ( $ l i n e =~ /^>/)7 {8 p r i n t " Sequence ␣ a c c e s s i o n ␣ : ␣ $ l i n e " ;9 }

10 }11 c l o s e IN ;� �

79 / 98

Page 143: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse variables scalar and array

Task P11.1• Update the script retrieve-accession.pl :

I a filename will be given by the user as argumentsI this file contains a list of accessionsI A while loop will be used to read the file

80 / 98

Page 144: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Loops

Takes a list of values and assigns them to a scalar variable, whichexecutes a block of code�

1 f o r e a c h $e lement ( @ l i s t )2 {3 do t h i s ;4 do tha t ; #u n t i l no more $e lement ’ s5 }� �

81 / 98

Page 145: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

HashFirst steps

• similar to the array• instead of indexing with integers, the hash is indexed with text

Listing 17 – hash.pl�1 my %genet i c_code = ( ’ATG ’ => ’Met ’ ,2 ’AAA ’ => ’ Lys ’ ,3 ’CCA ’ => ’ Pro ’ ) ;4

5 p r i n t " $genet i c_code { ’ATG ’}\n" ;� �

82 / 98

Page 146: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

HashFirst steps

• similar to the array• instead of indexing with integers, the hash is indexed with text

Listing 18 – hash.pl�1 my %genet i c_code = ( ’ATG ’ => ’Met ’ ,2 ’AAA ’ => ’ Lys ’ ,3 ’CCA ’ => ’ Pro ’ ) ;4

5 p r i n t " $genet i c_code { ’ATG ’}\n" ;� �

82 / 98

Page 147: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

HashKeys and values

• iterate over the keys with foreach loop and keys() function• keys() : returns an array of keys

Listing 19 – hash.pl�1 f o r e a c h my $key ( keys %genet i c_code )2 {3 p r i n t " $key ␣ $genet i c_code { $key }\n" ;4 }� �

83 / 98

Page 148: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

HashKeys and values

• iterate over the keys with foreach loop and keys() function• keys() : returns an array of keys

Listing 20 – hash.pl�1 f o r e a c h my $key ( keys %genet i c_code )2 {3 p r i n t " $key ␣ $genet i c_code { $key }\n" ;4 }� �

83 / 98

Page 149: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse hash variable

Task P12.1• Write the following program

Listing 21 – hash.pl�1 my %genet i c_code = ( ’ATG ’ => ’Met ’ ,2 ’AAA ’ => ’ Lys ’ ,3 ’CCA ’ => ’ Pro ’ ) ;4

5 p r i n t " $genet i c_code { ’ATG ’}\n" ;6

7 f o r e a c h my $key ( keys %genet i c_code )8 {9 p r i n t " $key ␣ $genet i c_code { $key }\n" ;

10 }� �84 / 98

Page 150: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse values() function

Task P12.2• use the values() function that returns an array of values• Add the following lines to your last program�

1 my @va l s = v a l u e s (%genet i c_code ) ;2 p r i n t " v a l u e s : ␣ @va l s \n" ;� �

85 / 98

Page 151: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

HashAdding, Removing and Testing

• To add one value :�1 $genet i c_code {CCG} = ’ Pro ’ ;2 $genet i c_code {AAA} = ’ Ly s i n e ’ ;� �

86 / 98

Page 152: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

HashAdding, Removing and Testing

• To remove both a key and its associated value from a hash :�1 d e l e t e $genet i c_code {AAA} ;� �

87 / 98

Page 153: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

HashAdding, Removing and Testing

• Sometimes, it’s necessary to test if a particular key already existsin a hash, for example, before overwriting something.

• exists() function�1 i f ( e x i s t s $genet i c_code {AAA})2 {3 p r i n t "AAA␣codon␣ has ␣a␣ va l u e ␣ : ␣ $genet i c_code {←↩

AAA}\n" ;4 }5 e l s e6 {7 p r i n t "No␣ va l u e ␣ s e t ␣ f o r ␣AAA␣codon\n" ;8 }� �

88 / 98

Page 154: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse values() function

Task P12.3• Write instructions adding a new key and value to %genetic_code• Add lines removing a key• Add lines testing if a particular key already exists in a hash

89 / 98

Page 155: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Hash variablesSummary of hash-related functions

Function Meaningkeys %hash return an array of keysvalues %hash return an array of valuesexists $hashkey returns true if the key existsdelete $hashkey removes the key and value from the hash

90 / 98

Page 156: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

EXERCICEProject 2 : Counting codon

• Your program will read a nucleic sequence given by user asargument and report the codon usage for the sequence

• How to write your program ?I what input ?I what output ?I what variables used ? scalar, hashI List of main operationsI List of instructions

91 / 98

Page 157: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Interacting with other program2 ways

• The simplest one is the backticks operator “, the output will bereturned to you in an array or scalar

• The second one is with the system() function

92 / 98

Page 158: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Interacting with other programbackticks operator

Listing 22 – system.pl�1 my @ f i l e s = ‘ l s ‘ o r d i e "Problem␣wi th ␣ l s ␣command" ;2 p r i n t " @ f i l e s \n" ;3 my $ f i l e_coun t = ‘ l s | wc ‘ ;4 p r i n t " $ f i l e_coun t \n" ;� �

93 / 98

Page 159: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse variables scalar and array

Task P13.1• Write and run the script system.pl• Add the directory name to list as argument given by the user

94 / 98

Page 160: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Interacting with other programsystem

�1 system ( " l s ␣" ) == 0 or d i e "Command␣ f a i l e d \n" ;� �

95 / 98

Page 161: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

PracticeUse variables scalar and array

Task P13.2• In the script system.pl, run a blastx analysis on the fasta files(against the bank bank.fasta)

96 / 98

Page 162: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Module

97 / 98

Page 163: An Introduction to Perl for bioinformatics · Perl is NOT suitable for • significantcomputation • sophisticateddata structuresthatuselarge amountsofmemory Perl is suitable for

Keep calm ! It’s your turn !

98 / 98