Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities...

125
Perl Introduction

Transcript of Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities...

Page 1: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Perl

Introduction

Page 2: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Why Perl?

• Widely used scripting language• Powerful text manipulation capabilities• Relatively easy to use• Has a wide range of libraries available• Fast• Good support for file and process operations

Page 3: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Less suiteable for:

• Building large and complex applications– Java, C\C++, C#

• Applications with a GUI– Java, C\C++, C#

• High performance/memory efficient applications– Java, C\C++, C#, Fortran

• Statistics– R

Page 4: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Learning to script

Knowledge + Skills

Page 5: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Exercise

Determine the percentage GC-content of the human chromosome 22

Page 6: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

open file

read linesper line:

skip if header line

count Cs and Gs

count all nucleotides

report percentage Cs and Gs

Page 7: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Hello World

Page 8: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Hello World….

Simple line of Perl code:print "Hello World";

Run from a terminal:perl -e 'print "Hello World";'

Now try this and notice the difference:perl -e 'print "Hello World\n";'

Page 9: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

\n

“backslash-n”newline character

'Enter'key

Page 10: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

\t

“backslash-t”'Tab' key

Page 11: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Hello World (cont)

To create a text file with this line of Perl code:echo 'print "Hello World\n";' > HelloWorld.pl

perl HelloWorld.pl

In the terminal window, type kate HelloWorld.pl

and then hit the enter key. Now you can edit the Perl code.

Page 12: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Pythagoras' theorem

a2 + b2 = c2

32 + 42 = 52

Page 13: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Pythagoras.pl

$a = 3;

$b = 4;

$a2 = $a * $a;

$b2 = $b * $b;

$c2 = $a2 + $b2;

$c = sqrt($c2);

print $c;

Page 14: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

$a

a single value or scalar variable starts with a $ followed by its name

Page 15: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Pythagoras.pl

$a = 3;

$b = 4;

$a2 = $a * $a;

$b2 = $b * $b;

$c2 = $a2 + $b2;

$c = sqrt($c2);

print $c;

Page 16: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

5

Page 17: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Perl scripts

Add these lines at the top of each Perl script:

#!/usr/bin/perl

# author:

# description:

use strict;

use warnings;

Page 18: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

perl Pythagoras.pl

Global symbol "$a2" requires explicit package name at Pythagoras.pl line 8.

Global symbol "$b2" requires explicit package name at Pythagoras.pl line 9.

Global symbol "$c2" requires explicit package name at Pythagoras.pl line 10.

Global symbol "$a2" requires explicit package name at Pythagoras.pl line 10.

Global symbol "$b2" requires explicit package name at Pythagoras.pl line 10.

Global symbol "$c" requires explicit package name at Pythagoras.pl line 11.

Global symbol "$c2" requires explicit package name at Pythagoras.pl line 11.

Global symbol "$c" requires explicit package name at Pythagoras.pl line 12.

Execution of Pythagoras.pl aborted due to compilation errors.

Page 19: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Pythagoras.pl

$a = 3;

$b = 4;

$a2 = $a * $a;

$b2 = $b * $b;

$c2 = $a2 + $b2;

$c = sqrt($c2);

print $c;

Page 20: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Pythagoras.pl

my $a = 3;

my $b = 4;

my $a2 = $a * $a;

my $b2 = $b * $b;

my $c2 = $a2 + $b2;

my $c = sqrt($c2);

print $c;

Page 21: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

my

The first time a variable appears in the script, it should be claimed using

‘my’. Only the first time...

Page 22: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Pythagoras.pl

my($a,$b,$c,$a2,$b2,$c2);

$a = 3;

$b = 4;

$a2 = $a * $a;

$b2 = $b * $b;

$c2 = $a2 + $b2;

$c = sqrt($c2);

print $c;

Page 23: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Pythagoras.pl

$a = 3;

$b = 4;

$a2 = $a * $a;

$b2 = $b * $b;

$c2 = $a3 + $b2;

$c = sqrt($c2);

print $c;

Page 24: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

4

Page 25: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Pythagoras.pl

$a = 3;

$b = 4;

$a2 = $a * $a;

$b2 = $b * $b;

$c2 = $a3 + $b2;

$c = sqrt($c2);

print $c;

Page 26: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Pythagoras.pl

my $a = 3;

my $b = 4;

my $a2 = $a * $a;

my $b2 = $b * $b;

my $c2 = $a3 + $b2;

my $c = sqrt($c2);

print $c;

Page 27: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

perl Pythagoras.pl

Global symbol "$a3" requires explicit package name at Pythagoras.pl line 10.

Execution of Pythagoras.pl aborted due to compilation errors.

Page 28: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Text or numberVariables can contain text (strings) or numbers

my $var1 = 1;my $var2 = "2";my $var3 = "three";

Try these four statements:print $var1 + $var2; print $var2 + $var3;print $var1.$var2;print $var2.$var3;

Page 29: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Text or numberVariables can contain text (strings) or numbers

my $var1 = 1;my $var2 = "2";my $var3 = "three";

Try these four statements:print $var1 + $var2; => 3print $var2 + $var3; => 2print $var1.$var2; => 12print $var2.$var3; => 2three

Page 30: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

variables can be added, subtracted, multiplied, divided and modulo’d with:

+ - * / %

variables can be concatenated with:.

Page 31: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

sequence.plprint "Please type a DNA sequence: ";

#this is a comment line#Read a line from the standard input (keyboard)my $DNAseq = <STDIN>;

#Remove the newline (Enter) from the typed textchomp($DNAseq);

#Get the length of the text(DNA sequence)my $length = length($DNAseq);print "It has $length nucleotides\n";

Page 32: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

sequence.plprint "Please type a DNA sequence: ";

#this is a comment line#Read a line from the standard input (keyboard)my $DNAseq = <STDIN>;

#Remove the newline (Enter) from the typed textchomp($DNAseq);

#Get the length of the text(DNA sequence)my $length = length($DNAseq);print "It has $length nucleotides\n";

Program flow is top - down

Page 33: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

<STDIN>

read characters that are typed on the keyboard. Stop after the Enter key is

pressed

Page 34: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

<>

same, STDIN is the default and can be left out. This is a recurring and

confusing theme in Perl...

Page 35: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

sequence.plprint "Please type a DNA sequence: ";

#this is a comment line#Read a line from the standard input (keyboard)my $DNAseq = <>;

#Remove the newline (Enter) from the typed textchomp($DNAseq);

#Get the length of the text(DNA sequence)my $length = length($DNAseq);print "It has $length nucleotides\n";

Page 36: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

$output = function($input)

input and output can be left outparentheses are optional

Page 37: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

$coffee = function($beans,$water)

Page 38: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

sequence2.pl

print "Please type a DNA sequence: ";

my $DNAseq = <>;

chomp($DNAseq);

#Get the first three characters of $DNAseq

my $first3bases = substr($DNAseq,0,3);

print "The first 3 bases: $first3bases\n";

Page 39: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

$frag = substr($text, $start, $num)

Extract a fragment of string $text starting at $start and with $num characters.

The first letter is at position 0!

Page 40: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

perldoc

perldoc -f substr substr EXPR,OFFSET,LENGTH,REPLACEMENT substr EXPR,OFFSET,LENGTH substr EXPR,OFFSET Extracts a substring out of EXPR and

returns it. First character is at offset 0, .....

Page 41: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

print

perldoc -f print print FILEHANDLE LIST print LIST print Prints a string or a list of strings.

If you leave out the FILEHANDLE, STDOUT is the destination: your terminal window.

Page 42: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

print

In Perl items in a list are separated by commasprint "Hello World","\n";

Is the same as:print "Hello World\n";

Page 43: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

sequence3.pl

print "Please type a DNA sequence: ";

my $DNAseq = <>;

chomp($DNAseq);

#Get the second codon of $DNAseq

my $codon2 = substr($DNAseq,3,3);

print "The second codon: $codon2\n";

Page 44: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

if, else, unless

Page 45: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

sequence4.pl

print "Please type a DNA sequence: ";

my $DNAseq = <>;

chomp($DNAseq);

#Get the first three characters of $DNAseq

my $codon = substr($DNAseq,0,3);

if($codon eq "ATG") {

print "Found a start codon\n";

}

Page 46: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Conditional execution

if ( condition ) { do something

}

if ( condition ) {do something

} else {do something else

}

Page 47: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Conditional executionif ( $number > 10 ) {print "larger than 10";

} elsif ( $number < 10 ) {print "smaller less than 10";

} else {print "number equals 10";

}

unless ( $door eq "locked" ) {openDoor();

}

Page 48: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Conditions are true or false

1 < 10 : true21 < 10 : false

Page 49: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Comparison operators

Numeric test String test Meaning== eq Equal to!= ne Not equal to> gt Greater than

>= ge Greater than or equal to< lt Less than

<= le Less than or equal to<=> cmp Compare

Page 50: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Examples

if ( 1 == 1 ) { # TRUE

if ( 1 == 2 ) { # FALSE

if ( 1 != 2 ) { # TRUE

if ( -1 > 10 ) { # FALSE

if ( "hi" eq "dag" ) { # FALSE

if ( "hi" gt "dag" ) { # TRUE

if ( "hi" == "dag" ) { # TRUE !!!

The last example may surprise you, as "hi" is not equal to "dag" and therefore should evaluate to FALSE. But for a numerical comparison they are both 0.

Page 51: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

numbers as conditions

0 : falseall other numbers : true

Page 52: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Numbers as conditions

if ( 1 ) { print "1 is true";

}

if ( 0 ) { print "this code will not be reached";

}

if ( $open ) { print "open is not zero";

}

Page 53: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

repetition

Page 54: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

sequence5.pl

print "Please type a DNA sequence: ";

my $DNAseq = <>;

chomp($DNAseq);

#Get all codons of $DNAseq

my $position = 0

while($position < length($DNAseq)) {

my $codon = substr($DNAseq,$position,3);

print "The next codon: $codon\n";

$position = $position + 3;

}

Page 55: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

the while loop

while ( condition ) {

do stuff

}

my $i = 0;

while ($i < 10) {

$i = $i + 1;

}

print $i;

Page 56: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

$i = $i + 1

First the part to the right of the assignment operator ‘=‘ is calculated, then the result is moved to the left.

Page 57: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

$i += 1

Same result as previous slide.

Page 58: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

$i++

Same as result previous slide, increments $i with 1.

Page 59: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

++$i

Same as previous, but compare:print $i++;print ++$i;

Page 60: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Exercise: Fibonacci numbers

Write a script that calculates and prints all Fibonacci numbers below one thousand.

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, etc.

Fn = Fn-1+ Fn-2

F0 = 0, F1 = 1

Page 61: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

sequence5.pl

print "Please type a DNA sequence: ";

my $DNAseq = <>;

chomp($DNAseq);

#Copy the sequence to a new variable

my $asDNAseq = $DNAseq;

#'translate' a->t, c->g, g->c, t->a

$asDNAseq =~ tr/acgt/tgca/;

print "Complementary strand:\n$asDNAseq\n";

Page 62: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

$asDNAseq =~ tr/acgt/tgca/;

=~ is a binding operator and means: perform the following action on this variable.

The operation tr/// translates each character from the first set of characters into the corresponding character in the second set:

acgt

||||

tgca

Page 63: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Counting

tr/// can also be used to count characters. If the second part is left empty, no translation takes place.

$numberOfNs = ($DNASeq =~ tr/N//);

Page 64: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

'automatic' typing

using a pipe "|":echo ggatcc | perl sequence5.pl

or redirect using "<":perl sequence5.pl < sequence.txt

Page 65: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Exercise 1.

Create a program that reads a DNA sequence from the keyboard, and reports the sequence length and the G/C content of the sequence (as a fraction)

Page 66: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

perltidy

program that properly formats your perl scriptIndentation, spaces, etc.

perltidy yourscript.pl

Result is in:yourscript.pl.tdy

Page 67: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

@months

a list variable or array starts with an @ followed by its name

0

1

2

3

Page 68: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Arrays

my @fibonacci = (0,1,1,2);

print @fibonacci;

print $fibonacci[3];

$fibonacci[4] = 3;

$fibonacci[5] = 5;

$fibonacci[6] = 8;

Page 69: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

@fibonacci

0

1

2

3

0

1

1

2

Page 70: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Arrays

my @hw = ("Hello ","World","\n");

print @hw;

my @months = ( "January",

"February",

"March");

Page 71: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Arrays

To access a single element of the list use the array name with $ instead of the @ and append the position of the element in: [ ]

print $months[1];February

$hw[1] = "Wur";

print @hw;

Page 72: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Arrays

To find the index of the last element in the list:print $#months;

2

To find the number of elements in an array:print $#months + 1;

or:print scalar(@months);

Page 73: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Arrays

Note: like many programming languages, the index of the first item in an array is not 1, but 0!

Note: $months ≠ $months[0] !!!

Page 74: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Growing and shrinking arrays

push: add an item to the end of the listpop: remove an item from the end of the listshift: remove an item from the start of the listunshift:add an item to the start of the listsplice: insert/remove one or more items

@out = splice(@array, start, length, @in);

Page 75: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

@numbers

index 0 1 2 3 4

value 1 2 3 4 5

Page 76: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

$last = pop(@numbers);

0 1 2 3 4

1 2 3 4 5

$last

Page 77: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

$last = pop(@numbers);

0 1 2 3

1 2 3 4

5

$last

Page 78: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

push(@numbers, 6);

0 1 2 3

1 2 3 4

6

Page 79: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

push(@numbers, 6);

0 1 2 3 4

1 2 3 4 6

6

Page 80: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

$first = shift(@numbers);

0 1 2 3 4

1 2 3 4 6

$first

Page 81: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

$first = shift(@numbers);

0 1 2 3

2 3 4 6

1

$first

Page 82: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

unshift(@numbers,7);

7

0 1 2 3

2 3 4 6

Page 83: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

unshift(@numbers,1);

0 1 2 3 4

7 2 3 4 6

7

Page 84: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

@out = splice(@numbers,2,1,8,9);

0 1 2 3 4

7 2 3 4 6

8 9

0

@out

Page 85: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

@out = splice(@numbers,2,1,8,9);

0 1 2 3 4 5

7 2 8 9 4 6

8 9

03

@out

Page 86: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

my ($x,$y,$z) = @coordinates;

Page 87: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

my @words = split(" ", "Hello World");

$words[0] = "Hello"$words[1] = "World"

Page 88: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

More loops

my @plantList = ("rice", "potato", "tomato");

print $plantList[0];

print $plantList[1];

Print $plantList[2];

Or:

foreach my $plant (@plantList) {

print $plant;

}

Page 89: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Loopsforeach variable ( list ) {

do something with the variable}

foreach my $i ( @lotto_numbers ) {print $i;

}

foreach my $i ( 1 .. 10, 20, 30 ) {print $i;

}

Page 90: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Loopsfor variable ( list ) {

do something with the variable}

for my $i ( 1, 2, 3, 4, 5 ) {print $i;

}

for my $i ( 1 .. 10, 20, 30 ) {print $i;

}

Page 91: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Loops

while ( condition ) {

do something

}

my $i = 0;

while ($i < 10) {

print "$i < 10\n";

$i++;

}

Page 92: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Loops

for ( init; condition; increment ) {

do something

}

for (my $i = 0; $i < 10; $i++) {

print "$i < 10\n";

}

Page 93: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Loops

my $i = 0;

while ($i < 10) {

print "$i < 10\n";

$i++;

}

for (my $i = 0; $i < 10; $i++) {

print "$i < 10\n";

}

Page 94: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Exercise

Write a script that reverses a DNA sequence use an array

Hint: Splitting on an empty string "" splits after every character.@sequence = split("",$sequence);

Page 95: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

%phonebook

a hash table variable starts with a % followed by its name

Name Box

Crick 3

Franklin 1

Watson 0

Wilkins 2

0

1

2

3

Page 96: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Hash tables

Also called lookup tables, dictionaries or associative arrays

key/value combinations: keys are text, values can be anything

%month_days = ("January" => 31,"February" => 28,"March" => 31 );

Page 97: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Hash tables

To access a value in the hash table, use the hash table name with $ instead of the % and append the key between { }

$month_days{"February"} = 29;

print $month_days{"January"}; 31

Page 98: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Hash tables

The 'keys' function returns an list with the keys of the hash table. There is also a 'values' function.

@month_list = keys(%month_days);

# ("January", "February", "March")

Page 99: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Hash tablesmy %latin_name=(

"rice" => "Oryza sativa","potato" => "Solanum tuberosum"

)

foreach my $common_name (keys(%latin_name)){print "$common_name: " ;print "$latin_name{$common_name}\n";

}rice: Oryza sativapotato: Solanum tuberosum

Page 100: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Hash tables

The keys have to be unique, the values do not.

The order of elements in a hash table is not reliable, first in is not necessarily first out.

You can use 'sort' to get the keys in an alphabetically ordered list:@sorted = sort(keys(%latin_name));

Page 101: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Exercise

Create a hash table with codons as keys and the corresponding amino acids as the values

Hint: search for the standard genetic code in the "genetic code" database at: http://srs.bioinformatics.nl/Use the three lines for the first, second and third base and the line for the corresponding AA.

Page 102: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

I/O: Input and Output

reading and writing files

Page 103: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Reading and writing files

open FASTA, "sequence.fa";

my $firstLine = <FASTA>;

my $secondLine = <FASTA>;

close FASTA;

Page 104: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Reading and writing files

Files need to be opened before use

Page 105: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Reading and writing files

Perl uses so-called “file handles” to attach to files for reading and writing

Page 106: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

file

file handle

Page 107: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Opening files

Generalopen FileHandle, "mode", "filename"

Open for reading:open LOG, "<", "/var/log/messages";open LOG, "/var/log/messages";

Open for writing:open WRT, ">", "newfile.txt";

Open for appending:open APP, ">>", "existingfile.txt";

Page 108: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Defensive programming

my $fastaName = "sequence.fa";

open FASTA, $fastaName or

die "cannot open $fastaName\n";

Page 109: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Reading from a file

reading from an open file via the filehandle:

$firstLine = <FASTA>;

$secondLine = <FASTA>;

@otherLines = <FASTA>;

Page 110: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

<FASTA>

Reads one line if the result goes into a scalar$line = <FASTA>;

Reads all (remaining) lines if the result goes into an array

@lines = <FASTA>;

file handles 'remember' the position in the file

Page 111: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Standard in and standard out

The keyboard and screen also have 'file' handles, remember STDIN and STDOUT

read from the keyboard:$DNAseq = <STDIN>;

write to the screen:print STDOUT "Hello World\n";

Page 112: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Reading a file

open FASTA, "sequence.fa" or die;

my $sequence = "";

while (my $line = <FASTA>) {

chomp($line);

$sequence .= $line;

}

close FASTA;

print $sequence,"\n";

Page 113: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

(my $line = <FASTA>)also is a condition

true: line could be readfalse: EOF, end of file

Page 114: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Identical?

while (my $line = <FASTA>) {

print $line;

}

for my $line (<FASTA>) {

print $line;

}

Page 115: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Not completely

Read line by line:while (my $line = <FASTA>) {

print $line;

}

First read complete file into computer memory:for my $line (<FASTA>) {

print $line;

}

Page 116: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Writing to a file

open RANDOM, ">", "Random.txt";

for(1..50) {

my $random = rand(6);

print RANDOM "$random\n";

}

close RANDOM;

Page 117: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Writing to a file

open RANDOM, ">", "Random.txt";

for(1..50) {

my $rnd = rand(6);

$rnd = sprintf("%d\n",$rnd + 1);

print RANDOM $rnd;

}

close RANDOM;

Page 118: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Closing the file

close filehandle;

close FASTA;

A file is automatically closed if you (re)open a file using the same filehandle, or if the Perl script is finished.

Page 119: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Minimalistic Perl

open FASTA, "sequence.fa" or die;

my $sequence = "";

while (my $line = <FASTA>) {

chomp($line);

$sequence .= $line;

}

close FASTA;

print $sequence,"\n";

Page 120: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Minimalistic Perl

open FASTA, "sequence.fa" or die;

my $sequence = "";

while (<FASTA>) {

chomp;

$sequence .= $_;

}

close FASTA;

print $sequence,"\n";

Page 121: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

$_

default scalar variable, if no other variable is given. But only in selected

cases...

Page 122: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Minimalistic Perl

open FASTA, "sequence.fa" or die;

my $sequence = "";

while (<FASTA>) {

chomp;

$sequence .= $_;

}

close FASTA;

print $sequence,"\n";

Page 123: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Minimalistic Perl

open FASTA, "sequence.fa" or die;

my $sequence = "";

while ($_ = <FASTA>) {

chomp($_);

$sequence .= $_;

}

close FASTA;

print $sequence,"\n";

Page 124: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Exercises

2. Adapt the G/C script so multiple sequences in FASTA format are read from a file

3. Modify the script to process a file containing any number of sequences in EMBL format

4. Now let the program generate the reverse complement of the sequence(s), and report sequence length and G/C content

Page 125: Perl Introduction. Why Perl? Widely used scripting language Powerful text manipulation capabilities Relatively easy to use Has a wide range of libraries.

Exercises

5. Use the rand function of Perl to shuffle the nucleotides of the input sequence, while maintaining sequence composition; again report sequence length and G/C content