Welcome to lecture 3: An introduction to programming in PERL

96
Welcome to lecture 3: An introduction to programming in PERL IGERT – Sponsored Bioinformatics Workshop Series Michael Janis and Max Kopelevich, Ph.D. Dept. of Chemistry & Biochemistry, UCLA

description

Welcome to lecture 3: An introduction to programming in PERL. IGERT – Sponsored Bioinformatics Workshop Series Michael Janis and Max Kopelevich, Ph.D. Dept. of Chemistry & Biochemistry, UCLA. Last time…. We covered a bit of material… Try to keep up with the reading – it’s all in there! - PowerPoint PPT Presentation

Transcript of Welcome to lecture 3: An introduction to programming in PERL

Page 1: Welcome to lecture 3: An introduction to programming in PERL

Welcome to lecture 3:An introduction to programming in PERL

IGERT – Sponsored Bioinformatics Workshop SeriesMichael Janis and Max Kopelevich, Ph.D.

Dept. of Chemistry & Biochemistry, UCLA

Page 2: Welcome to lecture 3: An introduction to programming in PERL

Last time…

• We covered a bit of material…• Try to keep up with the reading – it’s all in there!• How’s it coming along?– regex examples? (TATA box, palindrome)…

• > grep -E --color 'TA(TAAA|TAAT|TATT|ATAA|ATAT)' *.fsa

• > grep -E --color '(.)(.).\2\1'

– Using emacs?

– Let’s ignore the long version of the prosite match for now… we’ll deal with that soon…

Page 3: Welcome to lecture 3: An introduction to programming in PERL

Shell scripting is useful, but…

It does not port or scale well; complex data structures may be somewhat challenging. Having said that,

Shell scripting skills have many applications, including:

– Ability to automate tasks, such as • Backups

• Administration tasks

• Periodic operations on a database via cron

• Any repetetive operations on files

– Increase your general knowledge of UNIX • Use of environment

• Use of UNIX utilities

• Use of features such as pipes and I/O redirection

Page 4: Welcome to lecture 3: An introduction to programming in PERL

For bioinformatics, we need a fully featured programming language

There’s a problem with our search of fasta files – can you guess what? We’ll be dealing with this using a programming language with arbitrarily complex data structures

Perl is a scriptable, portable, interpreted and compiled language: – Scriptable and portable and networks well

• The code remains in text format• The code is interpreted and compiled at runtime• The interpreter has been written for use on every (?) platform • Can control a vast number of other devices (files, programs, either local or

remote)

– Drawbacks of the language• Since it’s compiled to C code, it will always run slower than C code• There’s a double edged sword called TMTOWTDI • Not truly OO; not the most elegant language for algorithm implementation

(arguable!)

Page 5: Welcome to lecture 3: An introduction to programming in PERL

PERL: starting point for bioinformatics

• Easy to learn (a bit forgiving)• Easy to process text files; good language for pattern

searching– Most biological file formats are text files– Most sequence analysis tasks deal with pattern finding at some

point

• Easy to run other programs and process their results– Similar to shell programming in this regard!

Page 6: Welcome to lecture 3: An introduction to programming in PERL

Extending the shell: Creating Our Own Commands

• Use programming language to create the new command• We will use perl• TASK: write a PERL program that

– A.) reads a fasta sequence file– B.) reverse complements the sequence– C.) prints the output to STDOUT– D.) Then modify program to write to a file

• 1. Using command line REDIRECTION• 2. Using PERL to open and write to OUTPUT FILE

Page 7: Welcome to lecture 3: An introduction to programming in PERL

PERL vocabulary – similar to bash functionality

• print• chomp• while• open• close• $ARGV[0], $ARGV[1] • $_• if. . .else• =~• /^>/

Page 8: Welcome to lecture 3: An introduction to programming in PERL

PERL vocabulary. . .EXPLAINED

• print works like echo command• chomp removes the ‘newline character’ • while repetitive loop until breaking condition met • open ,used to open a file• close used to close a file• $ARGV[0], $ARGV[1] command line arguments• $_ variable that holds current line from in-file• if. . .else [if true perform a, else perform b]• =~ binding operator (compare text w/ reg. exp)• /^>/ match “>” at very beginning of line ONLY

Page 9: Welcome to lecture 3: An introduction to programming in PERL

Running a perl script

1. Create a file– Specify location of perl

– Write program

2. Make it executable

3. Run it!

Page 10: Welcome to lecture 3: An introduction to programming in PERL

Example: “Hello world!”

• Write the program:

#!/usr/bin/perl

print("Hello, world!\n");

>chmod 744>

>hello.plHello, world!>

Tells the computer to allows the user to read,

write AND execute it. Others can only read it.

The location of PERL

A PERL command

Run the program

The output

• Make it executable:

• Run it:

Page 11: Welcome to lecture 3: An introduction to programming in PERL

Data• Data is stored in variables.• A variable is like a box.• We put values in it.

• There are three ways of storing data:– Scalar variables– Arrays– Hashes

• A single variable (a ‘scalar variable’) can be called anything, but must start with a ‘$’

Page 12: Welcome to lecture 3: An introduction to programming in PERL

Scalar variables: example

#!/usr/bin/perl

$dna = “TGACT”;Print(“$dna\n”); Using it

Defining a variable

>printVariable.plTGACT>

Page 13: Welcome to lecture 3: An introduction to programming in PERL

Scalar variables (cont.)

• PERL doesn’t differentiate between strings (e.g. “Fred”), integers (e.g. “13”) or floating point numbers (e.g. “16.9”).

• If there’s one piece of information, it’s a scalar variable.

• PERL understands the context you’re working in.

Page 14: Welcome to lecture 3: An introduction to programming in PERL

Scalar variables (cont.)

#!/usr/bin/perl

$dna = “TGACT”;print(“$dna\n”);$dna = 11;print($dna+2.”\n”);

Using it

Defining a variable(here it’s a string)

>printVariable.plTGACT13>

Redefine variable Use it in an integer context

Perl worked out what to do

Page 15: Welcome to lecture 3: An introduction to programming in PERL

Limitations of scalar variables

Imagine we want to find the average of a list of numbers• we could do it like this:

program 1$number1 = 5.4;$number2 = 7.3;$number3 = 4.1;$average = ( $number1 + $number2 + $number3 ) / 3;

but this is obviously extremely limited

Page 16: Welcome to lecture 3: An introduction to programming in PERL

Lists

Of course there is a way to make lists in Perl. You can always enclose a list of items in parentheses...

( 5.6, 8.22, 14.9 ); # list of floating point numbers( "hello", "Canada" ); # list of strings( "hello", $country ); # mixed list( "blah", 18, 22, 'x', 3.14 ); # mixed list( 0 .. 5 ); # list of integers between 0 and 5( 'a' .. 'z' ); # list of strings a,b,c,d......

Page 17: Welcome to lecture 3: An introduction to programming in PERL

Array variables

There is a special type of variable in perl which can hold lists - The array

• Perl knows a variable is an array when we use a special character @– Remember, scalars (single valued variables) start with a dollar

($) sign, arrays start with an @ sign.

• Arrays can have as many elements as you need (up to the limits of your available memory, anyway)

@numbers = (5.6, 8.22, 14.9); # list of floating point numbers

Page 18: Welcome to lecture 3: An introduction to programming in PERL

Printing arrays

@words = ("Hello", "Canada!");

print "@words" # prints Hello Canada!

print @words # prints HelloCanada!

• Double quoted strings will print array elements with spaces in between them. – No quotes will print array elements all smashed

together. !

Page 19: Welcome to lecture 3: An introduction to programming in PERL

Accessing array elements

An array wouldn't be very useful if we couldn't look at the individual members of the list.

print "Enter an index number between 0 and 25\n";

$index = <STDIN>;

chomp $index;

@letters = ('A'..'Z'); print "letter index $index = $letters[$index] \n";

What does it mean?

Page 20: Welcome to lecture 3: An introduction to programming in PERL

Accessing array elements

• Arrays are stored in perl's memory in order. – Each position (element) in the array has a number– This number is called the index

• Each element in an array is a single (scalar) value• There is magic syntax for addressing individual array

elements. – This syntax can be a bit bewildering.

• To access an element we type:– $array_name[element_number]

• Elements are numbered starting at zero, not one!!

Page 21: Welcome to lecture 3: An introduction to programming in PERL

Setting the values in an array

Remember ‘ls –1’? We’ll use that here…

@files=`ls –1 *.CEL`; # BACKQUOTE here

- this is an \n separated list- Any delimiter is ok- Any element can be accessed as a scalar and any

function that acts upon a scalar can be introduced ($file=$files[2];)

Page 22: Welcome to lecture 3: An introduction to programming in PERL

Indexing arrays with negative numbers

You can index from the end of an array backwards by using negative numbers:

@letters = ('A'..'Z');

print "last letter = $letters[-1] \n";

print "penultimate letter = $letters[-2] \n";

Page 23: Welcome to lecture 3: An introduction to programming in PERL

Getting the length of an array

• You can use the function scalar to turn an array into a single valued scalar variable; – the value of this variable will be the number of

elements in the array.

@numbers = (0..100);

print scalar(@numbers); # prints 101

Page 24: Welcome to lecture 3: An introduction to programming in PERL

Functions that act on arrays

push

Adds a value (or values) to the end of an array

@numbers = (1, 2, 3);

push(@numbers, 4, 5);

print "@numbers \n"; # prints 1 2 3 4 5

Page 25: Welcome to lecture 3: An introduction to programming in PERL

Functions that act on arrays

pop

Removes a single value from the end of an array

@words = ('the', 'quick', 'brown', 'fox');

print pop(@words); # fox print pop(@words); # brown print pop(@words); # quick

Page 26: Welcome to lecture 3: An introduction to programming in PERL

Functions that act on arrays

shift

Removes a single value from the beginning of an array

@words = ('the', 'quick', 'brown', 'fox');

print shift(@words); # the print shift(@words); # quick

Page 27: Welcome to lecture 3: An introduction to programming in PERL

Functions that act on arrays

unshift

Pushes a value (or values) onto the front of an array

Page 28: Welcome to lecture 3: An introduction to programming in PERL

Functions that act on arrays

reverse

@words = ('the', 'quick', 'brown', 'fox');

print reverse(@words), "\n";

# foxbrownquickthe

Page 29: Welcome to lecture 3: An introduction to programming in PERL

Functions that act on arrays

sort

sort does what you think it does. You give it a list (or array), and it returns a list that is sorted in some way.

@words = ('The', 'quick', 'brown', 'fox', 'jumped');@sorted = sort(@words);

print "sorted words = @sorted\n";

# The brown fox jumped quick

Page 30: Welcome to lecture 3: An introduction to programming in PERL

Functions that act on arrays

join

@words = ('The', 'quick', 'brown', 'fox', 'jumped');

print join("+", @words), "\n";

# The+quick+brown+fox+jumped

You specify what string you want to join with as the first argument. You can use anything.

Page 31: Welcome to lecture 3: An introduction to programming in PERL

Array summary

• An array is a variable that has multiple values simultaneously.

• We refer to the different values using a number called the index.

Page 32: Welcome to lecture 3: An introduction to programming in PERL

Array example

#!/usr/bin/perl

$dna[0] = “TATA”;$dna[1] = “ATG”;print(“$dna[0]\n”);print(“$dna[1]\n”);

Defining different entries of an array

Print them both

>arrayExample.plTATAATG>

Note square brackets enclose index

Page 33: Welcome to lecture 3: An introduction to programming in PERL

What is a hash?

Hashes are similar to arrays in many respects.

Remember, arrays are simple lists stored as a series of elements, and each element has a number (index). The elements are stored in numeric order. It is a bit like a shopping list.

Arrays are limited, in that you need to know which index position contains your value of interest. It might be nice if we could give these index positions names of our choice.

Page 34: Welcome to lecture 3: An introduction to programming in PERL

What is a hash?

Perl has a way to do this, it is called a hash. Perl denotes a hash with a % (percent) sign.

If arrays are shopping lists, hashes are telephone directories. You look up phone numbers by a person's name, not a unique number. They look something like this

%astronomy

value key to get the value: --------------------------------- | 'string' | 'word' | $astronomy{'word'}

Page 35: Welcome to lecture 3: An introduction to programming in PERL

Making a hash

%re_lookup = (

'Eco47III'=> 'AGCGCT', 'EcoNI' => 'CCTNNNNNAGG', 'EcoRI' => 'GAATTC', 'EcoRII' => 'CCWGG', 'HincII' => 'GTYRAC', 'HindII' => 'GTYRAC', 'HindIII' => 'AAGCTT', 'HinfI' => 'GANTC' );

Page 36: Welcome to lecture 3: An introduction to programming in PERL

Accessing a hash

print "Enter restriction enzyme name\n"; $re=<STDIN>;

chomp $re;

$seq = $re_lookup{$re}; if (defined($seq))

{ print "RE sequence for $re is: $seq\n"; } else { print "Sorry, I don't know about \"$re\""; }

Page 37: Welcome to lecture 3: An introduction to programming in PERL

Changing values in a hash

Just like we can change individual elements in an array by referring to them by number, we can change values in a hash by referring to them by their key.

$space{'moon'} = 'Titan';

# change "Luna" to "Titan"

Page 38: Welcome to lecture 3: An introduction to programming in PERL

Useful Hash Functions

The keys function takes a hash as argument and returns a list of keys in that hash

The values function takes a hash as argument and returns a list of values in that hash

Page 39: Welcome to lecture 3: An introduction to programming in PERL

Useful Hash Functions

KEYS

%accession_hash = (

"BACR01A01" => "AC005555",

"BACR48E02" => "AC005577",

"BACR24K17" => "AC005101", );

# get all the keys in the hash

@clones = keys %accession_hash;

print "Clone IDs: @clones\n";

# prints BACR01A01 BACR48E02 BACR24K17

Page 40: Welcome to lecture 3: An introduction to programming in PERL

Useful Hash Functions

VALUES# get all the values in the hash (hash is a lookup for

accessions):

@accs = values %accession_hash;

print "GenBank Accessions: @accs\n";

# prints AC005555 AC005577 AC005101

Page 41: Welcome to lecture 3: An introduction to programming in PERL

Removing elements from a hash

To remove a key value pair from a hash, you can use the delete function

delete $re_lookup{"EcoRI"}

If you just want to delete a value, but keep the key, you could do this:

$re_lookup{"EcoRI"} = “”; # set value to the empty string

Page 42: Welcome to lecture 3: An introduction to programming in PERL

Counting things with a hash

One of the most popular things to do with a hash is to count the number of times something has been seen.

Page 43: Welcome to lecture 3: An introduction to programming in PERL

Counting things with a hash

@things = qw(YOR382W YML383W YML280W);# a list of accession numbers%counting = (); # initialize a hashforeach $item (@things){ $counting{$item}++; # increment the value

associated with the key}foreach $key (keys %counting) { print "$key is found $counting{$key} times \n";}

Page 44: Welcome to lecture 3: An introduction to programming in PERL

Hashes summary

• Hashes are like arrays except instead of a numerical index, we use keys.

• A key can have any value. It can be a string, an integer – anything.

• Until you learn to use hashes, you aren’t really using Perl!

Page 45: Welcome to lecture 3: An introduction to programming in PERL

Hashes: example

#!/usr/bin/perl

$wife{“Fred”} = “Hannah”;$wife{“Bill”} = “Josephine”;print($wife{“Bill”}.”\n”);print($wife{“Fred”}.”\n”);

Defining different entries of the hash

>testHash.plJosephineHannah>

Note curly braces enclose key

Page 46: Welcome to lecture 3: An introduction to programming in PERL

More stuff on variables

• We’ve used the ‘$’ to talk about individual entries for hashes or arrays.

• But referring to the whole array, we use ‘@’.• Referring to the whole hash, we use ‘%’.

Page 47: Welcome to lecture 3: An introduction to programming in PERL

More stuff on variables

• This becomes useful when looking at properties of an entire array or hash

• For example, the length of an array:

#!/usr/bin/perl

$names[0] = “Bill”;$names[1] = “Fred”;$names[2] = “Bartholomew”;

print(scalar(@names).”\n”);

‘@’ means we’re referring to the whole

array

>testScalar.pl3>

Page 48: Welcome to lecture 3: An introduction to programming in PERL

Control structures

• All out programs so far have run from start to finish. Each line has been executed in turn.

• What if we only want to run some lines some of the time?

• This is where control structures come in.

Page 49: Welcome to lecture 3: An introduction to programming in PERL

Control structures

• PERL has a number of control structures.• I’ll talk about four:– if

– while

– for & foreach

• There are others (e.g. unless)

Page 50: Welcome to lecture 3: An introduction to programming in PERL

‘if’ control structure

#!/usr/bin/perl

$name = “Bill”;if ($name eq “Bill”) { print(“The name is Bill!\n”); } else { print(“The name isn’t Bill!\n”); }

>testIf.plThe name is Bill!>

Page 51: Welcome to lecture 3: An introduction to programming in PERL

‘if’ control structure

#!/usr/bin/perl

$name = “Fred”;if ($name eq “Bill”) { print(“The name is Bill!\n”); } else { print(“The name isn’t Bill!\n”); }

>testIf.plThe name isn’t Bill!>

Page 52: Welcome to lecture 3: An introduction to programming in PERL

Perl has great regular expression support

• Usually, we compare two strings of characters using an equality test:

#!/usr/bin/perl

if ($name eq “Bill”) { print(“The name is Bill!\n”); }

Page 53: Welcome to lecture 3: An introduction to programming in PERL

The real world is fuzzier…

• Maybe we want to see if the name is ‘Bill’ OR ‘bill’.

• The if statement would need to be more complex:

#!/usr/bin/perl

if (($name eq “Bill”) || ($name eq “bill”)) { print(“The name is Bill!\n”); }

Page 54: Welcome to lecture 3: An introduction to programming in PERL

This is where regular expressions come in.

• Regular expressions describe generalised patterns of strings instead of exact strings.

• For example, the first problem was:

if (($name eq “Bill”) || ($name eq “bill”)) { print(“The name is Bill!\n”); }

• But can be re-written:

if ($name =~ /[Bb]ill/) { print(“The name is Bill!\n”); }

Page 55: Welcome to lecture 3: An introduction to programming in PERL

Another example…

• The phone number pattern from before (using GREP) problem can also easily be tackled in perl:

• (clearly the pattern syntax is very similar… we only need to specify to perl that the syntatical expression should be a regular expression)– We do this by prepending and appending ‘/’ (forward slashes) to the

expression

if ($number =~ /([0-9]{3} ){0,1}[0-9]{3} [0-9]{4}/) { print(“The number is a valid phone number!\n”); }

Page 56: Welcome to lecture 3: An introduction to programming in PERL

First principles of regex in perl

if ($name =~ /red/) { print(“Name contains the text ‘red’!\n”); }

Variable

Regular expression

Page 57: Welcome to lecture 3: An introduction to programming in PERL

Special characters (metachars)(the following is a review of what we learned for

grep!)‘.’ is a wildcard and matches any character$input = $ARGV[0];if ($input =~ /.ed/) { print(“Yes!\n”); }

>testRegExp.pl bedYes!>testRegExp.pl redYes!>testRegExp.pl head>testRegExp.pl edwardYes!>

Page 58: Welcome to lecture 3: An introduction to programming in PERL

Special characters(‘metacharacters’)

‘*’ means ‘zero or more of the previous character’.

$input = $ARGV[0];if ($input =~ /be*d/) { print(“Yes!\n”); }

>testRegExp.pl bedYes!>testRegExp.pl red>testRegExp.pl beeeedYes!>testRegExp.pl bdYes!>

Page 59: Welcome to lecture 3: An introduction to programming in PERL

Special characters(‘metacharacters’)

‘+’ means ‘one or more of the previous character’.

$input = $ARGV[0];if ($input =~ /be+d/) { print(“Yes!\n”); }

>testRegExp.pl bedYes!>testRegExp.pl red>testRegExp.pl beeeedYes!>testRegExp.pl bd>

Page 60: Welcome to lecture 3: An introduction to programming in PERL

Start and end of line

‘^’ is designates the start of the line, ‘$’ the end.

$input = $ARGV[0];if ($input =~ /bed/) { print(“Yes!\n”); }

>testRegExp.pl bedYes!>testRegExp.pl bedbedYes!>testRegExp.pl xxxbedxxxYes!>

$input = $ARGV[0];if ($input =~ /^bed$/) { print(“Yes!\n”); }

>testRegExp.pl bedYes!>testRegExp.pl bedbed>testRegExp.pl xxxbedxxx>

Page 61: Welcome to lecture 3: An introduction to programming in PERL

Grouping with parentheses

Parentheses group characters

$input = $ARGV[0];if ($input =~ /(bed)+/) { print(“Yes!\n”); }

>testRegExp.pl bedYes!>testRegExp.pl bedbedYes!>testRegExp.pl beddd>

Page 62: Welcome to lecture 3: An introduction to programming in PERL

Character classes

• The square brackets are used to denote whole groups of characters

$input = $ARGV[0];if ($input =~ /[brf]ed/) { print(“Yes!\n”); }

>testRegExp.pl bedYes!>testRegExp.pl redYes!>testRegExp.pl led>

Page 63: Welcome to lecture 3: An introduction to programming in PERL

Character classes (cont)

• A hyphen designates a range:

$input = $ARGV[0];if ($input =~ /[a-z]ed/) { print(“Yes!\n”); }

>testRegExp.pl bedYes!>testRegExp.pl fedYes!>testRegExp.pl Bed>

Page 64: Welcome to lecture 3: An introduction to programming in PERL

Character class shortcuts

• Some character classes are so common there are in-built shortcuts:

– [0-9] = \d– [A-Za-z0-9] = \w– [\f\t\n\r ] = \s

Page 65: Welcome to lecture 3: An introduction to programming in PERL

Negating a character

• ‘^’ negates a character. Note the context determines whether ‘^’ is negation or start-of-line!

$input = $ARGV[0];if ($input =~ /[^b]ed/) { print(“Yes!\n”); }

>testRegExp.pl redYes!>testRegExp.pl bed>

$input = $ARGV[0];if ($input =~ /^bed/) { print(“Yes!\n”); }

>testRegExp.pl red>testRegExp.pl bedYes!>

Page 66: Welcome to lecture 3: An introduction to programming in PERL

Quantifying

• Curly brackets quantify repeats better than ‘*’ (0+) or ‘+’ (1+)

a{3,5} = three, four or five ‘a’’s.

$input = $ARGV[0];if ($input =~ /la{3,5}d/) { print(“Yes!\n”); }

>testRegExp.pl laaaadYes!>testRegExp.pl laaaaaaad>

Page 67: Welcome to lecture 3: An introduction to programming in PERL

Using parentheses as memory

• Remember that parentheses group things? What they match is stored in variables $1, $2, $3…

$input = $ARGV[0];if ($input =~ /^(.*)e(.)$/) { print(“$1\n$2\n”); }

>testRegExp.pl fredfrd>testRegExp.pl bad>

Page 68: Welcome to lecture 3: An introduction to programming in PERL

Interpolating variables• We can place variables inside regular expressions

$input = $ARGV[0];$name = “fred”;if ($input =~ /$name/) { print(“Contains $name!\n”); }

>testRegExp.pl fredContains fred!>testRegExp.pl bill>

Page 69: Welcome to lecture 3: An introduction to programming in PERL

Using regular expressions to substitute parts of strings.

• Another useful thing with regular expressions is to use them to substitute parts of a string for other parts.

• My favourite use: strip trailing backslashes from a path:

$input = $ARGV[0];$input =~ s/\/$//;print(“$input\n”);

>testRegExp.pl /usr/bin/tmp//usr/bin/tmp

Page 70: Welcome to lecture 3: An introduction to programming in PERL

The ‘for’ control structure

• The ‘for’ control structure is ideal for looping through arrays

Page 71: Welcome to lecture 3: An introduction to programming in PERL

For Loops

Consider the standard while loop in pseudocode:

initialization code

while ( Test code ) {

Code to execute in body

} continue {

Update code

}

Page 72: Welcome to lecture 3: An introduction to programming in PERL

For Loops

This can be generalized into the concise for loop:

for ( initialization code; test code; update code ) {

body code

}

Page 73: Welcome to lecture 3: An introduction to programming in PERL

‘for’ example

#!/usr/bin/perl

$name[0] = “Bill”;$name[1] = “Fred”;$name[2] = “Bartholomew”;

For ($nameIndex = 0; $nameIndex < scalar(@name); $nameIndex++) { print(“$name[$nameIndex]\n”); }

>testFor.plBillFredBartholomew>

Page 74: Welcome to lecture 3: An introduction to programming in PERL

Foreach Loop has similar application

foreach will process each element of an array or list:

foreach $loop_variable ('item1','item2','item3') { print $loop_variable,"\n"; }

Page 75: Welcome to lecture 3: An introduction to programming in PERL

‘foreach’ example

#!/usr/bin/perl

$name[0] = “Bill”;$name[1] = “Fred”;$name[2] = “Bartholomew”;

foreach $currentName (@name) { print(“$currentName\n”); }

>testForeach.plBillFredBartholomew>

$currentName is assigned each value in

the array @name in turn.

Page 76: Welcome to lecture 3: An introduction to programming in PERL

Opening files

• We can open other files with our PERL script.

• This is the real strength of PERL: processing text files.

• It’s easy!

Page 77: Welcome to lecture 3: An introduction to programming in PERL

Opening files (cont.)

• To open a file, we need to assign it a ‘file handle’ – this is the unique identifier we use to refer to the file with:

open(INPUTFILE, “names.txt”);

FilehandleThe name of the file we want to open

and assign to the filehandle

close(INPUTFILE);

• When we’re finished, we should close the file:

Page 78: Welcome to lecture 3: An introduction to programming in PERL

While Loops

A while loop has a condition at the top. The code within the body will execute until the code becomes false.

while ( TEST ) { Code to execute } continue { Optional code to execute at the end of each loop }

Page 79: Welcome to lecture 3: An introduction to programming in PERL

The ‘while’ control structure

• The ‘while’ control stucture keeps looping while a given condition is satisfied

#!/usr/bin/perl

while (1 == 1) { print(“This is a really annoying infinite loop\n”); }

>whileTest.plThis is a really annoying infinite loopThis is a really annoying infinite loopThis is a really annoying infinite loopThis is a really annoying infinite loopThis is a really annoying infinite loop

Ad nauseum…

Page 80: Welcome to lecture 3: An introduction to programming in PERL

Combining while loops with opening files

• ‘while’ and open files go together very well:

#!/usr/bin/perl

open(INPUTFILE, “names.txt”);while ($inputLine = <INPUTFILE>) { print(“$inputLine\n”); }close(INPUTFILE);

FredBillBartholomew

(names.txt looks like this)

>whileTest.plFredBillBartholomew>

Page 81: Welcome to lecture 3: An introduction to programming in PERL

split• A good use for regular expressions is to use them to define

delimiting character(s).

• My favorite use: separating tab-delimited lines into an array:

$input = <STDIN>;@lineContents = split(/\t/, $input);Print($lineContents[0].”\n”);

>testRegExp.pl < data.txtXYZ>

X 1Y 3Z 6 (d

ata.

txt)

Page 82: Welcome to lecture 3: An introduction to programming in PERL

Until Loops

Sometimes you want to loop until some condition becomes true, rather than until some condition becomes false. The until loop is easier to read than the equivalent while (!TEST).

my $counter = 5; until ( $counter < 0 ) { print $counter--,"\n"; }

Page 83: Welcome to lecture 3: An introduction to programming in PERL

Executing external programs

• Another strength of PERL is that it can be used to run external programs.

• For example, say we have a C++ program that takes a PDB file and calculates inter-Cα distances, outputting them like this:

1 10 9.23

One CαThe other Cα

Distance between them in angstroms

(tab seperated)

Page 84: Welcome to lecture 3: An introduction to programming in PERL

Example• We could write a PERL script to calculate the

average inter-Cα distances:

#!/usr/bin/perl

$PDBFile = “1a8l.pdb”;@results = `getDistances $PDBFile`;$total = 0;$count = 0;

foreach $line (@results) { chomp; ($carbon1, $carbon2, $distance) = split(/\t/, $line); $total = $total + $distance; $count++; }print(“Average distance: “ . ($total / $count) . “\n”);

These little reverse quotes tell PERL to

execute the program and collect the results in the array ‘@results’

The ‘split’ command splits the line at every

tab.

Page 85: Welcome to lecture 3: An introduction to programming in PERL

Our FASTA pattern problem

• Our problem with pattern matching across FASTA files is the lack of cohesive sequence (it runs across many lines)

• Furthermore, our DNA sequence download only has one strand direction (why? Think programmatically!)

• We need to solve that– To do so, we need to read in the file and choose a data

structure appropriate for our needs

– Which one should we use?

Page 86: Welcome to lecture 3: An introduction to programming in PERL

PERL data stuctures we can use

• $stringName – scalars – strings, perl handles datatype conversions

• @arrayName – arrays – indexed by position, starting at 0

• Function(@arrayName) – manipulation of arrays

• $($array) – scalar conversion of an array element

• % hashes – index non-sequentially (aka “associative arrays”) – we’ll talk

more about these in coming lectures

Page 87: Welcome to lecture 3: An introduction to programming in PERL

Basic concept for our task

Read Command Line ArgumentsOpen Fasta FileWhile open {

– Read each line of Fasta File– If line starts with “>”,

• print to out file

– Else, • reverse complement the line

}Close Fasta File

Use Control StructuresTo Impose Logic

Page 88: Welcome to lecture 3: An introduction to programming in PERL

Emacs commands

• http://sip.clarku.edu/tutorials/intro_emacs.html

• http://www.badgertronics.com/writings/cvs/emacs.html

(in your reading material -> copious emacs cmds)

Page 89: Welcome to lecture 3: An introduction to programming in PERL

Emacs text editor

• Use either term or GUI– (‘> emacs –nw’)

– (‘> emacs’)

• Able to load ASCII and binary files and show metadata (windows conversions)

• Spell check, search, replace (see readings)• Markup language handling for all file types,

formatting (LaTeX, etc.)

Page 90: Welcome to lecture 3: An introduction to programming in PERL

date & version

programdescription

explanationof major steps

place holderfor the

remaining steps

Write Seq File & Program

Page 91: Welcome to lecture 3: An introduction to programming in PERL

WATCH OUT FOR TYPOS!!

/^>/ vs. /^>?

Page 92: Welcome to lecture 3: An introduction to programming in PERL
Page 93: Welcome to lecture 3: An introduction to programming in PERL

Homework problem 3• Finish writing the perl program for reverse complementing a fasta

sequence • Use cat “file_of_fields” | awk . . .

– To reorder the first and last field on each line– To select just the 1st and 5th fields of each line– To select 1st and 5th field and add “human” as a field between the 1st and 5th

fields• Use cat “file_of_fields” | awk . . . | grep . . .

To select only lines containing ‘trans_factors’

Use redirection operator to write the output to a file called “human disease genes”

Estimated time –perl 15 – 90 mins- cat,awk,grep 5 to 15 mins

Page 94: Welcome to lecture 3: An introduction to programming in PERL

Homework Set 4

• Use STDIN instead of command line argument to read file, make the program work using STDIN. (Hint. cat seq.fa | revcomp.pl)

while(<STDIN>) {...

}

(Estimated time: 15 – 60 minutes)

Page 95: Welcome to lecture 3: An introduction to programming in PERL

Homework Set #5

• Modify the output portion of the program to make a 2nd command line argument ($ARGV[1]) provide the name of an output file for the reverse complemented sequence.

• open (OUTPUT, “>$out_put_file_name”);• print OUTPUT “$_\n”;• close (OUTPUT);

(Estimated time: 15 – 60 mins)

Page 96: Welcome to lecture 3: An introduction to programming in PERL

Important Advice!!!

• Save your program frequently!!• cp revcomp.pl revcomp_BKUP.pl• Save intermediate versions– cp revcomp.pl revcomp_STDIN.pl

– cp revcomp.pl revcomp_FILEOUT.pl

– Etc……