1 An Introduction to Perl Part 1 CSC8304 – Computing Environments for Bioinformatics - Lecture 7.
Perl for Bioinformatics Part 2
-
Upload
tatyana-blanchard -
Category
Documents
-
view
49 -
download
2
description
Transcript of Perl for Bioinformatics Part 2
Perl for BioinformaticsPart 2
Stuart Brown
NYU School of Medicine
Sources
• Beginning Perl for Bioinformatics– James Tisdall, O’Reilly Press, 2000
• Using Perl to Facilitate Biological Analysis in Bioinformatics: A Practical Guide (2nd Ed.)– Lincoln Stein, Wiley-Interscience, 2001
• Introduction to Programming and Perl– Alan M. Durham, Computer Science Dept., Univ. of São Paulo, Brazil
Debugging
• Hopefully you were lucky enough to have some bugs in your programs from the first Perl exercise.
• Test each line as you write – insert extra print statements to check on
variables
Perl Debugging Help
• Add -w on the first line of your programs:
#!usr/local/perl -w– provides ‘warnings’
• Add use strict as the 2nd line of your programs– enforces proper variable names– must initialize variables before using
(set to some initialvalue such as 0 or empty)
Variable “Interpolation”• A variable holds a value $value = 6;• When you print the variable, Perl gives the value
rather than the name of the variable.print $value;
6 • If you put a variable inside double quotes, Perl
substitutes the value (this is called variable interpolation)print “The result is $value\n”
The result is 6• If you use single quotes, the variable name is used
(interpolation is not used) print ‘The result is $value\n’
The result is $value\n
Input
• A Perl program can take input from the keyboard– The angle bracket operator (<>)takes input– Usually this is assigned to a variable
print “Please type a number: ”;
$num = <>;
print “Your number is $num\n”;
chomp• When data is entered from the keyboard, Perl waits for the
Enter key to be typed
• But the string which is captured includes a newline (carriage return) at its end
• Perl uses the function chomp to remove the newline character:
print “Enter your name: ”;
$name = <>;
print “Hello $name, happy to meet you!\n”;
chomp $name;
print “Hello $name, happy to meet you!\n”;
Working with Text Files
• To do real work, Perl has to read data out of text files and write results into output files
• This is done in two steps
• First, you must give the file a name within the script - this is known as a filehandle
• Use the open command:
open FILE1, ‘/u/schmoj01/Seqs/protein1.seq’;
Read From the File
• Once the file is open, you can read from it using the <> operator – (put the filehandle between the angle brackets)
• Perl reads files one line at a time, each time you input data from the file, the next line is read:
open FILE1, ‘/u/prot1.seq’;$line1 = <FILE1>;chomp $line1;$line2 = <FILE1>;
…etc
Write to a File
• Writing to a file is similar to reading from it
• Use the > operator to open a file for writing:
open FILE1, ‘>/u/prot1.seq’;
• This creates a new file with that name, or overwrites an existing file
• Use >> to append text to an existing file• print to the file using the filehandle:
print FILE1 $data1;
Making Decisons
• Useful programs must be able to make some decisions on their own
• The if operator is very powerful
• It is generally used together with numerical or string comparison operators
numerical: ==, !=, >, <, ≥, ≤
strings: eq, ne, gt, lt, ge, le
True/False
• Perl relies on the concept of True/False decisions.
• Things are true if the math works.
• The not operator ! reverses it
print “positive number” if ! ($a < 0);
Conditional Blocks• An if test can be used to control multiple lines
of commands:print “Enter your age: ”;$age = <>;chomp $age;if ($age < 21) { print “You are too young for this kind of work!\n”; die “too young”;
}print “You are old enough to know better!\n”;
• If the test is true, execute all the command lines inside the {} brackets. If not, then go on past the closing } to the statements below.
• If evaluates some statement in parentheses (must be true or false)
• Note: conditional block is indented– Perl doesn’t care about indents, but it makes your
code more human readable
• die is a special function - stops your script and prints its message– Often used to test if keyboard input data is valid
or if an input file exists.
Else & Elseif• Instead of just letting the script go on if it fails the if
test, you can designate a second block of code for the “or else” condition
• You can also perform multiple tests using elseifif $A = 10 {
print “yadda yadda”; # do some stuff} elseif $A > 10 {
print “yowsa yowsa”; # do different stuff} elseif $A < 10 {
print “do this other stuff”;} else $A {
print “if it ain\’t =, >, or <, then I’m stumped”die “not a number”;
}
Loops• OK, we’ve got variables, input & output and
decisions. Now we need Loops.
• Loops test a condition and repeat a block of code based on the result– while loops repeat while the condition is true
$count = 1;while ($count <= 10) {
print “$count bottles of pop\n”;$count = $count +1;
};print “POP!\n”;
[Try this program yourself]
Read a File: line by line
open FILE1, ‘/u/doej01/prot1.seq’;while ($line = <FILE1>){ chomp($line);
$my_sequence = $my_sequence .
$line;};close FILE1
• Dumps the whole file into the variable $my_sequence
Arrays• It is awkward to store a large DNA sequence in
one variable, or to create many variables for a list of numbers
• Perl has a type of variable called an “array” that can store a list of data– multiple lines of a text file– a list of numbers– a list of words
• Array variables are referred to with an “@” symbol
@numbers = (1,2,45,234,11);
Bioinformatics Uses Arrays
• bioinformatics data often comes in the form of arrays– tab delimited lists– multi-line text files
• Arrays are handy because the entries are indexed– You can grab the third number directly
@numbers = (1, 2, 45, 234, 11);print “$numbers[3]\n”;
234#Note - the index starts with zero!
Read a File into an Array
• Rather than read a file one line at time into a scalar variable, it is often helpful to read the entire file into an array
open FILE1, ‘/u/doej01/prot1.seq’;@DNA = <FILE1>;
• join combines the elements of an array into a single scalar variable (a string)
$DNA = join('', @DNA);
• substr takes characters out of a string
$letter = substr($DNA, $position, 1)
join & substr
which string where in the string
how many letters to take
which arrayspacer(empty here)
Exercise
• Read a DNA sequence from a text file
• Calculate the %GC content
• What about non-DNA characters in the file?– carriage returns and blank spaces– N’s or X’s or unexpected letters
• Write the output to the screen and to a file – use append so that the file will grow as you run
this program on additional sequences