Practical Extraction and Report Language...of C, sed, awk, and sh, so people familiar with those...
Transcript of Practical Extraction and Report Language...of C, sed, awk, and sh, so people familiar with those...
Page 1VI, March 2005
Practical Extraction and Report Language
« Perl is a language of getting your job done »
Larry Wall
« There is more than one way to do it »
Page 2VI, March 2005
Practical Extraction and Report Language
http://perl.oreilly.com
" Perl is both a programming languageand an application on your computer that runs those programs "
Page 3VI, March 2005
Perl history
1969 UNIX was born at Bell Labs.
1970 Brian Kernighan suggested the name "Unix" and the operating system we know today was born.
1972 The programming language C is born at the Bell Labs (C is one of Perl's ancestors).
1973 “grep” is introduced by Ken Thompson as an external utility: Global REgular expression Print.
1976 Steven Jobs and Steven Wozniak found Apple Computer (1 April).
1977 The computer language awk is designed by Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan (awk is one of Perl's ancestors).
A few dates:
Page 4VI, March 2005
Perl history
1987 Perl 1.000 is unleashed upon the world
NAME perl | Practical Extraction and Report Language
SYNOPSIS perl [options] filename args
DESCRIPTION Perl is a interpreted language optimized for scanning arbitrary textfiles, extracting information from those text files, and printing reports based on thatinformation. It's also a good language for many system management tasks. The languageis intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny,elegant, minimal). It combines (in the author's opinion, anyway) some of the best featuresof C, sed, awk, and sh, so people familiar with those languages should have little difficultywith it (Language historians will also note some vestiges of csh, Pascal, and evenBASIC|PLUS). Expression syntax corresponds quite closely to C expression syntax. Ifyou have a problem that would ordinarily use sed or awk or sh, but it exceeds theircapabilities or must run a little faster, and you don't want to write the silly thing in C, then perlmay be for you. There are also translators to turn your sed and awk scripts into perl scriptsOK, enough hype.
Page 5VI, March 2005
Perl history
1994 Perl5: last major release (Currently Perl 5.8.6).
1996 Creation of the CPAN repository of modules and documentation( Comprehensive Perl Archive Network).
2005 Perl 5.8.6
Supported Operating Systems:Unix systems / Macintosh (OS 7-9 and X) / Windows / VMS
Perl FeaturesPerls database integration interface (DBI) supports thirdparty databases including Oracle, Sybase, Postgres, MySQL and others.Perl works with HTML, XML, and other markup languages .Perl supports Unicode.Perl is Y2K compliant.Perl supports both procedural and objectoriented programming.Perl interfaces with external C/C++ libraries through XS or SWIG.Perl is extensible There are over 500 third party modules available from (CPAN).
Page 6VI, March 2005
Perl history
Perl and the Web
Perl is the most popular web programming language due to its text manipulation capabilities and rapid development cycle.
Perl's CGIpm module, part of Perl's standard distribution, makes handling HTML forms simple.
Perl can handle encrypted Web data, including ecommerce transactions.
Perl can be embedded into web servers (mod_perl) to speed up processing by as much as 2000%.
Perl's DBI package makes webdatabase integration easy.
Page 7VI, March 2005
Perl Hello world !
My first program (hello.pl):
computerX: vioannid$ which perl/usr/bin/perl
computerY: vioannid$ which perl/usr/local/bin/perl
#!/usr/local/bin/perl
use strict;use warnings;
#tell the program to print "Hello world"print "Hello world" ;
#tell the program to exitexit ;
The first line of a Perl program is called "command interpretation" or "Shebang line". This linerefers to the "#!" and tells the computer that this is a Perl program.
To find out whether you should use /usr/bin/perl OR /usr/local/bin/perl,type: "which perl" in your shell:
Page 8VI, March 2005
Perl Hello world !
My first program (hello.pl):
use strict;
A command like use strict is called a pragma. Pragmas are instructions to the Perl interpreter to dosomething special when it runs your program. "use strict" does two things that make it harder towrite bad software:
It makes you declare all your variables, and it makes it harder for Perl to mistake your intentions when you are using subroutines
ALL STATEMENTS ENDS IN A SEMICOLON ";"(similar to the use of the period "." in the English language)
#!/usr/local/bin/perl
use strict;use warnings;
#tell the program to print "Hello world "print "Hello world" ;
#tell the program to exitexit ;
Page 9VI, March 2005
Perl Hello world !
My first program (hello.pl):#!/usr/local/bin/perl
use strict;use warnings;
#tell the program to print "Hello world"print "Hello world" ;
#tell the program to exitexit ;
use warnings;
Comments are good, but the most important tool for writing good Perl is the "warnings". Turning onwarnings will make Perl yelp and complain at a huge variety of things that are almost alwayssources of bugs in your programs.
Perl normally takes a relaxed attitude toward things that may be problems:it assumes that you know what you're doing, even when you don't…
Page 10VI, March 2005
Perl Hello world !
My first program (hello.pl):#!/usr/local/bin/perl
use strict;use warnings;
#tell the program to print "Hello world"print "Hello world" ;
#tell the program to exitexit ;
CommentsAll lines starting with "#" are not taken into account in the execution of the program.Good comments are short, but instructive They tell you things that aren't clear from readingthe code.
Blank lines or spaces are also not taken into account in the execution of the program. However, theyhelp in the reading of the code.
Page 11VI, March 2005
Perl Hello world !
My first program (hello.pl):#!/usr/local/bin/perl
use strict;use warnings;
#tell the program to print "Hello world"print "Hello world" ;
#tell the program to exitexit ;
Print statement:
… prints !
By default, the standard output is the shell window from which the program is executed.
ALL STATEMENTS ENDS IN A SEMICOLON ";"(similar to the use of the period "." in the English language)
Page 12VI, March 2005
Perl Hello world !
My first program (hello.pl):#!/usr/local/bin/perl
use strict;use warnings;
#tell the program to print "Hello world"print "Hello world" ;
#tell the program to exitexit ;
The exit statement:
Tells the computer to exit the program.
Although not explicitely required in Perl, it is definitely common.
Page 13VI, March 2005
Perl Hello world !
My first program (hello.pl):#!/usr/local/bin/perl
use strict;use warnings;
#tell the program to print "Hello world"print "Hello world" ;
#tell the program to exitexit ;
(Do not forget to make the file executable: vioannid$ chmod a+x perl_01.pl )
vioannid$ ./perl_01.pl Hello worldvioannid$
output:
Page 14VI, March 2005
Perl Hello world !!
Print:
#!/usr/local/bin/perl
use strict;use warnings;
#play with the print statement
#words separated by newlineprint "Hello\nworld\n" ;
#words separated by tabs & a final newlineprint "Hello\tworld\n" ;
#usage of the period to cat stringsprint "Hello"."world"."\n";
#tell the program to exitexit ;
vioannid$ ./perl_02.pl HelloworldHello worldHelloworldvioannid$
Important:Unix & all Unix flavors: \nMac OS : \rWindows: \r\n
Page 15VI, March 2005
Perl variables
Perl has 3 data types: scalars / arrays / hashes
scalars
a single string (of any size, limited only by the available memory), or a number, or a reference to something
Scalar values are always named with '$' (even when referring to a scalar that is part of an array ora hash). The '$' symbol works semantically like the English word "the" in that it indicates a singlevalue is expected.
my $variable_1 = "Hello world !\n"; #note the quotes
my $variable_two = 30; #note the absence of quotes
my $marks[4]; # the fifth element of the array "marks"
Page 16VI, March 2005
Perl variables
Perl has 3 data types: scalars / arrays / hashes
arrays (of scalars)Normal arrays are ordered lists of scalars indexed by number (starting with 0).
Entire arrays are denoted by '@', which works much like the word "these" or "those" does inEnglish, in that it indicates multiple values are expected.
my @numbers = ("One", "Two", "Three", "Four", "Five");
my @numbers = (1..5); #same as "@numbers = (1, 2, 3, 4, 5);"
my $numbers[0] = "One"; my $numbers[1] = "Two";…
my @anyarray = (6, "hello", @numbers);
FiveFourThreeTwoOne
43210index
value…
Page 17VI, March 2005
Perl variables
Perl has 3 data types:
hashes (associative arrays of scalars)
Hashes are unordered collections of scalar values indexed by their associated string key.Entire hashes are denoted by '%'
my %var = ("a","first","b","3");
my %codon3 = ("TTT" => "Phe","TTA" => "Leu",
);
print $codon3{'TTT'};TyrTAT
CysTGT
SerTCT
PheTTT
ValueKey
Page 18VI, March 2005
Perl special variables (small extract)
$_ The default input and patternsearching space.
$& The string matched by the last successful pattern match.$` The string preceding whatever was matched by the last successful pattern match.$' The string following whatever was matched by the last successful pattern match.
$! If a system or library call fails, it sets this variable This means that the value of $! is meaningful only immediately after a failure.
$/ The input record separator, newline by default .
$$ The process number of the Perl running this script.
@ARGV commandline arguments (space separation by default).
note:$ARGV[0] first commandline argument …
Page 19VI, March 2005
Perl variables
Programs using variables :
#!/usr/local/bin/perl
use strict;use warnings;
my $name = "John Doe";
print "Hello $name !\n" ;
exit ;
#!/usr/local/bin/perl
use strict;use warnings;
my $name = $ARGV[0];
print "Hello $name !\n" ;
exit ;
#!/usr/local/bin/perl
use strict;use warnings;
print "\nEnter your name(then press \"return\"when done):\t";
#get information from the#terminal windowmy $name = <STDIN>;
print "Hello $name !\n" ;
exit ;
Interpolation & quoting:
the quotes have different significations
…my $price = '$100';print "the price is $price";
#this is called interpolation
…
Page 20VI, March 2005
Perl variables
Program using variables :
#!/usr/local/bin/perl
use strict;use warnings;
my @names = ("Pedro", "Claire", "Yemima", "Fabien" , "RochPhilippe", "Francisco", "Sandra Yukie","Simona", "Christophe", "Dominique", "Michaela", "Lionel", "Gabriele", "Michael", "Charlotte","Subhash", "Adam", "Sebastian", "Tu", "Sergey", "Olusegun", "Joel", "Uta", "Viviane", "Stanislav","Kyrill", "Petr", "Sebastien");
print "Hello\n @names !\n" ;
exit ;
Some arrays functions:sort sorts all the elements of an array.reverse inverses the order of all the elements of an array.shift, unshift takes the first element, places an element at the first position of the array.pop, push takes the last element, places an element at the last position of the array.
Page 21VI, March 2005
Perl statement modifiers
Any simple statement may optionally be followed by a SINGLE modifier, just before the terminatingsemicolon (or block ending). The possible modifiers are:
if (EXPR) { }unless (EXPR) { }while (EXPR ) { }until (EXPR ) { }foreach (LIST ) { }
The EXPR following the modifier is referred to as the "condition". Its truth or falsehood determineshow the modifier will behave.
if executes the statement once if and only if the condition is true .unless is the opposite, it executes the statement if the condition is false (unless the condition is true).The foreach modifier is an iterator: it executes the statement once for each item in the LIST (with$_ aliased to each item in turn).while repeats the statement while the condition is true.until does the opposite, it repeats the statement until the condition is true (or while the condition isfalse): The while and until modifiers have the usual "while loop" semantics (conditionalevaluated first).
Page 22VI, March 2005
Perl statement modifiers
if / if else / if elsif else#!/usr/local/bin/perl
use strict;use warnings;
print "\nEnter your name (then press \"return\" when done):\t";
#get information from the terminal windowmy $name = <STDIN>;
#remove trailing "\n" if anychomp $name;
if ($name eq "Couchepin") { print "Hello Mr President !\n" ; }
else { print "Hello $name !\n" ; }
exit ;
Page 23VI, March 2005
Perl statement modifiers
if / if else / if elsif else (name.pl) :#!/usr/local/bin/perl
use strict;use warnings;
print "\nEnter your name (then press \"return\" when done):\t";
#get information from the terminal windowmy $name = <STDIN>;
#remove trailing "\n" if anychomp $name;
if ($name eq "Couchepin") { print "Hello Mr President !\n" ; }
elsif ($name eq "Falquet") { print "Good day to you Master $name !\n" ; }
else { print "Hello $name !\n" ; }
exit ;
Page 24VI, March 2005
Perl statement modifiers
Perl looping the for/foreach loop :"Passing an array":foreach my $element ( @array ) { # do something with the element}
"Passing a hash":foreach my $key (keys %hash) {
print "The value of $key is $hash{$key}\n";}
"specify 3 EXPR inside the (): initial state, condition and loop expression": for ($i = 0; $i <= 10; $i=$i+1 ) { #execute the contents of the block as long as $i is less than, or equal to 10 or while $i is smaller than 10}
Page 25VI, March 2005
Perl statement modifiers
Perl looping the for/foreach loop :
#!/usr/local/bin/perl
use strict;use warnings;
my @names = ("Pedro", "Claire", "Yemima", "Fabien" , "RochPhilippe", "Francisco", "Sandra Yukie","Simona", "Christophe", "Dominique", "Michaela", "Lionel", "Gabriele", "Michael", "Charlotte","Subhash", "Adam", "Sebastian", "Tu", "Sergey", "Olusegun", "Joel", "Uta", "Viviane", "Stanislav","Kyrill", "Petr", "Sebastien", "Haleh");
foreach my $name (@names) {print "Hello $name !\n";
}
exit ;
Page 26VI, March 2005
Perl statement modifiers
Perl looping the for/foreach loop :
#!/usr/local/bin/perl
use strict;use warnings;
my $counter;
for ($counter=1;$counter<=10;$counter++){print "I can count up to $counter !\n";
}
exit ;
Page 27VI, March 2005
Perl statement modifiers
Perl looping the while loop
while ( condition ) { #execute the contents of the block}
ATTENTION: Infinite Loop !!!
while (1) { #execute the contents of the block forever !}
True/False
In Perl some variables are consideredtrue:
- integer with a nonzero value - string with nonzero length - array with at least one element - hash with at least one key/value pair
For example:
$lang = "Perl"; # < true
$version = 5.6; # < true
$zero = 0; # < false
$empty = ""; # < false
@states = (); # < false
%table = (1 => "one"); # < true
Page 28VI, March 2005
#!/usr/bin/perl
use strict;use warnings;
my $number = 1;
while ($number<=10) {print "I can count up to $number !";$number+=1; #Ha !
}
exit ;
Perl statement modifiers
Perl looping the while loop
#!/usr/local/bin/perl
use strict;use warnings;
my $number = 1;
while ($number<=10) {print "I can count up to $number !";
}
exit ; #really ?
Tip:
To stop a "looping" script press CTRL+C …
Page 29VI, March 2005
Perl statement modifiers
Perl looping while loop / do until
while loop
do until
"Activity" is executed at least once !
"Activity" may never be executed.
Page 30VI, March 2005
Perl operators
Perl operators
Arithmetic+ addition- subtraction* multiplication/ division
Numeric comparison== equality!= inequality< less than> greater than<= less than or equal>= greater than or equal
String comparisoneq equalityne inequalitylt less thangt greater thanle less than or equalge greater than or equal
Why do we have separate numeric and string comparisons?
Because we don't have special variable types, and Perl needs to know whether to sortnumerically (where 99 is less than 100) or alphabetically (where 100 comes before 99).
Page 31VI, March 2005
Perl operators
Perl operators
#!/usr/local/bin/perl
use strict;use warnings;
my $x = 100;my $y = 99;
if ($x > $y) { print "\"$x\" is numerically greater than \"$y\"\n" ; }else { print "\"$x\" is numerically smaller than \"$y\"\n" ; }
if ($x gt $y) { print "\"$x\" is alphabetically greater than \"$y\"\n" ; }else { print "\"$x\" is alphabetically smaller than \"$y\"\n" ; }
exit ;
Output:"100" is numerically greater than "99""100" is alphabetically smaller than "99"
Page 32VI, March 2005
Perl operators
Perl operators
Boolean logic&& and|| or! not
Miscellaneous= assignment. string concatenationx string multiplication.. range operator (creates a list of numbers)
Many operators can be combined with a "=" as follows:
$a += 1; # same as $a = $a + 1 #same as $a++
$a -= 1; # same as $a = $a - 1 #same as $a--
$a .= "\n"; # same as $a = $a. "\n";
Page 33VI, March 2005
Perl functions
Functions in Perl are called subroutines
Functions are useful to avoid typing redundant code over and over.
Functions help in the clarity of scripts.
There are already many available functions in Perl:
http://searchcpanorg/~nwclark/perl-5.8.6/pod/perlfunc.pod
syntax of Perl subroutines:
sub (list of arguments) { list of statements to execute return some value
}
Page 34VI, March 2005
Perl functions
#!/usr/local/bin/perl
use strict;use warnings;
my $height = 220;my $weight = 120;
#to calculate the BFI you need the heigth in cm and the weight in kgmy $bfi = &cal($height, $weight);print "$bfi\n";exit;
sub cal { if (@_ != 2) { die "&cal should get exactly two arguments!\n" ; } my ($cm, $kg) = @_ ; my $index = ($kg)/(($cm / 100)*($cm / 100)); return $index;}
Output:24.7933884297521
Notice on Body Fat Index (BFI):BFI <20 => weight is too low20 < BFI < 25 => weight is correctBFI > 25 => Oups !
Page 35VI, March 2005
Perl functions
#!/usr/local/bin/perl
use strict;use warnings;
my @names = ("Pedro", "Claire", "Yemima", "Fabien", "Uta");
foreach (@names) {my $size = length($_);print "*"x($size+2)"\n";print "*$_*\n";print "*"x($size+2)"\n";
}
exit ;
Output:********Pedro*****************Claire******************Yemima******************Fabien***************Uta******
my @names1 = ("Pedro", "Claire", "Yemima", "Fabien" ,"Uta");my @names2 = ("Sandra Yukie", "Simona", "Christophe", "Dominique");my @names3 = ("Lionel", "Michael", "Charlotte", "Subhash", "Adam");my @names4 = ("Sebastian", "Tu", "Sergey", "Olusegun", "Joel", "Viviane");my @names5 = ("Stanislav", "Kyrill", "Petr", "Sebastien", "Haleh");
What if you need this "pretty print" more than once ?
Page 36VI, March 2005
Perl functions
#!/usr/local/bin/perl
use strict;use warnings;
my @names1 = ("Pedro", "Claire", "Yemima", "Fabien" ,"Francisco");my @names2 = ("Sandra Yukie", "Simona", "Christophe", "Dominique", "Michaela");my @names3 = ("Lionel", "Gabriele", "Michael", "Charlotte", "Subhash", "Adam");my @names4 = ("Sebastian", "Tu", "Sergey", "Olusegun", "Joel", "Uta", "Viviane");my @names5 = ("Stanislav", "Kyrill", "Petr", "Sebastien", "Haleh");
&pretty_print(@names1);&pretty_print(@names2);&pretty_print(@names3);&pretty_print(@names4);&pretty_print(@names5);
exit ;
sub pretty_print {foreach (@_) {my $size = length($_);print '*'x($size+2),"\n";print "*$_*\n";print '*'x($size+2),"\n";}
}
Page 37VI, March 2005
Perl File handles
A "file handle" is a connection between your Perl script and the outside world.
You can open a file for input or output using the open() function.
open(INFILE, "input.txt") or die "Can't open input.txt: $!";open(OUTFILE, ">output.txt") or die "Can't open output.txt: $!";open(LOGFILE, ">>logfile") or die "Can't open logfile: $!";
print() can also take an optional first argument specifying which filehandleto print to:
print STDERR "This is your final warning\n";print OUTFILE $record;print LOGFILE $logmessage;
use whatever name you like BUT: STDIN, STDOUT, STDERR !
Page 38VI, March 2005
Perl File handles
Perl special file handles
There are three connections that always exist and are always "open" when your program starts:
STDIN, STDOUT, and STDERR.
Actually, these names are file handles. File handles are variables used to manipulate files.
STDIN reads from standard input which is usually the keyboard in normal Perl script(or input from a Browser in a CGI script. Cgi-lib.pl reads from this automatically.)
STDOUT (Standard Output) and STDERR (Standard Error) by default write to a console(or a browser in CGI).
We have been using the STDOUT file handle without knowing it for every print()statement during this presentation. The print() function uses STDOUT as the default if noother file handle is specified.
Page 39VI, March 2005
Perl File handles
You can read from an open filehandle using the "<>" operator.
In scalar context it reads a single line (or a single record) from the filehandle, and in list context itreads the whole file in, assigning each line to an element of the list:
my $line = <INFILE>;my @lines = <INFILE>;
Reading in the whole file at one time is called slurping. It can be useful but it may be a memoryhog. Most text file processing can be done a line at a time with Perl's looping constructs.The "<>" operator is most often seen in a while loop:
while <INFILE> { # assigns each line in turn to $_print "Just read in this line: $_";
}
When you're done with your filehandles, you should close() them(though Perl will clean up after you if you forget…):
close INFILE; You can modify the regular record separator "\n" by something else:$/= "\/\/\n"; for a file containing SwissProt entries or$/=">"; for a fasta file)
Page 40VI, March 2005
Perl regular expressions
Idea: powerful way to search for text patterns …
>sw:THIO_RAT/110VKLIESKEAFQEALAAAGDKLVVVDFSATWCGPCKMIKPFFHSLCDKY ……>te:CB530525/66168VKQIESKYAFQEALNSAGEKLVVVDFSATWCGPCKMIKPFFHSLSEKY ……>tr:Q5R9M3_PONPY/210VKQIESKTAFQEALDAAGDKLVVVDFSATWCGPCKMIKPFFHSLSEKY ……>tg:NT039170_956/56151VKLIESKEAFQEALAAERDKLVMVDFSATWCGPCKMIKPFFHSSCDKY ……>te:CV502349/88193VSLITTKESWDQKLAEAKKegKIVIANFSASWCGPCRMISPFYCELKY ……>sw:TRXL2_ARATH/98174ITSAEQFLNALKDAGDRLVIVDFYGTWCGSCRAMFPKLCKFGHTAKEH ……>te:OMY_1368_2/13111ISSEEQWEEALSGPGLLVIEVYQRWCGPCKAVQNIFRKLRSHTHHTEY ……>te:CA246724/110160SKATYDEQWAAhkSSGKLMVIDFSASWCGPCRFIEPAFKELTHTASRF ……>tr:Q84XR8_CHLRE/68169ILTADTYHGFLEKNAEKLVVTDFYAVWCGPCKVIAPEIERTLANEMMT ……>tg:AL772421_11/578KLVVIEFGASWCEPSRRIAPVFAEYAKKMNKDKNDHDKDGDKDGMKEF ……
Page 41VI, March 2005
Perl