Perl Scripting for Biologists -

43
Perl Scripting for Biologists

Transcript of Perl Scripting for Biologists -

Page 1: Perl Scripting for Biologists -

Perl Scripting for Biologists

Page 2: Perl Scripting for Biologists -

Perl ( www.perl.org )• A preferred programming language in bioinformatics

• Easy to learn and write

• Write simple programs fast, yet very powerful.

• File and text manipulation, database access, graphical and web programming.

• Derives from– C language– Unix shell– sed– awk

Page 3: Perl Scripting for Biologists -

www.cpan.org

Page 4: Perl Scripting for Biologists -

www.bioperl.org

Page 5: Perl Scripting for Biologists -

Where can I get Perl?

• Unix/Linux– Installed as standard, or get the package.

• Apple – Included from OSX 10.3

• Microsoft– www.perl.com Compiling from source– Executable distributions

• Strawberryperl.com• www.activestate.com/activeperl

Page 6: Perl Scripting for Biologists -

Editing programs

• Why not use Microsoft Word?– Embedded control characters in file formats– No syntax highlighting / auto indentation– No integration with other development tools

• Some tools:– Emacs– Vi, vim, gvim– Eclipse– Xcode (Apple)

Page 7: Perl Scripting for Biologists -

My first program

• Editing the program– Open emacs (or your favorite editor)– First line should be #!/usr/bin/perl– enter program code (print “Hello World!\n”; )– Save (helloworld.pl)

• Execute the program in the terminal

$ perl helloworld.pl

$ perl -c helloworld.pl # compile only

Page 8: Perl Scripting for Biologists -

Adding execute permissions

$ chmod +x helloworld.pl

• Check with ls -l• If we added execute permissions:

$ ./helloworld.pl

Page 9: Perl Scripting for Biologists -

Adding comments• Any line starting with a # sign is a comment• Everything after a # sign on a line is a comment• For example

# this is a comment

print “Hello World\n”; # another comment

Page 10: Perl Scripting for Biologists -

Hello World program

#!/usr/bin/perl# my first Perl program

print “Hello World\n”;

Page 11: Perl Scripting for Biologists -

Development Cycle

Edit Compile

Run / Test

Page 12: Perl Scripting for Biologists -

Program compilation/interpretation

• At runtime: program compiled, then executed• Syntax errors: program will not run

# this is a syntax error

print Hello World\n”;

Execute the program in the terminal

$ perl helloworld.pl

$ perl -c helloworld.pl # compile only

Page 13: Perl Scripting for Biologists -

Program format

• First line is #!/usr/bin/perl • File .pl extension is optional • Statements end with a semicolon ;• Comments: lines beginning with #• Syntax errors: program will not run• Variables : scalars ($) , arrays (@) , hashes (%)

Page 14: Perl Scripting for Biologists -

Scalar variables

#!/usr/bin/perl

# my first Perl program

$message = “Hello World!”;

print “$message\n”;

Page 15: Perl Scripting for Biologists -

Scalar variables

#!/usr/bin/perl

$a = 2;

$b = 5;

#sum

$result = $a + $b;

# print it

print "Result is: $result\n";

Page 16: Perl Scripting for Biologists -

Scalar Variables

• Scalars are prefixed with a $ sign

• Valid variable names begin with a letter, and then any number of letters, numbers and underscores (_).

$foo

$chromosome_number

$block13

$a123b

$test1A2

$test1_a_2● Capital letters are legal, but not often used in Perl variable names.

• Most programmers don't use camelCase for Perl variables (such as $chromosomeNumber)

Page 17: Perl Scripting for Biologists -

Variable assignments

• Assignment operator is “=”• Examples

$r = 4; # assigning an integer

$pi = 3.14156; # assigning a real

$foo = “hello”; # assigning a string. # Note the quotes

$bar = 'Ciao!'; # alternate set of # quotes

$sum = $pi * ($r ** 2); # do some math

Page 18: Perl Scripting for Biologists -

Variables

• Write numeric variables in any format you want

$a = 134 ;

$a = -2004 ;

$a = 56.79 ;

$a = -56.7913 ;

$a = 7.25e24 ;

$a = -12E-29 ;

Page 19: Perl Scripting for Biologists -

Variables

• Write strings with double or single quotes

$a = “” # empty string

$a = “BTI's bioinformatics course”;

$a = 'BTI bioinformatics course';

Page 20: Perl Scripting for Biologists -

Variables

• Use double quotes with

$a = “BTI perl course\n” #print new line

$a = “TAB1\tTAB2\tTAB3\n”; #print tabs

$a = “$n students in the $course”; #print variable values

● literal double quote or backslash is escaped with a backslash (\” \\)

$a=“$n students in the \“BTI perl course\””;

Page 21: Perl Scripting for Biologists -

Writing safe code

• By default, no variable declaration necessary in Perl• BAD!!!!!!!!!• Turn on optional variable declaration:

– use strict;

– At the beginning of the program (after #!/usr/bin/perl)• Another good directive is

– use warnings;

– Turns on warnings during compile / execution.

Page 22: Perl Scripting for Biologists -

Declaring variables• Several ways

– Global variables. • Variables are “global” in scope, valid anywhere in

your program.• Declaration: using keyword our• our $foo;

– Local variables.• Variables that are “lexically scoped”, i.e., valid only in

the current blocks and the enclosed blocks• Declaration using the keyword my• my $foo;

Writing safe code

Page 23: Perl Scripting for Biologists -

Writing safe code

• Declare your variables with mymy $a = 1;

#initialize some variables with undef value:my $a;my ($a,$b,$c);

The my operator declares a variable or a list of variables to be local (private) to the enclosed block, subroutine or file (the “scope”).

● Variables from the same program will not “step on each other”

● Important if your code will be used with other programs with variable names unknown to you

Page 24: Perl Scripting for Biologists -

Writing safe code - summary

• Declare your variables with my

#!/usr/bin/perluse strict;use warnings; my $a = 1;my $b;my $sum = $a + $b ;

• Use the strict module for enforcing declaring variables• Use the warnings module for helping debugging

Page 25: Perl Scripting for Biologists -

Writing safe code - summary

• Declare your variables with my

#!/usr/bin/perluse strict;use warnings; my $a = 1;my $b = “this is a string”;my $sum = $a + $b ;

• Use the strict module for enforcing declaring variables• Use the warnings module for helping debugging

Page 26: Perl Scripting for Biologists -

Writing safe code - summary

• Declare your variables with my

#!/usr/bin/perluse strict;use warnings; my $a = 1;my $b = 2;my $c = 3;my $sum = $a + $b + $d;

• Use the strict module for enforcing declaring variables• Use the warnings module for helping debugging

Page 27: Perl Scripting for Biologists -

Operations with scalars

• Assigning to other variables

– $a = $b;

• Math for number containing variables

– my $foo = 2 * $bar;

– my $c = sqrt($a ** 2 + $b ** 2));

• Printing

– print “The total is $total\n”;

– Note double quotes (single quotes print literals)

• String operations

– Concatenation: $z = $x . $y . 'blabla';

• Reading from the keyboard

– my $name = <STDIN>; OR my $name = <>;

Page 28: Perl Scripting for Biologists -

Perl function callsmy $result = foo($x, $y);

Where– foo() function name– $x, $y function parameters– $result is the return value (can also be @array or

%hash)

• More information about a specific function:

perldoc -f <function> (perldoc -f print)

Page 29: Perl Scripting for Biologists -

Math functions• abs, atan2, cos, exp, hex, int, log, oct, rand, sin, sqrt• Math operators

+ - / * ** (power) % (modulus)

Built-in functions

my $a = 10;my $b = 15;my $result = sqrt($a + $b);

Page 30: Perl Scripting for Biologists -

Some functions operating on strings• length() – length of a string• uc() and lc() - convert to upper (lower) case• Concatenation: “.”

Built-in functions

my $a = “Hello”;my $b = “ world”;my $string = $a . $b;

print “$string\n”;print “string lc($string) has ” . length($string) . “ characters\n”;

Page 31: Perl Scripting for Biologists -

Arrays @

• Ordered list of variables• The variable prefix '@' defines a list• Each element can be accessed using a numeric index• Declaration & notation

my @list; # the empty listmy @list = (1, 2, 3, 4, 5, 6); # a list of six

integer elementsmy @list = (“foo”, “bar”, “batz”); # a list of

string values

Page 32: Perl Scripting for Biologists -

Traversing a listUsing the foreach construct

my @countries = (“USA”, “France”, “England”);

foreach my $country (@countries) {

print “$country\n”;

}

Arrays

Page 33: Perl Scripting for Biologists -

my @list = ('a', 'b', 'c');

my $value = 'd';

#add an element to the end of the array:

push @list, $value;

#remove the last element of the array:

my $last = pop @list;

#add an element to the beginning of the array:

unshift @list, $value;

#remove an element from the beginning of the array:

shift @list;

Adding/removing list elements

Page 34: Perl Scripting for Biologists -

Accessing individual list elements

• List elements are accessed using the index

• The index is zero-based !!!!• When accessing an element, use a $ (scalar)

my @countries = ('France', 'China', 'Peru');

First element: $countries[0] (has the value of 'France')

• Assigning to a list element

$countries[3] = 'Marocco';

Page 35: Perl Scripting for Biologists -

Other list operations• my @foo = (1, 2, 3, 4);

• my @bar = (5, 6, 7, 8);

Combine 2 lists:

• my @combined = (@foo, @bar);

Assign list into another list:

• my @x = (@foo, 12, 13, @bar, 45);

Extract elements from a list:

• my ($x, $y, @rest) = @foo;

Page 36: Perl Scripting for Biologists -

“List” vs “scalar” context• “List context”

– In list context, the list is treated as a list• “Scalar context”

– The list is treated as a scalar. – As a scalar, the value is the number of list elements.– Sometimes subtle changes can change the context

• Try: print @list;

• Versus: print @list . “\n”;

• The function scalar forces the list in scalar context: my $count = scalar(@list);

Page 37: Perl Scripting for Biologists -

More functions on lists• Sort a list

my @sorted = sort ('Zimbabwe', 'Japan', 'Spain');

• Join: convert a list to a string

my $list = join(“, “, @countries);

• Split: convert a string to a list

my @list = split(“ “, “hello world”);

Page 38: Perl Scripting for Biologists -

Transforming lists using map

my @numbers = (1, 2, 3, 4, 5);

my @squares = map { $_ ** 2 } @numbers;

print join(“, “, @squares);

Prints

1, 4, 9, 16, 25

Page 39: Perl Scripting for Biologists -

Transforming lists using map

my @numbers = (1, 2, 3, 4, 5);

my @squares = map { $_ ** 2 } @numbers;

The same as

foreach my $n (@numbers) {

$n = $n ** 2 ;

}

Page 40: Perl Scripting for Biologists -

Arrays: summary

➢ Definition: my @numbers = (1, 7, 3, 9, 5);➢ foreach my $i (@numbers { ... }

➢ print “ @numbers \n”; print join(',' , @numbers);

➢ @sorted = sort(@numbers);

➢ my $length = scalar(@numbers);

➢ Add/extract array elements with shift, unshift, pop, push functions

➢ Access individual elements: $numbers[$index] ($index starts with 0)

➢ More array functions: join, split, map

(perldoc -f <built_in_function_name>

Page 41: Perl Scripting for Biologists -

Hashes %

• Collection of key/value pairs

• Order of the key/value pairs in the hash is not important

• Declaration– Use % as prefix

my %capitals; # the empty hash

my %capitals = ('Spain' => 'Madrid',

'Japan' => 'Tokyo',

'Peru' => 'Lima' );

• Access hash element, assign, etc.

$capitals{'Spain'} = “Madrid”;

Page 42: Perl Scripting for Biologists -

Traversing hashes

• By key

foreach my $k (keys(%capitals)) {

print “$k: $capitals{$k}\n”;

}

• By value

foreach my $v (values(%capitals)) {

print “$v\n”;

}

• Hashes have no defined order of elements

Page 43: Perl Scripting for Biologists -

Perl variables -reminder

Scalar: $var

Array: @var; list of indexed scalars, order matters.

$var[0]; $var[$index];

foreach my $element (@var) {

print $element . “\n”;

}

Hash: %var; elements ('values') and indexes ('keys') are scalars. Indexes are not ordered, but must be unique!

$var{'name'} ; $var{$key};

foreach my $key ( keys (%var) ) {

print “key = $key , value = $var{$key} \n”;

}