Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for...
Transcript of Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for...
![Page 1: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/1.jpg)
Introduction to Programming: Perl for Biologists
Timothy M. Kunau
Center for Biomedical Research InformaticsAcademic Health CenterUniversity of [email protected]
Bioinformatics Summer Institute 2007
1
![Page 2: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/2.jpg)
Introduction to Programming: Day two
Timothy M. Kunau
Center for Biomedical Research InformaticsAcademic Health CenterUniversity of [email protected]
Bioinformatics Summer Institute 2007
2
![Page 3: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/3.jpg)
Day I
•Art and Programming
•Getting Started
•Biology and Computer Science
•Bioinformatics Data
•Perl basics:
•Strings and Variables
•Math and Logic
•Looping, operators, and functions
3
![Page 4: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/4.jpg)
Day II
•Assignment discussion
•Data from outside the program
•Writing out data
•Data into arrays and hashes
•Array operations
•Scope and Good practices
•RegEx
4
![Page 5: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/5.jpg)
Day I: assignment review.
1. Calculate the reverse complement of a DNA strand using the tr/// operation.
2. Read about file handling. (Safari on-line documentation is available.)
3. Read about Regular Expressions (regex). (Safari)
4. Find CPAN.ORG and locate a module that would be useful to you as a biologist.
5. Read about that module and email me ([email protected]) the following details:
1. Name of the module.
2. The name of the person who wrote it.
3. What it does.
4. How it would be useful to you?
5
![Page 6: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/6.jpg)
Day I: assignment review.
1. Calculate the reverse complement of a DNA strand using the tr/// operation.
6
![Page 7: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/7.jpg)
7
•Match and replace what is in the first section, in order, with what is in the second.
• $dna =~ tr/[A-Z]/[a-z]/; # lowercase
• $dna =~ tr/[A-Z]/[B-ZA]/; # shift cipher
• $dna =~ tr/[ACGT]/[TGCA]/; # revcom
• $dna = reverse($dna);
The tr/// operator (translate)
7
![Page 8: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/8.jpg)
8
•Allows you to substitute whatever is matched in first section with value in the second section. (See m//.)
• $sport =~ s/football/soccer/g;
• $tdfwinner =~ s/Lance Armstrong/Ivan Basso/g;
s/// operator (substitute)
8
![Page 9: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/9.jpg)
#!/usr/bin/perl -w# Calculating the reverse complement of a strand of DNA
# The DNAmy $DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';
print "Here is the starting DNA:\n\n$DNA\n\n";
# Calculate the reverse complementmy $revcom = reverse $DNA;
# The Perl translate/transliterate command is just what we need:$revcom =~ tr/ACGTacgt/TGCAtgca/;
print "Here is the reverse complement DNA:\n\n$revcom\n";
Reverse compliment of a DNA strand
9
![Page 10: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/10.jpg)
CPAN
10
![Page 11: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/11.jpg)
Day I: assignment review, CPAN modules
1. Name of the module.
2. The name of the person who wrote it.
3. What it does.
4. How it would be useful to you.
11
![Page 12: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/12.jpg)
12
Getting Data from Files
open(HANDLE, "contig2_MT.fa") || die $!;
while (defined($line = <HANDLE>)) { if( $line =~ /^\>/ ) { print $line, "\n"; }}
close(HANDLE);
% ./file-handles.pl>ContigId:Contig2 AssemblyProcessId:MtSC AssemblyProcessVersion:1
12
![Page 13: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/13.jpg)
13
open(HANDLE, "contig2_MT.fa") || die $!;
while (<HANDLE>) {
if( $_ =~ /^\>/ ) { # tests first line print $_, "\n"; # prints first line }}
close(HANDLE);
% ./file-handlesII.pl>ContigId:Contig2 AssemblyProcessId:MtSC AssemblyProcessVersion:1
Getting Data from Files
13
![Page 14: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/14.jpg)
14
Getting Data from Files
open(HANDLE, "contig2_MT.fa") || die $!;
@slurp = <HANDLE>;
print @slurp;
close(HANDLE);
% ./file-handlesIII.pl
>ContigId:Contig2 AssemblyProcessId:MtSC AssemblyProcessVersion:1
GGGTATACTTCCTCCTCCATTGTTTGAGATATCACAAGACTTGAAATTGA
GCACGACCCATATTCTACTTCAAGGCGTTGAAGCAAAAACTCACCATGGG
AAACTAAACAGGTTAGTAAGTAGGCATCACCATCATTTTATATCGATATG
GATAATAATGCACAAGACTTTCAAAGTTATCTTCAGATTCTTCCCCCTGT
TGAGTTTGCTTGCGTTTATGGATCATCTCTTCATCCAACCAATCATGACA
AGACAACCATGGTTGATTATATTCTTGGAGTTTCTGACCCTATACAATGG
CATTCTGAGAATCCGAAAATGAATAAGCATCACTATGCGTCATGGATGGT
GCACCTTGGTGGAGAGAGGCTGATTACCGCAGATGCAGATAAAATTGGTG
TGGGAGTACATTTCAACCCTTTTG
14
![Page 15: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/15.jpg)
15
Pass data into a program
while(<STDIN>) {
print “stdin read: $_”;
}
15
![Page 16: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/16.jpg)
16
Pass data into a program
open(GREP, “grep ‘>’ $filename”) || die $!;
my $i = 0;
while(<GREP>) { $i++;}
close(GREP);
print “$i sequences in file\n”;
16
![Page 17: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/17.jpg)
17
Writing out data
open(OUT, “>outname”) || die $!;
print OUT “sequence report\n”;
close(OUT);
17
![Page 18: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/18.jpg)
18
Writing out data
# appending with >>
open(OUT, “>>outname”) || die $!;
print OUT “append this\n”;
close(OUT);
18
![Page 19: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/19.jpg)
19
Filehandles as variables
my $var = \*STDIN;
19
![Page 20: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/20.jpg)
20
Filehandles as variables
open($fh, “>report.txt”) || die $!;
print $fh “line 1\n”;
20
![Page 21: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/21.jpg)
21
Filehandles as variables
open($fh2, “report”) || die $!;
$fh = $fh2;
while(<$fh>) {
something interesting goes here;
}
21
![Page 22: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/22.jpg)
22
Zero based economy...
•The first element is ‘0’ for an index or first character in a string
•computer scientists like it this way
•as do most programming languages, including Perl
•Biologists often number first base in a sequence as ‘1’
•GenBank
•BioPerl
•Interbase coordinates (Kent-UCSC, Chado-GMOD)
22
![Page 23: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/23.jpg)
23
Coordinate systems
• Zero based, interbase coordinates
A A T G G G T A G A
0 1 2 3 4 5 6 7 8 9
• 1 based coordinates
A T G G G T A G A
1 2 3 4 5 6 7 8 9
23
![Page 24: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/24.jpg)
24
Arrays as Lists
• Lists are sets of items
• Can be mixed types of scalars (numbers, strings, floats)
• Perl uses lists extensively
• Variables are prefixed by @
24
![Page 25: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/25.jpg)
25
List operations
• reverse # reverse list order
• $list[$n] # get the $n-th item
• $two = $list[2]; # get which item?
25
![Page 26: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/26.jpg)
26
List operations
• reverse # reverse list order
• $list[$n] # get the $n-th item
• $three = $list[2]; # get the third item
26
![Page 27: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/27.jpg)
27
List operations
• scalar # get length of array
• $len = scalar @list;
• $last_index = $#list;
• delete $list[10]; # delete entry
27
![Page 28: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/28.jpg)
28
Autovivication
• Autovivify : to bring oneself to life.
• Automatically allocates space for an array item element:
$array[0] = ‘apple’;
$array[4] = ‘elephant’;$array[25] = ‘zebra’;
delete $array[25];
28
![Page 29: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/29.jpg)
29
![Page 30: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/30.jpg)
30
pop,push,shift,unshift
# remove last item$last = pop @list;
# remove first item$first = shift @list;
# add to end of listpush @list, $last;
# add to beginning of listunshift @list, $first;
30
![Page 31: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/31.jpg)
31
splicing an array
splice ARRAY,OFFSET,LENGTH,LIST
splice ARRAY,OFFSET,LENGTH
splice ARRAY,OFFSET
splice ARRAY
31
![Page 32: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/32.jpg)
32
splicing an array
@list = (‘alice’,’chad’,’rod’);
($x,$y) = splice(@list,1,2);
splice(@list, 1,0,(‘marvin’,’alex’));
32
![Page 33: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/33.jpg)
33
Sorting with sort
@list = (‘tree’,’frog’, ‘log’);
@sorted = sort @list;
# reverse order@sorted = sort { $b cmp $a } @list;
33
![Page 34: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/34.jpg)
34
Sorting with arrays of numbers
@list = (25,21,12,17,9,8);
# sort based on numerics@sorted = sort { $a <=> $b } @list;
# reverse order of sort@revsorted = sort { $b <=> $a } @list;
34
![Page 35: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/35.jpg)
LAB: files#!/usr/bin/perl -w## Reading protein sequence data file.
# File containing the sequence datamy $fastafilename = 'contig2_MT.fa';
# First we have to "open" the fileopen(FASTAFILE, $fastafilename);
# Read the fastafrom file, and store it# into the array variable @protein@fasta = <FASTAFILE>;
# Print the protein onto the screenprint @fasta;
# Close the file.close FASTAFILE;
exit;
% pico files2arrays.pl
35
![Page 36: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/36.jpg)
LAB: files#!/usr/bin/perl -w## Reading protein sequence data file.
# File containing the sequence datamy $fastafilename = 'contig2_MT.fa';
# First we have to "open" the fileopen(FASTAFILE, $fastafilename) || die $!;
# Read the fastafrom file, and store it# into the array variable @protein@fasta = <FASTAFILE>;
# Print the protein onto the screenprint @fasta;
# Close the file.close FASTAFILE;
exit;
% pico files2arrays.pl
36
![Page 37: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/37.jpg)
LAB: get a file in FASTA format
http://www.ncbi.nlm.nih.gov/
37
![Page 38: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/38.jpg)
LAB: navigate to GenBank
38
![Page 39: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/39.jpg)
LAB: search for your favorite protein
39
![Page 40: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/40.jpg)
LAB: favorite protein entries, change display
40
![Page 41: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/41.jpg)
LAB: change display to FASTA
41
![Page 42: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/42.jpg)
LAB: we return to our
program, already in
progress
% pico kinase.fa
% pico files2arrays.pl
Add the name of the
FASTA file you created to
the program.
Run the program.
#!/usr/bin/perl -w## Reading protein sequence data file.
# File containing the sequence datamy $fastafilename = 'kinase.fa';
# First we have to "open" the fileopen(FASTAFILE, $fastafilename) || die $!;
# Read the fastafrom file, and store it# into the array variable @protein@fasta = <FASTAFILE>;
# Print the protein onto the screenprint @fasta;
# Close the file.close FASTAFILE;
exit;
42
![Page 43: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/43.jpg)
LAB: break it.
What happens when?:
1. You added the file?
2. Did the error message go away?
3. How would you protect your user from an error like this?
Did you think that was harder than it needed to be?
43
![Page 44: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/44.jpg)
LAB: a safer method
% pico files2arrays.pl
% ./files2arrays.pl
Run the program.
#!/usr/bin/perl -w# Reading data from a file using a loop
# File containing the sequence datamy $fastafilename = 'kinase.fa';
open(FASTAFILE, $fastafilename) || die $!;
# Read file one line at a time and printwhile ($protein = <FASTAFILE>) { print $protein;}
close FASTAFILE;
exit;
44
![Page 45: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/45.jpg)
LAB:
breaking it.
Why is this more safe than reading the file into an array?
#!/usr/bin/perl -w# Reading data from a file using a loop
# File containing the sequence datamy $fastafilename = 'kinase.fa';
open(FASTAFILE, $fastafilename) || die $!;
# Read file one line at a time and printwhile ($protein = <FASTAFILE>) { print $protein;}
close FASTAFILE;
exit;
45
![Page 46: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/46.jpg)
A brief break
46
![Page 47: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/47.jpg)
47
Scope
TM proctor & gamble
•Section or subsection of a program where a variable is valid.
•Defined by braces { }
•Use ‘my’ to declare variables.
• use strict; # mandates declaration of variables.
• use warnings; # or ‘-w’ on shebang line
47
![Page 48: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/48.jpg)
48
Good practices
• ‘my’ operator declares a variable or a list of variables to be local (private) to the enclosed block, subroutine, or file. It will also be recognized in blocks contained by that region.
• The region in which the private variable is recognized is called its scope, variables declared with ‘my’ are called lexically scoped variables.
• Lexical (private) variables are not recognized outside of their scope.
• A private variable of a function will not be recognized in another function called by that function. If you want that to happen, declare the variable as ‘local’.
• It is recommended that you declare all of your variables with ‘my’.
48
![Page 49: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/49.jpg)
49
Someone else’s code
@list = (‘aardvark’, ‘baboon’, ‘cat’, ‘dog’,’lamb’,’kangaroo’);
for $animal ( @list ) { if( length($animal) <= 3 ) { print “$animal is noisy\n”; } else { print “$animal is quiet\n”; }}
49
![Page 50: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/50.jpg)
50
Made more safe.
use warnings;use strict;my @list = (‘aardvark’, ‘baboon’, ‘cat’, ‘dog’,’lamb’,’kangaroo’);
for my $animal ( @list ) { if( length($animal) <= 3 ) { print “$animal is noisy\n”; } else { print “$animal is quiet\n”; }}
50
![Page 51: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/51.jpg)
51
Associative arrays or Hashes
10 2 3 4 5
Array Hashpear
apple
cherry
lemon
peach
kiwi
‘john’
‘ste
ve’
‘aar
on’
‘max
’
‘juan
’
‘sue’
12
3
30
2
6
3
51
![Page 52: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/52.jpg)
52
Associative arrays or Hashes
• Like arrays, but instead of numbers as indices hashes use strings.
my @array = (‘john’, ‘steve’, ‘aaron’, ‘max’, ‘juan’, ‘sue’);
my %fruithash = ( ‘apple’ => 12, ‘pear’ => 3, ‘cherry’ =>30, ‘lemon’ => 2, ‘peach’ => 6, ‘kiwi’ => 3);
52
![Page 53: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/53.jpg)
53
Using hashes
• { } operator
• Set a value
$fruithash{‘cherry’} = 10;
• Access a value
print $fruithash{‘cherry’}, “\n”;
• Remove an entry
delete $fruithash{‘cherry’};
53
![Page 54: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/54.jpg)
54
Get the Keys
• ‘keys’ function will return a list of the hash keys
my @keys = keys %fruithash;
for my $key ( keys %fruithash ) { print “$key => $hash{$key}\n”;}
• produces: ‘apple’, ‘pear’, ...
• Order of keys is NOT guaranteed!
54
![Page 55: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/55.jpg)
55
Get just the values
•Similarly:
# creates an array of hash values
my @fruitcnt = values %fruithash;
for my $itemcount ( @fruitcnt ) { print “val is $itemcount\n”;}
55
![Page 56: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/56.jpg)
56
Iterate through a set
• Order is not guaranteed!
while( my ($key,$value) = each %fruithash){ print “$key => $value\n”;}
56
![Page 57: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/57.jpg)
57
References
• Are “pointers” to the data object instead of object itself.
• A shorthand to refer to a variable and pass it around.
•Must “dereference” whatever is pointed at to get its actual value, the “reference” is just a location in memory.
57
![Page 58: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/58.jpg)
58
Reference Operators
• \ in front gets its memory location
my $ptr = \@vals;
• Pointers can be assigned directly:
• [ ] for arrays, { } for hashes
my $ptr = [ (‘owlmonkey’, ‘lemur’)];
my $hashptr = { ‘cdrom’ => ‘III’, ‘start’ => 23};
58
![Page 59: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/59.jpg)
59
Dereferencing
• Need to cast reference back to datatype:
my @list = @$ptr;
my %hash = %$hashref;
• Can also use ‘{ }’ to clarify
my @list = @{$ptr};
my %hash = %{$hashref};
59
![Page 60: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/60.jpg)
60
Really not so hard...
my @list = (‘fugu’, ‘human’, ‘worm’, ‘fly’);
my $list_ref = \@list;
my $list_ref_copy = [@list];
for my $item ( @$list_ref ) { print “$item\n”;}
60
![Page 61: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/61.jpg)
61
Why use references?
• Simplify argument passing to subroutines
• Allows updating data without making multiple copies.
• What if we wanted to pass in 2 arrays to a subroutine?
sub func { my (@v1,@v2) = @_; }
• How do we know when one stops and another starts?
61
![Page 62: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/62.jpg)
62
Why use references?
• Passing in two arrays to intermix.
sub func { my ($v1,$v2) = @_; my @mixed;
while( @$v1 || @$v2 ) { push @mixed, shift @$v1 if @$v1; push @mixed, shift @$v2 if @$v2; } return \@mixed;}
62
![Page 63: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/63.jpg)
63
References also allow Arrays of Arrays
my @lst;push @lst, [‘milk’, ‘butter’, ‘cheese’];push @lst, [‘wine’, ‘sherry’, ‘port’];push @lst, [‘bread’, ‘bagels’, ‘croissants’];
my @matrix = [ [1, 0, 0], [0, 1, 0], [0, 0, 1] ];
63
![Page 64: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/64.jpg)
64
Hashes of arrays
$hash{‘dogs’} = [‘beagle’, ‘shepherd’, ‘lab’];$hash{‘cats’} = [‘calico’, ‘tabby’, ‘siamese’];$hash{‘fish’} = [‘gold’,’beta’,’tuna’];
for my $key (keys %hash ) { print “$key => “, join(“\t”, @{$hash{$key}}), “\n”;}
64
![Page 65: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/65.jpg)
65
Subroutines
•Set of code that can be reused.
•Can also be referred to as procedures and functions.
•Often the result of re-factoring and refining your solution.
•Have little to do with submarines.
65
![Page 66: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/66.jpg)
66
Defining a subroutine
• sub routine_name { } # declaring a subroutine
• Calling the routine:
routine_name;
&routine_name; # & is optional
66
![Page 67: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/67.jpg)
67
Passing data to a subroutine
• Pass in a list of data
&dosomething($var1,$var2);
sub dosomething { my ($v1,$v2) = @_;}
sub dosomethingelse { my $v1 = shift @_; my $v2 = shift;}
67
![Page 68: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/68.jpg)
68
Returning data from a subroutine
• The last line of the routine sets the return value.
sub dothis { my $c = 10 + 20;}
print dothis(), “\n”;
• Better to specify return value and/or a condition to leave routine early.
68
![Page 69: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/69.jpg)
69
sub is_stopcodon { my $val = shift @_;
if( length($val) != 3 ) { return -1; } elsif( $val eq ‘TAA’ || $val eq ‘TAG’ || $val eq ‘TGA’ ) { return 1; } else { return 0; }}
Subroutine returns true (1) if codon is a stop codon
(standard genetic code)
69
![Page 70: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/70.jpg)
LAB: subroutines
% pico subroutine.pl
#!/usr/bin/perl -w# A program with a subroutine to append AAAAT to DNA
# The original DNA$dna = 'CGACGTCTTCTCAGGCGA';
# The call to the subroutine "addPOLYA".# argument passed in is $dna; result is $longer_dna$longer_dna = addPOLYA($dna);
print "I added AAAAT to $dna and got $longer_dna\n\n";
# Here is the definition for subroutine "addPOLYA"sub addPOLYA { my($dna) = @_;
$dna .= 'AAAAT'; return $dna;}
exit;
70
![Page 71: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/71.jpg)
LAB: break it.
Can you?:
1. Create better variable names?
2. Find a potential problem with subroutines and variable scope?
3. Get it to work with GLOBAL variables?
4. Explain why this might be a problem?
71
![Page 72: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/72.jpg)
LAB: add to it.
Can you?:
1. Find another way to concatenate the strings?
2. Add a subroutine that provides a reverse transcription service?
3. Test for a poly-A tail before adding a poly-A tail and add one only if it isn’t already there?
4. Create a file of FASTA entries and run them through your program?
72
![Page 73: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/73.jpg)
73
Funny operators
my @bases = qw(C A G T);
my $msg = <<EOFIn his return from the ship to New York, he was discovered by the enemy as he passed near Governors Island, They took chase and in an effort to escape, Ezra Lee cast off the timed mine, as he imagined it retarded him in the heavy swells of the harbor. He was then spotted by his men waiting for his return on the shore and was safely retrieved. The freed magazine, which was set to go off at one hour, “drifted past Governors Island into the East River where it exploded with great violence, throwing large columns of water and pieces of wood high in the air.”
EOF;
73
![Page 74: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/74.jpg)
74
• Part of “amazing power” of Perl
• Considered by some to be the heart and soul of Perl.
• Provide a set of very powerful and flexible facilities for parsing and manipulating text.
• Syntax can be tricky.
• Worth the effort to learn!
• Do not be afraid.
Regular Expressions (reg’-ex)
74
![Page 75: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/75.jpg)
75
• Regular Expressions represent a small, nearly unrelated, programming language within the Perl programming language.
• ‘Regexes’ are symbiotic DNA.
• A state machine operating on strings.
• Do not be afraid.
Regular Expressions: the secret
75
![Page 76: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/76.jpg)
76
if( $fruit eq ‘apple’ || $fruit eq ‘Apple’ || $fruit eq ‘pear’) { print “ matched fruit $fruit\n”;}
# becomes this
if( $fruit =~ /[Aa]pple|pear/ ){ print “matched fruit $fruit\n”;}
A simple regex
76
![Page 77: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/77.jpg)
77
• use the =~ operator to match
• if( $var =~ /pattern/ ) {} # scalar context
• my ($a,$b) = ( $var =~ /(\S+)\s+(\S+)/ );
• if( $var !~ m// ) { } # true if pattern doesn’t
• m/REGEXPHERE/ # match
• s/REGEXP/REPLACE/ # substitute
• tr/VALUES/NEWVALUES/ # translate
Regular Expression syntax
77
![Page 78: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/78.jpg)
78
• aMino - {A,C}, Keto - {G,T}
• puRines - {A,G}, prYmidines - {C,T}
• Strong - {G,C}, Weak - {A,T}
• H (Not G)- {ACT}, B (Not A), V (Not T), D(Not C)
$str =~ tr/acgtrymkswhbvdnxACGTRYMKSWHBVDNX/tgcayrkmswdvbhnxTGCAYRKMSWDVBHNX/;
DNA ambiguity chars: (reverse compliment)
78
![Page 79: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/79.jpg)
79
• Search a string for a pattern match
• If no string is specified, will match $_
• Pattern can contain variables which will be interpolated (and pattern recompiled)
while (<>) { print if /$pat/; }
while (<>) { print if /$pat/o; }
m// operator (match)
79
![Page 80: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/80.jpg)
80
• /i # case insensitive
• /g # global match (more than one)
• /x # extended regex (comments and whitespace)
• /o # compile regex once
Pattern extras: suffixes
80
![Page 81: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/81.jpg)
Regex Operators
\ escape character - used to a metacharacter like a period, brackets, etc.. (period) match any character except newlinex ! match any instance of x^x ! match any character except x[x] ! match any instance of x in the bracketed range - [abxyz] will match any
instance of a, b, x, y, or z| (pipe) an OR operator - [x|y] will match an instance of x or y() ! used to group sequences of characters or matches{} ! used to define numeric quantifiers{x} ! match must occur exactly x times{x,} !match must occur at least x times{x,y} !match must occur at least x times, but no more than y times? ! preceding match is optional or one only, same as {0,1}* ! find 0 or more of preceding match, same as {0,}+ ! find 1 or more of preceding match, same as {1,}^ ! match the beginning of the line$ ! match the end of a line
81
![Page 82: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/82.jpg)
Regex: Character Operators
\d !matches a digit, same as [0-9]\D !matches a non-digit, same as [^0-9]\s ! matches a whitespace character (space, tab, newline, etc.)\S !matches a non-whitespace character\w !matches a word character\W !matches a non-word character
82
![Page 83: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/83.jpg)
Regex: POSIX Operators
[:alnum:] !alphabetic and numeric characters[:alpha:] ! alphabetic characters[:blank:] ! space and tab[:cntrl:] ! control characters[:digit:] ! digits[:graph:] ! non-blank (not spaces and control characters)[:lower:] ! lowercase alphabetic characters[:print:] ! any printable characters[:punct:] ! punctuation characters[:space:] ! all whitespace characters (includes [:blank:], newline, carriage return)[:upper:] ! uppercase alphabetic characters[:xdigit:] ! digits allowed in a hexadecimal number (i.e. 0-9, a-f, A-F)
83
![Page 84: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/84.jpg)
Regex: Additional Modules, 180+ found
POSIX::RegexOO interface for the gnu regex enginePOSIX-Regex-0.89 - 18 Aug 2006 - Paul Miller
Regexp::CommonProvide commonly requested regular expressionsRegexp-Common-2.120 - 15 Mar 2005 - Abigail
Regexp::Common::CCprovide patterns for credit card numbers.Regexp-Common-2.120 - 15 Mar 2005 - Abigail
Regexp::Common::IRCprovide patterns for parsing IRC messagesRegexp-Common-IRC-0.02 - 18 Dec 2005 - Chris Prather
Regexp::Common::URIprovide patterns for URIs.Regexp-Common-2.120 - 15 Mar 2005 - Abigail
Regexp::Common::numberprovide regexes for numbersRegexp-Common-2.120 - 15 Mar 2005 - Abigail
Regexp::Common::profanityprovide regexes for profanityRegexp-Common-2.120 - 15 Mar 2005 - Abigail
Regexp::EnglishPerl module to create regular expressions more verboselyRegexp-English-1.00 - 10 Jul 2005 - chromatic
Regexp::EthiopicRegular Expressions Support for Ethiopic Script.Regexp-Ethiopic-0.15 - 22 Nov 2006
84
![Page 85: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/85.jpg)
85
Simple regex
my $line = “aardvark”;
if( $line =~ /aa/ ) { print “has a double aa\n” }if( $line =~ /(a{2})/ ) { print “has double aa\n” }if( $line =~ /(a+)/ ) { print “has 1 or more a\n” }
85
![Page 86: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/86.jpg)
86
Matching gene names
# YFL001C YAR102W - yeast ORF names# let-1, unc-7 - worm names
# ENSG000000101 - human Ensembl gene names
while(<IN>) {
if( /^(Y([A-P])(R|L)(\d{3})(W|C)(\-\w)?)/ ) {
printf “yeast gene %s, chrom %d,%s arm, %d %s strand\n”,
$1, (ord($2)-ord(‘A’))+1, $3, $4;
} elsif( /^(ENSG\d+)/ ) { print “human gene $1\n” } elsif( /^(\w{3,4}\-\d+)/ ) { print “worm gene $1\n”; }
}
86
![Page 87: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/87.jpg)
87
Regex GenBank record into FASTA components
my ($anno, $dna) = ($rec =~ /^(LOCUS.*ORIGIN\s*\n)(.*)\/\/\n/s);
LOCUS appears at the beginning of the GenBank record,
followed by any number of characters including newlines
with .*, followed by the string ORIGIN, followed by possibly
some whitespace with \s*, followed by a newline \n.
This matches the annotation part of the GenBank record.
87
![Page 88: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/88.jpg)
88
A parser for output from a gene prediction program
Putting it together
88
![Page 89: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/89.jpg)
89
GlimmerM (Version 3.0)Sequence name: BAC1Contig11Sequence length: 31797 bp
Predicted genes/exons
Gene Exon Strand Exon Exon Range Exon # # Type Length
1 1 + Initial 13907 13985 79 1 2 + Internal 14117 14594 478 1 3 + Internal 14635 14665 31 1 4 + Internal 14746 15463 718 1 5 + Terminal 15497 15606 110
2 1 + Initial 20662 21143 482 2 2 + Internal 21190 21618 429 2 3 + Terminal 21624 21990 367
3 1 - Single 25351 25485 135
4 1 + Initial 27744 27804 61 4 2 + Internal 27858 27952 95 4 3 + Internal 28091 28576 486 4 4 + Internal 28636 28647 12 4 5 + Internal 28746 28792 47 4 6 + Terminal 28852 28954 103
5 3 - Terminal 29953 30037 85 5 2 - Internal 30152 30235 84 5 1 - Initial 30302 30318 17
89
![Page 90: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/90.jpg)
90
while(<>) { if(/^(Glimmer\S*)\s+\((.+)\)/ { $method = $1; $version = $2; } elsif( /^(Predicted genes)|(Gene)|(\s+\#)/ || /^\s+$/ ) { next } elsif( # glimmer 3.0 output /^\s+(\d+)\s+ # gene num (\d+)\s+ # exon num ([\+\-])\s+ # strand (\S+)\s+ # exon type (\d+)\s+(\d+) # exon start, end \s+(\d+) # exon length! /ox ) {
my ($genenum,$exonnum,$strand,$type,$start,$end, $len) = ( $1,$2,$3,$4,$5,$6,$7); }}
Putting it together
90
![Page 91: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/91.jpg)
Day II: assignment.
1. Modify one of your existing programs to do something useful using a Regular Expression. (see the last lab)
2. Read about Perl DBI. (Safari on-line documentation is available.)
3. Read about BioPerl. (Safari and CPAN)
4. Write a paragraph describing what you hope to do with Perl in your BSI project and email it to me. ([email protected])
91
![Page 92: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/92.jpg)
If you remember nothing else
•Biology is hard and messy: better tools will help.
•The key problems are social.
•Together we are smarter than any one of us.
•Technology is easy by comparison.
92
![Page 93: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/93.jpg)
Questions?
93
![Page 94: Introduction to Pr ogramming: Perl for Biologists · Introduction to Pr ogramming: Perl for Biologists Timothy M. Kunau Center for Biomedical Resear ch Informatics Academic Health](https://reader034.fdocuments.in/reader034/viewer/2022051321/5b1c5f347f8b9a23258fd5b2/html5/thumbnails/94.jpg)
Thank You.
94