Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp .
-
Upload
trevor-hunter -
Category
Documents
-
view
221 -
download
0
Transcript of Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp .
Computer Programming for Biologists
Class 7
Nov 27th, 2014
Karsten Hokamp
http://bioinf.gen.tcd.ie/GE3M25/programming
Hash Variables
associative arrays
list of key/value pairs
values and keys scalars
access values by key names
Great for look-ups!
Description
Hash VariablesLook-up Table
Look-up table in real life for translation:
AAA K
AAC N
AAG K
AAU N
…
…
UUG L
UUU F
Genetic code
In Perl use hash variable:
%genetic_code = ('AAA' => 'K','AAC' => 'N','AAG' => 'K','AAU' => 'N', …'UUG' => 'L', 'UUU' => 'F');
Keys are unique!
Hash Variables
%bases = ('a', 'purine', 'c', 'pyrimidine', 'g', 'purine','t', 'pyrimidine');
%complement = ('a' => 't','c' => 'g','g' => 'c','t' => 'a');
%letters = (1, 'a', 2, 'b', 3, 'c', 4, 'd');
Examples
Hashes: Lists with special relationship between each pair of elements!
Hash Variables
Storing Data
# count frequency of nucleotides:my $As = 0; my $Cs = 0; my $Gs = 0; my $Ts = 0;
foreach my $nuc (split //, $dna) {if ($nuc eq 'A') {
$As++;} elsif ($nuc eq 'C') {
$Cs++;} elsif ($nuc eq 'G') {
$Gs++;} elsif ($nuc eq 'T') {
$Ts++;}
}
Hash Variables
Storing Data
# count frequency of nucleotides:my %freq = ();
foreach my $nuc (split //, $dna) {$freq{$nuc}++;
}
Hash Variables
Storing Data
# count frequency of nucleotides:my %freq = ();
foreach my $nuc (split //, 'ACTTGGGT') {$freq{$nuc}++;
}
key value
A 1
C 1
G 3
T 3
keys are stored in no specific order
auto-initialisationwith '' or 0
Hash Variables
Scalar vs Hash
$As = 0;
As
0
$Cs = 0;
Cs
0
$Gs = 0;
Gs
0
$Ts = 0;
Ts
0
Hash Variables
Scalar vs Hash
$As = 0;
$As++;
As
1
$Cs = 0;
$Cs++;
Cs
1
$Gs = 0;
$Gs++;
Gs
1
$Ts = 0;
$Ts++;
Ts
1
Hash Variables
Scalar vs Hash
$As = 0;
$As++;
As
1
$Cs = 0;
$Cs++;
Cs
1
$Gs = 0;
$Gs++;
Gs
1
$Ts = 0;
$Ts++;
Ts
1 Cs
As
Gs
Ts
1
%freq = ();
$freq{'Gs'}++;
freq
Computer Programming for Biologists
Practical:
http://bioinf.gen.tcd.ie/GE3M25/programming/class7
Exercises
Hash Variables
Accessing Elements
General: $value = $hash{$key};
Special funtions: keys and values
# get complement of a basemy $new_base = $complement{$base};
# get aminoacid for a codonmy $aa = $genetic_code{$codon};
# list all the aa's that occurredforeach my $aa (keys %list) {
print "$aa was found!\n";}
loop through all keys
Hash Variables
$freq = $freq{'Gs'};
print "Gs: $freq\n";
Gs: 3
Retrieving a key/value pair
Cs
As
Gs
Ts
3
%freq
Hash Variables
$nuc = 'Gs';
print "$nuc: $freq{$nuc}\n";
Gs: 3
Retrieving a key/value pair
Cs
As
Gs
Ts
3
%freq
Hash Variables
foreach my $nuc (keys %freq) {
print "$nuc: $freq{$nuc}\n";
}
Cs: 1
Ts: 3
Gs: 3
As: 1
Retrieving a key/value pair
Cs
As
Gs
Ts
3
%freq
Hash Variables
foreach my $nuc (sort keys %freq) {
print "$nuc: $freq{$nuc}\n";
}
As: 1
Cs: 1
Gs: 3
Ts: 3
Retrieving a key/value pair
Cs
As
Gs
Ts
3
%freq
Hash Variables
Checking for keys/values
# does the key exist?if (exists $hash{$key}) {}
# does the key have a defined value?if (defined $hash{$key}) {}
# does the key have a valueif ($hash{$key}) {}
Computer Programming for Biologists
Use hashes in your sequence analysis tool for:
-reporting frequencies of nucleotides
or amino acids
- reporting the GC content
Exercises