Perl SortingHashes
-
Upload
hunter1208 -
Category
Documents
-
view
215 -
download
0
Transcript of Perl SortingHashes
-
7/24/2019 Perl SortingHashes
1/4
S o r t i n g ha s h e s
by key or by value
Sorting hashes, whether by key or by value, also starts with a list of the keys in the
hash. You can then sort the keys or values of the hash using syntax very similar tothat you already know for sorting arrays.
S o r t i n g A S C I I b e t i c a l l y b y k e y s
Consider the simplest case, where we want to sort the keys of a hash ASCIIbetically.
For example, the hash may contain gene sequences as values, and gene names as
keys, and you want to print out the sequences in fasta format, sorted by gene name.
You simply get a list of the keys, and then sort them using the same syntax as when
sorting arrays:
sort keys %hash
keys %hash returns a list, and sort sorts that list.
Just like with arrays, the sorting won't be preserved unless you do something with it.
With arrays, we typically assigned the sorted array to a variable name. You can also
do that with your sorted list of hash keys - just assign it to a new array. But because
hashes are untrinsically unordered, you can't "sort the hash" and have it stay sorted
until you do whatever you want to do to it. Instead, you can sort the keys within a
foreach loop, and do what you plan to do to each key or value as you loop across the
sorted list of keys:
foreach my $key (sort keys %hash) {
#do something, like printing out the key and its value
print "$key $hash{$key}\n";
}
So here we use keys %hash to get a list of keys, and sort to sort it. We then work
through this list, one element at a time, assigning each element in turn to the control
variable $key and working through the commands in the loop. Note that, as always,
your hash, control variable etc can be called whatever you like - I have just calledthem $key and %hash for simplicity.
E x e r c i s e
Download the file seqlengths23.txt. Write script to read in this data (gene names and
their respective gene lengths), and print out the same info, but sorted ASCIIbetically
by gene name. [Solution]
http://www.woolfit.net/perl/23sortinghashes/ex23.1.plhttp://www.woolfit.net/perl/23sortinghashes/seqlengths23.txthttp://www.woolfit.net/perl/23sortinghashes/seqlengths23.txthttp://www.woolfit.net/perl/23sortinghashes/ex23.1.plhttp://www.woolfit.net/perl/23sortinghashes/seqlengths23.txt -
7/24/2019 Perl SortingHashes
2/4
S o r t i n g n u m e r i c k e y s
If the keys of your hash are numeric values rather than strings, again you use the
same syntax as you have already used on arrays to sort the list of keys:
foreach my $key (sort {$a $b} keys %hashes) {
# do something
}
keys %hash returns a list of keys, and sort {$a $b} tells Perl to sort the list
numerically. Remember that the spaceship operator sorts numerically, and $a
and $b are the inbuilt variable names to which Perl assigns the elements of your list
in turn while it is doing a pairwise comparison of them. So in this sort, $a and $b are
keys in the list.
E x e r c i s e
Download the file flyids23.txt. It contains data on how many minutes flies flew in aone hour period, where the first column is the fly individual's ID, and the second
column is the number of minutes it flew. Print out the fly IDs and their flight times for
those flies that flew more than 10 minutes in the hour. Sort the output by fly ID,
smallest to largest. [Solution]
S o r t i n g b y n u m e r i c h a s h v a l u e s
If you want to sort your hash by values rather than by keys (e.g. for the data from
the exercise above, you may want to sort the output by flying time rather than fly
ID), you modify the syntax slightly. Now, if you are sorting a hash by keys, and the
keys are numeric, you write:
foreach my $key (sort { $a $b } keys %hashes) { do something; }
If instead you want to sort by numeric hash values, you'd write:
foreach my $key (sort { $hash{$a} $hash{$b} } keys %hashes) { do something; }
In both cases, $a and $b refer to whatever two keys from the list of keys in the hash
are being compared at that point in time. When you type $hash{$a}, the value of
this variable name is the value associated with the key $a in the hash %hash, and
similarly $hash{$b} refers to the value in the hash that has the key $b. Since thetwo variables on either side of the are values, that is what Perl will sort your
keys by.
S o r t i n g b y s t r i n g h a s h v a l u e s
The syntax modification works the same way for sorting values ASCIIbetically, but it
requires us to use the fuller syntax for ASCIIbetical sorting. Remember from sorting
arrays that you can sort the elements of an array ASCIIbetically using either of these
two syntaxes:
http://www.woolfit.net/perl/23sortinghashes/ex23.2.plhttp://www.woolfit.net/perl/23sortinghashes/flyids23.txt -
7/24/2019 Perl SortingHashes
3/4
my @sortedarray = sort @array;
OR
my @sortedarray = sort { $a cmp $b } @array;
Because ASCIIbetical sorting is the default sort in Perl, you can leave out the bit { $a
cmp $b } which tells Perl to sort ASCIIbetically in ascending order. The same is true
when you're sorting the keys of a hash ASCIIbetically - you can use either of these
formats:
foreach my $key (sort keys %hash) { do something; }
OR
foreach my $key (sort { $a cmp $b } keys %hash) { do something; }
What if you want to sort a hash ASCIIbetically by values? This is the syntax:
foreach my $key (sort { $hash{$a} cmp $hash{$b} } keys %hash) { do something; }
Again here, the two variables on either side of the comparison operator cmp are
values in the hash (the values associated with keys $a and $b), so that is what Perl
will sort your hash on.
S y n t a x s u m m a r y
Here's an overview of the syntax you use to sort by key or value, ASCIIbetically or
numerically. I've put extra spaces in (which Perl doesn't care about) to try and show
the commonalities across all the different cases.
key, ASCII: foreach my $key (sort keys %hash) { }
key, ASCII: foreach my $key (sort { $a cmp $b } keys %hash) { }
key, numeric: foreach my $key (sort { $a $b } keys %hash) { }
value, ASCII: foreach my $key (sort { $hash{$a} cmp $hash{$b} } keys %hash) { }
value, numeric: foreach my $key (sort { $hash{$a} $hash{$b} } keys %hash) { }
The consensus syntax is
foreach my $key (sort { comparison type } keys %hash) {do something;
}
where { comparisons type } tells Perl whether to sort (a) on keys or values, (b)
numerically or ASCIIbetically, and (c) in ascending or descending order (just reverse
the order of $a and $b to get descending order).
E x e r c i s e s
-
7/24/2019 Perl SortingHashes
4/4
Using the input file flyids23.txt that you used for the previous exercise, write a script
to produce an output file containing the same data, but reformatted so that it is
sorted in ascending order by flight time (the values in the second column). [Solution]
Using the input file seqlengths23.txt from the first exercise at the top of the page,
write a script that prints out a list of genes, from longest to shortest. [Solution]
Download the file seqs23.txt. Write a script to read in this data, then print out a list
of genes, sorted by gene name, together with the length of the sequence for each
gene. [Solution]
Now, using seqs23.txt again, write a script that prints out a list of genes, sorted by
gene length. [Solution]
http://www.woolfit.net/perl/23sortinghashes/ex23.6.plhttp://www.woolfit.net/perl/23sortinghashes/seqs23.txthttp://www.woolfit.net/perl/23sortinghashes/ex23.5.plhttp://www.woolfit.net/perl/23sortinghashes/seqs23.txthttp://www.woolfit.net/perl/23sortinghashes/ex23.4.plhttp://www.woolfit.net/perl/23sortinghashes/seqlengths23.txthttp://www.woolfit.net/perl/23sortinghashes/ex23.3.plhttp://www.woolfit.net/perl/23sortinghashes/flyids23.txt