Perl SortingHashes

download Perl SortingHashes

of 4

Transcript of Perl SortingHashes

  • 7/24/2019 Perl SortingHashes

    1/4

    S o r t i n g ha s h e s

    by key or by value

    Sorting hashes, whether by key or by value, also starts with a list of the keys in the

    hash. You can then sort the keys or values of the hash using syntax very similar tothat you already know for sorting arrays.

    S o r t i n g A S C I I b e t i c a l l y b y k e y s

    Consider the simplest case, where we want to sort the keys of a hash ASCIIbetically.

    For example, the hash may contain gene sequences as values, and gene names as

    keys, and you want to print out the sequences in fasta format, sorted by gene name.

    You simply get a list of the keys, and then sort them using the same syntax as when

    sorting arrays:

    sort keys %hash

    keys %hash returns a list, and sort sorts that list.

    Just like with arrays, the sorting won't be preserved unless you do something with it.

    With arrays, we typically assigned the sorted array to a variable name. You can also

    do that with your sorted list of hash keys - just assign it to a new array. But because

    hashes are untrinsically unordered, you can't "sort the hash" and have it stay sorted

    until you do whatever you want to do to it. Instead, you can sort the keys within a

    foreach loop, and do what you plan to do to each key or value as you loop across the

    sorted list of keys:

    foreach my $key (sort keys %hash) {

    #do something, like printing out the key and its value

    print "$key $hash{$key}\n";

    }

    So here we use keys %hash to get a list of keys, and sort to sort it. We then work

    through this list, one element at a time, assigning each element in turn to the control

    variable $key and working through the commands in the loop. Note that, as always,

    your hash, control variable etc can be called whatever you like - I have just calledthem $key and %hash for simplicity.

    E x e r c i s e

    Download the file seqlengths23.txt. Write script to read in this data (gene names and

    their respective gene lengths), and print out the same info, but sorted ASCIIbetically

    by gene name. [Solution]

    http://www.woolfit.net/perl/23sortinghashes/ex23.1.plhttp://www.woolfit.net/perl/23sortinghashes/seqlengths23.txthttp://www.woolfit.net/perl/23sortinghashes/seqlengths23.txthttp://www.woolfit.net/perl/23sortinghashes/ex23.1.plhttp://www.woolfit.net/perl/23sortinghashes/seqlengths23.txt
  • 7/24/2019 Perl SortingHashes

    2/4

    S o r t i n g n u m e r i c k e y s

    If the keys of your hash are numeric values rather than strings, again you use the

    same syntax as you have already used on arrays to sort the list of keys:

    foreach my $key (sort {$a $b} keys %hashes) {

    # do something

    }

    keys %hash returns a list of keys, and sort {$a $b} tells Perl to sort the list

    numerically. Remember that the spaceship operator sorts numerically, and $a

    and $b are the inbuilt variable names to which Perl assigns the elements of your list

    in turn while it is doing a pairwise comparison of them. So in this sort, $a and $b are

    keys in the list.

    E x e r c i s e

    Download the file flyids23.txt. It contains data on how many minutes flies flew in aone hour period, where the first column is the fly individual's ID, and the second

    column is the number of minutes it flew. Print out the fly IDs and their flight times for

    those flies that flew more than 10 minutes in the hour. Sort the output by fly ID,

    smallest to largest. [Solution]

    S o r t i n g b y n u m e r i c h a s h v a l u e s

    If you want to sort your hash by values rather than by keys (e.g. for the data from

    the exercise above, you may want to sort the output by flying time rather than fly

    ID), you modify the syntax slightly. Now, if you are sorting a hash by keys, and the

    keys are numeric, you write:

    foreach my $key (sort { $a $b } keys %hashes) { do something; }

    If instead you want to sort by numeric hash values, you'd write:

    foreach my $key (sort { $hash{$a} $hash{$b} } keys %hashes) { do something; }

    In both cases, $a and $b refer to whatever two keys from the list of keys in the hash

    are being compared at that point in time. When you type $hash{$a}, the value of

    this variable name is the value associated with the key $a in the hash %hash, and

    similarly $hash{$b} refers to the value in the hash that has the key $b. Since thetwo variables on either side of the are values, that is what Perl will sort your

    keys by.

    S o r t i n g b y s t r i n g h a s h v a l u e s

    The syntax modification works the same way for sorting values ASCIIbetically, but it

    requires us to use the fuller syntax for ASCIIbetical sorting. Remember from sorting

    arrays that you can sort the elements of an array ASCIIbetically using either of these

    two syntaxes:

    http://www.woolfit.net/perl/23sortinghashes/ex23.2.plhttp://www.woolfit.net/perl/23sortinghashes/flyids23.txt
  • 7/24/2019 Perl SortingHashes

    3/4

    my @sortedarray = sort @array;

    OR

    my @sortedarray = sort { $a cmp $b } @array;

    Because ASCIIbetical sorting is the default sort in Perl, you can leave out the bit { $a

    cmp $b } which tells Perl to sort ASCIIbetically in ascending order. The same is true

    when you're sorting the keys of a hash ASCIIbetically - you can use either of these

    formats:

    foreach my $key (sort keys %hash) { do something; }

    OR

    foreach my $key (sort { $a cmp $b } keys %hash) { do something; }

    What if you want to sort a hash ASCIIbetically by values? This is the syntax:

    foreach my $key (sort { $hash{$a} cmp $hash{$b} } keys %hash) { do something; }

    Again here, the two variables on either side of the comparison operator cmp are

    values in the hash (the values associated with keys $a and $b), so that is what Perl

    will sort your hash on.

    S y n t a x s u m m a r y

    Here's an overview of the syntax you use to sort by key or value, ASCIIbetically or

    numerically. I've put extra spaces in (which Perl doesn't care about) to try and show

    the commonalities across all the different cases.

    key, ASCII: foreach my $key (sort keys %hash) { }

    key, ASCII: foreach my $key (sort { $a cmp $b } keys %hash) { }

    key, numeric: foreach my $key (sort { $a $b } keys %hash) { }

    value, ASCII: foreach my $key (sort { $hash{$a} cmp $hash{$b} } keys %hash) { }

    value, numeric: foreach my $key (sort { $hash{$a} $hash{$b} } keys %hash) { }

    The consensus syntax is

    foreach my $key (sort { comparison type } keys %hash) {do something;

    }

    where { comparisons type } tells Perl whether to sort (a) on keys or values, (b)

    numerically or ASCIIbetically, and (c) in ascending or descending order (just reverse

    the order of $a and $b to get descending order).

    E x e r c i s e s

  • 7/24/2019 Perl SortingHashes

    4/4

    Using the input file flyids23.txt that you used for the previous exercise, write a script

    to produce an output file containing the same data, but reformatted so that it is

    sorted in ascending order by flight time (the values in the second column). [Solution]

    Using the input file seqlengths23.txt from the first exercise at the top of the page,

    write a script that prints out a list of genes, from longest to shortest. [Solution]

    Download the file seqs23.txt. Write a script to read in this data, then print out a list

    of genes, sorted by gene name, together with the length of the sequence for each

    gene. [Solution]

    Now, using seqs23.txt again, write a script that prints out a list of genes, sorted by

    gene length. [Solution]

    http://www.woolfit.net/perl/23sortinghashes/ex23.6.plhttp://www.woolfit.net/perl/23sortinghashes/seqs23.txthttp://www.woolfit.net/perl/23sortinghashes/ex23.5.plhttp://www.woolfit.net/perl/23sortinghashes/seqs23.txthttp://www.woolfit.net/perl/23sortinghashes/ex23.4.plhttp://www.woolfit.net/perl/23sortinghashes/seqlengths23.txthttp://www.woolfit.net/perl/23sortinghashes/ex23.3.plhttp://www.woolfit.net/perl/23sortinghashes/flyids23.txt