Map grep sort

Post on 15-Jul-2015

606 views 3 download

Transcript of Map grep sort

Copyright 2014 Daina Pettit

map, grep, sort – slide 1

Streamlining and simplifying your Perl code using

Map, Grep, and Sort

Daina Pettit

dpettit@bluehost.com

daina@xmission.com

Copyright 2014 Daina Pettit

map, grep, sort – slide 2

“Perl culture” sometimes gets shortened to “Perl cult”.*

Larry Wall

*Wall, Larry, Perl, the first postmodern computer language, Linux World [Conference], March 3, 1999

Copyright 2014 Daina Pettit

map, grep, sort – slide 3

Overview● What are map, grep, & sort and why should I care?● map details● grep details● sort details● Combining map, grep, & sort● Advanced combinations

● Schwartzian Transform● Orcish Maneuver● Guttman-Rosler Transform● Alternatives

Copyright 2014 Daina Pettit

map, grep, sort – slide 4

What are they?

map, grep, & sort are iterator functions that operate on lists or arrays.

1. map performs action on each element.

2. grep tests each element.

3. sort orders the elements.

Copyright 2014 Daina Pettit

map, grep, sort – slide 5

General Form

All have similar forms.

@array = map  { exp } @list;@array = grep { exp } @list;@array = sort { exp } @list;

and

@array = map  exp, @list;@array = grep exp, @list;@array = sort      @list;

Copyright 2014 Daina Pettit

map, grep, sort – slide 6

General Form—code blocks

Damian Conway in Perl Best Practices* recommends:

“Always use a block with a map and grep”

This is a syntactic aid suggestion to help you prevent yourself from making an error with grouping arguments. Block enclosures actually incur more overhead. Not much, but some.

*Conway, Damian, Perl Best Practices, O'Reilly Media, Sebastopol, CA, 2005, pp 169-170.

@array = map  { exp } @list;@array = grep { exp } @list;

Copyright 2014 Daina Pettit

map, grep, sort – slide 7

What is map?

Map is essentially a loop that processes a list, much like a foreach loop.

foreach $line ( @lines ) {$line = uc $line;

}

Copyright 2014 Daina Pettit

map, grep, sort – slide 8

What is map?

Map is essentially a loop that processes a list, much like a foreach loop.

foreach $line ( @lines ) {$line = uc $line;

}

@lines = map uc, @lines;

Copyright 2014 Daina Pettit

map, grep, sort – slide 9

What is map?

Map is essentially a loop that processes a list, much like a foreach loop.

foreach $line ( @lines ) {$line = uc $line;

}

@lines = map uc, @lines;

@lines = map { uc } @lines;

Copyright 2014 Daina Pettit

map, grep, sort – slide 10

Aside—foreach inside-out

Alternate single line foreach is concise as map, and is slightly faster than map, but more cryptic.

@lines = map uc, @lines;

Copyright 2014 Daina Pettit

map, grep, sort – slide 11

Aside—foreach inside-out

Alternate single line foreach is concise as map, and is slightly faster than map, but more cryptic.

@lines = map uc, @lines;

$_ = uc foreach @lines;

Copyright 2014 Daina Pettit

map, grep, sort – slide 12

Aside—foreach inside-out

Alternate single line foreach is concise as map, and is slightly faster than map, but more cryptic.

@lines = map uc, @lines;

$_ = uc foreach @lines;

foreach ( @lines ) {    $_ = uc;}

Copyright 2014 Daina Pettit

map, grep, sort – slide 13

What is the best way to use map?

● map is best for creating new lists. ● foreach is best for transforming a list.

@words = map { split } @lines;

foreach ( @lines ) {$_ = uc;

}

Copyright 2014 Daina Pettit

map, grep, sort – slide 14

Dumping out a hash alternatives

foreach ( sort keys %h ) {    print "$_ => $h{$_}\n";}

Copyright 2014 Daina Pettit

map, grep, sort – slide 15

Dumping out a hash alternatives

foreach ( sort keys %h ) {    print "$_ => $h{$_}\n";}

map {     print "$_ => $h{$_}\n" } sort keys %h;

Copyright 2014 Daina Pettit

map, grep, sort – slide 16

Dumping out a hash alternatives

foreach ( sort keys %h ) {    print "$_ => $h{$_}\n";}

map {     print "$_ => $h{$_}\n" } sort keys %h;

print "$_ => $h{$_}\n"     foreach sort keys %h;

Copyright 2014 Daina Pettit

map, grep, sort – slide 17

map {} is list context

Damian Conway in Perl Even-Better Practices* recommends:

"Use explicitly scalar map expressions"

*Thoughtstream Pty Ltd, 2013 pp 10-11

@dates = map {     localtime $_   # Wrong!  } @epoch_times;

Copyright 2014 Daina Pettit

map, grep, sort – slide 18

map {} is list context

Damian Conway in Perl Even-Better Practices* recommends:

"Use explicitly scalar map expressions"

*Thoughtstream Pty Ltd, 2013 pp 10-11

@dates = map {     scalar localtime $_   } @epoch_times;

Copyright 2014 Daina Pettit

map, grep, sort – slide 19

map {} is list context

Damian Conway in Perl Even-Better Practices* recommends:

"Use explicitly scalar map expressions"

*Thoughtstream Pty Ltd, 2013 pp 10-11

@words = map {     scalar split   # Wrong!} @lines;

Copyright 2014 Daina Pettit

map, grep, sort – slide 20

map {} is list context

Damian Conway in Perl Even-Better Practices* recommends:

"Use explicitly scalar map expressions"

*Thoughtstream Pty Ltd, 2013 pp 10-11

@words = map {     split} @lines;

Copyright 2014 Daina Pettit

map, grep, sort – slide 21

map {} confusion

How does perl know that { 6 } is a code block or a partial hash? Use +{ 6 }. + is required or you will get a syntax error.

map +{ 6 }, @stuff; # hashmap  { 6 }  @stuff; # code block

Copyright 2014 Daina Pettit

map, grep, sort – slide 22

Using map in void context● Frowned upon. ● Incurs extra overhead.

map {     print "$_ => $h{$_}\n" } sort keys %h;

Copyright 2014 Daina Pettit

map, grep, sort – slide 23

Creating a hash in map

map { $age_of{$_} = ­M } @files;

Copyright 2014 Daina Pettit

map, grep, sort – slide 24

Creating a hash in map

map { $age_of{$_} = ­M } @files;

foreach ( @files ) {    $age_of{$_} = ­M;}

Copyright 2014 Daina Pettit

map, grep, sort – slide 25

Creating a hash in map

map { $age_of{$_} = ­M } @files;

foreach ( @files ) {    $age_of{$_} = ­M;}

$age_of{$_} = ­M for @files;

Copyright 2014 Daina Pettit

map, grep, sort – slide 26

Skipping in map● Drop an item using an empty list.● Do NOT use an explicit return.

@ones = map {     $_ < 10 ? $_ : (); } @numbers;

Copyright 2014 Daina Pettit

map, grep, sort – slide 27

What is grep?● Similar to Unix command-line utility grep● Given a list, grep returns only certain items

@ones = map {     $_ < 10 ? $_ : (); } @numbers;

Copyright 2014 Daina Pettit

map, grep, sort – slide 28

What is grep?● Similar to Unix command-line utility grep● Given a list, grep returns only certain items

@ones = map {     $_ < 10 ? $_ : (); } @numbers;

@ones = grep { $_ < 10 } @numbers;

Copyright 2014 Daina Pettit

map, grep, sort – slide 29

Boolean Scalar Context● Anywhere in perl where a true/false is expected

—if, while, and, or, not, &&, ||, !, etc.● Evaluation results in 0, “0”, 0.0, “”, or undef then

it is false. Everything else is true.

if (   0     ) {} # Falseif ( 400     ) {} # Trueif (  ­1     ) {} # Trueif ( "false" ) {} # True!if ( "00"    ) {} # True!undef $x;if (  $x     ) {} # False

Copyright 2014 Daina Pettit

map, grep, sort – slide 30

Examples of grep● Expression can be any valid perl expression.● Expression is in scalar boolean context.

@ones = grep { $_ < 10 } @numbers;

@dirs = grep { ­d } @files;

@no_dup = grep { ! $h{$_}++ } @old;

@errors = grep { /error/i } @log;

@true = grep { $_ } @all;

Copyright 2014 Daina Pettit

map, grep, sort – slide 31

Sorting Basics

Sort can be called in three ways:

1. With no comparison directives

2. With a subroutine that returns comparison directives

3. With a code block (an anonymous subroutine) that returns comparison directives

@sorted = sort         @unsorted;@sorted = sort   sub   @unsorted;@sorted = sort { exp } @unsorted;

Copyright 2014 Daina Pettit

map, grep, sort – slide 32

Sorting Basics

Sort requires the comparison directives value of -1, 0, or 1 to tell whether any two elements, $a and $b, are in order (-1), the same (0), or out of order (1).

cmp and <=> conveniently provide this for string or numeric comparisons, respectively.

We don't have to use cmp and <=>. We just have to return -1, 0, or 1.

$a <=> $b

Copyright 2014 Daina Pettit

map, grep, sort – slide 33

Sorting Basics

Basic ASCII-betical sort:

Basic numeric sort:

@sorted = sort @list;

@sorted = sort { $a <=> $b } @list;

Copyright 2014 Daina Pettit

map, grep, sort – slide 34

Sorting Basics

Basic ASCII-betical sort:

Basic numeric sort:

@sorted = sort { $a cmp $b } @list;

@sorted = sort { $a <=> $b } @list;

Copyright 2014 Daina Pettit

map, grep, sort – slide 35

Sorting Basics--reverse

Reverse ASCII-betical sort:

Reverse numeric sort:

@sorted = sort { $b cmp $a } @list;

@sorted = sort { $b <=> $a } @list;

Copyright 2014 Daina Pettit

map, grep, sort – slide 36

Sorting Basics--reverse

Or just use reverse function:

Reverse numeric sort:

@sorted = reverse sort @list;

@sorted = reverse sort { $a <=> $b }     @list;

Copyright 2014 Daina Pettit

map, grep, sort – slide 37

Sorting Basics--subroutine

Using a subroutine instead of a code block

You can also use anonymous subroutines.

These subroutines cannot be recursive!

sub compare {uc ( $a ) cmp uc ( $b ); 

}

$comp = \&compare;

@sorted = sort $comp @list;

Copyright 2014 Daina Pettit

map, grep, sort – slide 38

Complicated Sorting

You can sort on anything you can get to through $a and $b.

@sorted = sort {    @array_a = split / /, $a;    @array_b = split / /, $b;  $array_a[5] cmp $array_b[5];

} @lines;

Copyright 2014 Daina Pettit

map, grep, sort – slide 39

Complicated Sorting

Sorting hash keys

Sorting hash keys by value

@sorted_keys = sort keys %hash;

@sorted_keys = sort {     $hash{$a} cmp $hash{$b} } keys %hash;

Copyright 2014 Daina Pettit

map, grep, sort – slide 40

Complicated Sorting

We can sorting with multiple keys such as sort by year, then by month, then by day even if the data is mm-dd-yyyy.

@sorted_dates = sort {     ( $ma, $da, $ya ) = split /­/, $a;    ( $mb, $db, $yb ) = split /­/, $b;    $ya<=>$yb || $ma<=>$mb || $da<=>$db;} @dates;

Copyright 2014 Daina Pettit

map, grep, sort – slide 41

Complicated Sorting

We don't have to always use the comparison operators. We can make up our own unique order.

@order = sort {     return ­1 if $a eq 'King' &&                 $b ne 'King';  return  1 if $a ne 'King' &&               $b eq 'King';

    return  0;    } @cards; # King first,    # the rest doesn't matter.

Copyright 2014 Daina Pettit

map, grep, sort – slide 42

Combinations

Since map, grep, and sort both take and return lists, you can chain them together.

@pics = map { lc }         grep { /\.jpe?g$/i }         sort @list;

Copyright 2014 Daina Pettit

map, grep, sort – slide 43

Optimizing sort

Given a list of files, sort by the age of the files.

chomp ( @files = `ls ­1` );

“file1” “file7” “a.out” “x.pl” “5.dat”

Copyright 2014 Daina Pettit

map, grep, sort – slide 44

Optimizing sort

Sorts by name, but not by age.

@sorted = sort @files; # by name

“file1” “file7” “a.out” “x.pl” “5.dat”

Copyright 2014 Daina Pettit

map, grep, sort – slide 45

Optimizing sort

Sorts by date, but slow for large data sets.

­M is called twice every time sort compares!

@sorted = sort {       ­M $a <=> ­M $b } @files;

“file1” “file7” “a.out” “x.pl” “5.dat”

Copyright 2014 Daina Pettit

map, grep, sort – slide 46

Optimizing sort

We want to call ­M once for each file, save that and use that each time sort needs to compare.

Map will do this for us!

@order =     map { [ $_, ­M ] } @files;

“file1” “file7” “a.out” “x.pl” “5.dat”

1.2 2.9 3.1 1.1 2.9

Copyright 2014 Daina Pettit

map, grep, sort – slide 47

Optimizing sort

Then we want to sort based on just the date part.

But now we need to get rid of the date part.

@order =     sort { $a­>[1] <=> $b­>[1] }    map { [ $_, ­M ] } @files;

“x.pl” “file1” “5.dat” “file7” “a.out”

1.1 1.2 2.9 2.9 3.1

Copyright 2014 Daina Pettit

map, grep, sort – slide 48

Optimizing sortNow use map to extract just element 0 and we are back to the original list and sorted by date.

This is known as the Schwartzian Transform.**Perl idiom named for Randal Schwartz, author of Learning Perl, coined by Tom Christiansen.

@order =     map { $_­>[0] }    sort { $a­>[1] <=> $b­>[1] }    map { [ $_, ­M ] } @files;

“x.pl” “file1” “5.dat” “file7” “a.out”

Copyright 2014 Daina Pettit

map, grep, sort – slide 49

Optimizing sort

Key points to remember for ST:● map sort map idiom

@order =     map { $_­>[0] }    sort { $a­>[1] <=> $b­>[1] }    map { [ $_, ­M ] } @files;

Copyright 2014 Daina Pettit

map, grep, sort – slide 50

Optimizing sort

Key points to remember for ST:● map sort map idiom● Use proper comparison

@order =     map { $_­>[0] }    sort { $a­>[1] <=> $b­>[1] }    map { [ $_, ­M ] } @files;

Copyright 2014 Daina Pettit

map, grep, sort – slide 51

Optimizing sort

Key points to remember for ST:● map sort map idiom● Use proper comparison● Extract value to compare

@order =     map { $_­>[0] }    sort { $a­>[1] <=> $b­>[1] }    map { [ $_, ­M ] } @files;

Copyright 2014 Daina Pettit

map, grep, sort – slide 52

Optimizing sort

Key points to remember for ST:● map sort map idiom● Use proper comparison● Extract value to compare● Everything else stays the same.

@order =     map { $_­>[0] }    sort { $a­>[1] <=> $b­>[1] }    map { [ $_, ­M ] } @files;

Copyright 2014 Daina Pettit

map, grep, sort – slide 53

Optimizing sort—Orcish Maneuver*

Uses “or” cache (in a hash) to remember values already computed: ||=

● Simpler than ST● Almost as fast as ST● Faster if list contains duplicates

*Term coined by Joseph Hall in Effective Perl Programming, Addison-Wesley Professional, Boston, MA, 1998.

@order = sort {     ( $cache{$a} ||= ­M $a ) <=>     ( $cache{$b} ||= ­M $b ) }    @files;

Copyright 2014 Daina Pettit

map, grep, sort – slide 54

Optimizing sort—Orcish Maneuver

Key points to remember for OM:● Only sort

@order = sort {     ( $cache{$a} ||= ­M $a ) <=>     ( $cache{$b} ||= ­M $b ) }    @files;

Copyright 2014 Daina Pettit

map, grep, sort – slide 55

Optimizing sort—Orcish Maneuver

Key points to remember for OM:● Only sort● Compute comparison data

@order = sort {     ( $cache{$a} ||= ­M $a ) <=>     ( $cache{$b} ||= ­M $b ) }    @files;

Copyright 2014 Daina Pettit

map, grep, sort – slide 56

Optimizing sort—Orcish Maneuver

Key points to remember for OM:● Only sort● Compute comparison data● Use proper comparison operator

@order = sort {     ( $cache{$a} ||= ­M $a ) <=>     ( $cache{$b} ||= ­M $b ) }    @files;

Copyright 2014 Daina Pettit

map, grep, sort – slide 57

Optimizing sort—Orcish Maneuver

Key points to remember for OM:● Only sort● Compute comparison data● Use proper comparison operator

Everything else stays the same.

@order = sort {     ( $cache{$a} ||= ­M $a ) <=>     ( $cache{$b} ||= ­M $b ) }    @files;

Copyright 2014 Daina Pettit

map, grep, sort – slide 58

Optimizing sort—Guttman-Rosler Transform*

This is a tweak on ST. Takes advantage of substr and sprintf being faster than array manipulation. Also uses default string sort which is slightly faster.

*A Fresh Look at Efficient Perl Sorting, Uri Guttman and Larry Rosler, approx. 1999.

@order = map { substr $_, 10 }    sort    map { m#(\d{4})/(\d+)/(\d+)#;        sprintf "%d­%02d­%02d%s",             $1, $2, $3, $_    } @dates;

Copyright 2014 Daina Pettit

map, grep, sort – slide 59

Optimizing sort—Guttman-Rosler Transform

Faster than ST

Harder to code and less readable

Not suitable for all sorts

@order = map { substr $_, 10 }    sort    map { m#(\d{4})/(\d+)/(\d+)#;        sprintf "%d­%02d­%02d%s",             $1, $2, $3, $_    } @dates;

Copyright 2014 Daina Pettit

map, grep, sort – slide 60

Further List & Sort Options

List::Util

shuffle, reduce, any, first, max, min, ...

List::MoreUtils

uniq, natatime, ...

Sort::Key

May be faster than ST or GRT

Sort::Naturally

Automatically sorts numeric when appropriate

Sort::Maker

Internally uses OM, ST, or GRT.

Copyright 2014 Daina Pettit

map, grep, sort – slide 61

Q&A

Comments?

Questions?

Copyright 2014 Daina Pettit

map, grep, sort – slide 62

Resources

http://www.perlmonks.org

http://www.cpan.org

http://www.hidemail.de/blog/perl_tutor.shtml

http://perldoc.perl.org/

http://www.stonehenge.com/writing.html

For profiling:

perldoc Devel::NYTProf

perldoc Benchmark