Neatly folding-a-tree

32
Neatly Folding a Tree: Functional Perl5 AWS Glacier Hashes Steven Lembark Workhorse Computing [email protected]

Transcript of Neatly folding-a-tree

Page 1: Neatly folding-a-tree

Neatly Folding a Tree: Functional Perl5 AWS Glacier Hashes

Steven LembarkWorkhorse [email protected]

Page 2: Neatly folding-a-tree

In the beginning...

There was Spaghetti Code.

And it was bad.

Page 3: Neatly folding-a-tree

In the beginning...

There was Spaghetti Code.

And it was bad.

So we invented Objects.

Page 4: Neatly folding-a-tree

In the beginning...

There was Spaghetti Code.

And it was bad.

So we invented Objects.

Now we have Spaghetti Objects.

Page 5: Neatly folding-a-tree

Alternative: Fucntional Programming

Based on Lambda Calculus.

Few basic ideas:

Transparency.

Consistency.

Page 6: Neatly folding-a-tree

Basic rules

Constant data.

Transparent transforms.

Functions require input.

Output determined fully by inputs.

Avoid internal state & side effects.

Page 7: Neatly folding-a-tree

Catch: It doesn't always work.

time()

random()

readline()

fetchrow_array()

Result: State matters!

Fix: Apply reality.

Page 8: Neatly folding-a-tree

Where it does: Tree Hash

Used with AWS “Glacier” service.

$0.01/GiB/Month.

Large, cold data (discounts for EiB, PiB).

Uploads require lots of sha256 values.

Page 9: Neatly folding-a-tree

Digesting large chunks

Uploads chunked in multiples of 1MB.

Digest for each chunk & entire upload.

Result: tree-hash.

Page 10: Neatly folding-a-tree

Image from Amazon Developer Guide (API Version 2012-06-01) http://docs.aws.amazon.com/amazonglacier/latest/dev/checksum-calculations.html

Page 11: Neatly folding-a-tree

One solution from Net::Amazon::TreeHashsub calc_tree{ my ($self) = @_; my $prev_level = 0; while (scalar @{ $self->{tree}->[$prev_level] } > 1) { my $curr_level = $prev_level+1; $self->{tree}->[$curr_level] = []; my $prev_tree = $self->{tree}->[$prev_level]; my $curr_tree = $self->{tree}->[$curr_level]; my $len = scalar @$prev_tree; for (my $i = 0; $i < $len; $i += 2) { if ($len - $i > 1) { my $a = $prev_tree->[$i]; my $b = $prev_tree->[$i+1]; push @$curr_tree, { joined => 0, start => $a->{start}, finish => $b->{finish}, hash => sha256( $a->{hash}.$b->{hash} ) }; } else { push @$curr_tree, $prev_tree->[$i]; } } $prev_level = $curr_level;}

Page 12: Neatly folding-a-tree

Possibly simpler?

Trees are naturally recursive.

Two-step generation:

Split the buffer.

Reduce the hashes.

Page 13: Neatly folding-a-tree

Pass 1: Reduce the hashes

Reduce pairs.

Until one value remaining.

sub reduce_hash{ # undef for empty list

@_ > 1 or return $_[0];

my $count = @_ / 2 + @_ % 2;

reduce_hash map {

@_ > 1? sha256 splice @_, 0, 2: shift

} ( 1 .. $count )}

Page 14: Neatly folding-a-tree

Pass 1: Reduce the hashes

Reduce pairs.

Until one value remaining.

Catch:

Eats Stack

sub reduce_hash{ # undef for empty list

@_ > 1 or return $_[0];

my $count = @_ / 2 + @_ % 2;

reduce_hash map {

@_ > 1? sha256 splice @_, 0, 2: shift

} ( 1 .. $count )}

Page 15: Neatly folding-a-tree

Chasing your tail

Tail recursion is common.

"Tail call elimination" recycles stack.

"Fold" is a feature of FP languages.

Reduces the stack to a scalar.

Page 16: Neatly folding-a-tree

Fold in Perl5

Reset the stack.

Restart the sub.

my $foo =sub{ @_ > 1 or return $_[0];

@_ = … ;

# new in v5.16

goto __SUB__};

Page 17: Neatly folding-a-tree

Pass 2: Reduce hashes

Viola!

Stack shrinks.

sub reduce_hash{ 2 > @_ and return $_[0];

my $count = @_ / 2 + @_ % 2;

@_ = map {

@_ > 1? sha256 splice @_, 0, 2: @_

} ( 1 .. $count );

goto __SUB__};

Page 18: Neatly folding-a-tree

Pass 2: Reduce hashes

Viola!

Stack shrinks.

@_ = is ugly.

sub reduce_hash{ 2 > @_ and return $_[0];

my $count = @_ / 2 + @_ % 2;

@_ = map {

@_ > 1? sha256 splice @_, 0, 2: @_

} ( 1 .. $count );

goto __SUB__};

Page 19: Neatly folding-a-tree

Pass 2: Reduce hashes

Viola!

Stack shrinks.

@_ = is ugly.

goto scares people.

sub reduce_hash{ 2 > @_ and return $_[0];

my $count = @_ / 2 + @_ % 2;

@_ = map {

@_ > 1? sha256 splice @_, 0, 2: @_

} ( 1 .. $count );

goto __SUB__};

Page 20: Neatly folding-a-tree

"Fold" is an FP Pattern.

use Keyword::Declare;

keyword tree_fold ( Ident $name, Block $new_list ){

qq # this is souce code, not a subref!{

sub $name{

\@_ or return;

( \@_ = do $new_list ) > 1;and goto __SUB__;

$_[0]}

}}

See K::D POD for {{{…}}} to avoid "\@_".

Page 21: Neatly folding-a-tree

Minimal syntax

tree_fold reduce_hash{

my $count = @_ / 2 + @_ % 2;

map{

@_ > 1? sha256 splice @_, 0, 2: @_

}( 1 .. $count )

}

User supplies generator

a.k.a

$new_list

Page 22: Neatly folding-a-tree

Minimal syntax

tree_fold reduce_hash{

my $count = @_ / 2 + @_ % 2;

map{

@_ > 1? sha256 splice @_, 0, 2: @_

}( 1 .. $count )

}

User supplies generator.

NQFP: Hacks the stack.

Page 23: Neatly folding-a-tree

Don't hack the stack

Replace splice with offsets.

tree_fold reduce_hash{

my $last = @_ / 2 + @_ % 2 - 1;

map{

$_[ $_ + 1 ]? sha256 @_[ $_, $_ + 1 ]: $_[ $_ ]

}map{

2 * $_}( 0 .. $last )

}

Page 24: Neatly folding-a-tree

Don't hack the stack

Replace splice with offsets.

Still messy: @_, stacked map.

tree_fold reduce_hash{

my $last = @_ / 2 + @_ % 2 - 1;

map{

$_[ $_ + 1 ]? sha256 @_[ $_, $_ + 1 ]: $_[ $_ ]

}map{

2 * $_}( 0 .. $last )

}

Page 25: Neatly folding-a-tree

Using lexical variables

Declare fold_hash with parameters.

Caller uses lexical vars.

keyword tree_fold(

Ident $name,List $argz,Block $stack_op

){

...}

Page 26: Neatly folding-a-tree

Boilerplate for lexical variables

Extract lexical variables.

See also: PPI::Token

my @varz # ( '$foo', '$bar' )= map{ $_->isa( 'PPI::Token::Symbol' ) ? $_->{ content } : ()}map{ $_->isa( 'PPI::Statement::Expression' ) ? @{ $_->{ children } } : ()}@{ $argz->{ children } };

Page 27: Neatly folding-a-tree

Boilerplate for lexical variables

my $lexical = join ',' => @varz;my $count = @varz;my $offset = $count -1;

sub $name{

\@_ or return;

my \$last = \@_ % $count? int( \@_ / $count ): int( \@_ / $count ) - 1;

...

Count & offset used to extract stack.

Page 28: Neatly folding-a-tree

Boilerplate for lexical variables

\@_= map{

my ( $lexical )= \@_[ \$_ .. \$_ + $offset ];

do $stack_op}map{

\$_ * $count}( 0 .. \$last );

Interpolate lexicals, count, offset, stack op.

Page 29: Neatly folding-a-tree

Chop shop

Not much body left:

tree_fold reduce_hash($left, $rite){ $rite ? sha2656 $left, $rite : $left}

Page 30: Neatly folding-a-tree

Buffer Size vs. Usr Time

Explicit map, keyword with and without lexicals.

8-32MiB are good chunk sizes.

MiB Explicit Implicit Keyword 1 0.02 0.01 0.02 2 0.03 0.03 0.04 4 0.07 0.07 0.07 8 0.14 0.13 0.10 16 0.19 0.18 0.17 32 0.31 0.30 0.26 64 0.50 0.51 0.49 128 1.00 1.02 1.01 256 2.03 2.03 2.03 512 4.05 4.10 4.06 1024 8.10 8.10 8.11

Page 31: Neatly folding-a-tree

Result: FP in Perl5

When FP works it is elegant.

Core Perl5 syntax helps:

lvalue

__SUB__

COW strings

Page 32: Neatly folding-a-tree

Result: FP in Perl5 & Perl6

When FP works it is elegant.

Keywords: True Lazyness ® at its best.

Don't repeat boilerplate.

Multimethods in Perl5.