Neatly folding-a-tree

Post on 13-Feb-2017

160 views 0 download

Transcript of Neatly folding-a-tree

Neatly Folding a Tree: Functional Perl5 AWS Glacier Hashes

Steven LembarkWorkhorse Computinglembark@wrkhors.com

In the beginning...

There was Spaghetti Code.

And it was bad.

In the beginning...

There was Spaghetti Code.

And it was bad.

So we invented Objects.

In the beginning...

There was Spaghetti Code.

And it was bad.

So we invented Objects.

Now we have Spaghetti Objects.

Alternative: Fucntional Programming

Based on Lambda Calculus.

Few basic ideas:

Transparency.

Consistency.

Basic rules

Constant data.

Transparent transforms.

Functions require input.

Output determined fully by inputs.

Avoid internal state & side effects.

Catch: It doesn't always work.

time()

random()

readline()

fetchrow_array()

Result: State matters!

Fix: Apply reality.

Where it does: Tree Hash

Used with AWS “Glacier” service.

$0.01/GiB/Month.

Large, cold data (discounts for EiB, PiB).

Uploads require lots of sha256 values.

Digesting large chunks

Uploads chunked in multiples of 1MB.

Digest for each chunk & entire upload.

Result: tree-hash.

Image from Amazon Developer Guide (API Version 2012-06-01) http://docs.aws.amazon.com/amazonglacier/latest/dev/checksum-calculations.html

One solution from Net::Amazon::TreeHashsub calc_tree{ my ($self) = @_; my $prev_level = 0; while (scalar @{ $self->{tree}->[$prev_level] } > 1) { my $curr_level = $prev_level+1; $self->{tree}->[$curr_level] = []; my $prev_tree = $self->{tree}->[$prev_level]; my $curr_tree = $self->{tree}->[$curr_level]; my $len = scalar @$prev_tree; for (my $i = 0; $i < $len; $i += 2) { if ($len - $i > 1) { my $a = $prev_tree->[$i]; my $b = $prev_tree->[$i+1]; push @$curr_tree, { joined => 0, start => $a->{start}, finish => $b->{finish}, hash => sha256( $a->{hash}.$b->{hash} ) }; } else { push @$curr_tree, $prev_tree->[$i]; } } $prev_level = $curr_level;}

Possibly simpler?

Trees are naturally recursive.

Two-step generation:

Split the buffer.

Reduce the hashes.

Pass 1: Reduce the hashes

Reduce pairs.

Until one value remaining.

sub reduce_hash{ # undef for empty list

@_ > 1 or return $_[0];

my $count = @_ / 2 + @_ % 2;

reduce_hash map {

@_ > 1? sha256 splice @_, 0, 2: shift

} ( 1 .. $count )}

Pass 1: Reduce the hashes

Reduce pairs.

Until one value remaining.

Catch:

Eats Stack

sub reduce_hash{ # undef for empty list

@_ > 1 or return $_[0];

my $count = @_ / 2 + @_ % 2;

reduce_hash map {

@_ > 1? sha256 splice @_, 0, 2: shift

} ( 1 .. $count )}

Chasing your tail

Tail recursion is common.

"Tail call elimination" recycles stack.

"Fold" is a feature of FP languages.

Reduces the stack to a scalar.

Fold in Perl5

Reset the stack.

Restart the sub.

my $foo =sub{ @_ > 1 or return $_[0];

@_ = … ;

# new in v5.16

goto __SUB__};

Pass 2: Reduce hashes

Viola!

Stack shrinks.

sub reduce_hash{ 2 > @_ and return $_[0];

my $count = @_ / 2 + @_ % 2;

@_ = map {

@_ > 1? sha256 splice @_, 0, 2: @_

} ( 1 .. $count );

goto __SUB__};

Pass 2: Reduce hashes

Viola!

Stack shrinks.

@_ = is ugly.

sub reduce_hash{ 2 > @_ and return $_[0];

my $count = @_ / 2 + @_ % 2;

@_ = map {

@_ > 1? sha256 splice @_, 0, 2: @_

} ( 1 .. $count );

goto __SUB__};

Pass 2: Reduce hashes

Viola!

Stack shrinks.

@_ = is ugly.

goto scares people.

sub reduce_hash{ 2 > @_ and return $_[0];

my $count = @_ / 2 + @_ % 2;

@_ = map {

@_ > 1? sha256 splice @_, 0, 2: @_

} ( 1 .. $count );

goto __SUB__};

"Fold" is an FP Pattern.

use Keyword::Declare;

keyword tree_fold ( Ident $name, Block $new_list ){

qq # this is souce code, not a subref!{

sub $name{

\@_ or return;

( \@_ = do $new_list ) > 1;and goto __SUB__;

$_[0]}

}}

See K::D POD for {{{…}}} to avoid "\@_".

Minimal syntax

tree_fold reduce_hash{

my $count = @_ / 2 + @_ % 2;

map{

@_ > 1? sha256 splice @_, 0, 2: @_

}( 1 .. $count )

}

User supplies generator

a.k.a

$new_list

Minimal syntax

tree_fold reduce_hash{

my $count = @_ / 2 + @_ % 2;

map{

@_ > 1? sha256 splice @_, 0, 2: @_

}( 1 .. $count )

}

User supplies generator.

NQFP: Hacks the stack.

Don't hack the stack

Replace splice with offsets.

tree_fold reduce_hash{

my $last = @_ / 2 + @_ % 2 - 1;

map{

$_[ $_ + 1 ]? sha256 @_[ $_, $_ + 1 ]: $_[ $_ ]

}map{

2 * $_}( 0 .. $last )

}

Don't hack the stack

Replace splice with offsets.

Still messy: @_, stacked map.

tree_fold reduce_hash{

my $last = @_ / 2 + @_ % 2 - 1;

map{

$_[ $_ + 1 ]? sha256 @_[ $_, $_ + 1 ]: $_[ $_ ]

}map{

2 * $_}( 0 .. $last )

}

Using lexical variables

Declare fold_hash with parameters.

Caller uses lexical vars.

keyword tree_fold(

Ident $name,List $argz,Block $stack_op

){

...}

Boilerplate for lexical variables

Extract lexical variables.

See also: PPI::Token

my @varz # ( '$foo', '$bar' )= map{ $_->isa( 'PPI::Token::Symbol' ) ? $_->{ content } : ()}map{ $_->isa( 'PPI::Statement::Expression' ) ? @{ $_->{ children } } : ()}@{ $argz->{ children } };

Boilerplate for lexical variables

my $lexical = join ',' => @varz;my $count = @varz;my $offset = $count -1;

sub $name{

\@_ or return;

my \$last = \@_ % $count? int( \@_ / $count ): int( \@_ / $count ) - 1;

...

Count & offset used to extract stack.

Boilerplate for lexical variables

\@_= map{

my ( $lexical )= \@_[ \$_ .. \$_ + $offset ];

do $stack_op}map{

\$_ * $count}( 0 .. \$last );

Interpolate lexicals, count, offset, stack op.

Chop shop

Not much body left:

tree_fold reduce_hash($left, $rite){ $rite ? sha2656 $left, $rite : $left}

Buffer Size vs. Usr Time

Explicit map, keyword with and without lexicals.

8-32MiB are good chunk sizes.

MiB Explicit Implicit Keyword 1 0.02 0.01 0.02 2 0.03 0.03 0.04 4 0.07 0.07 0.07 8 0.14 0.13 0.10 16 0.19 0.18 0.17 32 0.31 0.30 0.26 64 0.50 0.51 0.49 128 1.00 1.02 1.01 256 2.03 2.03 2.03 512 4.05 4.10 4.06 1024 8.10 8.10 8.11

Result: FP in Perl5

When FP works it is elegant.

Core Perl5 syntax helps:

lvalue

__SUB__

COW strings

Result: FP in Perl5 & Perl6

When FP works it is elegant.

Keywords: True Lazyness ® at its best.

Don't repeat boilerplate.

Multimethods in Perl5.