Neatly folding-a-tree
-
Upload
workhorse-computing-llc -
Category
Technology
-
view
159 -
download
0
Transcript of Neatly folding-a-tree
Neatly Folding a Tree: Functional Perl5 AWS Glacier Hashes
Steven LembarkWorkhorse [email protected]
In the beginning...
There was Spaghetti Code.
And it was bad.
In the beginning...
There was Spaghetti Code.
And it was bad.
So we invented Objects.
In the beginning...
There was Spaghetti Code.
And it was bad.
So we invented Objects.
Now we have Spaghetti Objects.
Alternative: Fucntional Programming
Based on Lambda Calculus.
Few basic ideas:
Transparency.
Consistency.
Basic rules
Constant data.
Transparent transforms.
Functions require input.
Output determined fully by inputs.
Avoid internal state & side effects.
Catch: It doesn't always work.
time()
random()
readline()
fetchrow_array()
Result: State matters!
Fix: Apply reality.
Where it does: Tree Hash
Used with AWS “Glacier” service.
$0.01/GiB/Month.
Large, cold data (discounts for EiB, PiB).
Uploads require lots of sha256 values.
Digesting large chunks
Uploads chunked in multiples of 1MB.
Digest for each chunk & entire upload.
Result: tree-hash.
Image from Amazon Developer Guide (API Version 2012-06-01) http://docs.aws.amazon.com/amazonglacier/latest/dev/checksum-calculations.html
One solution from Net::Amazon::TreeHashsub calc_tree{ my ($self) = @_; my $prev_level = 0; while (scalar @{ $self->{tree}->[$prev_level] } > 1) { my $curr_level = $prev_level+1; $self->{tree}->[$curr_level] = []; my $prev_tree = $self->{tree}->[$prev_level]; my $curr_tree = $self->{tree}->[$curr_level]; my $len = scalar @$prev_tree; for (my $i = 0; $i < $len; $i += 2) { if ($len - $i > 1) { my $a = $prev_tree->[$i]; my $b = $prev_tree->[$i+1]; push @$curr_tree, { joined => 0, start => $a->{start}, finish => $b->{finish}, hash => sha256( $a->{hash}.$b->{hash} ) }; } else { push @$curr_tree, $prev_tree->[$i]; } } $prev_level = $curr_level;}
Possibly simpler?
Trees are naturally recursive.
Two-step generation:
Split the buffer.
Reduce the hashes.
Pass 1: Reduce the hashes
Reduce pairs.
Until one value remaining.
sub reduce_hash{ # undef for empty list
@_ > 1 or return $_[0];
my $count = @_ / 2 + @_ % 2;
reduce_hash map {
@_ > 1? sha256 splice @_, 0, 2: shift
} ( 1 .. $count )}
Pass 1: Reduce the hashes
Reduce pairs.
Until one value remaining.
Catch:
Eats Stack
sub reduce_hash{ # undef for empty list
@_ > 1 or return $_[0];
my $count = @_ / 2 + @_ % 2;
reduce_hash map {
@_ > 1? sha256 splice @_, 0, 2: shift
} ( 1 .. $count )}
Chasing your tail
Tail recursion is common.
"Tail call elimination" recycles stack.
"Fold" is a feature of FP languages.
Reduces the stack to a scalar.
Fold in Perl5
Reset the stack.
Restart the sub.
my $foo =sub{ @_ > 1 or return $_[0];
@_ = … ;
# new in v5.16
goto __SUB__};
Pass 2: Reduce hashes
Viola!
Stack shrinks.
sub reduce_hash{ 2 > @_ and return $_[0];
my $count = @_ / 2 + @_ % 2;
@_ = map {
@_ > 1? sha256 splice @_, 0, 2: @_
} ( 1 .. $count );
goto __SUB__};
Pass 2: Reduce hashes
Viola!
Stack shrinks.
@_ = is ugly.
sub reduce_hash{ 2 > @_ and return $_[0];
my $count = @_ / 2 + @_ % 2;
@_ = map {
@_ > 1? sha256 splice @_, 0, 2: @_
} ( 1 .. $count );
goto __SUB__};
Pass 2: Reduce hashes
Viola!
Stack shrinks.
@_ = is ugly.
goto scares people.
sub reduce_hash{ 2 > @_ and return $_[0];
my $count = @_ / 2 + @_ % 2;
@_ = map {
@_ > 1? sha256 splice @_, 0, 2: @_
} ( 1 .. $count );
goto __SUB__};
"Fold" is an FP Pattern.
use Keyword::Declare;
keyword tree_fold ( Ident $name, Block $new_list ){
qq # this is souce code, not a subref!{
sub $name{
\@_ or return;
( \@_ = do $new_list ) > 1;and goto __SUB__;
$_[0]}
}}
See K::D POD for {{{…}}} to avoid "\@_".
Minimal syntax
tree_fold reduce_hash{
my $count = @_ / 2 + @_ % 2;
map{
@_ > 1? sha256 splice @_, 0, 2: @_
}( 1 .. $count )
}
User supplies generator
a.k.a
$new_list
Minimal syntax
tree_fold reduce_hash{
my $count = @_ / 2 + @_ % 2;
map{
@_ > 1? sha256 splice @_, 0, 2: @_
}( 1 .. $count )
}
User supplies generator.
NQFP: Hacks the stack.
Don't hack the stack
Replace splice with offsets.
tree_fold reduce_hash{
my $last = @_ / 2 + @_ % 2 - 1;
map{
$_[ $_ + 1 ]? sha256 @_[ $_, $_ + 1 ]: $_[ $_ ]
}map{
2 * $_}( 0 .. $last )
}
Don't hack the stack
Replace splice with offsets.
Still messy: @_, stacked map.
tree_fold reduce_hash{
my $last = @_ / 2 + @_ % 2 - 1;
map{
$_[ $_ + 1 ]? sha256 @_[ $_, $_ + 1 ]: $_[ $_ ]
}map{
2 * $_}( 0 .. $last )
}
Using lexical variables
Declare fold_hash with parameters.
Caller uses lexical vars.
keyword tree_fold(
Ident $name,List $argz,Block $stack_op
){
...}
Boilerplate for lexical variables
Extract lexical variables.
See also: PPI::Token
my @varz # ( '$foo', '$bar' )= map{ $_->isa( 'PPI::Token::Symbol' ) ? $_->{ content } : ()}map{ $_->isa( 'PPI::Statement::Expression' ) ? @{ $_->{ children } } : ()}@{ $argz->{ children } };
Boilerplate for lexical variables
my $lexical = join ',' => @varz;my $count = @varz;my $offset = $count -1;
sub $name{
\@_ or return;
my \$last = \@_ % $count? int( \@_ / $count ): int( \@_ / $count ) - 1;
...
Count & offset used to extract stack.
Boilerplate for lexical variables
\@_= map{
my ( $lexical )= \@_[ \$_ .. \$_ + $offset ];
do $stack_op}map{
\$_ * $count}( 0 .. \$last );
Interpolate lexicals, count, offset, stack op.
Chop shop
Not much body left:
tree_fold reduce_hash($left, $rite){ $rite ? sha2656 $left, $rite : $left}
Buffer Size vs. Usr Time
Explicit map, keyword with and without lexicals.
8-32MiB are good chunk sizes.
MiB Explicit Implicit Keyword 1 0.02 0.01 0.02 2 0.03 0.03 0.04 4 0.07 0.07 0.07 8 0.14 0.13 0.10 16 0.19 0.18 0.17 32 0.31 0.30 0.26 64 0.50 0.51 0.49 128 1.00 1.02 1.01 256 2.03 2.03 2.03 512 4.05 4.10 4.06 1024 8.10 8.10 8.11
Result: FP in Perl5
When FP works it is elegant.
Core Perl5 syntax helps:
lvalue
__SUB__
COW strings
Result: FP in Perl5 & Perl6
When FP works it is elegant.
Keywords: True Lazyness ® at its best.
Don't repeat boilerplate.
Multimethods in Perl5.