Effective Testing of Apache Accumulo Iterators

Post on 09-Feb-2017

60 views 0 download

Transcript of Effective Testing of Apache Accumulo Iterators

Effective Testing ofApache Accumulo IteratorsJosh ElserAccumulo Summit 20162016/10/11

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Engineer at Hortonworks, Member of the Apache Software Foundation

Top-Level Projects• Apache Accumulo®• Apache Calcite™• Apache Commons ™• Apache HBase ®• Apache Phoenix ™

ASF Incubator• Apache Fluo ™• Apache Gossip ™• Apache Pirk ™• Apache Rya ™• Apache Slider ™

These Apache project names are trademarks or registeredtrademarks of the Apache Software Foundation.

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

A Novel Feature of Apache Accumulo

SortedKeyValueIterator (SKVI or “Iterators”) Computation offload Reduced I/O Rumored to be called “cool” by Jeff Dean

TransformationsServer-Side

Predicate-Pushdown

Filters

Aggregations

Combiners

Versioning

Security

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Accumulo Iterators

Column Slices (CfCqSliceFilter) Basic Statistics (StatsCombiner) Value/Array Concatenation (Summing[Array]Combiner) Aggregations (WholeRowIterator, WholeColumnFamilyIterator) In-Row operations (AndIterator, OrIterator) Filters (RegExFilter, GrepIterator, FirstEntryInRowIterator)

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Reads

Clients request a Range of data Key to Row to Tablet to TabletServer Sorted, merged-read of memory and files Computation offload and RPC boost

Tablet

Memory RFileRFile

RFileRFile

RFileClient

Iterators

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Reads with Iterators

A poor-man’s “VIEW” Server-side transformation at query-time

Raw Key Value Transformed Key Value

3141592 siblings:brothers Bobby,Steven 3141592 siblings:count 4

3141592 siblings:sisters Sally,Francine

3141593 siblings:brothers Frank 3141593 siblings:count 3

3141593 siblings:sisters Amy,Loretta

3141594 siblings:brothers 3141594 siblings:count 2

3141594 siblings:sisters Rebecca,Savannah

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Compactions

Bounds number of files and performance Iterators provide data optimization mechanism

Tablet

RFileRFile

RFileRFile

RFile

RFile

RFile

Before AfterIterators

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Compactions with Iterators Deferred aggregation Rewrite application data in optimal form

Raw Key Value Transformed Key Value

3141592 siblings:brothers Bobby,Steven 3141592 siblings:brothers …

3141592 siblings:count 4

3141592 siblings:sisters Sally,Francine 3141592 siblings:sisters …

3141593 siblings:brothers Frank 3141593 siblings:brothers …

3141593 siblings:count 3

3141593 siblings:sisters Amy,Loretta 3141593 siblings:sisters …

3141594 siblings:brothers 3141594 siblings:brothers …

3141594 siblings:counts 2

3141594 siblings:sisters Rebecca,Savannah 3141594 siblings:sisters …

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Better for Everyone

Iterators are great– Abstraction for system-level filters and optimizations– Better performance for power-users

Lots of things Iterators are not– Triggers– Hooks– Coprocessors– “Hammers”

Iterators do not generally replace– Flink, Hive, Mesos, Presto, Storm, Spark, YARN, etc– Can in some cases

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

On Building an Iterator

The API is not particularly intuitive

Hard to create/support SKVIv2

Edge-cases in production are hard to understand

Lots of things to not do in an Iterator– Trial and error

Difficult insight in production systems

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Good– Fast– Concise/Simple– Given input, verify output

Bad– Not end-to-end– Not representative invocation

Unit Testing Good

– Same server execution as production– Same client interaction as production

Bad– Slow/Memory intensive– Pedantic to write tests– Might not catch production edge-cases– Impacted by environment

MiniAccumuloCluster (MAC) Testing

Existing Testing Tools

What’s the happy medium?

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Iterator Testing Harness

Testing harness designed to capture common pitfalls– ACCUMULO-626 in >=1.8.0

Complementary The good parts

– Fast– Generalized/Reusable tests– Extensible

The bad parts– Not directly using TabletServer for invocation– Subtle failures

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Iterator Testing Harness

Testing an Iterator requires three things– Input data– Expected output– Collection of test cases to run

Test cases found via reflection– Common edge cases provided– Easy to develop and run new test cases

JUnit4 integration

@Parameters public static Object[][] data() { IteratorTestInput input = createIteratorInput(); IteratorTestOutput expectedOutput = createIteratorOuput(); List<IteratorTestCase> testCases = createTestCases(); return BaseJUnit4IteratorTest.createParameters(input,

expectedOutput, testCases); }

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Example Test Cases

Iterator Instantiation– Does the Iterator have a visibile no-args constructor?

”DeepCopy” safety– Can a “deepCopy()” of an Iterator be used like the original?

Stateless “hasTop()”– Do multiple invocations of “hasTop()” cause incorrect results/errors?

Re-seek()’ing– Accumulo will re-instantiate scan sessions and use new Ranges– Does the Iterator still return correct results in this case?

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

In an Ideal World

Good testing means faster deployments Faster deployment means more value for customers Automated tests combats technical debt in code growth More automation reduces developer stress

Unit Tests MiniAccumuloCluster Iterator Testing Harness+ + =

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

In an Ideal World

Unit Tests (test lifecycle phase)– Fast verification given input/output– Validate impact of state

Iterator Testing Harness (test lifecycle phase)– Catch common-mistakes– Basic lifetime/API validation– Encourage best-practices

MiniAccumuloCluster (integration-test lifecycle phase)– Functional/Acceptance tests– Does the ingest/query system function– Real execution of Iterator by TabletServer

A Trio of Testing Approaches

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Standalone environment– The ”laptop test”– Sanity check

Staging environments– Small cluster with a subset of data– Correctness and performance

In an Ideal World

Code

MAC

IteratorTest Harness

Unit Tests

BinaryArtifacts

Standalone

Staging

ProductionDeploy

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

In an Ideal World

No more ”voodoo” and “black magic” Find common errors fast Catch bad Iterator design early Standardized testing methodology Community contributes new tests Increase in quality, reusability, and confidence

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank YouTwitter: @josh_elserEmail: elserj@apache.org / jelser@hortonworks.com