Designing and Testing Accumulo Iterators

15
© Hortonworks Inc. 2014 Designing and Testing Accumulo Iterators Josh Elser Member of Technical Staff PMC, Apache Accumulo 10, November 2015 Page 1 Apache, Accumulo, and Apache Accumulo are trademarks of the Apache Software Foundation.

Transcript of Designing and Testing Accumulo Iterators

Page 1: Designing and Testing Accumulo Iterators

© Hortonworks Inc. 2014

Designing and Testing Accumulo IteratorsJosh Elser

Member of Technical StaffPMC, Apache Accumulo

10, November 2015

Page 1

Apache, Accumulo, and Apache Accumulo are trademarks of the Apache Software Foundation.

Page 2: Designing and Testing Accumulo Iterators

© Hortonworks Inc. 2014

Design

Page 2

How do I know if my Iterator works?

What can I do in an Iterator?

How are these methods even called?!

Page 3: Designing and Testing Accumulo Iterators

© Hortonworks Inc. 2014

Common PatternsOnly a certain subset of algorithms fit into Accumulo Iterators well.

(Avoid shoving a square peg into a round hole.)

• Filtering• Reduction• Bounded aggregations

–Keep an upper bound on the number of elements being aggregated to avoid memory issues

• Transformations–Key sort-order must be retained–Best limited to the Value only

Page 3

Page 4: Designing and Testing Accumulo Iterators

© Hortonworks Inc. 2014

Design

Josh’s Iterator Design Principles:

•Always make forward-progress

•Think functional – Avoid unnecessary state

•Operate only on the data you have

•Do one thing and do it efficiently

Page 4

Page 5: Designing and Testing Accumulo Iterators

© Hortonworks Inc. 2014

Design

Page 5

MakeForwardProgress

Start

End

Page 6: Designing and Testing Accumulo Iterators

© Hortonworks Inc. 2014

Think about your Iterator like a function

Unnecessary State

Page 6

def sum(list): sum = 0 for entry in list: sum += entry return sum

• Avoid holding onto state when at all possible.

• Think in terms of a stream rather than chunks of data.

• Beware of memory implications when performing aggregations.

Page 7: Designing and Testing Accumulo Iterators

© Hortonworks Inc. 2014

Operate locally

Daily Reminder: Iterators have no calls for implementing a safe cleanup.

• Iterators cannot properly handle I/O-related issues to external systems.

• Slow-external calls result in slow Accumulo.

• Some problems are more-safely implemented outside of an Accumulo Iterators. Not a Coprocessor/Container.

Page 7

Page 8: Designing and Testing Accumulo Iterators

© Hortonworks Inc. 2014

Simplicity

Avoid doing multiple things in a single Iterator.

•Object Oriented Design 101

• Iterators can be tricky to debug on their own

•Configuring multiple iterators are a feature

Page 8

Page 9: Designing and Testing Accumulo Iterators

© Hortonworks Inc. 2014

Testing

You should always test your code before running it in any environment

to ensure that it functions as intended.

Page 9

Page 10: Designing and Testing Accumulo Iterators

© Hortonworks Inc. 2014

Testing

HOW?Page 10

Page 11: Designing and Testing Accumulo Iterators

© Hortonworks Inc. 2014

Testing

A framework designed for testing Iterators given input, a Range, options, and expected output.

Page 11

Page 12: Designing and Testing Accumulo Iterators

© Hortonworks Inc. 2014

Testing

Page 12

Test

Test

Test

Test

Test

Iterator Class

Range

Iterator Options

Sorted Input DataVerification of output records

OR

True/False check

User-ProvidedFramework

Page 13: Designing and Testing Accumulo Iterators

© Hortonworks Inc. 2014

Features

•Auto-Discovery of test cases

•JUnit Parameterized test integration

•Provided Generic Tests–Default Constructor–Re-Seek (teardown)–Deep Copy Verification

Page 13

Page 14: Designing and Testing Accumulo Iterators

© Hortonworks Inc. 2014

Future Work

•More Iterator tests!

•A final resting place for the code

•Documentation

•Usability testing

Page 14

https://issues.apache.org/jira/browse/ACCUMULO-626

Page 15: Designing and Testing Accumulo Iterators

© Hortonworks Inc. 2014

[email protected]

Page 15