Designing and Testing Accumulo Iterators

Post on 09-Feb-2017

5.111 views 0 download

Transcript of Designing and Testing Accumulo Iterators

© Hortonworks Inc. 2014

Designing and Testing Accumulo IteratorsJosh Elser

Member of Technical StaffPMC, Apache Accumulo

10, November 2015

Page 1

Apache, Accumulo, and Apache Accumulo are trademarks of the Apache Software Foundation.

© Hortonworks Inc. 2014

Design

Page 2

How do I know if my Iterator works?

What can I do in an Iterator?

How are these methods even called?!

© Hortonworks Inc. 2014

Common PatternsOnly a certain subset of algorithms fit into Accumulo Iterators well.

(Avoid shoving a square peg into a round hole.)

• Filtering• Reduction• Bounded aggregations

–Keep an upper bound on the number of elements being aggregated to avoid memory issues

• Transformations–Key sort-order must be retained–Best limited to the Value only

Page 3

© Hortonworks Inc. 2014

Design

Josh’s Iterator Design Principles:

•Always make forward-progress

•Think functional – Avoid unnecessary state

•Operate only on the data you have

•Do one thing and do it efficiently

Page 4

© Hortonworks Inc. 2014

Design

Page 5

MakeForwardProgress

Start

End

© Hortonworks Inc. 2014

Think about your Iterator like a function

Unnecessary State

Page 6

def sum(list): sum = 0 for entry in list: sum += entry return sum

• Avoid holding onto state when at all possible.

• Think in terms of a stream rather than chunks of data.

• Beware of memory implications when performing aggregations.

© Hortonworks Inc. 2014

Operate locally

Daily Reminder: Iterators have no calls for implementing a safe cleanup.

• Iterators cannot properly handle I/O-related issues to external systems.

• Slow-external calls result in slow Accumulo.

• Some problems are more-safely implemented outside of an Accumulo Iterators. Not a Coprocessor/Container.

Page 7

© Hortonworks Inc. 2014

Simplicity

Avoid doing multiple things in a single Iterator.

•Object Oriented Design 101

• Iterators can be tricky to debug on their own

•Configuring multiple iterators are a feature

Page 8

© Hortonworks Inc. 2014

Testing

You should always test your code before running it in any environment

to ensure that it functions as intended.

Page 9

© Hortonworks Inc. 2014

Testing

HOW?Page 10

© Hortonworks Inc. 2014

Testing

A framework designed for testing Iterators given input, a Range, options, and expected output.

Page 11

© Hortonworks Inc. 2014

Testing

Page 12

Test

Test

Test

Test

Test

Iterator Class

Range

Iterator Options

Sorted Input DataVerification of output records

OR

True/False check

User-ProvidedFramework

© Hortonworks Inc. 2014

Features

•Auto-Discovery of test cases

•JUnit Parameterized test integration

•Provided Generic Tests–Default Constructor–Re-Seek (teardown)–Deep Copy Verification

Page 13

© Hortonworks Inc. 2014

Future Work

•More Iterator tests!

•A final resting place for the code

•Documentation

•Usability testing

Page 14

https://issues.apache.org/jira/browse/ACCUMULO-626

© Hortonworks Inc. 2014

Thanks!jelser@hortonworks.com

Page 15