Designing and Testing Accumulo Iterators
-
Upload
josh-elser -
Category
Software
-
view
5.111 -
download
0
Transcript of Designing and Testing Accumulo Iterators
© Hortonworks Inc. 2014
Designing and Testing Accumulo IteratorsJosh Elser
Member of Technical StaffPMC, Apache Accumulo
10, November 2015
Page 1
Apache, Accumulo, and Apache Accumulo are trademarks of the Apache Software Foundation.
© Hortonworks Inc. 2014
Design
Page 2
How do I know if my Iterator works?
What can I do in an Iterator?
How are these methods even called?!
© Hortonworks Inc. 2014
Common PatternsOnly a certain subset of algorithms fit into Accumulo Iterators well.
(Avoid shoving a square peg into a round hole.)
• Filtering• Reduction• Bounded aggregations
–Keep an upper bound on the number of elements being aggregated to avoid memory issues
• Transformations–Key sort-order must be retained–Best limited to the Value only
Page 3
© Hortonworks Inc. 2014
Design
Josh’s Iterator Design Principles:
•Always make forward-progress
•Think functional – Avoid unnecessary state
•Operate only on the data you have
•Do one thing and do it efficiently
Page 4
© Hortonworks Inc. 2014
Design
Page 5
MakeForwardProgress
Start
End
© Hortonworks Inc. 2014
Think about your Iterator like a function
Unnecessary State
Page 6
def sum(list): sum = 0 for entry in list: sum += entry return sum
• Avoid holding onto state when at all possible.
• Think in terms of a stream rather than chunks of data.
• Beware of memory implications when performing aggregations.
© Hortonworks Inc. 2014
Operate locally
Daily Reminder: Iterators have no calls for implementing a safe cleanup.
• Iterators cannot properly handle I/O-related issues to external systems.
• Slow-external calls result in slow Accumulo.
• Some problems are more-safely implemented outside of an Accumulo Iterators. Not a Coprocessor/Container.
Page 7
© Hortonworks Inc. 2014
Simplicity
Avoid doing multiple things in a single Iterator.
•Object Oriented Design 101
• Iterators can be tricky to debug on their own
•Configuring multiple iterators are a feature
Page 8
© Hortonworks Inc. 2014
Testing
You should always test your code before running it in any environment
to ensure that it functions as intended.
Page 9
© Hortonworks Inc. 2014
Testing
HOW?Page 10
© Hortonworks Inc. 2014
Testing
A framework designed for testing Iterators given input, a Range, options, and expected output.
Page 11
© Hortonworks Inc. 2014
Testing
Page 12
Test
Test
Test
Test
Test
Iterator Class
Range
Iterator Options
Sorted Input DataVerification of output records
OR
True/False check
User-ProvidedFramework
© Hortonworks Inc. 2014
Features
•Auto-Discovery of test cases
•JUnit Parameterized test integration
•Provided Generic Tests–Default Constructor–Re-Seek (teardown)–Deep Copy Verification
Page 13
© Hortonworks Inc. 2014
Future Work
•More Iterator tests!
•A final resting place for the code
•Documentation
•Usability testing
Page 14
https://issues.apache.org/jira/browse/ACCUMULO-626