Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

23
Accumulo 2.0.0 A New Client API Christopher Tubbs

Transcript of Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

Page 1: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

Accumulo 2.0.0A New Client APIChristopher Tubbs

Page 2: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

Versions Overview

● Retired○ 1.3 (last: 1.3.6)○ 1.4 (last: 1.4.5)

● Current○ 1.5 (latest: 1.5.2)*○ 1.6 (latest: 1.6.2)○ 1.7 ?

● Development○ 1.8○ 2.0

Page 3: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

Version Philosophy

Old1.x.y

x: major*y: minor/bugfixes*

* habit of removing deprecated code arbitrarily

New (1.6.2+)x.y.z

x: majory: minorz: patch (bugfix)

Semantic Versioning 2.0(http://semver.org/)

Page 4: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

Background: 1.x API● Focus (or lack thereof)

○ Function > Usability○ Limited forethought for integration

● Current API○ a gradual evolution○ biggest redesign in 2009

■ Instance / Connector■ Permissions / Authenticator

○ lots of feature additions, deprecations, removals, but few fundamental design changes since

Page 5: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

Background: 1.x API (cont.)Public API● public and protected

○ in org.apache.accumulo.core.client■ everything but impl packages

○ in org.apache.accumulo.core.data■ Key, Mutation, Value, Range■ Condition and ConditionalMutation (1.6+)

○ in org.apache.accumulo.minicluster■ everything but impl packages

Page 6: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

Background: 1.x API (cont.)Public API (1.7)● public and protected

○ org.apache.accumulo.core.client○ org.apache.accumulo.core.data○ org.apache.accumulo.core.security○ org.apache.accumulo.minicluster

● all but○ *impl*, *thrift*, *crypto*

Page 7: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

Lessons Learned1. Confusing entry point

Instance i = new ZooKeeperInstance(…);

Connector c;

// c = new Connector(i, user, pass);

c = i.getConnector(user, pass);

2. Too many overloaded methodsBatchWriterConfig bwConf;

bwConf = new BatchWriterConfig();

bw = c.createBatchWriter(table, bwConf);

Page 8: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

What’s Better?Perhaps:

AccumuloClient.Builder builder =

Accumulo.client();

builder.setXXXX(…);

AccumuloClient c = builder.build();

Page 9: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

Better Yet......make it fluent:

AccumuloClient client =

Accumulo.client().setXXXX(…).build();

1. More factories2. More configuration containers / builders3. Fluent

Page 10: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

Resource ManagementCurrent Problems:

● private static fields shared by clients

● no ability to close / clean up

● performance trade-offs

Page 11: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

Opaque ResourcesBetter:

try (

AccumuloClient.Resources r =

Accumulo.clientResources();

AccumuloClient client =

Accumulo.client().with(r).build()) {

/* do work with the client */

} catch (Exception e) { … }

Page 12: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

What About Exceptions?Current Problems:

... throws TableNotFoundException,

AccumuloSecurityException, AccumuloException;

With Java 7, this gets a little better:catch (AccumuloSecurityException | AccumuloException e) { … }

Page 13: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

Exception HierarchyBetter:

public class TableNotFound

extends AccumuloException {}

--------------------------------------------

try (

AccumuloClient client =

Accumulo.client().build()) {

/* do work with the client */

} catch (AccumuloException e) {

} catch (YourCodeException e) {…}

Page 14: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

LeakingCurrent Problems:● Leaking non-public (implementation) classes

○ apilyzer-maven-plugin○ Problem: requires users to instantiate, assign, or

pass non-public classes in normal use

● Exposing too much implementation○ MapReduce classes○ Problem: makes it difficult to extend or evolve

internal changes without affecting users.

Page 15: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

Dependency ExposureCurrent Problems:● Dependencies on unstable third-party

classes○ Guava “@Beta”-annotated classes○ Hadoop “@LimitedPrivate”-annotated classes

● Dependencies with lots of transitive deps○ Hadoop “Text”,○ “Writable” for serialization

● RPC serialization library in public API○ Thrift

Page 16: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

Parameter Problems

Current Problems:● Exposing implementation-specific classes

○ log4j “Level”○ prevents using log4j2, slf4j, and logback

● “stringly” typed objects parameters○ table○ tableName○ tableId

Page 17: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

Encoding ProblemsCurrent Problems:

● Fail to specify internal encoding

● serialize/deserialize mismatch

● UTF-8 or user-specified?

● Overloaded methods again

● Unexpected characters (Authorizations)

● The Accumulo shell (jline)

Page 18: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

New Types● Namespace

○ .getId()○ .exists()○ .tables()○ .rename(String)

● Table○ .getSplits()○ .merge(Range)○ .scanner(ScanOptions)○ .compact()

Page 19: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

API-only Artifact

● accumulo-api.jar (new!)○ org.apache.accumulo.api○ no dependencies on other accumulo jars

■ use Java’s ServiceLoader to bind to impl○ minimal dependencies on stable libraries

■ commons■ guava

● Not in accumulo-core.jar

Page 20: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

2.0.0 API StatementPublic API (new!)● public and protected

○ org.apache.accumulo.api

Alternatively:● public and protected

○ accumulo-api.jar

Page 21: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

Goals: A Summary● Improved API stability● Compatibility (semver)

○ Easy to check○ Easy to track changes

● Helps users manage dependencies● Separate API from implementation● Possible ability to swap out implementation

(mock replacement? in-process impl?)● Intuitive “front-door”● Fluent usage● Resource management

Page 22: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

Release planSteps● Finish implementation● Initial reviews● 2.0.0-alpha-1

○ Developer preview released to get feedback● 2.0.0-beta-1 ?

○ Possibly another developer preview after stabilizing API changes

● 2.0.0 final release (Summer?)

Page 23: Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]

ContactMe:

[email protected] Fingerprint: 8CC4 F8A2 B29C 2B04 0F2B 835D 6F0C DAE7 00B6 899D

Us:[email protected]@accumulo.apache.org

#accumulo on FreeNode IRC

Issue:https://issues.apache.org/jira/browse/ACCUMULO-2589