Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

33
Securely explore your data SQRRL ENTERPRISE + APACHE ACCUMULO: A secure, scalable, real-time analysis framework Adam Fuchs, CTO Sqrrl Data, Inc. August 21, 2013

description

Adam Fuch provides an overview of Accumulo and Sqrrl Enterprise at the 2013 NoSQL Now! conference

Transcript of Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

Page 1: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

Securely explore your data

SQRRL ENTERPRISE +

APACHE ACCUMULO:

A secure, scalable, real-time analysis framework

Adam Fuchs, CTO

Sqrrl Data, Inc.

August 21, 2013

Page 2: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

OUTLINE

Two Halves of “Real-Time”

Accumulo and Sqrrl Technology

Data-Centric Security

Table Designs

Performance Benchmarks

Page 3: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

TWO HALVES OF REAL-TIME

Real-Time reduce event to reaction time Real-Time reduce ingest to query latency

Data-Driven Query-Driven

Page 4: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

1. SPE queries NoSQL to enrich streaming data

2. SPE persists results in NoSQL for future query

3. SPE takes action automatically

4. SPE issues data-driven alerts

5. Sqrrl provides context for dashboards

6. Analysis tools query use Sqrrl to search and manipulate historical data

Data-Driven + Query-Driven Real-Time Ecosystem

Data

NoSQL+

SPE

Dashboards

Actions

InteractiveAnalysis Tools(Discovery + Forensics)

1 2

3

5

4

6

Page 5: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 5

This talk focuses on the database.

Dashboards

InteractiveAnalysis Tools(Discovery + Forensics)

1. SPE queries NoSQL to enrich streaming data2. SPE persists results in NoSQL for future query3. SPE takes action automatically4. SPE issues data-driven alerts5. Sqrrl provides context for dashboards6. Analysis tools query use Sqrrl to search and manipulate historical data

Data

Actions

SPE4

3

NoSQL+6

5

21

Page 6: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

OUTLINE

Two Halves of “Real-Time”

Accumulo and Sqrrl Technology

Data-Centric Security

Table Designs

Performance Benchmarks

Page 7: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

ACCUMULO DATA FORMAT

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 7

Accumulo Key/Value Example

An Accumulo key is a 5-tuple, consisting of:

- Row: Controls Atomicity- Column Family: Controls Locality - Column Qualifier: Controls Uniqueness- Visibility Label: Controls Access- Timestamp: Controls Versioning

Page 8: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

ACCUMULO TABLETS

Collections of KV pairs form Tables

Tables are partitioned into Tablets

Metadata tablets hold info about other tablets, forming a 3-level hierarchy

A Tablet is a unit of work for a Tablet Server

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 8

Root Tablet-∞ to ∞

Metadata Tablet 1-∞ to “Encyclopedia:Ocelot”

Data Tablet-∞ : thing

Data Tabletthing : ∞

Data Tablet-∞ : Ocelot

Data TabletOcelot : Yak

Data TabletYak : ∞

Data Tablet-∞ to ∞

Metadata Tablet 2 “Encyclopedia:Ocelot” to ∞

Well-Known Location

(zookeeper)

Table: Adam’s Table Table: Encyclopedia Table: Foo

Page 9: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

ACCUMULO PROCESSES

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 9

Tablet Server

Tablet

Tablet Server

Tablet

Tablet Server

Tablet

Application

Zookeeper

Zookeeper

Zookeeper

Master

HDFS

Read/Write

Store/Replicate

Assign/Balance

Delegate Authority

Application

Application

Page 10: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

TABLET DATA FLOW

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 10

In-Memory Map

Write AheadLog

(For Recovery)

Sorted, Indexed

File

Sorted, Indexed

File

Sorted, Indexed

File

Tablet

ReadsIterator

TreeMinor

Compaction

Merging / Major Compaction

Iterator Tree

Writes Iterator Tree

Scan

Page 11: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

WORD COUNT:

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 11

Summing Aggregating Iterator

Input Corpus

Page 12: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

ITERATOR FRAMEWORK

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 12

Iterator Operations:

- File Reads- Block Caching- Merging- Deletion- Isolation- Locality Groups- Range Selection- Column Selection- Cell-level Security- Versioning- Filtering- Aggregation- Partitioned Joins

Page 13: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

ACCUMULO LATENCIES

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 13

Ingesters QueriersTablet Servers

Input BatchWriter

In-Memory

Map

ScanIterators

Scanner/Batch

Scanner

In-Memory

Map

RFile

Compaction

Iterators

ScanIterators

RFile

Compaction

Iterators

In-Memory

Map

RFiles

CompactionIterators

ScanIterators

Output

~ms~ms ~ms

ms

- m

in

Page 14: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

ACCUMULO THROUGHPUT

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 14

Ingesters QueriersTablet Servers

Input BatchWriter

In-Memory

Map

ScanIterators

Scanner/Batch

Scanner

In-Memory

Map

RFile

Compaction

Iterators

ScanIterators

RFile

Compaction

Iterators

In-Memory

Map

RFiles

CompactionIterators

ScanIterators

Output

~ms~ms ~ms

ms

- m

in

Read-Modify-Write Latency: ~ms

>1K entries/s challenging with R-M-W

Ingest:up to 500K entries/s

per node

Scan:up to 1M entries/s

per node

Page 15: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

SQRRL ENTERPRISE

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 15

Built on Apache Accumulo

Sqrrl Server

Sqrrl API over Apache Thrift RPC(JSON, Graph, Aggregation, Search, etc.)

• Sqrrl proprietary• Automated indexing• Custom iterators• Lucene integration• Security extensions Accumulo RPC

(Sorted Key/Value I/O)

Hadoop RPC(File I/O)

• Open source (including Sqrrl contributions)

• Open source or commercial distributions

Graph + Document I/O

Exploratory / Operational Apps

Bulk Processing Integration

Page 16: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

OUTLINE

Two Halves of “Real-Time”

Accumulo and Sqrrl Technology

Data-Centric Security

Table Designs

Performance Benchmarks

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 16

Page 17: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

DATA-CENTRIC SECURITY

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 17

Definition: Data carries with it information that is required to make policy decisions on its releasability.

User 1 User 2Sqrrl/

Accumulo

Page 18: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

SECURITY

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 18

Example Accumulo Key/Value Pairs

Accumulo is the only NoSQL database with cell-level access controls

Page 19: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

DATA-CENTRIC SECURITY ECOSYSTEM

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 19

Data Labeler Sqrrl Enterprise

Apps

User Attributes

Audits

Policies

End Users

Auth. Service

Policy Engine

Key Mgmt

Page 20: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

OUTLINE

Two Halves of “Real-Time”

Accumulo and Sqrrl Technology

Data-Centric Security

Table Designs

Performance Benchmarks

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 20

Page 21: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

HIERARCHICAL DECOMPOSITION

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 21

Row:

Column Family:

Column Qualifier:

Value:

<person>

attribute purchases

age

<age>

discount

<cost>

sneakers

<rate>

returns

hat

<cost>

Page 22: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

MATERIALIZED TABLE

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 22

Row: george

attribute purchases

age

27 $83

sneakers

bill

attribute purchases

40%

sneakers

$100

discount

49

age

Key/Value Pair

Column Family:

Column Qualifier:

Value:

Page 23: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

FORWARD AND INVERTED INDEX

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 23

Table:

Row:

Column Family:

Value:

Forward Index

<UUID>

<Type>

<Field>

<Term>

Inverted Index

<Term>

<UUID>

<Type+Field>

<Digest of Event>

Column Qualifier:

Page 24: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

FORWARD AND INVERTED INDEX

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 24

Page 25: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

CUSTOM INDEXING

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 25

Table:

Row:

Geo Index

<GeoHash>

<Event Type>

<UUID>

<Digest of Event>

Latitude10110101001

Longitude00111010010

101001110111010101011100001011100

Depth11010110110

Column Family:

Column Qualifier:

Value:

Page 26: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

D4M 2.0 SCHEMA FOR TWITTER DATA

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 26

Table:

Row:

Column Family:

Tedge

<UUID>

“stat”

<stat>

“1”

“time”

<time>

“1”

“user”

<user>

“1”

“word”

<word>

“1”

TedgeT

<value>

“stat”

<UUID>

“1”

“time”

<UUID>

“1”

“user”

<UUID>

“1”

“word”

<UUID>

“1”

Column Qualifier:

Value:

Page 27: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

D4M 2.0 SCHEMA FOR TWITTER DATA

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 27

Table:

Row:

Column Family:

TedgeDegT

<value>

“stat”

“degree”

<count>

“time”

“degree”

<count>

“user”

“degree”

<count>

“word”

“degree”

<count>

Ttext

<UUID>

Column Qualifier:

Value:

“text”

-

<text>

Page 28: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

D4M 2.0 SCHEMA FOR TWITTER DATA

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 28

Source: D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database , Kepner et. al., HPEC 2013

Page 29: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

OUTLINE

Two Halves of “Real-Time”

Accumulo and Sqrrl Technology

Data-Centric Security

Table Designs

Performance Benchmarks

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 29

Page 30: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

ACCUMULO WITH D4M 2.0 SCHEMA PERFORMANCE

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 30

Source: D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database , Kepner et. al., HPEC 2013

Maximizing throughput on an 8-node, 192-core cluster:

Page 31: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

ACCUMULO SCALABILITY: GRAPH500 BENCHMARK

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 31

source: http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf

Page 32: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

ATOMIC INCREMENT PERFORMANCE COMPARISON

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 32

Read/Modify/Write (HBase) vs. Iterators/Combiners (Accumulo)

Page 33: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

QUESTIONS?

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 33

Adam Fuchs, CTOSqrrl Data, Inc.