Securely explore your data
SQRRL ENTERPRISE +
APACHE ACCUMULO:
A secure, scalable, real-time analysis framework
Adam Fuchs, CTO
Sqrrl Data, Inc.
August 21, 2013
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
OUTLINE
Two Halves of “Real-Time”
Accumulo and Sqrrl Technology
Data-Centric Security
Table Designs
Performance Benchmarks
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
TWO HALVES OF REAL-TIME
Real-Time reduce event to reaction time Real-Time reduce ingest to query latency
Data-Driven Query-Driven
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
1. SPE queries NoSQL to enrich streaming data
2. SPE persists results in NoSQL for future query
3. SPE takes action automatically
4. SPE issues data-driven alerts
5. Sqrrl provides context for dashboards
6. Analysis tools query use Sqrrl to search and manipulate historical data
Data-Driven + Query-Driven Real-Time Ecosystem
Data
NoSQL+
SPE
Dashboards
Actions
InteractiveAnalysis Tools(Discovery + Forensics)
1 2
3
5
4
6
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 5
This talk focuses on the database.
Dashboards
InteractiveAnalysis Tools(Discovery + Forensics)
1. SPE queries NoSQL to enrich streaming data2. SPE persists results in NoSQL for future query3. SPE takes action automatically4. SPE issues data-driven alerts5. Sqrrl provides context for dashboards6. Analysis tools query use Sqrrl to search and manipulate historical data
Data
Actions
SPE4
3
NoSQL+6
5
21
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
OUTLINE
Two Halves of “Real-Time”
Accumulo and Sqrrl Technology
Data-Centric Security
Table Designs
Performance Benchmarks
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
ACCUMULO DATA FORMAT
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 7
Accumulo Key/Value Example
An Accumulo key is a 5-tuple, consisting of:
- Row: Controls Atomicity- Column Family: Controls Locality - Column Qualifier: Controls Uniqueness- Visibility Label: Controls Access- Timestamp: Controls Versioning
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
ACCUMULO TABLETS
Collections of KV pairs form Tables
Tables are partitioned into Tablets
Metadata tablets hold info about other tablets, forming a 3-level hierarchy
A Tablet is a unit of work for a Tablet Server
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 8
Root Tablet-∞ to ∞
Metadata Tablet 1-∞ to “Encyclopedia:Ocelot”
Data Tablet-∞ : thing
Data Tabletthing : ∞
Data Tablet-∞ : Ocelot
Data TabletOcelot : Yak
Data TabletYak : ∞
Data Tablet-∞ to ∞
Metadata Tablet 2 “Encyclopedia:Ocelot” to ∞
Well-Known Location
(zookeeper)
Table: Adam’s Table Table: Encyclopedia Table: Foo
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
ACCUMULO PROCESSES
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 9
Tablet Server
Tablet
Tablet Server
Tablet
Tablet Server
Tablet
Application
Zookeeper
Zookeeper
Zookeeper
Master
HDFS
Read/Write
Store/Replicate
Assign/Balance
Delegate Authority
Application
Application
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
TABLET DATA FLOW
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 10
In-Memory Map
Write AheadLog
(For Recovery)
Sorted, Indexed
File
Sorted, Indexed
File
Sorted, Indexed
File
Tablet
ReadsIterator
TreeMinor
Compaction
Merging / Major Compaction
Iterator Tree
Writes Iterator Tree
Scan
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
WORD COUNT:
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 11
Summing Aggregating Iterator
Input Corpus
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
ITERATOR FRAMEWORK
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 12
Iterator Operations:
- File Reads- Block Caching- Merging- Deletion- Isolation- Locality Groups- Range Selection- Column Selection- Cell-level Security- Versioning- Filtering- Aggregation- Partitioned Joins
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
ACCUMULO LATENCIES
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 13
Ingesters QueriersTablet Servers
Input BatchWriter
In-Memory
Map
ScanIterators
Scanner/Batch
Scanner
In-Memory
Map
RFile
Compaction
Iterators
ScanIterators
RFile
Compaction
Iterators
In-Memory
Map
RFiles
CompactionIterators
ScanIterators
Output
~ms~ms ~ms
ms
- m
in
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
ACCUMULO THROUGHPUT
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 14
Ingesters QueriersTablet Servers
Input BatchWriter
In-Memory
Map
ScanIterators
Scanner/Batch
Scanner
In-Memory
Map
RFile
Compaction
Iterators
ScanIterators
RFile
Compaction
Iterators
In-Memory
Map
RFiles
CompactionIterators
ScanIterators
Output
~ms~ms ~ms
ms
- m
in
Read-Modify-Write Latency: ~ms
>1K entries/s challenging with R-M-W
Ingest:up to 500K entries/s
per node
Scan:up to 1M entries/s
per node
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
SQRRL ENTERPRISE
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 15
Built on Apache Accumulo
Sqrrl Server
Sqrrl API over Apache Thrift RPC(JSON, Graph, Aggregation, Search, etc.)
• Sqrrl proprietary• Automated indexing• Custom iterators• Lucene integration• Security extensions Accumulo RPC
(Sorted Key/Value I/O)
Hadoop RPC(File I/O)
• Open source (including Sqrrl contributions)
• Open source or commercial distributions
Graph + Document I/O
Exploratory / Operational Apps
Bulk Processing Integration
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
OUTLINE
Two Halves of “Real-Time”
Accumulo and Sqrrl Technology
Data-Centric Security
Table Designs
Performance Benchmarks
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 16
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
DATA-CENTRIC SECURITY
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 17
Definition: Data carries with it information that is required to make policy decisions on its releasability.
User 1 User 2Sqrrl/
Accumulo
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
SECURITY
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 18
Example Accumulo Key/Value Pairs
Accumulo is the only NoSQL database with cell-level access controls
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
DATA-CENTRIC SECURITY ECOSYSTEM
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 19
Data Labeler Sqrrl Enterprise
Apps
User Attributes
Audits
Policies
End Users
Auth. Service
Policy Engine
Key Mgmt
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
OUTLINE
Two Halves of “Real-Time”
Accumulo and Sqrrl Technology
Data-Centric Security
Table Designs
Performance Benchmarks
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 20
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
HIERARCHICAL DECOMPOSITION
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 21
Row:
Column Family:
Column Qualifier:
Value:
<person>
attribute purchases
age
<age>
discount
<cost>
sneakers
<rate>
returns
hat
<cost>
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
MATERIALIZED TABLE
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 22
Row: george
attribute purchases
age
27 $83
sneakers
bill
attribute purchases
40%
sneakers
$100
discount
49
age
Key/Value Pair
Column Family:
Column Qualifier:
Value:
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
FORWARD AND INVERTED INDEX
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 23
Table:
Row:
Column Family:
Value:
Forward Index
<UUID>
<Type>
<Field>
<Term>
Inverted Index
<Term>
<UUID>
<Type+Field>
<Digest of Event>
Column Qualifier:
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
FORWARD AND INVERTED INDEX
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 24
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
CUSTOM INDEXING
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 25
Table:
Row:
Geo Index
<GeoHash>
<Event Type>
<UUID>
<Digest of Event>
Latitude10110101001
Longitude00111010010
101001110111010101011100001011100
Depth11010110110
Column Family:
Column Qualifier:
Value:
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
D4M 2.0 SCHEMA FOR TWITTER DATA
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 26
Table:
Row:
Column Family:
Tedge
<UUID>
“stat”
<stat>
“1”
“time”
<time>
“1”
“user”
<user>
“1”
“word”
<word>
“1”
TedgeT
<value>
“stat”
<UUID>
“1”
“time”
<UUID>
“1”
“user”
<UUID>
“1”
“word”
<UUID>
“1”
Column Qualifier:
Value:
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
D4M 2.0 SCHEMA FOR TWITTER DATA
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 27
Table:
Row:
Column Family:
TedgeDegT
<value>
“stat”
“degree”
<count>
“time”
“degree”
<count>
“user”
“degree”
<count>
“word”
“degree”
<count>
Ttext
<UUID>
Column Qualifier:
Value:
“text”
-
<text>
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
D4M 2.0 SCHEMA FOR TWITTER DATA
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 28
Source: D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database , Kepner et. al., HPEC 2013
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
OUTLINE
Two Halves of “Real-Time”
Accumulo and Sqrrl Technology
Data-Centric Security
Table Designs
Performance Benchmarks
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 29
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
ACCUMULO WITH D4M 2.0 SCHEMA PERFORMANCE
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 30
Source: D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database , Kepner et. al., HPEC 2013
Maximizing throughput on an 8-node, 192-core cluster:
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
ACCUMULO SCALABILITY: GRAPH500 BENCHMARK
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 31
source: http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
ATOMIC INCREMENT PERFORMANCE COMPARISON
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 32
Read/Modify/Write (HBase) vs. Iterators/Combiners (Accumulo)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
QUESTIONS?
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 33
Adam Fuchs, CTOSqrrl Data, Inc.
Top Related