Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters
-
date post
21-Oct-2014 -
Category
Technology
-
view
291 -
download
0
description
Transcript of Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters
![Page 1: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/1.jpg)
Four Orders of Magnitude: Running Large Scale Accumulo Clusters
Aaron Cordova Accumulo Summit, June 2014
![Page 2: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/2.jpg)
Scale, Security, Schema
![Page 3: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/3.jpg)
Scale
![Page 4: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/4.jpg)
to scale1 - (vt) to change the size of something
![Page 5: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/5.jpg)
“let’s scale the cluster up to twice the original size”
![Page 6: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/6.jpg)
to scale2 - (vi) to function properly at a large scale
![Page 7: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/7.jpg)
“Accumulo scales”
![Page 8: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/8.jpg)
What is Large Scale?
![Page 9: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/9.jpg)
Notebook Computer
• 16 GB DRAM
• 512 GB Flash Storage
• 2.3 GHz quad-core i7 CPU
![Page 10: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/10.jpg)
Modern Server
• 100s of GB DRAM
• 10s of TB on disk
• 10s of cores
![Page 11: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/11.jpg)
Large ScaleLaptop Server 10 Node
Cluster100
Nodes1000
Nodes10,000 Nodes
10 GB
100 GB
1 TB
10 TB
100 TB
1 PB
10 PB
100 PB
In RAM On Disk
![Page 12: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/12.jpg)
Data Composition
0
45
90
135
180
January February March April
Original Raw Derivative QFDs Indexes
![Page 13: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/13.jpg)
Accumulo Scales
• From GB to PB, Accumulo keeps two things low:
• Administrative effort
• Scan latency
![Page 14: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/14.jpg)
Scan Latency
0
0.013
0.025
0.038
0.05
0 250 500 750 1000
![Page 15: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/15.jpg)
Administrative Overhead
0
3
6
9
12
0 250 500 750 1000
Failed Machines Admin Intervention
![Page 16: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/16.jpg)
Accumulo Scales
• From GB to PB three things grow linearly:
• Total storage size
• Ingest Rate
• Concurrent scans
![Page 17: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/17.jpg)
Ingest Benchmark
0
25
50
75
100
0 250 500 750 1000
Milli
ons
of e
ntrie
s pe
r sec
ond
![Page 18: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/18.jpg)
AWB Benchmark
http://sqrrl.com/media/Accumulo-Benchmark-10312013-1.pdf
![Page 19: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/19.jpg)
1000 machines
![Page 20: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/20.jpg)
100 M entries written per second
![Page 21: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/21.jpg)
408 terabytes
![Page 22: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/22.jpg)
7.56 trillion total entries
![Page 23: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/23.jpg)
Graph Benchmark
http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf
![Page 24: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/24.jpg)
1200 machines
![Page 25: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/25.jpg)
4.4 trillion vertices
![Page 26: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/26.jpg)
70.4 trillion edges
![Page 27: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/27.jpg)
149 M edges traversed per second
![Page 28: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/28.jpg)
1 petabyte
![Page 29: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/29.jpg)
Graph Analysis
Billions of Edges
1
100
10000
Twitter Yahoo! Facebook Accumulo
70,000
1,000
6.61.5
![Page 30: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/30.jpg)
Accumulo is designed after Google’s BigTable
![Page 31: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/31.jpg)
BigTable powers hundreds of applications at Google
![Page 32: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/32.jpg)
BigTable serves 2+ exabytes
http://hbasecon.com/sessions/#session33
![Page 33: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/33.jpg)
600 M queries per second organization wide
![Page 34: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/34.jpg)
From 10 to 10,000
![Page 35: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/35.jpg)
Starting with ten machines 101
![Page 36: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/36.jpg)
One rack
![Page 37: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/37.jpg)
1 TB RAM
![Page 38: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/38.jpg)
10-100 TB Disk
![Page 39: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/39.jpg)
Hardware failures rare
![Page 40: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/40.jpg)
Test Application Designs
![Page 41: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/41.jpg)
Designing Applications for Scale
![Page 42: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/42.jpg)
Keys to Scaling
1. Live writes go to all servers
2. User requests are satisfied by few scans
3. Turning updates into inserts
![Page 43: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/43.jpg)
Keys to Scaling
Writes on all servers Few Scans
![Page 44: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/44.jpg)
Hash / UUID KeysRowID Col Value
af362de4 Bob
b23dc4be Annie
b98de2ff Joe
c48e2ade $30
c7e43fb2 $25
d938ff3d 32
e2e4dac4 59
e98f2eab3 43
Key Value
userA:name Bob
userA:age 43
userA:account $30
userB:name Annie
userB:age 32
userB:account $25
userC:name Joe
userC:age 59
Uniform writes
![Page 45: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/45.jpg)
MonitorParticipating Tablet Servers
MyTable
Servers Hosted Tablets … Ingest
r1n1 1500 200k
r1n2 1501 210k
r2n1 1499 190k
r2n2 1500 200k
![Page 46: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/46.jpg)
Hash / UUID KeysRowID Col Value
af362de4 Bob
b23dc4be Annie
b98de2ff Joe
c48e2ade $30
c7e43fb2 $25
d938ff3d 32
e2e4dac4 59
e98f2eab3 43
3 x 1-entry scans on 3 servers
get(userA)
![Page 47: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/47.jpg)
Keys to Scaling
Writes on all servers Few Scans
Hash / UUID Keys
![Page 48: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/48.jpg)
Group for LocalityKey Value
userA:name Bob
userA:age 43
userB:name Annie
userB:age 32
userC:name Fred
userC:age 29
userD:name Joe
userD:age 59
Key Value
userA:name Bob
userA:age 43
userA:account $30
userB:name Annie
userB:age 32
userB:account $25
userC:name Joe
userC:age 59
RowID Col Value
af362de4 name Annie
af362de4 age 32
af362de4 account $25
c48e2ade name Joe
c48e2ade age 59
e2e4dac4 name Bob
e2e4dac4 age 43
e2e4dac4 account $30
Still fairly uniform writes
![Page 49: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/49.jpg)
Group for LocalityRowID Col Value
af362de4 name Annie
af362de4 age 32
af362de4 account $25
c48e2ade name Joe
c48e2ade age 59
e2e4dac4 name Bob
e2e4dac4 age 43
e2e4dac4 account $30
1 x 3-entry scan on 1 server
get(userA)
![Page 50: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/50.jpg)
Keys to Scaling
Writes on all servers Few Scans
Grouped Keys
![Page 51: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/51.jpg)
Temporal KeysKey Value
userA:name Bob
userA:age 43
userB:name Annie
userB:age 32
userC:name Fred
userC:age 29
userD:name Joe
userD:age 59
Key Value
20140101 44
20140102 22
20140103 23
RowID Col Value
20140101 44
20140102 22
20140103 23
![Page 52: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/52.jpg)
Temporal KeysKey Value
userA:name Bob
userA:age 43
userB:name Annie
userB:age 32
userC:name Fred
userC:age 29
userD:name Joe
userD:age 59
Key Value
20140101 44
20140102 22
20140103 23
20140104 25
20140105 31
RowID Col Value
20140101 44
20140102 22
20140103 23
20140104 25
20140105 31
![Page 53: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/53.jpg)
Temporal KeysKey Value
userA:name Bob
userA:age 43
userB:name Annie
userB:age 32
userC:name Fred
userC:age 29
userD:name Joe
userD:age 59
Key Value
20140101 44
20140102 22
20140103 23
20140104 25
20140105 31
20140106 27
20140107 25
20140108 17
RowID Col Value
20140101 44
20140102 22
20140103 23
20140104 25
20140105 31
20140106 27
20140107 25
20140108 17
Always write to one server
![Page 54: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/54.jpg)
No write parallelism
![Page 55: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/55.jpg)
Temporal KeysRowID Col Value
20140101 44
20140102 22
20140103 23
20140104 25
20140105 31
20140106 27
20140107 25
20140108 17
Fetching ranges uses few scans
get(20140101 to 201404)
![Page 56: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/56.jpg)
Keys to Scaling
Writes on all servers Few Scans
Temporal Keys
![Page 57: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/57.jpg)
Binned Temporal KeysKey Value
userA:name Bob
userA:age 43
userB:name Annie
userB:age 32
userC:name Fred
userC:age 29
userD:name Joe
userD:age 59
Key Value
20140101 44
20140102 22
20140103 23
RowID Col Value
0_20140101 44
1_20140102 22
2_20140103 23
Uniform Writes
![Page 58: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/58.jpg)
Binned Temporal KeysKey Value
userA:name Bob
userA:age 43
userB:name Annie
userB:age 32
userC:name Fred
userC:age 29
userD:name Joe
userD:age 59
Key Value
20140101 44
20140102 22
20140103 23
20140104 25
20140105 31
20140106 27
RowID Col Value
0_20140101 44
0_20140104 25
1_20140102 22
1_20140105 31
2_20140103 23
2_20140106 27
Uniform Writes
![Page 59: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/59.jpg)
Binned Temporal KeysKey Value
userA:name Bob
userA:age 43
userB:name Annie
userB:age 32
userC:name Fred
userC:age 29
userD:name Joe
userD:age 59
Key Value
20140101 44
20140102 22
20140103 23
20140104 25
20140105 31
20140106 27
20140107 25
20140108 17
RowID Col Value
0_20140101 44
0_20140104 25
0_20140107 25
1_20140102 22
1_20140105 31
1_20140108 17
2_20140103 23
2_20140106 27
Uniform Writes
![Page 60: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/60.jpg)
Binned Temporal KeysRowID Col Value
0_20140101 44
0_20140104 25
0_20140107 25
1_20140102 22
1_20140105 31
1_20140108 17
2_20140103 23
2_20140106 27
One scan per bin
get(20140101 to 201404)
![Page 61: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/61.jpg)
Keys to Scaling
Writes on all servers Few Scans
Binned Temporal Keys
![Page 62: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/62.jpg)
Keys to Scaling
• Key design is critical
• Group data under common row IDs to reduce scans
• Prepend bins to row IDs to increase write parallelism
![Page 63: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/63.jpg)
Splits
• Pre-split or organic splits
• Going from dev to production, can ingest a representative sample, obtain split points and use them to pre-split a larger system
• Hundreds or thousands of tablets per server is ok
• Want at least one tablet per server
![Page 64: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/64.jpg)
Effect of Compression
• Similar sorted keys compress well
• May need more data than you think to auto-split
![Page 65: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/65.jpg)
Inserts are fast 10s of thousands per second per
machine
![Page 66: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/66.jpg)
Updates *can* be …
![Page 67: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/67.jpg)
Update Types
• Overwrite
• Combine
• Complex
![Page 68: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/68.jpg)
Update - Overwrite
• Performance same as insert
• Ignore (don’t read) existing value
• Accumulo’s Versioning Iterator does the overwrite
![Page 69: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/69.jpg)
Update - OverwriteRowID Col Value
af362de4 name Annie
af362de4 age 32
af362de4 account $25
c48e2ade name Joe
c48e2ade age 59
e2e4dac4 name Bob
e2e4dac4 age 43
e2e4dac4 account $30
userB:age -> 34
![Page 70: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/70.jpg)
Update - OverwriteRowID Col Value
af362de4 name Annie
af362de4 age 34
af362de4 account $25
c48e2ade name Joe
c48e2ade age 59
e2e4dac4 name Bob
e2e4dac4 age 43
e2e4dac4 account $30
userB:age -> 34
![Page 71: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/71.jpg)
Update - Combine
• Things like X = X + 1
• Normally one would have to read the old value to do this, but Accumulo Iterators allow multiple inserts to be combined at scan time, or compaction
• Performance is same as inserts
![Page 72: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/72.jpg)
Update - CombineRowID Col Value
af362de4 name Annie
af362de4 age 34
af362de4 account $25
c48e2ade name Joe
c48e2ade age 59
e2e4dac4 name Bob
e2e4dac4 age 43
e2e4dac4 account $30
userB:account -> +10
![Page 73: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/73.jpg)
Update - CombineRowID Col Value
af362de4 name Annie
af362de4 age 34
af362de4 account $25
af362de4 account $10
c48e2ade name Joe
c48e2ade age 59
e2e4dac4 name Bob
e2e4dac4 age 43
e2e4dac4 account $30
userB:account -> +10
![Page 74: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/74.jpg)
Update - CombineRowID Col Value
af362de4 name Annie
af362de4 age 34
af362de4 account $25
af362de4 account $10
c48e2ade name Joe
c48e2ade age 59
e2e4dac4 name Bob
e2e4dac4 age 43
e2e4dac4 account $30
getAccount(userB) $35
![Page 75: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/75.jpg)
Update - Combine
After compaction
RowID Col Value
af362de4 name Annie
af362de4 age 34
af362de4 account $35
c48e2ade name Joe
c48e2ade age 59
e2e4dac4 name Bob
e2e4dac4 age 43
e2e4dac4 account $30
![Page 76: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/76.jpg)
Update - Complex
• Some updates require looking at more data than Iterators have access to - such as multiple rows
• These require reading the data out in order to write the new value
• Performance will be much slower
![Page 77: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/77.jpg)
Update - ComplexuserC:account = getBalance(userA) + getBalance(userB)
RowID Col Value
af362de4 name Annie
af362de4 age 34
af362de4 account $35
c48e2ade name Joe
c48e2ade age 59
c48e2ade account $40
e2e4dac4 name Bob
e2e4dac4 age 43
e2e4dac4 account $30
35+30 = 65
![Page 78: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/78.jpg)
Update - ComplexuserC:account = getBalance(userA) + getBalance(userB)
RowID Col Value
af362de4 name Annie
af362de4 age 34
af362de4 account $35
c48e2ade name Joe
c48e2ade age 59
c48e2ade account $65
e2e4dac4 name Bob
e2e4dac4 age 43
e2e4dac4 account $30
35+30 = 65
![Page 79: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/79.jpg)
Planning a Larger-Scale Cluster 102 - 104
![Page 80: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/80.jpg)
Storage vs Ingest
1
1000
1000000
10 100 1000 10000
Ingest Rate 1x1TB 12x3TB
120,000
12,000
1,200
120
10,000
1,000
100
10 Stor
age
Tera
byte
s
Milli
ons
of E
ntrie
s pe
r sec
ond
![Page 81: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/81.jpg)
Model for Ingest Rates
A = 0.85log2N * N * S
N - Number of machines S - Single Server throughput (entries / second) A - Aggregate Cluster throughput (entries / second)
Expect 85% increase in write rate when doubling the size of the cluster
![Page 82: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/82.jpg)
Estimating Machines Required
N = 2 (log2(A/S) / 0.7655347)
N - Number of machines S - Single Server throughput (entries / second) A - Target Aggregate throughput (entries / second)
Expect 85% increase in write rate when doubling the size of the cluster
![Page 83: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/83.jpg)
Predicted Cluster SizesN
umbe
r of M
achi
nes
0
3000
6000
9000
12000
Millions of Entries per Second
0 150 300 450 600
![Page 84: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/84.jpg)
100 Machines 102
![Page 85: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/85.jpg)
Multiple racks
![Page 86: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/86.jpg)
10 TB RAM
![Page 87: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/87.jpg)
100 TB - 1PB Disk
![Page 88: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/88.jpg)
Some hardware failures in the first week
(burn in)
![Page 89: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/89.jpg)
Expect 3 failed HDs in first 3 mo
![Page 90: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/90.jpg)
Another 4 within the first year
http://static.googleusercontent.com/media/research.google.com/en/us/archive/disk_failures.pdf
![Page 92: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/92.jpg)
Can store and index the Common Crawl Corpus
!
2.8 Billion web pages 541 TB
commoncrawl.org
![Page 93: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/93.jpg)
One year of Twitter 182 trillion tweets
483 TB
http://www.sec.gov/Archives/edgar/data/1418091/000119312513390321/d564001ds1.htm
![Page 94: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/94.jpg)
Deploying an ApplicationTablet ServersClientsUsers
![Page 95: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/95.jpg)
May not see the affect of writing to disk for a while
![Page 96: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/96.jpg)
1000 machines 103
![Page 97: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/97.jpg)
Multiple rows of racks
![Page 98: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/98.jpg)
100 TB RAM
![Page 99: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/99.jpg)
1-10 PB Disk
![Page 100: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/100.jpg)
Hardware failure is a regular occurrence
![Page 101: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/101.jpg)
Hard drive failure about every 5 days (average).
!
Will be skewed towards beginning of the year
![Page 102: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/102.jpg)
Can traverse the ‘brain graph’ 70 trillion edges, 1 PB
http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf
![Page 103: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/103.jpg)
Facebook Graph 1s of PB
http://www-conf.slac.stanford.edu/xldb2012/talks/xldb2012_wed_1105_DhrubaBorthakur.pdf
![Page 104: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/104.jpg)
Netflix Video Master Copies 3.14 PB
http://www.businessweek.com/articles/2013-05-09/netflix-reed-hastings-survive-missteps-to-join-silicon-valleys-elite
![Page 105: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/105.jpg)
World of Warcraft Backend Storage 1.3 PB
http://www.datacenterknowledge.com/archives/2009/11/25/wows-back-end-10-data-centers-75000-cores/
![Page 106: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/106.jpg)
Webpages, live on the Internet 14.3 Trillion
http://www.factshunt.com/2014/01/total-number-of-websites-size-of.html
![Page 107: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/107.jpg)
Things like the difference between two compression algorithms start
to make a big difference
![Page 108: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/108.jpg)
Use range compactions to affect changes on portions of table
![Page 109: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/109.jpg)
Lay off Zookeeper
![Page 110: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/110.jpg)
Watch Garbage Collector and Namenode ops
![Page 111: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/111.jpg)
Garbage Collection > 5 minutes?
![Page 112: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/112.jpg)
Start thinking about NameNode Federation
![Page 113: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/113.jpg)
Accumulo 1.6
![Page 114: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/114.jpg)
Multiple NameNodes
Accumulo
Namenode Namenode
DataNodesDataNodes
Multiple HDFS Clusters
![Page 115: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/115.jpg)
Multiple NameNodes
Accumulo
DataNodes
Multiple NameNodes, shared DataNodes (Federation. Requires Hadoop 2.0)
Namenode Namenode
![Page 116: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/116.jpg)
More Namenodes = higher risk of one going down.
!
Can use HA Namenodes in conjunction w/ Federation
![Page 117: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/117.jpg)
10,000 machines 104
![Page 118: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/118.jpg)
You, my friend, are here to kick a** and chew bubble gum
![Page 119: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/119.jpg)
1 PB RAM
![Page 120: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/120.jpg)
10-100 PB Disk
![Page 121: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/121.jpg)
1 hardware failure every hour on average
![Page 122: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/122.jpg)
Entire Internet Archive 15 PB
http://www.motherjones.com/media/2014/05/internet-archive-wayback-machine-brewster-kahle
![Page 123: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/123.jpg)
A year’s worth of data from the Large Hadron Collider
15 PB
http://home.web.cern.ch/about/computing
![Page 124: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/124.jpg)
0.1% of all Internet traffic in 2013 43.6 PB
http://www.factshunt.com/2014/01/total-number-of-websites-size-of.html
![Page 125: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/125.jpg)
Facebook Messaging Data 10s of PB
http://www-conf.slac.stanford.edu/xldb2012/talks/xldb2012_wed_1105_DhrubaBorthakur.pdf
![Page 126: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/126.jpg)
Facebook Photos 240 billion
High 10s of PB
http://www-conf.slac.stanford.edu/xldb2012/talks/xldb2012_wed_1105_DhrubaBorthakur.pdf
![Page 127: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/127.jpg)
Must use multiple NameNodes
![Page 128: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/128.jpg)
Can tune back heartbeats, periodicity of central processes in
general
![Page 129: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/129.jpg)
Can combine multiple PB data sets
![Page 130: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/130.jpg)
Up to 10 quadrillion entries in a single table
![Page 131: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/131.jpg)
While maintaining sub-second lookup times
![Page 132: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/132.jpg)
Only with Accumulo 1.6
![Page 133: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/133.jpg)
Dealing with data over time
![Page 134: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/134.jpg)
Data Over Time - Patterns
• Initial Load
• Increasing Velocity
• Focus on Recency
• Historical Summaries
![Page 135: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/135.jpg)
Initial Load
• Get a pile of old data into Accumulo fast
• Latency not important (data is old)
• Throughput critical
![Page 136: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/136.jpg)
Bulk Load RFiles
![Page 137: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/137.jpg)
Bulk Loading
MapReduce
RFiles Accumulo
![Page 138: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/138.jpg)
Increasing velocity
![Page 139: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/139.jpg)
If your data isn’t big today, wait a little while
![Page 140: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/140.jpg)
Accumulo scales up dynamically, online. No downtime
![Page 141: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/141.jpg)
The first scale, ‘can change size’
![Page 142: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/142.jpg)
Scaling UpClients
Accumulo
HDFS
3 physical servers Each running
a Tablet Server process and a Data Node process
![Page 143: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/143.jpg)
Scaling UpClients
Accumulo
HDFS Start 3 new Tablet Server procs
3 new Data node processes
![Page 144: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/144.jpg)
Scaling UpClients
Accumulo
HDFS master immediately assigns tablets
![Page 145: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/145.jpg)
Scaling UpClients
Accumulo
HDFS
Clients immediately begin querying new
Tablet Servers
![Page 146: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/146.jpg)
Scaling UpClients
Accumulo
HDFS
new Tablet Servers read data from old Data nodes
![Page 147: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/147.jpg)
Scaling UpClients
Accumulo
HDFS
new Tablet Servers write data to new Data Nodes
![Page 148: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/148.jpg)
Never really seen anyone do this
![Page 149: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/149.jpg)
Except myself
![Page 150: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/150.jpg)
20 machines in Amazon EC2
![Page 151: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/151.jpg)
to 400 machines
![Page 152: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/152.jpg)
all during the same MapReduce job reading data out of Accumulo, summarizing, and writing back
![Page 153: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/153.jpg)
Scaled back down to 20 machines when done
![Page 154: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/154.jpg)
Just killed Tablet Servers
![Page 155: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/155.jpg)
Decommissioned Data Nodes for safe data consolidation to
remaining 20 nodes
![Page 156: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/156.jpg)
Other ways to go from 10x to 10x+1
![Page 157: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/157.jpg)
Accumulo Table Export
![Page 158: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/158.jpg)
followed by HDFS DistCP to new cluster
![Page 159: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/159.jpg)
Maybe new replication feature
![Page 160: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/160.jpg)
Newer Data is Read more Often
![Page 161: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/161.jpg)
Accumulo keeps newly written data in memory
![Page 162: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/162.jpg)
Block Cache can keep recently queried data in memory
![Page 163: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/163.jpg)
Combining Iterators make maintaining summaries of large
amounts of raw events easy
![Page 164: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/164.jpg)
Reduces storage burden
![Page 165: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/165.jpg)
Historical Summaries
0
2000
4000
6000
8000
April May June July
Unique Entities Stored Raw Events Processed
![Page 166: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/166.jpg)
Age-off iterator can automatically remove data over a certain age
![Page 167: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/167.jpg)
IBM estimates 2.5 exabytes of data is created every day
http://www-01.ibm.com/software/data/bigdata/what-is-big-data.html
![Page 168: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/168.jpg)
90% of available data created in last 2 years
http://www-01.ibm.com/software/data/bigdata/what-is-big-data.html
![Page 169: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/169.jpg)
25 new 10k node Accumulo clusters per day
![Page 170: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/170.jpg)
Accumulo is doing it’s part to get in front of the big data trend
![Page 171: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/171.jpg)
Questions ?
![Page 172: Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo Clusters](https://reader033.fdocuments.in/reader033/viewer/2022051207/54465fc5afaf9f51178b4659/html5/thumbnails/172.jpg)
@aaroncordova