Large scale-olap-with-kobayashi
-
Upload
boundary -
Category
Technology
-
view
548 -
download
1
Transcript of Large scale-olap-with-kobayashi
Large-scaleOLAP withKobayashi
Boundary Tech TalksFri, May 18, 2012
Dietrich Featherston, Boundary@d2fn
Friday, May 18, 12
Monitoring is an analytics problem
Friday, May 18, 12
Historical Perspective
Friday, May 18, 12
1 minute collection intervals
Arbitrary OLAP
Friday, May 18, 12
Cassandra
bitset indexes per dimension
query-time sampling
Friday, May 18, 12
Friday, May 18, 12
riak_core + fastbit
Friday, May 18, 12
apply intelligence to the problem
Friday, May 18, 12
Arbitrary OLAP requires 2n data
cubeswhere n is dimensionality
Friday, May 18, 12
dimensions (11)epoch secondsepoch minutesepoch hoursmeter idsource ipsource portdest ipdest portinterfacecountrynetwork
measurements (4)egress packetsegress octetsingress packetsingress octets
Friday, May 18, 12
Total Volume.by Host Port/Protocol Country Network+ meterFor each aggregation period
Friday, May 18, 12
Friday, May 18, 12
15 < 211
Friday, May 18, 12
24 hours 2 Months ~10 years
86,400Observations
(per monitored host per query)
Friday, May 18, 12
86,400*15 ≈ 1.3MObservations
(per monitored host )
Friday, May 18, 12
Total Observations(for half a million meters)
Friday, May 18, 12
{{100 meters
10 secondsRiak Key Layout
< 80KB
Friday, May 18, 12
Total Observations
Friday, May 18, 12
Friday, May 18, 12
Friday, May 18, 12
Friday, May 18, 12
Friday, May 18, 12
Friday, May 18, 12
Bitcask would have been nice
LevelDB backend
Use leveldb cache to bound memory
Friday, May 18, 12
Compute your keys
Use secondary indexes sparingly
Friday, May 18, 12
Friday, May 18, 12
How do I query the database?
Friday, May 18, 12
Find 45 minutes of total traffic seen on meters 1, 2, 226, &
301 starting 18 hours ago broken down by
traffic type
Friday, May 18, 12
< 80KB
{{100 meters
10 secondsAtomic Unit of Storage
Friday, May 18, 12
0(0,99)
100(100,199)
200(200,299)
300(300,399)
400(400,499)
t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t18 t19
Time
Meter Id
Step 1: fetch appropriate blocks (riak) 45 min
12
226
301
Friday, May 18, 12
0(0,99)
100(100,199)
200(200,299)
300(300,399)
400(400,499)
t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t18 t19
Time
Meter Id
Step 2: filter 45 min
12
226
301
Friday, May 18, 12
topk( , 10)
Step 3: aggregate and perform top-k
12
226301
45 min
+
{epochMillis: 1337230140000portProtocol: "4740:6"ingressPackets: 370482ingressOctets: 3113782199egressPackets: 343780egressOctets: 37126033
},{
epochMillis: 1337230140000portProtocol: "9092:6"ingressPackets: 440915ingressOctets: 1816615857egressPackets: 481237egressOctets: 1312198133
},...
Friday, May 18, 12
http://computers-r-terrible/volume_1m_meter_port_protocol/data?from=-18h&duration=45mparts=1,2,226,301&aggregations=observationDomainId
In URL Form
Friday, May 18, 12
http://computers-r-terrible/volume_1m_meter_port_protocol/data?from=-18h&duration=45m&parts=1,2,226,301&aggregations=observationDomainId,epochMillis
Arbitrary Aggregations
Friday, May 18, 12
“unfortunately the project has been
blocked for weeks choosing a name”
Friday, May 18, 12
V
V ʹ′V ≃ Vʹ′
Friday, May 18, 12
V
V ʹ′V ≃ Vʹ′
Friday, May 18, 12
V
V ʹ′V ≃ Vʹ′
Friday, May 18, 12
Future -->
Friday, May 18, 12
Send expired data to cold storage
output in arbitrary time resolution
Friday, May 18, 12
Open source the data cubing and predicate
matching code
Query grammar for kobayashi
Friday, May 18, 12
questions?
Friday, May 18, 12