Large scale-olap-with-kobayashi

42
Large-scale OLAP with Kobayashi Boundary Tech Talks Fri, May 18, 2012 Dietrich Featherston, Boundary @d2fn Friday, May 18, 12

Transcript of Large scale-olap-with-kobayashi

Page 1: Large scale-olap-with-kobayashi

Large-scaleOLAP withKobayashi

Boundary Tech TalksFri, May 18, 2012

Dietrich Featherston, Boundary@d2fn

Friday, May 18, 12

Page 2: Large scale-olap-with-kobayashi

Monitoring is an analytics problem

Friday, May 18, 12

Page 3: Large scale-olap-with-kobayashi

Historical Perspective

Friday, May 18, 12

Page 4: Large scale-olap-with-kobayashi

1 minute collection intervals

Arbitrary OLAP

Friday, May 18, 12

Page 5: Large scale-olap-with-kobayashi

Cassandra

bitset indexes per dimension

query-time sampling

Friday, May 18, 12

Page 6: Large scale-olap-with-kobayashi

Friday, May 18, 12

Page 7: Large scale-olap-with-kobayashi

riak_core + fastbit

Friday, May 18, 12

Page 8: Large scale-olap-with-kobayashi

apply intelligence to the problem

Friday, May 18, 12

Page 9: Large scale-olap-with-kobayashi

Arbitrary OLAP requires 2n data

cubeswhere n is dimensionality

Friday, May 18, 12

Page 10: Large scale-olap-with-kobayashi

dimensions (11)epoch secondsepoch minutesepoch hoursmeter idsource ipsource portdest ipdest portinterfacecountrynetwork

measurements (4)egress packetsegress octetsingress packetsingress octets

Friday, May 18, 12

Page 11: Large scale-olap-with-kobayashi

Total Volume.by Host Port/Protocol Country Network+ meterFor each aggregation period

Friday, May 18, 12

Page 12: Large scale-olap-with-kobayashi

Friday, May 18, 12

Page 13: Large scale-olap-with-kobayashi

15 < 211

Friday, May 18, 12

Page 14: Large scale-olap-with-kobayashi

24 hours 2 Months ~10 years

86,400Observations

(per monitored host per query)

Friday, May 18, 12

Page 15: Large scale-olap-with-kobayashi

86,400*15 ≈ 1.3MObservations

(per monitored host )

Friday, May 18, 12

Page 16: Large scale-olap-with-kobayashi

Total Observations(for half a million meters)

Friday, May 18, 12

Page 17: Large scale-olap-with-kobayashi

{{100 meters

10 secondsRiak Key Layout

< 80KB

Friday, May 18, 12

Page 18: Large scale-olap-with-kobayashi

Total Observations

Friday, May 18, 12

Page 19: Large scale-olap-with-kobayashi

Friday, May 18, 12

Page 20: Large scale-olap-with-kobayashi

Friday, May 18, 12

Page 21: Large scale-olap-with-kobayashi

Friday, May 18, 12

Page 22: Large scale-olap-with-kobayashi

Friday, May 18, 12

Page 23: Large scale-olap-with-kobayashi

Friday, May 18, 12

Page 24: Large scale-olap-with-kobayashi

Bitcask would have been nice

LevelDB backend

Use leveldb cache to bound memory

Friday, May 18, 12

Page 25: Large scale-olap-with-kobayashi

Compute your keys

Use secondary indexes sparingly

Friday, May 18, 12

Page 26: Large scale-olap-with-kobayashi

Friday, May 18, 12

Page 27: Large scale-olap-with-kobayashi

How do I query the database?

Friday, May 18, 12

Page 28: Large scale-olap-with-kobayashi

Find 45 minutes of total traffic seen on meters 1, 2, 226, &

301 starting 18 hours ago broken down by

traffic type

Friday, May 18, 12

Page 29: Large scale-olap-with-kobayashi

< 80KB

{{100 meters

10 secondsAtomic Unit of Storage

Friday, May 18, 12

Page 30: Large scale-olap-with-kobayashi

0(0,99)

100(100,199)

200(200,299)

300(300,399)

400(400,499)

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t18 t19

Time

Meter Id

Step 1: fetch appropriate blocks (riak) 45 min

12

226

301

Friday, May 18, 12

Page 31: Large scale-olap-with-kobayashi

0(0,99)

100(100,199)

200(200,299)

300(300,399)

400(400,499)

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t18 t19

Time

Meter Id

Step 2: filter 45 min

12

226

301

Friday, May 18, 12

Page 32: Large scale-olap-with-kobayashi

topk( , 10)

Step 3: aggregate and perform top-k

12

226301

45 min

+

{epochMillis: 1337230140000portProtocol: "4740:6"ingressPackets: 370482ingressOctets: 3113782199egressPackets: 343780egressOctets: 37126033

},{

epochMillis: 1337230140000portProtocol: "9092:6"ingressPackets: 440915ingressOctets: 1816615857egressPackets: 481237egressOctets: 1312198133

},...

Friday, May 18, 12

Page 33: Large scale-olap-with-kobayashi

http://computers-r-terrible/volume_1m_meter_port_protocol/data?from=-18h&duration=45mparts=1,2,226,301&aggregations=observationDomainId

In URL Form

Friday, May 18, 12

Page 34: Large scale-olap-with-kobayashi

http://computers-r-terrible/volume_1m_meter_port_protocol/data?from=-18h&duration=45m&parts=1,2,226,301&aggregations=observationDomainId,epochMillis

Arbitrary Aggregations

Friday, May 18, 12

Page 35: Large scale-olap-with-kobayashi

“unfortunately the project has been

blocked for weeks choosing a name”

Friday, May 18, 12

Page 36: Large scale-olap-with-kobayashi

V

V ʹ′V ≃ Vʹ′

Friday, May 18, 12

Page 37: Large scale-olap-with-kobayashi

V

V ʹ′V ≃ Vʹ′

Friday, May 18, 12

Page 38: Large scale-olap-with-kobayashi

V

V ʹ′V ≃ Vʹ′

Friday, May 18, 12

Page 39: Large scale-olap-with-kobayashi

Future -->

Friday, May 18, 12

Page 40: Large scale-olap-with-kobayashi

Send expired data to cold storage

output in arbitrary time resolution

Friday, May 18, 12

Page 41: Large scale-olap-with-kobayashi

Open source the data cubing and predicate

matching code

Query grammar for kobayashi

Friday, May 18, 12

Page 42: Large scale-olap-with-kobayashi

questions?

Friday, May 18, 12