Surfing the event stream

129
@samnewman #geecon Surfing The Event Stream Sam Newman ThoughtWorks Sunday, 21 July 13

description

As presented at GeeCon 2013. We have lots of information available about our systems. CPU, disk IO, orders placed, error rates, users logged in. But typically all these pieces of information are collected, aggregated and stored in very different ways making correlation difficult and increasing the operational overhead of our systems. What if we could treat all of this information as events? What if we could aggregate, store, and report on all of this information as a uniform event stream? This talk will look at emerging trends in the space of log aggregation, monitoring and event streaming to paint a picture for how you too can start to make real use of the information already available to you using nothing more complex than some free, off the shelf Open Source software.

Transcript of Surfing the event stream

Page 1: Surfing the event stream

@samnewman#geecon

Surfing The Event StreamSam Newman

ThoughtWorks

Sunday, 21 July 13

Page 2: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 3: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 4: Surfing the event stream

@samnewman#geecon

Operational Data

Sunday, 21 July 13

Page 5: Surfing the event stream

@samnewman#geecon

Operational Data

CPU

Sunday, 21 July 13

Page 6: Surfing the event stream

@samnewman#geecon

Operational Data

CPU Memory Use

Sunday, 21 July 13

Page 7: Surfing the event stream

@samnewman#geecon

Operational Data

CPU Memory Use

Threads

Sunday, 21 July 13

Page 8: Surfing the event stream

@samnewman#geecon

Operational Data

CPU

Disk IO

Memory Use

Threads

Sunday, 21 July 13

Page 9: Surfing the event stream

@samnewman#geecon

Collection & Display

• sar

• syslog

• collectd

• syslog-ng

• nagios

• ganglia

Sunday, 21 July 13

Page 10: Surfing the event stream

@samnewman#geecon

Server

Server

Server

Server

Sunday, 21 July 13

Page 11: Surfing the event stream

@samnewman#geecon

Server

Server

Server

Server

Sunday, 21 July 13

Page 12: Surfing the event stream

@samnewman#geecon

Server

Server

Server

Server

Sunday, 21 July 13

Page 13: Surfing the event stream

@samnewman#geecon

Server

Server

Server

Server

Sunday, 21 July 13

Page 14: Surfing the event stream

@samnewman#geecon

Business Data

Sunday, 21 July 13

Page 15: Surfing the event stream

@samnewman#geecon

Business Data

Orders Placed

Sunday, 21 July 13

Page 16: Surfing the event stream

@samnewman#geecon

Business Data

Orders Placed Revenue

Sunday, 21 July 13

Page 17: Surfing the event stream

@samnewman#geecon

Business Data

Orders Placed Revenue

Fraud Cases

Sunday, 21 July 13

Page 18: Surfing the event stream

@samnewman#geecon

Business Data

Orders Placed

Bounce Rate

Revenue

Fraud Cases

Sunday, 21 July 13

Page 19: Surfing the event stream

@samnewman#geecon

How did we handle them?

• Google Analytics

• Data Warehouse Systems

• Log files!

Sunday, 21 July 13

Page 20: Surfing the event stream

@samnewman#geecon

Something Happened!

Sunday, 21 July 13

Page 21: Surfing the event stream

@samnewman#geecon

Something Happened!

What Should We Do?

Sunday, 21 July 13

Page 22: Surfing the event stream

@samnewman#geecon

Something Happened!

What Should We Do?

Sunday, 21 July 13

Page 23: Surfing the event stream

@samnewman#geecon

Something Happened!

What Should We Do?

Sunday, 21 July 13

Page 24: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 25: Surfing the event stream

@samnewman#geecon

http://blog.jgc.org/2006/05/what-slashdot-effect-looks-like.html

Sunday, 21 July 13

Page 26: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 27: Surfing the event stream

@samnewman#geecon

Fast

Sunday, 21 July 13

Page 28: Surfing the event stream

@samnewman#geecon

Fast

And Easy...

Sunday, 21 July 13

Page 29: Surfing the event stream

@samnewman#geecon

Fast

And Easy...

At Scale

Sunday, 21 July 13

Page 30: Surfing the event stream

@samnewman#geecon

Aggregation Is Key

Sunday, 21 July 13

Page 31: Surfing the event stream

@samnewman#geecon

Mark McGranaghan: "Logs as Data"

http://blip.tv/clojure/mark-mcgranaghan-logs-as-data-5953857

Sunday, 21 July 13

Page 32: Surfing the event stream

@samnewman#geecon

Paul Ingles: "Users as Data"

http://vimeo.com/45136211

Sunday, 21 July 13

Page 33: Surfing the event stream

@samnewman#geecon

Log Stash + Graylog2

Sunday, 21 July 13

Page 34: Surfing the event stream

@samnewman#geecon

Log Stash + Graylog2

Sunday, 21 July 13

Page 35: Surfing the event stream

@samnewman#geecon

Log Stash + Graylog2

Sunday, 21 July 13

Page 36: Surfing the event stream

@samnewman#geecon

Log Stash + Graylog2

Sunday, 21 July 13

Page 37: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 38: Surfing the event stream

@samnewman#geecon

Graphite

Sunday, 21 July 13

Page 39: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 40: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 41: Surfing the event stream

@samnewman#geecon

www01.cpuUsage 42 1286269200

Sunday, 21 July 13

Page 42: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 43: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 44: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 45: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 46: Surfing the event stream

@samnewman#geecon

???

Sunday, 21 July 13

Page 47: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 48: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 49: Surfing the event stream

@samnewman#geecon

Graphite

Sunday, 21 July 13

Page 50: Surfing the event stream

@samnewman#geecon

Graphite

Server

collectd

Sunday, 21 July 13

Page 51: Surfing the event stream

@samnewman#geecon

Graphite

AppServer

collectd

Sunday, 21 July 13

Page 52: Surfing the event stream

@samnewman#geecon

Graphite

App

Server

Server

collectd

Sunday, 21 July 13

Page 53: Surfing the event stream

@samnewman#geecon

Graphite

App

Server

Server

collectd Yammer Metrics

Sunday, 21 July 13

Page 54: Surfing the event stream

@samnewman#geecon

Graphite

App

Server

Server

collectd Yammer Metrics

Sunday, 21 July 13

Page 55: Surfing the event stream

@samnewman#geecon

Volume!

Sunday, 21 July 13

Page 56: Surfing the event stream

@samnewman#geecon

Aggregation!

Sunday, 21 July 13

Page 57: Surfing the event stream

@samnewman#geecon

www01.cpuUsage 42 1286269200

Sunday, 21 July 13

Page 58: Surfing the event stream

@samnewman#geecon

orderplaced 1 1286269200

Sunday, 21 July 13

Page 59: Surfing the event stream

@samnewman#geecon

orderplaced 1 1286269200

orderplaced 1 1286269200

Sunday, 21 July 13

Page 60: Surfing the event stream

@samnewman#geecon

orderplaced 1 1286269200

orderplaced 1 1286269200

orderplaced = 1

Sunday, 21 July 13

Page 61: Surfing the event stream

@samnewman#geecon

StatsD

Sunday, 21 July 13

Page 62: Surfing the event stream

@samnewman#geecon

Counters

ordersplaced:1|c

Sunday, 21 July 13

Page 63: Surfing the event stream

@samnewman#geecon

timings

orderduration:140|ms

Sunday, 21 July 13

Page 64: Surfing the event stream

@samnewman#geecon

StatsD

Client Client

Graphite

Sunday, 21 July 13

Page 65: Surfing the event stream

@samnewman#geecon

StatsD

Client Client

Graphite

Sunday, 21 July 13

Page 66: Surfing the event stream

@samnewman#geecon

StatsD

Client Client

Graphite

Sunday, 21 July 13

Page 67: Surfing the event stream

@samnewman#geecon

Riemann

Sunday, 21 July 13

Page 68: Surfing the event stream

@samnewman#geecon

Riemann

Sunday, 21 July 13

Page 69: Surfing the event stream

@samnewman#geecon

Riemann

Sunday, 21 July 13

Page 70: Surfing the event stream

@samnewman#geecon

Riemann

Sunday, 21 July 13

Page 71: Surfing the event stream

@samnewman#geecon

Riemann

Client Client

Graphite

Sunday, 21 July 13

Page 72: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 73: Surfing the event stream

@samnewman#geecon

(service "api req") (percentiles 5 [0.5 0.95 0.99] index))

Sunday, 21 July 13

Page 74: Surfing the event stream

@samnewman#geecon

(service "api req") (percentiles 5 [0.5 0.95 0.99] index))

Sunday, 21 July 13

Page 75: Surfing the event stream

@samnewman#geecon

(def tell-ops (rollup 5 3600 (email "[email protected]")))

(streams (where (state "critical") tell-ops))

Sunday, 21 July 13

Page 76: Surfing the event stream

@samnewman#geecon

(let [client (tcp-client :host "aggregator")] (by [:host :service] (changed :state (forward client))))

Sunday, 21 July 13

Page 77: Surfing the event stream

@samnewman#geecon

Riemann Server

Client Client

Sunday, 21 July 13

Page 78: Surfing the event stream

@samnewman#geecon

Riemann Server

Client Client

Riemann Server

Client Client

Sunday, 21 July 13

Page 79: Surfing the event stream

@samnewman#geecon

Riemann Server

Client Client

Riemann Server

Client Client

Riemann Server

Sunday, 21 July 13

Page 80: Surfing the event stream

@samnewman#geecon

So What Do We Have?

Sunday, 21 July 13

Page 81: Surfing the event stream

@samnewman#geecon

Server Server

GraphiteGraylog 2

Server

Sunday, 21 July 13

Page 82: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 83: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 84: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 85: Surfing the event stream

@samnewman#geecon

Server Server

Graphite Graylog 2Dashboard A

Dashboard B

Dashboard C

Server

Sunday, 21 July 13

Page 86: Surfing the event stream

@samnewman#geecon

Server Server

StatsD/Riemann

Graylog 2

Graphite

Dashboard A

Dashboard B

Dashboard C

Sunday, 21 July 13

Page 87: Surfing the event stream

@samnewman#geecon

http://shopify.github.io/dashing/

Sunday, 21 July 13

Page 88: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 89: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 90: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 91: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 92: Surfing the event stream

@samnewman#geecon

RealtimeAggregator

Sunday, 21 July 13

Page 93: Surfing the event stream

@samnewman#geecon

RealtimeAggregator

Sunday, 21 July 13

Page 94: Surfing the event stream

@samnewman#geecon

RealtimeAggregator

Sunday, 21 July 13

Page 95: Surfing the event stream

@samnewman#geecon

RealtimeAggregator

Data is lost!

Sunday, 21 July 13

Page 96: Surfing the event stream

@samnewman#geecon

RealtimeAggregator

Data is lost!

Sunday, 21 July 13

Page 97: Surfing the event stream

@samnewman#geecon

Real-time metrics requires upfront

knowledge

Sunday, 21 July 13

Page 98: Surfing the event stream

@samnewman#geecon

RealtimeAggregator

Sunday, 21 July 13

Page 99: Surfing the event stream

@samnewman#geecon

RealtimeAggregator

Sunday, 21 July 13

Page 100: Surfing the event stream

@samnewman#geecon

RealtimeAggregator

Lossless Event Store

Sunday, 21 July 13

Page 101: Surfing the event stream

@samnewman#geecon

RealtimeAggregator

Lossless Event Store

Sunday, 21 July 13

Page 102: Surfing the event stream

@samnewman#geecon

RealtimeAggregator

Lossless Event Store

HadoopHBase

Cassandra

Sunday, 21 July 13

Page 103: Surfing the event stream

@samnewman#geecon

Riemann Server

Client Client

Sunday, 21 July 13

Page 104: Surfing the event stream

@samnewman#geecon

Riemann Server

Client Client

Lossless Event Store

Sunday, 21 July 13

Page 105: Surfing the event stream

@samnewman#geecon

Event Sourcing

Sunday, 21 July 13

Page 106: Surfing the event stream

@samnewman#geecon

But...

Sunday, 21 July 13

Page 107: Surfing the event stream

@samnewman#geecon

RealtimeAggregator

Sunday, 21 July 13

Page 108: Surfing the event stream

@samnewman#geecon

Lossless Event Store

RealtimeAggregator

Sunday, 21 July 13

Page 109: Surfing the event stream

@samnewman#geecon

Can I have one view?

Lossless Event Store

RealtimeAggregator

Sunday, 21 July 13

Page 110: Surfing the event stream

@samnewman#geecon

http://nathanmarz.com/

Sunday, 21 July 13

Page 111: Surfing the event stream

@samnewman#geecon

Lossless Event Store

Realtime Aggregator

Sunday, 21 July 13

Page 112: Surfing the event stream

@samnewman#geecon

Lossless Event Store

Realtime Aggregator

Sunday, 21 July 13

Page 113: Surfing the event stream

@samnewman#geecon

Lossless Event Store

Realtime Aggregator

Up to date, but only for a small window

Sunday, 21 July 13

Page 114: Surfing the event stream

@samnewman#geecon

Lossless Event Store

Realtime Aggregator

Consistent, but out of date

Up to date, but only for a small window

Sunday, 21 July 13

Page 115: Surfing the event stream

@samnewman#geecon

Lossless Event Store

Realtime Aggregator

Unified Query

Consistent, but out of date

Up to date, but only for a small window

Sunday, 21 July 13

Page 116: Surfing the event stream

@samnewman#geecon

Lossless Event Store

Realtime Aggregator

Lambda Architecture

Unified Query

Consistent, but out of date

Up to date, but only for a small window

Sunday, 21 July 13

Page 117: Surfing the event stream

@samnewman#geecon

The Future?

Sunday, 21 July 13

Page 118: Surfing the event stream

@samnewman#geecon

Server Server

Aggregating Relay

Graphite

Graylog 2

Hadoop

Sunday, 21 July 13

Page 119: Surfing the event stream

@samnewman#geecon

Server Server

Aggregating Relay

Graphite

Graylog 2

Hadoop

Unified Query

Sunday, 21 July 13

Page 120: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 121: Surfing the event stream

@samnewman#geecon

All Your Data

Sunday, 21 July 13

Page 122: Surfing the event stream

@samnewman#geecon

All Your Data

In Realtime

Sunday, 21 July 13

Page 123: Surfing the event stream

@samnewman#geecon

All Your Data

In Realtime

Sunday, 21 July 13

Page 124: Surfing the event stream

@samnewman#geeconSunday, 21 July 13

Page 125: Surfing the event stream

@samnewman#geecon

Find and free your data

Sunday, 21 July 13

Page 126: Surfing the event stream

@samnewman#geecon

Find and free your data

Start simple

Sunday, 21 July 13

Page 127: Surfing the event stream

@samnewman#geecon

Find and free your data

Start simple

Create different views for different stakeholders

Sunday, 21 July 13

Page 128: Surfing the event stream

@samnewman#geecon

Find and free your data

Start simple

Create different views for different stakeholders

Don’t be scared of real-time!

Sunday, 21 July 13

Page 129: Surfing the event stream

@samnewman#geecon

[email protected]@samnewman

Sunday, 21 July 13