Surfing the event stream

Post on 10-May-2015

865 views 0 download

Tags:

description

As presented at GeeCon 2013. We have lots of information available about our systems. CPU, disk IO, orders placed, error rates, users logged in. But typically all these pieces of information are collected, aggregated and stored in very different ways making correlation difficult and increasing the operational overhead of our systems. What if we could treat all of this information as events? What if we could aggregate, store, and report on all of this information as a uniform event stream? This talk will look at emerging trends in the space of log aggregation, monitoring and event streaming to paint a picture for how you too can start to make real use of the information already available to you using nothing more complex than some free, off the shelf Open Source software.

Transcript of Surfing the event stream

@samnewman#geecon

Surfing The Event StreamSam Newman

ThoughtWorks

Sunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geecon

Operational Data

Sunday, 21 July 13

@samnewman#geecon

Operational Data

CPU

Sunday, 21 July 13

@samnewman#geecon

Operational Data

CPU Memory Use

Sunday, 21 July 13

@samnewman#geecon

Operational Data

CPU Memory Use

Threads

Sunday, 21 July 13

@samnewman#geecon

Operational Data

CPU

Disk IO

Memory Use

Threads

Sunday, 21 July 13

@samnewman#geecon

Collection & Display

• sar

• syslog

• collectd

• syslog-ng

• nagios

• ganglia

Sunday, 21 July 13

@samnewman#geecon

Server

Server

Server

Server

Sunday, 21 July 13

@samnewman#geecon

Server

Server

Server

Server

Sunday, 21 July 13

@samnewman#geecon

Server

Server

Server

Server

Sunday, 21 July 13

@samnewman#geecon

Server

Server

Server

Server

Sunday, 21 July 13

@samnewman#geecon

Business Data

Sunday, 21 July 13

@samnewman#geecon

Business Data

Orders Placed

Sunday, 21 July 13

@samnewman#geecon

Business Data

Orders Placed Revenue

Sunday, 21 July 13

@samnewman#geecon

Business Data

Orders Placed Revenue

Fraud Cases

Sunday, 21 July 13

@samnewman#geecon

Business Data

Orders Placed

Bounce Rate

Revenue

Fraud Cases

Sunday, 21 July 13

@samnewman#geecon

How did we handle them?

• Google Analytics

• Data Warehouse Systems

• Log files!

Sunday, 21 July 13

@samnewman#geecon

Something Happened!

Sunday, 21 July 13

@samnewman#geecon

Something Happened!

What Should We Do?

Sunday, 21 July 13

@samnewman#geecon

Something Happened!

What Should We Do?

Sunday, 21 July 13

@samnewman#geecon

Something Happened!

What Should We Do?

Sunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geecon

http://blog.jgc.org/2006/05/what-slashdot-effect-looks-like.html

Sunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geecon

Fast

Sunday, 21 July 13

@samnewman#geecon

Fast

And Easy...

Sunday, 21 July 13

@samnewman#geecon

Fast

And Easy...

At Scale

Sunday, 21 July 13

@samnewman#geecon

Aggregation Is Key

Sunday, 21 July 13

@samnewman#geecon

Mark McGranaghan: "Logs as Data"

http://blip.tv/clojure/mark-mcgranaghan-logs-as-data-5953857

Sunday, 21 July 13

@samnewman#geecon

Paul Ingles: "Users as Data"

http://vimeo.com/45136211

Sunday, 21 July 13

@samnewman#geecon

Log Stash + Graylog2

Sunday, 21 July 13

@samnewman#geecon

Log Stash + Graylog2

Sunday, 21 July 13

@samnewman#geecon

Log Stash + Graylog2

Sunday, 21 July 13

@samnewman#geecon

Log Stash + Graylog2

Sunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geecon

Graphite

Sunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geecon

www01.cpuUsage 42 1286269200

Sunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geecon

???

Sunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geecon

Graphite

Sunday, 21 July 13

@samnewman#geecon

Graphite

Server

collectd

Sunday, 21 July 13

@samnewman#geecon

Graphite

AppServer

collectd

Sunday, 21 July 13

@samnewman#geecon

Graphite

App

Server

Server

collectd

Sunday, 21 July 13

@samnewman#geecon

Graphite

App

Server

Server

collectd Yammer Metrics

Sunday, 21 July 13

@samnewman#geecon

Graphite

App

Server

Server

collectd Yammer Metrics

Sunday, 21 July 13

@samnewman#geecon

Volume!

Sunday, 21 July 13

@samnewman#geecon

Aggregation!

Sunday, 21 July 13

@samnewman#geecon

www01.cpuUsage 42 1286269200

Sunday, 21 July 13

@samnewman#geecon

orderplaced 1 1286269200

Sunday, 21 July 13

@samnewman#geecon

orderplaced 1 1286269200

orderplaced 1 1286269200

Sunday, 21 July 13

@samnewman#geecon

orderplaced 1 1286269200

orderplaced 1 1286269200

orderplaced = 1

Sunday, 21 July 13

@samnewman#geecon

StatsD

Sunday, 21 July 13

@samnewman#geecon

Counters

ordersplaced:1|c

Sunday, 21 July 13

@samnewman#geecon

timings

orderduration:140|ms

Sunday, 21 July 13

@samnewman#geecon

StatsD

Client Client

Graphite

Sunday, 21 July 13

@samnewman#geecon

StatsD

Client Client

Graphite

Sunday, 21 July 13

@samnewman#geecon

StatsD

Client Client

Graphite

Sunday, 21 July 13

@samnewman#geecon

Riemann

Sunday, 21 July 13

@samnewman#geecon

Riemann

Sunday, 21 July 13

@samnewman#geecon

Riemann

Sunday, 21 July 13

@samnewman#geecon

Riemann

Sunday, 21 July 13

@samnewman#geecon

Riemann

Client Client

Graphite

Sunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geecon

(service "api req") (percentiles 5 [0.5 0.95 0.99] index))

Sunday, 21 July 13

@samnewman#geecon

(service "api req") (percentiles 5 [0.5 0.95 0.99] index))

Sunday, 21 July 13

@samnewman#geecon

(def tell-ops (rollup 5 3600 (email "ops@vonbraun.mil")))

(streams (where (state "critical") tell-ops))

Sunday, 21 July 13

@samnewman#geecon

(let [client (tcp-client :host "aggregator")] (by [:host :service] (changed :state (forward client))))

Sunday, 21 July 13

@samnewman#geecon

Riemann Server

Client Client

Sunday, 21 July 13

@samnewman#geecon

Riemann Server

Client Client

Riemann Server

Client Client

Sunday, 21 July 13

@samnewman#geecon

Riemann Server

Client Client

Riemann Server

Client Client

Riemann Server

Sunday, 21 July 13

@samnewman#geecon

So What Do We Have?

Sunday, 21 July 13

@samnewman#geecon

Server Server

GraphiteGraylog 2

Server

Sunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geecon

Server Server

Graphite Graylog 2Dashboard A

Dashboard B

Dashboard C

Server

Sunday, 21 July 13

@samnewman#geecon

Server Server

StatsD/Riemann

Graylog 2

Graphite

Dashboard A

Dashboard B

Dashboard C

Sunday, 21 July 13

@samnewman#geecon

http://shopify.github.io/dashing/

Sunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geecon

RealtimeAggregator

Sunday, 21 July 13

@samnewman#geecon

RealtimeAggregator

Sunday, 21 July 13

@samnewman#geecon

RealtimeAggregator

Sunday, 21 July 13

@samnewman#geecon

RealtimeAggregator

Data is lost!

Sunday, 21 July 13

@samnewman#geecon

RealtimeAggregator

Data is lost!

Sunday, 21 July 13

@samnewman#geecon

Real-time metrics requires upfront

knowledge

Sunday, 21 July 13

@samnewman#geecon

RealtimeAggregator

Sunday, 21 July 13

@samnewman#geecon

RealtimeAggregator

Sunday, 21 July 13

@samnewman#geecon

RealtimeAggregator

Lossless Event Store

Sunday, 21 July 13

@samnewman#geecon

RealtimeAggregator

Lossless Event Store

Sunday, 21 July 13

@samnewman#geecon

RealtimeAggregator

Lossless Event Store

HadoopHBase

Cassandra

Sunday, 21 July 13

@samnewman#geecon

Riemann Server

Client Client

Sunday, 21 July 13

@samnewman#geecon

Riemann Server

Client Client

Lossless Event Store

Sunday, 21 July 13

@samnewman#geecon

Event Sourcing

Sunday, 21 July 13

@samnewman#geecon

But...

Sunday, 21 July 13

@samnewman#geecon

RealtimeAggregator

Sunday, 21 July 13

@samnewman#geecon

Lossless Event Store

RealtimeAggregator

Sunday, 21 July 13

@samnewman#geecon

Can I have one view?

Lossless Event Store

RealtimeAggregator

Sunday, 21 July 13

@samnewman#geecon

http://nathanmarz.com/

Sunday, 21 July 13

@samnewman#geecon

Lossless Event Store

Realtime Aggregator

Sunday, 21 July 13

@samnewman#geecon

Lossless Event Store

Realtime Aggregator

Sunday, 21 July 13

@samnewman#geecon

Lossless Event Store

Realtime Aggregator

Up to date, but only for a small window

Sunday, 21 July 13

@samnewman#geecon

Lossless Event Store

Realtime Aggregator

Consistent, but out of date

Up to date, but only for a small window

Sunday, 21 July 13

@samnewman#geecon

Lossless Event Store

Realtime Aggregator

Unified Query

Consistent, but out of date

Up to date, but only for a small window

Sunday, 21 July 13

@samnewman#geecon

Lossless Event Store

Realtime Aggregator

Lambda Architecture

Unified Query

Consistent, but out of date

Up to date, but only for a small window

Sunday, 21 July 13

@samnewman#geecon

The Future?

Sunday, 21 July 13

@samnewman#geecon

Server Server

Aggregating Relay

Graphite

Graylog 2

Hadoop

Sunday, 21 July 13

@samnewman#geecon

Server Server

Aggregating Relay

Graphite

Graylog 2

Hadoop

Unified Query

Sunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geecon

All Your Data

Sunday, 21 July 13

@samnewman#geecon

All Your Data

In Realtime

Sunday, 21 July 13

@samnewman#geecon

All Your Data

In Realtime

Sunday, 21 July 13

@samnewman#geeconSunday, 21 July 13

@samnewman#geecon

Find and free your data

Sunday, 21 July 13

@samnewman#geecon

Find and free your data

Start simple

Sunday, 21 July 13

@samnewman#geecon

Find and free your data

Start simple

Create different views for different stakeholders

Sunday, 21 July 13

@samnewman#geecon

Find and free your data

Start simple

Create different views for different stakeholders

Don’t be scared of real-time!

Sunday, 21 July 13

@samnewman#geecon

Thanks!snewman@thoughtworks.com@samnewman

Sunday, 21 July 13