Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the...

41
Site Reliability Engineer Mahak Lamba Monitoring at LinkedIn

Transcript of Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the...

Page 1: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Site Reliability Engineer

Mahak Lamba

Monitoring at LinkedIn

Page 2: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

What gets measured, gets fixed.

Page 3: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

2011 2012 20152010 2018

Visualization

Alerting

Synthetic Monitoring

Notification

Storage

Page 4: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Site situation: Before 2010

● Peak traffic periods Mon-Wed ~ 8am.

● Regular capacity related outages Mon-Wed

~ 8am

● Bi-weekly downtime maintenances

● Zero tolerance for failure in application

stack

Page 5: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

2011 2012 20152010 2018

Open Source Tool

Visualization

Alerting

Synthetic Monitoring

Notification

Storage

Page 6: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Before2010

Metrics:

● Health checks

● CPU

● SNMP

● MBean

Open Source Tool

Used for data storage, visualization and alerting

Page 7: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Metrics were not being properly used

Page 8: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

2011 2012 20152010 2018

Open Source Tool

Visualization

Alerting Notification

Storage

Ingraphs

Synthetic Monitoring

Page 9: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

LinkedIn’s graphing system which lets you visualize the

metrics/data.

inGraphs

Uses RRDs to plot the metrics.

2010

Page 10: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

● Granularity selection

● Regex matching

● Dashboards

● Test graphs and

dashboards

Features

inGraphs

Page 11: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Too late to act !

Page 12: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

2011 2012 20152010 2018

Open Source Tool

Visualization

Alerting Notification

Storage

Ingraphs

Autoalerts

Synthetic Monitoring

Page 13: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

It is LinkedIn’s automated alerting system.

Autoalerts 2011

Alerts on the metrics fetched from RRDs.

Page 14: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

It is LinkedIn’s automated alerting system.

Autoalerts

● Yaml format

● State checks

● Alert history

● Suppression

● Plugins

Features

Page 15: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

2011 2012 20152010 2018

Open Source Tool

Visualization

Alerting Notification

Storage

Ingraphs

Autoalerts

Autometrics

Synthetic Monitoring

Page 16: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Self service model to add metrics

● Metrics pushed into Kafka

● Read by Kafka consumers

● Stored as RRDs

Autometrics 2011

Page 17: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

17

Applications

Kafka

Autometrics

xx

RRD

SSD

Kafka Reader

RRD Writer

Page 18: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

2011 2011 20152010 2018

Open Source Tool

Visualization

Alerting Notification

Storage

Ingraphs

Autoalerts

Autometrics

Synthetic Monitoring

Inmon

Page 19: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Internal synthetic monitoring tool

● Inside LinkedIn Datacenters

● Closer to servers

● No licensing cost involved

InMon 2012

Page 20: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

2011 2011 20152010 2018

Open Source Tool

Visualization

Alerting Notification

Storage

Ingraphs

Autoalerts

Autometrics

Synthetic Monitoring

Inmon

Iris

Page 21: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Iris

An alert notification and escalation platform.

https://github.com/linkedin/iris

https://github.com/linkedin/iris-mobile

2015

Page 22: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Iris

Vendor

Iris-frontend Iris-api

Iris-sender

Iris-relay

MySQL

Incident

Trigger

POST

/incidents

Iris

Page 23: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Plans

Page 24: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Plans

Page 25: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Oncall Calendar

Page 26: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Why do the same task twice manually ?

Page 27: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

2011 2011 20152010 2018

Open Source Tool

Visualization

Alerting Notification

Storage

Ingraphs

Autoalerts

Inmon

Iris

Nurse

Synthetic Monitoring

Page 28: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Nurse is a platform for codifying operations workflows into plans.

Features

● Triggers deployments, run commands, etc.

● Integrated with our existing tooling (JIRA, Iris, Autoalerts, etc.)

Concepts

● Plans

● Jobs

Nurse 2015

Page 29: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

2011 2011 20152010 2018

Open Source Tool

Visualization

Alerting

Storage

Notification

Storage

Ingraphs

Autoalerts

Autometrics

Iris

Nurse

Inmon

Page 30: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

● Random access

● Preallocated

● Bucketed or Window-fitted

RRDs

Page 31: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

● Write heavy system

● Frequent data compaction

● Faster replication

● Easy to maintain

Requirements

Page 32: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Options

Page 33: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Create Distributed Data Store

Page 34: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

2011 2011 20152010 2018

Open Source Tool

Visualization

Alerting Notification

Storage

Ingraphs

Autoalerts

Autometrics

Iris

Nurse

Inmon TSDS

Synthetic Monitoring

Page 35: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Responsible for collecting, storing and serving application metrics

Components

● Ingestor/Router

● Index

● Storage Nodes

TSDS 2018

Page 36: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Index

Postgres

36

Storage Nodes

Index Writer

Storage Writer

inGraphs,

Autoalerts, etc.

Metric-serverIngestor/Router

TSDS

Data loading and indexing

Querying

Page 37: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Pillars of Monitoring at LinkedIn

InGraphs: Visualization2

TSDS: Storage

1

Iris: Notification and Escalation

4

Inmon: Synthetic Monitoring6

Autoalerts: Alerting3

Nurse: Auto Remediation

5

Page 38: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Storage

Nodes

Metrics

collectors

Monitoring Infrastructure

Applications

Inmon

Autoalerts InGraphs

Metric-server

100KGraph dashboards

30MMetrics ingested/sec

460KAlerts processed/min

~3.2BTotal metrics

IRISNurse

TSDS

Page 39: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Future Plans

● Automatic dashboard generation

● Alert correlation

● Cost to Serve

Page 40: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Thank you!!

Page 41: Monitoring at LinkedIn - USENIX · LinkedIn’s graphing system which lets you visualize the metrics/data. inGraphs Uses RRDs to plot the metrics. 2010 Granularity selection Regex

Questions?