Nagios Conference 2014 - Mike Weber - Nagios Rapid Deployment Options
Deep dive into Nagios analytics
-
Upload
datadogslides -
Category
Technology
-
view
152 -
download
8
description
Transcript of Deep dive into Nagios analytics
@alqDev & OpsNagios user since 2008Datadog co-founder
A little survey
Top 3 failed checks
Top 3 failed checks
That I responded tolast week
That woke me up
That most of my teamresponded to at least once
That impacts our businessthe most?
That I responded to5 weeks ago
Top 3 failed checks
That I responded tolast week
That woke me up
That most of my teamresponded to at least once
That impacts our businessthe most?
That I responded to5 weeks ago
Using memory to prioritize remediation...
At best, finding local optimums
At worst, brownian motion
Analytics
Performance Metrics Nagios Traffic Other Sources
In the “Cloud”
Real-time graphs + analytics
Aggregation
Real-time Analytics(Nagios et al.)
Real-time Analytics
Nagios Traffic
In the “Cloud”
Real-time graphs + analytics
Nagios a “chatty” source out of 40+ Datadog supports
One example
Almost 13000 Nagios “events”over past week
Constant stream
86 notifications!
Pattern
Pattern
More data? More questions.
A dialog with dataNot a scientific study
0
2
4
6
0 250 500 750Host count
Popu
latio
n
factor(quartile)
1
2
3
4
Nagios samples
Population
25% 50% 75% 100% 20 93 322 904
Does size matter?
0
10
20
30
40
0
10
20
30
40
0
10
20
30
40
0
10
20
30
40
12
34
0 250 500 750 1000Nagios alert per host
coun
t per
wee
k
Weekly count per host split by quartile
0
10
20
30
40
0
10
20
30
40
0
10
20
30
40
0
10
20
30
40
12
34
0 250 500 750 1000Nagios alert per host
coun
t per
wee
k
Weekly count per host split by quartile
Outliers Sick hosts,
silenced checks
Notifications
Notifications1-3% of alerts notify
Little difference per quartile
Does time of day matter?
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●● ●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
4
8
12
4
8
12
4
8
12
4
8
12
12
34
0 5 10 15 20Hour of Day (UTC)
Aler
ts p
er h
our
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●● ●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
4
8
12
4
8
12
4
8
12
4
8
12
12
34
0 5 10 15 20Hour of Day (UTC)
Aler
ts p
er h
our
Mean about the sameacross quartiles
Time-based deviation?
Does the day of week matter?
0
10
20
30
40
0
10
20
30
40
0
10
20
30
40
0
10
20
30
40
12
34
Sun Mon Tue Wed Thu Fri SatDay of week
Aler
ts p
er h
our
Notifying Alerts per Day
0
10
20
30
40
0
10
20
30
40
0
10
20
30
40
0
10
20
30
40
12
34
Sun Mon Tue Wed Thu Fri SatDay of week
Aler
ts p
er h
our
Notifying Alerts per Day
Not really
Squeaky wheels? (checks)
0
10
20
30
0
10
20
30
0
10
20
30
0
10
20
30
12
34
0 50 100 150 200 250Checks ranked by noise
Aler
ts p
er h
our
Noisiest checks (overall)
Outlier
●
●
●
● ●● ●
● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●
0
10
20
30
0 20 40Checks ranked by noise
Aler
ts p
er h
our
Noisiest checks (outlier)
Outlier in more detail
0
2
4
6
8
0
2
4
6
8
0
2
4
6
8
0
2
4
6
8
12
34
0 50 100 150 200Checks ranked by noise
Aler
ts p
er h
our
Noisiest checks (without outlier)
Long Tail
Squeaky wheel? (hosts)
0
10
20
30
0
10
20
30
0
10
20
30
0
10
20
30
12
34
0 50 100 150 200Hosts ranked by noise
Aler
ts p
er h
our
Noisiest hosts (overall)
Same outlier
●
●
● ●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●
●
0
10
20
30
3
0 20 40 60Hosts ranked by noise
Aler
ts p
er h
our
Noisiest hosts (outlier)
Similar pattern as checks
0
2
4
6
8
0
2
4
6
8
0
2
4
6
8
0
2
4
6
8
12
34
0 50 100 150 200Checks ranked by noise
Aler
ts p
er h
our
Noisiest checks (without outlier)
Long Tail
Recurring alerts
●
●
●●●●
●●●
●●●
●●
●
●
●●●●●●
●
●
●●●
●●
●
●●
●●●●●
●●●●●●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●●●
●
●
●
●●
●●
●●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●●●●
●●
●
●
●
●●●●●●
●
●●●●●
●●●
●
●●●●
●●●●
●
●●●●
●
●
●
●
●●
●●
●
●●●●●●
●●●●●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●●●
●
●●●
●
●
●
●
●
●●●
●
●●
●●
●
●●●
●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●●●
●
●
●●●●
●●●
●
●
●●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●●●●
●
●
●
●
●
●
●●
●●●●●●
●
●
●●●●
●●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●●
●
●●●
●
●●●●●●●
●
●●●
●
●
●
●●
●●●●●●
●
●
●●
●
●
●●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●●●●●
●
●●●●●●●●●●●
●
●●●●●●●●●
●●●
●●●●
●●
●●
●●●●●●
●
●●●●●
●
●●●●
●
●●●
●
●
●●●
●●
●
●
●
●
●●●●●●
●
●
●●
●
●
●●●●
●
●
●
●●●●●
●
●●●●●●
●●
●●●●●●●
●
●●●●●●●●●
●●●
●
●
●
●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●●
●●●
●
●●●●
●
●●
●
●
●
●
●
●●●●
●
●●●
●
●●●●
●●●
●
●
●
●
●
●
●●●●
●
●●
●●
●
●
●
●
●
●●
●●●●●●●●
●
●●
●●●●
●
●●●●●●
●
●●●●●●●●●
●
●●●
●
●●●●●●●●●●●●
●
●●●●
●●●●●
●
●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●●●●●●●●●●
●●
●
●●●●●
●●
●●
●
●
●
●
●
●●
●
●
●●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●●●●
●
●●●●
●
●
●
●●●
●
●
●●●●●
●●
●
●
●●
●●●
●
●●●●●
●●
●●●●
●●●●●●●●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●●●●●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●●
●
●●
●●●●●
●
●
●
●●●
●●
●
●
●
●●
●
●
●●
●
●
●●●
●
●●
●●●●
●
●
●
●●●●●●●●
●
●
●
●
●●●●●
●
●●
●
●●
●
●
●●●●●●●●
●
●●
●
●●●
●
●●●
●●●●●●●
●●●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●●●●
●
●
●
●●●●●●
●●
●
●●●●●●●●●●
●
●
●●
●
●●●●●●●●●●
●
●
●
●●
●●●●●
●
●
●●
●●
●
●
●●●●●
●
●●
●
●●●●
●●●●●●●●
●
●●●●●●
●●●●●●
●
●●●●
●●●
●●
●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●
0
50
100
150
0 100 200 300Age between earliest and latest occurrence
Num
ber o
f day
s oc
curri
ng
factor(quartile)
●
●
●
●
1
2
3
4
Alert age & frequency of occurrence
Young Old
Seldom happens
HappensOften
●
●
●●●●
●●●
●●●
●●
●
●
●●●●●●
●
●
●●●
●●
●
●●
●●●●●
●●●●●●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●●●
●
●
●
●●
●●
●●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●●●●
●●
●
●
●
●●●●●●
●
●●●●●
●●●
●
●●●●
●●●●
●
●●●●
●
●
●
●
●●
●●
●
●●●●●●
●●●●●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●●●
●
●●●
●
●
●
●
●
●●●
●
●●
●●
●
●●●
●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●●●
●
●
●●●●
●●●
●
●
●●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●●●●
●
●
●
●
●
●
●●
●●●●●●
●
●
●●●●
●●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●●
●
●●●
●
●●●●●●●
●
●●●
●
●
●
●●
●●●●●●
●
●
●●
●
●
●●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●●●●●
●
●●●●●●●●●●●
●
●●●●●●●●●
●●●
●●●●
●●
●●
●●●●●●
●
●●●●●
●
●●●●
●
●●●
●
●
●●●
●●
●
●
●
●
●●●●●●
●
●
●●
●
●
●●●●
●
●
●
●●●●●
●
●●●●●●
●●
●●●●●●●
●
●●●●●●●●●
●●●
●
●
●
●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●●
●●●
●
●●●●
●
●●
●
●
●
●
●
●●●●
●
●●●
●
●●●●
●●●
●
●
●
●
●
●
●●●●
●
●●
●●
●
●
●
●
●
●●
●●●●●●●●
●
●●
●●●●
●
●●●●●●
●
●●●●●●●●●
●
●●●
●
●●●●●●●●●●●●
●
●●●●
●●●●●
●
●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●●●●●●●●●●
●●
●
●●●●●
●●
●●
●
●
●
●
●
●●
●
●
●●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●●●●
●
●●●●
●
●
●
●●●
●
●
●●●●●
●●
●
●
●●
●●●
●
●●●●●
●●
●●●●
●●●●●●●●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●●●●●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●●
●
●●
●●●●●
●
●
●
●●●
●●
●
●
●
●●
●
●
●●
●
●
●●●
●
●●
●●●●
●
●
●
●●●●●●●●
●
●
●
●
●●●●●
●
●●
●
●●
●
●
●●●●●●●●
●
●●
●
●●●
●
●●●
●●●●●●●
●●●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●●●●
●
●
●
●●●●●●
●●
●
●●●●●●●●●●
●
●
●●
●
●●●●●●●●●●
●
●
●
●●
●●●●●
●
●
●●
●●
●
●
●●●●●
●
●●
●
●●●●
●●●●●●●●
●
●●●●●●
●●●●●●
●
●●●●
●●●
●●
●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●
0
50
100
150
0 100 200 300Age between earliest and latest occurrence
Num
ber o
f day
s oc
curri
ng
factor(quartile)
●
●
●
●
1
2
3
4
Alert age & frequency of occurrence
Happen once in a while
Occur often, for a long time Tolerated
More data? More questions.
HOWTO?
Find out tomorrow!Awk
Postgres
R
d3
ggplot2
Presentation matters
Take-away?
Take-aways
• Don’t rely on your memory to prioritize
• Your Nagios logs are a treasure trove
• Have a dialog with your data
• Presentation matters