Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters
-
Upload
nagios -
Category
Technology
-
view
754 -
download
0
description
Transcript of Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters
![Page 1: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/1.jpg)
Why
dynamic & adaptivethresholdsmatters
![Page 4: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/4.jpg)
Threshold
![Page 5: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/5.jpg)
What is the limitation with static threshold?
✗ Not static✗ Load varies throughout the day, week✗ To many or to few alarms✗ Collecting and thresholding in the same context
✗ Based on the current measurement✗ Do not consider dependency to other services
![Page 6: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/6.jpg)
How to make thresholdsdynamic & adaptive?
![Page 7: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/7.jpg)
“Database table size should not be bigger then 5 % of yesterdays max size “
{example 1}
![Page 8: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/8.jpg)
“Database table size should not be bigger then 5 % of max size yesterday“
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
200000
Table size
Yesterday
table size < max(yesterday)*1.05
Today
{example 1}
![Page 9: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/9.jpg)
“Number of online users should not be more then 10 % higher then the average number of online users for the last 10 data points”
{example 2}
![Page 10: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/10.jpg)
users < avg(X0+X1+.....+X9)*1.1
Where X is the historical on-line users data points
{example 2}
“Number of online users should not be more then 10 % higher then the average number of online users for the last 10 data points”
![Page 11: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/11.jpg)
{example 3}
“The number of orders with errors should be lower then 5% of the total number of registered orders”
![Page 12: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/12.jpg)
“The number of orders with errors should be lower then 5% of the total number of registered orders”
{example 3}
Total #orders
#orders with errors < 0.05 * (Total #orders)
![Page 13: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/13.jpg)
“Message queue size should be above the defined Friday threshold profile”
{example 4}
count
time of the day
![Page 14: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/14.jpg)
“Message queue size should be above the defined Friday threshold profile”
{example 4}
count
time of the day
count
time of the day
![Page 15: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/15.jpg)
How to make thresholdsdynamic & adaptive?
✗ Time profiles✗ Historical data points✗ Math and statistical operations
![Page 16: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/16.jpg)
We did not want a check_XYZ hack
We wanted a tool
![Page 17: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/17.jpg)
Collecting
![Page 18: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/18.jpg)
Thresholding
Collecting
Separation
![Page 19: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/19.jpg)
Historical data
Collecting
![Page 20: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/20.jpg)
Historical data
Logic
Collecting
![Page 21: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/21.jpg)
Historical data
Logic
Collecting
Day profile
![Page 22: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/22.jpg)
Historical data
Calender Logic
Collecting
Day profile
![Page 23: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/23.jpg)
Historical data
Calender Logic
Collecting
Day profile
Nagios
![Page 24: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/24.jpg)
Historical data
Calender Logic
Collecting
Day profile
Scheduling
ServerInterface
NagiosOpenTSDB XYZ
![Page 25: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/25.jpg)
![Page 26: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/26.jpg)
![Page 27: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/27.jpg)
bischeck basics
● Configuration like Nagios – host, service but also service item● Host is just a container of the rest● Service specify the connection and scheduling● Service item specify the “query” and the threshold
class to use
● Host and service name must be the same as in the Nagios configuration
![Page 28: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/28.jpg)
Threshold – 24 hour day profile
● Divide the day in 24 hour points, where every point can be:● Static value● Dynamic value
– Math expression on single value or range of data from the cache
– Based on cached data points retrieved by● Index – single value or index range● Time – single value (closest) or time range (between)
![Page 29: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/29.jpg)
....
<! 12:00 Static >
<hour>7000</hour>
....
![Page 30: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/30.jpg)
....
<! 12:00 Static >
<hour>7000</hour>
<! 13:00 Adaptive >
<hour>erpserverordersediOrders[0] / 3</hour>
....
![Page 31: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/31.jpg)
....
<! 12:00 Static >
<hour>7000</hour>
<! 13:00 Adaptive >
<hour>erpserverordersediOrders[0] / 3</hour>
<! 14:00 Adaptive with math function >
<hour>avg(erpserverordersediOrders[30M:60M]) / 2</hour>
....
![Page 32: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/32.jpg)
Threshold – 24 hour day profile
Between every “full” hour a linear equation is calculated
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0
10000
20000
30000
40000
50000
60000
Day profile
Hour
![Page 33: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/33.jpg)
Threshold – 24 hour day profile
● Connect calender to the day profile and evaluate according to the following order:
1. Month and day of month
2.Week and day of week
3.Day in month
4.Day in the week
5.Month
6.Week
● Holiday – exception days
![Page 34: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/34.jpg)
And more....● Multithreaded and multischeduling schema per service
● interval● cron
● Data collection – jdbc, livestatus, internal cache● Virtual services● Date macros in execution statements ● Customize
● connection (service classes)● execution (service item classes)● thresholds (threshold classes) ● server integration (server classes)
● XML configuration supported with WEBui (beta)● GPL 2 license
![Page 35: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/35.jpg)
Future● Improved time series database● Patterns/baselines● More statistic functions● “Sensors” alarms on multiple/aggregated data points● Any ideas?
![Page 36: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/36.jpg)
Infrastructure monitoring Application performance monitoring [APM]
Business activity monitoring [BAM]
Operational Business intelligence [OBI]
![Page 37: Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds matters](https://reader035.fdocuments.in/reader035/viewer/2022081401/557819e7d8b42ab40c8b4c59/html5/thumbnails/37.jpg)
Questions & Feedback
Pictures – Creative Commonswww.flickr.com/photos/loneprimate/4017405677www.flickr.com/photos/catatronic/2397319483www.flickr.com/photos/dtrimarchi/6815004766www.flickr.com/photos/bikeracer/6740232