Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data...
Transcript of Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data...
![Page 1: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/1.jpg)
Analyze Prometheus Metrics Like a Data ScientistGeorg Öttl
Promcon 2017, Munich
![Page 2: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/2.jpg)
● Enterprise Software Dev.
● Data Science Services
● Dev / DevOps / Ops
● Developer who likes Math
Twitter: @goettl
About me / experiences
![Page 3: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/3.jpg)
Objective talk
Pushing the limits of prometheus: can I have a more reliablealerts model with insights from datasience?
● Journey on how to improve alerts / dashboards with insights from datasience
● Integration points to open source datasience tools
● Bring light into the dark (like prometheus did)
![Page 4: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/4.jpg)
... should I?
Don't use deep learning and datasience when a straight-forward 15 minute rule-based system does well.
Datascience can help you to detect patterns and facts in yourmetrics you can't see.
![Page 5: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/5.jpg)
What is already available. When do I start?● Great architecture to get high quality data
● Numerical data● Apply mathematical functions on it
● Easy and fast navigable (promql)
● Alert / rule model
● Chart / histogram vis with Grafana
![Page 6: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/6.jpg)
Next step: get data out of prometheus... to be used in Open Source datascience tools
![Page 7: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/7.jpg)
What data to export?● Raw metrics data, no functions applied on it
● As much as possible● Without putting too much load on prometheus / running into a timeout
![Page 8: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/8.jpg)
Two ways to get data out of prometheus● HTTP API (Poll)
● Exploratory data analysis
● REMOTE API (Push)● Streaming analysis
![Page 9: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/9.jpg)
HTTP API - /api/v1/query_rangerequests.get( url = 'http://127.0.0.1:9090/api/v1/query_range', params = { 'query': 'sum({__name__=~".+"}) by (__name__,instance)', 'start': '1502809554', 'end' : '1502839554', 'step' : '1m' })
{"data": {..., "resultType": "matrix","result": [{ "metric": {"method": "GET",...}, "values": [[1500008340,"3"], ... ]},...]}}
![Page 10: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/10.jpg)
Target format for datascience tools (tabular, csv)X
id time value req_dur ...A 1 1 4 ...
A 2 2 5 ...
B 1 2 3 ...
B 2 3 2 ...
y
id time valueA 1 1
A 2 1
B 1 0
B 2 0
... ... ...
![Page 11: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/11.jpg)
Easyiest way to export● Grafana
● Python (robustperception blog entry)
![Page 12: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/12.jpg)
Reduce data: use domain knowledge to select relevant datasubset
{__name__=~".+"}
![Page 13: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/13.jpg)
Tip: Use alerts as initial set of training labels
y = ALERTS{name="high_latency"}
tidy up, verify true positives, annotate manually, ...
![Page 14: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/14.jpg)
Normalize prometheus datatypes● Gauges, histograms are ok
● Counters have to be processed● No repetition in counters. No statistical value in that.● Use e.g derivative function to convert a counter to a gauge equivalent
![Page 15: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/15.jpg)
ExamplesApplied datasience on prometheus metrics
![Page 16: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/16.jpg)
Example 1
I can predict the latency of http requests
● Can I use the prometheus function predict_linear?
● Are there other predictions possible?
↡↡ R Notebook predict_linear↡↡
![Page 17: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/17.jpg)
Example 2
There are a better suited metrics to predict http5x failures thanthe one I use
![Page 18: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/18.jpg)
Choose method
![Page 19: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/19.jpg)
Get metrics into the right format for method● Training data with labels needed (X,y)
● Seasonally adjust
![Page 20: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/20.jpg)
Apply feature selection algorithmfrom sklearn.feature_selection import RFEfrom sklearn.ensemble import RandomForestRegressor...# perform feature selectionrfe = RFE( RandomForestRegressor( n_estimators=500, random_state=1, min_samples_split=5 ), 1)fit = rfe.fit(X, y)...
Selected Feature: POST
![Page 21: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/21.jpg)
Feedback cycle
Rewrite your alerts and dashboards to use label POST to betterpredict http 5x errors
![Page 22: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/22.jpg)
Example 3 - metrics / feature selection with library tsfresh● Metrics selection / ranking similar to example 1
● Metrics extension by applying functions to metrics
https://github.com/blue-yonder/tsfresh
![Page 23: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/23.jpg)
Prometheus datascience mantra
● Create hypothesis about your system and metrics
● Get metrics (devops) and convert them into the right format
● Use statistical methods to verify hypothesis
● Feedback results to system, the dashboards and alerts
![Page 24: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/24.jpg)
Lessons learned● Alert model improves with insights from descriptive statistics and ML!
● Depending on the result, correct, discard or handle data differently
● Day to day usecase: e.g. reduced try and error config on predict_linear function
● No need to process metrics streaming with ML/AI yet
![Page 25: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s](https://reader033.fdocuments.in/reader033/viewer/2022042809/5f8fa6f4807d5d0deb7ca6d0/html5/thumbnails/25.jpg)
Thx for having me here at promcon.io 2017! Questions?
Georg Öttl Twitter Handle: @goettl