Monitoring MySQL with Prometheus and Grafana
-
Upload
julien-pivotto -
Category
Technology
-
view
578 -
download
19
Transcript of Monitoring MySQL with Prometheus and Grafana
Monitoring MySQL withPrometheus & Grafana
Julien Pivotto (@roidelapluie)
Open Source Monitoring Conference
November 22nd, 2017
SELECT USER();Julien "roidelapluie" Pivotto
@roidelapluie
Sysadmin at inuits
Automation, monitoring, HA
MySQL/MariaDB user/admin/contributor
Grafana and Prometheus user/contributor
inuits
Monitoring /Observability
Monitoring or Observability?Collecting lots of data
Not only for alerting, also for understanding
You don't know what you will need
Who decides what to monitor?Pre-Define checks
Check only what os needed
OR
Let the system you monitor decide what toexpose
Alert based on that data
Push or pull ?Push: need to know and configure where yourserver is
What if 2 3 10 servers?
What in dev?
Pull: Let any server take the metricsUse service discovery to know what tofetch
High availabilityMonitoring is very important
Metamonitoring
What is the price to get a highly availablemonitoring system?
Prometheushttps://prometheus.io/
PrometheusPrometheus is a Cloud-Native Data-Centric Open-Source Performant Simple metrics collection,analysis and alerting tool.
Nothing more.
Cloud NativeVery easy to configure, deploy, maintain
Designed in multiple services
Container ready
Orchestration ready (dynamic config)
Open SourceApache 2.0
CNCF
Go
Support for multiple OS
Many "exporters":https://github.com/prometheus/prometheus/wiki/Defau
lt-port-allocations
PerformancePrometheus is designed to fetch data in aninterval measured in SECONDS
Designed to handle lots of metrics
New storage engine in 2.0 to scale even betterand support usecases like kubernetes
Data CentricA Metric in Prometheus has metadata:
myql_global_status_handlers_total{handler="tmp_write"} 1122
And lots of function to filter, change, remove...those metadata while fetching them.
Metrics typeCounters (always go up)
Gauge (go up and down)
Histograms (aggregate by buckets)
Summary (percentiles) (most of the timeuseless)
A word aboutPrometheus vs Graphite
Prometheus does not see a metric as an "event".Metrics are current value until they are replaced.You can not see when a metric has been includedin Prometheus.For Events, Prometheus refers to Elasticsearch.
How?Creative Commons Attribution 2.0 https://www.flickr.com/photos/75001512@N00/6824648824
Warning : this is complicated.
Prometheus uses HTTP andPLAIN TEXT.
http(s) is supported
basic auth / token also
(exposing https is not Prometheus job)
$ curl http://127.0.0.1:9090/metrics# HELP prometheus_notifications_queue_length# The number of alert notifications in the queue.# TYPE prometheus_notifications_queue_length gaugeprometheus_notifications_queue_length 0# HELP prometheus_notifications_sent_total# Total number of alerts successfully sent.# TYPE prometheus_notifications_sent_total counterprometheus_notifications_sent_total{alertmanager="127.0.0.1:9093"} 4.796464e+06
$ curl http://127.0.0.1:9090/metricsprometheus_notifications_queue_length 0prometheus_notifications_sent_total{alertmanager="127.0.0.1:9093"} 4.796464e+06
Anything that can embed a httpserver can serve prometheus
metrics.
Native Prometheus integrationsKubernetes
Ceph
etcd
telegraph
Grafana
mgmt
...
What when you can't?Linux (not counting TUX web server)
MySQL
Third party software
Third party services
ExportersCreative Commons Attribution 2.0 https://www.flickr.com/photos/fuzzy/563198182
ExportersExporters expose metrics with an HTTP API
They connect to the real target
Bindings available for many languages
Exporters do not save data ; they are not"proxies" and don't "cache" anything
Official exportersConsul, Memcached, MySQL, Node/System,HAProxy, AWS Cloudwatch, Collectd, Graphite,Statsd, JMX, influxdb, SNMP, Blackbox
Node exporterLinux Metrics
Including: CPU, Memory, Disk space,networking, load, time...
TextfileAs part of the node_exporter, you can writemetrics in file
Scripts output, etc ...
Blackbox exporterHTTP, DNS, TCP sockets, ICMP...
http://127.0.0.1:9115/probe?target=https://inuits.eu&module=http_2xxprobe_ssl_earliest_cert_expiry 1.52697083e+09
MySQLCreative Commons Attribution 2.0 https://www.flickr.com/photos/lurkerm/262541595
FrequencySome queries are expensive
You can decide to fetch some data every 1m,others every 10s...
Wait .. Exporters don't cache??How do I fetch some data every
10s and others every 1m?
scrape_configs: job_name: 'mysql global status' scrape_interval: 10s static_configs: targets: '172.31.14.3:9104' params: collect[]: global_status
job_name: 'mysql performance' scrape_interval: 1m static_configs: targets: '172.31.14.3:9104' params: collect[]: perf_schema.tableiowaits perf_schema.indexiowaits perf_schema.tablelocks
MySQL ReplicationMySQL Master <-> MySQL Master
MySQL Master -> MySQL Slave
MySQL Master -> MySQL Slave -> MySQLSlave
MySQL Masters -> MySQL Slaves -> MySQLSlaves -> MySQL Slaves
MySQL Master -> MySQL Slaves
pt-heartbeatpt-heartbeart is a daemon that updates an entrywith current timestamp on a mysql server everysecond.
On the replica, you can check the timestamp anddo NOW timestamp to get the real lag.
+++| ts | server_id |+++| 20170817T16:55:01.001030 | 1 |+++
pt-heartbeatGPL
Perl
Part of percona toolkit
wait, mysql has that nativelymysql> SHOW SLAVE STATUS\G...Seconds_Behind_Master: 0...
aka mysqld_exporter metric:
mysql_slave_lag_seconds
BugsFixes for Seconds_Behind_Master in: 5.7.18,5.6.36, 5.6.23, 5.6.16.
How it worksChecks the heartbeat table (SQL query). It's notcalling the ptheartbeat cli. So it is independantfrom it.
CLI flagscollect.heartbeat
collect.heartbeat.database
collect.heartbeat.table
Metricsmysql_heartbeat_stored_timestamp_seconds{server_id="1"}mysql_heartbeat_now_timestamp_seconds{server_id="1"}
Back to PrometheusCreative Commons Attribution 2.0 https://www.flickr.com/photos/andrepierre/14843110667
Exploring Metrics
Exploring Metrics
Exploring Metrics
Exploring Metrics
PromQLmysql_global_status_commands_total
PromQLmysql_global_status_commands_total{command="select"}
PromQLmysql_global_status_commands_total
{command=~"select|set_options"}
PromQLmysql_global_status_commands_total{command=~"select|set
_options"}
PromQLderiv(mysql_global_status_connections[5m])
PromQLmysql_up == 0
PromQLsum( avg_over_time(
mysql_info_schema_threads[10m])) by
(instance) sum(
avg_over_time(mysql_info_schema_thread
s[10m] offset 1d)) by (instance)
PromQL{__name__=~".+innodb.+cache.*"}
predict_linear(mysql_heartbeat_lag_seconds[5m], 60*2)
sum(rate(mysql_global_status_commands_total{command=~"
(commit|rollback)"}[5m]))
Recording and alerting rules
RulesPrometheus can record extra metrics
Make expensive calculations
Alert if value is present
groups: name: mysql rules: record: mysql_heartbeat_lag_second expr: | mysql_heartbeat_now_timestamp_seconds mysql_heartbeat_stored_timestamp_seconds alert: MysqlReplicationNotRunning expr: | mysql_slave_status_slave_io_running == 0 OR mysql_slave_status_slave_sql_running == 0 for: 2m annotations: summary: "Slave replication is not running" alert: MySQLReplicationLag expr: | (mysql_slave_lag_seconds > 30) AND on (instance) (predict_linear( mysql_slave_lag_seconds[5m], 60*2) > 0) for: 5m labels: priority: immediate
AlertmanagerWhen prometheus has an alert, it sends it.
Every minute by default, as long as the alert isongoing
Alertmanager, a separated daemon, do therest of the work
What is alertmanager doing?Grouping
Inhibition
Silence
Notify
grouping alerts5 nodes are down. Do you want 5 email?
group_by: ['alertname', 'cluster', 'service']
inhibitionDatacenter is on fire. Do you want to know thatswitches, hosts, services are down?
source_match: severity: 'critical' target_match: severity: 'warning'
SilenceI upgrade kernel and reboot the service. Do I wantmails during that time?
NotificationsAlertmanager can notify you with:
Mails
Webhooks
3rd party services
SMS (sachet: 3rd party integration usingwebhook)
Alerting routesYou can send alerts to defined people basedon routes
Everything to logs mailboxCritical alerts
Network to net team SMS
Svc to app team SMS
Warning alertsNetwork to net team mail
Svc to app team mail
High Availability: PrometheusHave different prometheis servers with thesame config
They do not talk to each other
All of them fetch the same data
They monitor each other
High availability: AlertmanagerHave multiple alertmanagers with the sameconfig
Prometheis send alerts to all alertmanagers
Alert manager talk to each other not to sendthe same notification
One tool does one job...Prometheus will collect data
Alertmanager will send notifications
Exporters will expose data
Grafana will graph data
GrafanaOpen Source (Apache 2.0)
Web app
Specialized in visualization
Pluggable
Multiple datasources: prometheus, graphite,influxdb...
Has an API!
History of GrafanaGrafana is a fork of Kibana 3 ; used to be JS-Driven.
Now fully featured, requires a database, multi-projects/users support, etc...
Grafana and PrometheusPrometheus shipped its own consoles
Now it recommends Grafana and deprecatedits own consoles
Grafana Dashboards
Grafana Dashboards
Time Picker
Configure Prometheus inGrafana
Configure Prometheus inGrafana
Prometheus Dashboard
Multiple Prometheus instancesYou can add multiple prometheus instanceson grafana
You can add dropdown on the top to selectwhich one you want to use
Use case: prometheus HA, local prometheus(with access mode=direct)
Creating Grafana DashboardsTakes time
Requires deep knowledge of the tools
Improved over time
Easy to share (json + online library)
Percona Grafana DashboardPercona Open Sourced Grafana Dashboards
Covering MySQL, Mongo and Linux monitoring
Part of a bigger picture, PMM, but usablestandalone
Open Source (AGPL!)
https://github.com/percona/grafana-dashboards
Installing Percona Graphes
Method 1
Enable File dashboards in Grafana
Clone grafana-dashboards to the configuredlocation (or make a package)
Method 2
Use the Grafana API to upload the JSON's.
MySQL SetupYou'll need mysqld_exporter, with a user
MySQL 5.1+
Performance Schema for full set of metrics
mysqld_exporter-collect.binlog_size=true-collect.info_schema.processlist=true`
node_exporter setupnode_exporter-collectors.enabled="diskstats,filefd,filesystem,loadavg,meminfo,netdev,stat,time,uname,vmstat"
Prometheus (static file)scrape_configs: job_name: prometheus static_configs: targets: ['localhost:9090'] labels: instance: prometheus
job_name: linux static_configs: targets: ['10.0.98.43:9100'] labels: instance: db1
job_name: mysql static_configs: targets: ['10.0.98.43:9104'] labels: instance: db1
Dashboards
Dashboards
Dashboards
We don't need all of them?Because Grafana is just viz, you can importonly the one you want (e.g. exclude Mongo)
You can import later any extra dashboard youneed
MySQL Overview
MySQL Overview
MySQL Overview
InnoDB
InnoDB
InnoDB
Replication
Other grafana featuresMulti tenant
Deployments
Folders for dashboards (comming soon)
Jsonnet library to create dashboard
Brand new, still WIP
https://github.com/grafana/grafonnet-lib
ConclusionsPrometheus and Grafana are first-classmonitoring tools
Totally different approach than other tools
Embeddable into your apps
Percona Dashboards gets your graphes readyin no-time with minimal efforts