Devconf 17 metrics collection using open-source tools is easy

download Devconf 17 metrics collection using open-source tools is easy

If you can't read please download the document

Transcript of Devconf 17 metrics collection using open-source tools is easy

Metrics collectionusing open-source tools

Yaniv Bronhaim.

Maintainer @ VDSM

Senior Software EngineerRed Hat Israel

Note that I expect the notes to be available to the public as well - so I'd improve them a bit too. (for example, operation -> operations is a typo, but also expand on 'Intro about me')why? I share this google slide only with you and oved - after the session I'll upload it to slideshare. that is my usual flowI'd remove it.The Maintainer @ RHV part.why?If you mention open-source then I'd write Maintainer @ VDSM.

Also, after a strong opening of the session you can add a slide on who you are. After slide 2 I guessThe opening is a mix of the 3 first slides, as part of the first one I present myself - that's the plan at least

Operations FlowsDisk Operations

Failures

Traffic

DataWe will discuss about data, types of data, types of presentation of the data and what we generally do with data.

http://metrics20.org/spec/Data TypesLogsMetrics

Sources - data rootscounters: how many likes\fails\views\sign-ins\accesses in certain time - always raisecount: whats the avg rate in 5 minutes of sign-insgauge: specific value with timestamp - read\write speed over time, new tcp connections over timerates: specific value per secondflows in logs - migrations, restarts.. also can be gauge or rates or count but need to be calculated manually

Example for data processing architecture goalsData analysis

(Billing, auto scaling, alerts)

Correlate between distributed logs and metrics

Scale easily

Historical DWH - data aggregation

Setting alarms

Sources - data rootscounters: how many likes\fails\views\sign-ins\accesses in certain time - always raisecount: whats the avg rate in 5 minutes of sign-insgauge: specific value with timestamp - read\write speed over time, new tcp connections over timerates: specific value per secondflows in logs - migrations, restarts.. also can be gauge or rates or count but need to be calculated manually

ClientShipperStore

Visualization

Data Processing Pipeline

Multiplayers game serviceSimple Case Study #1

Case study #1 - Title?changed

Save historical info about the hardware and set alarms and events in the system based on traffic and usage.

Parse multiply logs constantly and visualize statistics based on logs data.

Simple Case Study #1 - Goals

better analysis and management for manual scaling

want to reward players based on events - taken from the logs

Case Study #1 - CollectD + Graphite

Metrics analysis solution for monitoring and aggregation.

CollectDGraphite(Carbon)

what do you mean by scaling? graphite is not that scalableok. good point. so monitoring and aggregation rules for the data

Case Study #1ELK Stack

Metrics analysis solution for monitoring and scaling

better analysis and management for manual scaling

want to reward players based on events - taken from the logs

Case Study #1 - ELK Stack

2. Log analysis solution for dashboards.

FileBeat LogStash Elasticsearch Kibanabetter analysis and management for manual scaling

want to reward players based on events - taken from the logs

Case Study #1

Case Study #2

Large scale, centralized management for KVM based virtualization

Focus on ease of use/deployment

Case Study #2 - Title?

Historical DWH

polling

Case Study #2 - GoalsCollect basic hardware info and remove such logic from VDSM.

VDSMoVirt-EngineGet oVirt-Engine info same as physical and virtual entities info.

Historical DWH

polling

Case Study #2 - GoalsAllow building monitoring Dashboard based on historical data for last XX years with aggregation configs.

VDSMoVirt-EngineCorrelate between virtual to physical data.

Case Study #2 - Using metrics and logs

Getting data (metrics and logs)

Parse data to store format (json)

CollectDFluentD

metrics and logsWhat's the motivation for overlaying images one on top of the other? I'd split to slides instead.its nice with the animationI didn't like it, but up to you.Case Study #2 - Correlate data and store

Correlate between data sources and store

Scale up abilities

FluentDElasticsearch

the sentence does not make sense...Correlation and store sounds more sense?Correlate between different data sources before storing it.Case Study #2 - centralization and store Building dashboards and monitors - Analyze, visualize, alerts and alarm definitions

GrafanaKibana

Case Study #2 - Output

OUR GOAL IS TO LEAD IN SCALE, MANAGMENT, USER FRIENDLY-OS ALTERNATIVE == PROS AND CONS-RELEASE CYCLE EVERY 6 MONTHS, 3 STABLE BRANCH THAT FULLY SUPPORTED-FEATURE REACH EVERYONE CAN REGUEST-BASE KVM

Case Study #2 - Output

OUR GOAL IS TO LEAD IN SCALE, MANAGMENT, USER FRIENDLY-OS ALTERNATIVE == PROS AND CONS-RELEASE CYCLE EVERY 6 MONTHS, 3 STABLE BRANCH THAT FULLY SUPPORTED-FEATURE REACH EVERYONE CAN REGUEST-BASE KVM

Case Study #2 - Output

OUR GOAL IS TO LEAD IN SCALE, MANAGMENT, USER FRIENDLY-OS ALTERNATIVE == PROS AND CONS-RELEASE CYCLE EVERY 6 MONTHS, 3 STABLE BRANCH THAT FULLY SUPPORTED-FEATURE REACH EVERYONE CAN REGUEST-BASE KVM

prometheus.iohawkular.org

ShippersBottom line: Investigate your architectureStoresVisualizationClients

http://rancher.com/converting-prometheus-template-cattle-kubernetes/prometheus - monitoring alerting system, built at SoundCloud http://snowplowanalytics.com/product/

Some links for more infohttps://bronhaim.wordpress.com/2016/07/24/setup-toturial-for-collecting-metrics-with-statsd-and-grafana-containers/https://www.ovirt.org/oVirt metrics

And many more howto tutorials.https://www.digitalocean.com/community/tutorials/how-to-install-elasticsearch-logstash-and-kibana-elk-stack-on-ubuntu-14-04https://www.digitalocean.com/community/tutorials/how-to-configure-collectd-to-gather-system-metrics-for-graphite-on-ubuntu-14-04https://www.infoq.com/articles/graphite-intro

Or reach me - [email protected]

[email protected]