Metrics stack 2.0

Credit: user niteroi @ panoramio.com

vimeo.com/43800150

1 Metrics 2.0 concepts

2 Implementation

3 Advanced stuff

“Dieter” ?

Peter Deter→

Terminology sync

(1234567890, 82)

(1234567900, 123)

(1234567910, 109)

(1234567920, 77)

db15.mysql.queries_running

host=db15 mysql.queries_running

How many pagerequests/s is vimeo.com doing?

● stats.hits.vimeo_com

● stats_counts.hits.vimeo_com

stats.<host>.requesthostport.vimeo_com_443

stats.timers.dfs5.proxyserver.object.GET.200.timing.upper_90

O(X*Y*Z)X = # apps

Y = # people

Z = # aggregators

How long does it take to retrieve an object from swift?

stats.timers.<host>.proxyserver.<swift_type>.<http_method>.<http_code>.timing.<stat>

stats.timers.<host>.objectserver.<http_method>.timing.<stat>

target=stats.timers.dfs*.object*GET*timing.mean ?

target=groupByNode(stats.timers.dfs*.proxyserver.object.GET.*.timing.mean,2,"avg")

target=stats.timers.dfs*.objectserver.GET.timing.mean

swift_type=object stat=mean timing GET avg by http_code

O((DxV)^2)D = # dimensions

V = # values per dim

collectd.db.disk.sda1.disk_time.write

What should I name my metric?

101001000

100001000001000000

Metrics 2.0

Old:● information lacking

● fields unclear & inconsistent

● cumbersome strings / trees

● forbidden characters

New:● Selfdescribing

● Standardized

● all dimensions in orthogonal tagspace

● Allow some useful characters

stats.timers.dfs5.proxyserver.object.GET.200.timing.upper_90

{ “server”: “dfvimeodfsproxy5”, “http_method”: “GET”, “http_code”: “200”, “unit”: “ms”, “target_type”: “gauge”, “stat”: “upper_90”, “swift_type”: “object” “plugin”: “swift_proxy_server”}

Main advantages:● Immediate understanding of metric meaning (ideally)

● Minimize time to graphs, dashboards, alerting rules

github.com/vimeo/graphexplorer/wiki

SI + IEC

B Err Warn Conn Job File Req ...

MB/s Err/d Req/h ...

“site”: “vimeo.com”,

“port”: 80,

“unit”: “Req/s”,

“direction”: “in”,

“service”: “webapp_php”,

“server”: “webxx”

Carbontagger:

... service=foo.instance=host.target_type=gauge.type=calculation.unit=B 123 1234567890

Statsdaemon:

..unit=B..unit=B... unit=B/s→

..unit=ms..unit=ms.. unit=ms stat=mean→

→ unit=ms stat=upper_90

→ ...

GraphExplorer queries 101

site:api.vimeo.com unit=Req/s

requesthostport api_vimeo_com

Smoothing

avg over 10M

avg over ...

Aggregation, compare port 80 vs 443

avg by <dimension>

sum by <dimension>

sum by server

Compare 80 traffic amongt servers

site:api.vimeo.com unit=Req/s port=80 group by none avg over 10M

GraphExplorer queries 201

proxyserver swift server:regex upper_90 unit=ms from <datetime> to <datetime> avg over <timespec>

Compare object put/get

Stack .. http_method:(PUT|GET) swift_type=object avg by http_code,server

Comparing servers

http_method:(PUT|GET) avg by http_code,swift_type,http_method group by none

Compare http codes for GET, per swift type

http_method=GET avg by server group by swift_type

transcode unit=Job/s avg over <time> from <datetime> to <datetime>

Note: data is obfuscated

Bucketing

Compare job states per region (zones bucket)

group by zone

Unit conversion

unit=Mb/s network dfvimeorpc sum by server

unit=MB

server=dfvimeodfs1

plugin=diskspace

mountpoint=_srv_node_dfs5

unit=B

type=used

target_type=gauge

server:dfvimeodfs unit=GB type=free srv node

unit=GB/d group by mountpoint

Dashboard definition

queries = [

'cpu usage sum by core',

'mem unit=B !total group by type:swap',

'stack network unit=b/s',

'unit=B (free|used) group by =mountpoint'

stats.dfvimeocliapp2.twitter.error

“n1”: “dfvimeocliapp2”,

“n2”: “twitter”,

“n3”: “error”,

“plugin”: “catchall_statsd”,

“source”: “statsd”,

“target_type”: “rate”,

“unit”: “unknown/s”

Two hard things in computer science

stats.gauges.files.

id_boundary_7day

stats.gauges.files.

id_boundary_ceil

unit=File id_boundary_7d

“unit”: “File”,

“n1”: “id_boundary_7d”,

“intrinsic”: {

“site”: “vimeo.com”,

“unit”: “Req/s”

“extrinsic”: {

“agent”: “diamond”,

“processed_by”: “statsd1”,

“src”: “index.php:135”,

“replaces”: “vimeo_com_reqps”

site=vimeo.com unit=Req/s \

processed_by=statsd1 \ src=index.php:135 added_by=dieter \

123 1234567890

Equivalence

servers.host.cpu.total.iowait “core” : “_sum_”→

servers.host.cpu.<corenumber>.iowait

servers.host.loadavg.15

Rollups & aggregation

/etc/carbon/storageaggregation.conf[min]

pattern = \.min$

aggregationMethod = min

pattern = \.max$

aggregationMethod = max

pattern = \.count$

aggregationMethod = sum

[default_average]

pattern = .*

aggregationMethod = average

2 kinds of graphite users

Selfdescribing metrics

stat=upper/lower/mean/...target_type=counter..

● stats.timers.render_time.histogram.bin_0.01● stats.timers.render_time.histogram.bin_0.1● stats.timers.render_time.histogram.bin_1 unit=Freq_abs bin_upper=1→

● stats.timers.render_time.histogram.bin_10● stats.timers.render_time.histogram.bin_50● stats.timers.render_time.histogram.bin_inf● stats.timers.render_time.lower unit=ms stat=lower→

● stats.timers.render_time.mean unit=ms stat=mean→

● stats.timers.render_time.mean_90 ...→

● stats.timers.render_time.median● stats.timers.render_time.std● stats.timers.render_time.upper● stats.timers.render_time.upper_90

Also..

● graphite API functions such as "cumulative", "summarize" and "smartSummarize"

● Graph renderers

From: dygraphs.com

Facet based suggestions

Metric types

● gauge● count & rate● counter● timer

● Multiple values in same interval● “sticky”

Count & Rate

Counter

Timer..

http://janabeck.com/blog/2012/10/12/lessonslearnedfrom100/

Timer..

● What should a metric be?● Stickyness?● Behavior on no packets received● Behavior on multiple packets received

My personal takeaways

Conclusion● Building graphs, setting up alerting cumbersome● Esp. changing information needs (troubleshooting, exploring, ..)● Esp. Complicated information needs

→ PAIN

● Structuring metrics● Selfdescribing metrics● Standardized metrics● Native metrics 2.0

● → BREEZE

Conclusion

● Metrics can be so much more usable and useful. Let's talk about tagging, standardisation, retaining information throughout the pipeline.

● Converting information needs into graph defs, alerting rules● GraphExplorer, carbontagger, statsdaemon, …● Graphiteng (native metrics 2.0)● Metrics 2.0 in your apps, agents, aggregators?● Build out structured metrics library

github.com/vimeo

github.com/Dieterbe

twitter.com/Dieter_be

dieter.plaetinck.be

Metrics stack 2.0

Engineering

Transcript of Metrics stack 2.0

Software Testing Metrics - StickyMinds · Web view1.0 Business need 2 2.0 Software Metrics 2 3.0 Importance of Metrics 2 4.0 Point to remember 3 5.0 Metrics Lifecycle 3 6.0 Type of

CaGrid 2.0 December 2013. What is caGrid 2.0??? Provides a patch for caGrid 1.x to support SHA2 OSGi implementation of WSRF on the new technical stack.

Datadog Enterprise Solutions Engineer. Full Stack ... · Kubernetes State Metrics What are these? kube-state-metrics is a simple service that listens to the Kubernetes API server

FINDINGS ON CUSTOMER SUCCESS BENCHMARKS · Customer Success skews towards more metrics Avg.# of metrics1 Marketing 1.5 Sales 2.9 Professional Services 2.0 Technical Support 2.0 Customer

Open Science METRICS 2.0 and their impact at individual leveldigital.csic.es/bitstream/10261/131489/1/Newindicators.pdf · SCIENCE 2.0 = OPEN SCIENCE Big Data Linked Data Open Data

Rethinking metrics: metrics 2.0 @ Lisa 2014

FY 2018 CIO FISMA Metrics V 2.0 · The Fiscal Year (FY) 2018 Chief Information Officer (CIO) FISMA metrics focus on assessing agencies’ progress toward achieving outcomes that strengthen

MODULE 2 - LESSON 8 PUTTING IT TO PRACTICE: LEANSTACK 3... · SUCCESS METRICS CURRENT METRICS PROBLEM/SOLUTION FIT PRODUCT/MARKET FIT SCALE Lean Dashboard Lean Stack by Spark59.com

Zebra 2.0 and Lagopus: newly- designed routing …...Zebra 2.0 and Lagopus: newly-designed routing stack on high-performance packet forwarder Kunihiro Ishiguro, Yoshihiro Nakajima,

THE SUPPLY CHAIN GUIDELINE...• Procurement Policies and Procedures Version 2.0 – April 2010 • Expected to be legislated; will include common documents 5 . SCG Metrics Metrics

Web 2.0 nyc 2010 metrics

SAFAL SKY PRINTbasement floor plan 6.00 m.w. gate stack stack stack stack stack a stack stack stack stack caa lift 6.ooxaŽao car lift 6,ooxazo stack stack i stack f.r.d service wing

Web 2.0 - Metrics in a Post Page Impression World - eMetrics 2009

Startup Metrics 4 Pirates 2.0 (March 2011, SXSW)

Stangel open stack community activity board and metrics 041513

Field Manual for Rapid Assessment Metrics for Wildlife and ... · Field Manual for Open Pine Rapid Assessment Metrics (2016) 3 2.0 Applying Rapid Assessment Metrics in Southern Open

Web 2.0 sf 2011 metrics

Operational Security Risk Metrics: Definitions, Calculations, Visualizations Metricon 2.0

Cost per hire upgrade solution 2.0 for HR - A Strategic metrics tool for performance

Public Relations 2.0 by Niel Robertson [Metrics Marketing Bootcamp]