OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

28
From Ceilometer to Telemetry Not so alarming! A Julien Danjou & Nick Barcet presentation for OpenStack in action! 4 on the 5th December 2013

description

Paris, 5th December 2013 : OpenStack in Action 4! organized by eNovance, brings together members of the OpenStack community.

Transcript of OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

Page 1: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

From Ceilometer to Telemetry Not so alarming! A Julien Danjou & Nick Barcet presentation

for

OpenStack in action! 4 on the 5th December 2013

Page 2: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

Speakers

Nick Barcet VP Products @ eNovance Co-founded the Ceilometer project at the Folsom summit and led the project through incubation

Julien Danjou Ceilometer Lead Dev @ eNovance

Has been a core Ceilometer contributor from the outset, taking over the PTL reins for Havana

Page 3: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

State of the project •  Officially named OpenStack Telemetry •  Havana is the first integrated release •  Community growth

o  Grizzly: 30 contributors, 267 commits o  Havana: 57 contributors, 434 commits

Page 4: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

What was done during the Havana cycle?

Page 5: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

UDP transport •  Faster, stateless •  Lighter (msgpack encoding)

but…

•  No delivery guaranteed •  Not signed

▶ Use case: gathering metrics for alarms

Page 6: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

Improved API •  Group samples by fields when requesting

statistics (?groupby[]=user_id) •  Limit the number of items returned (?limit=42) •  Provides links to other resources in the API

Page 7: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

Send your own samples Users or operators can send samples

➔ Leverage the

statistics ➔ Usable for alarming

POST  /v2/meters/mymeter    [{      "counter_type":  "gauge",      "counter_unit":  "megabyte",      "counter_volume":  142.0,      "user_id":  "efd87807-­‐12d2-­‐4b38-­‐9c70-­‐5f5c2ac427ff",      "project_id":  "35b17138-­‐b364-­‐4e6a-­‐a131-­‐8f3099c5be68",      "resource_id":  "bd9431c1-­‐8d69-­‐4ad3-­‐803a-­‐8d4a6b89fd36",      "resource_metadata":  {              "name1":  "value1",              "name2":  "value2"      },      "source":  "mypaasplatform",      "timestamp":  "2013-­‐09-­‐10T20:34:13.711330"  }]  

Page 8: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

New storage backends

Page 9: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

Database TTL Previously:

No way to purge data. Ceilometer produces a lot of data

(gigabytes per day)

Now: ceilometer-expirer will drop data older than the configured time-to-live delay

Page 10: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

Hyper-V

➔  Disk, network and CPU usage

Page 11: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

New meters •  API endpoints

o  Meters the requests made to API server (Neutron, Glance, Nova, Swift, etc)

•  Neutron bandwidth o  Meter the bandwidth consumed by each project o  Traffic labeled as configured by operator

(based on source/destination)

Page 12: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

Neutron Traffic Labels

Internet

label: Ext label: Object label: Compute

Swift Swift Swift VM VM VM

Page 13: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

Alarms

Regularly watch for meters statistics values and triggers actions based on threshold crossings.

Page 14: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

Alarms architecture

Ceilometer API

Ceilometer alarm evaluator

Ceilometer alarm notifier

H T T P

RPC Bu s

Trigger Trigger Ceilometer

alarm notifier Ceilometer alarm notifier

Webhook, SMS, e-mail…

Page 15: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

Alarm types •  Threshold alarmsTriggered once a value crosses a

threshold“Call a Webhook as soon as CPU usage goes above 80%”

•  Combination alarmsTriggered once all alarms in that alarm are triggered“Call a Webhook as soon as alarm “foo” and alarm “bar” are triggered”

Page 16: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

Alarms API POST /v2/alarms { "alarm_actions": [ "http://site:8000/alarm"], "insufficient_data_actions": ["http://site:8000/nodata"], "ok_actions": ["http://site:8000/ok"], "comparison_operator": "gt", "description": "An alarm", "evaluation_periods": 2, "matching_metadata": {"key_name": "key_value"}, "meter_name": "storage.objects", "name": "SwiftObjectAlarm", "period": 240, "statistic": "avg", "threshold": 200.0 }

GET /v2/alarms/foobar PUT /v2/alarms/foobar DELETE /v2/alarms/foobar

Page 17: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

Heat & auto-scaling

Heat Engine

injects user metadata

Instance

my_stack

API service

Compute Agent

creates alarms

Alarm evaluator

monitors instances

triggers alarm

Ceilom

eter

Page 18: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

Heat & auto-scaling

Heat Engine

injects user metadata

my_stack

API

Compute

Alarms

alarming

scales out stack

Instance Instance Instance

Ceilom

eter

Page 19: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

Heat & auto-scaling

Heat Engine

injects user metadata

my_stack

API

Compute

Alarms

alarming

scales out stack

Instance Instance Instance Instance Instance

Ceilom

eter

Page 20: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

Events storage (Almost) all OpenStack components send notifications on events: let’s store them.

➔  Useful to be able to re-generate samples ➔  Useful to generate new sample we did not think about ➔  Allow to have a double-entry accounting ➔  Audit ability Not yet complete, to be continued in Icehouse

Page 21: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

Exciting ideas for Icehouse we’re going to

hack on.

Page 22: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

General improvements •  Split the collector in two logical pieces •  Rely on notification for samples rather than

RPC •  Bring SQLAlchemy and MongoDB driver

almost on parity •  Support for hardware polling •  Support Ironic

Page 23: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

API improvements •  Complex filtering and query DSLx  OR  y  AND  z  

•  /v2/samples(a.k.a. /v2/meter without the meter)

•  Return rate rather than absolute value •  More statistics functions (rate of change,

moving-window averages…) •  Bulk requests

Page 24: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

Alarming •  Exclude low sample counts •  Allow time constrained alarms

Page 25: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

Distributed polling Leveraging Tooz and Taskflow to distribute tasks among workers (agents).

★ Ability to distribute the polling ★ Replace alarm evaluator custom distributor

Page 26: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

The end.

OpenStack Telemetry

#openstack-ceilometer @ Freenode

Ceilometer

Page 27: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

Backup slides

Page 28: OpenStack in Action 4! Nick Barcet & Julien Danjou - From ceilometer to telemetry not so alarming!

Heat & auto-scaling

Heat Engine

my_stack

Instance

API service

Compute Agent

Alarm evaluator reports

samples

provides alarm rules

queries stats Meter store

Ceilom

eter