OpenStack Log Mining

33
Accelerating adoption of Open Infrastructure May 2014 Log Management and Mining

description

Presentation from the OpenStack Summit 2014 in Atlanta.

Transcript of OpenStack Log Mining

Page 1: OpenStack Log Mining

Accelerating adoption of Open Infrastructure

May 2014

Log Management and Mining

Page 2: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Logging has a Long History…

photo credit: <a href="http://www.flickr.com/photos/foresthistory/3662397221/">The Forest History Society</a> via <a href="http://photopin.com">photopin</a> <a href="http://creativecommons.org/licenses/by-nc/2.0/">cc</a>

Page 3: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

In Multiple Domains

Page 4: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Like Many Things, It Has Evolved…

photo credit: <a href="http://www.flickr.com/photos/rahimageworks/9196119199/">Richard Hurd</a> via <a href="http://photopin.com">photopin</a> <a href="http://creativecommons.org/licenses/by/2.0/">cc</a>

photo credit: <a href="http://www.flickr.com/photos/rahimageworks/9179873919/">Richard Hurd</a> via <a href="http://photopin.com">photopin</a> <a href="http://creativecommons.org/licenses/by/2.0/">cc</a>

Page 5: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Here Too…

Page 6: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Complexity Reigns in Cloud

Page 7: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

BEEF

Nova

Cinder

Etc.

rsyslog logstash elasticsearch tcp:5514 tcp:9200

verbose = True use_syslog = True syslog_log_facility=LOG_LOCAL{n}

local{n}.* @@logstash:5514

Page 8: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Standards are Elusive §  We have a couple standards that might apply:

–  RFC5424 (The Syslog Protocol) –  NCSA/Apache CLF (Web servers)

§  Project adoption varies, but right trajectory §  Some duplication of fields with rsyslog

–  When shipping remotely §  Don’t get me started on timestamps!

Page 9: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Anatomy of an OpenStack Message §  Most projects use a similar format

–  Date: 2014-05-02 14:10:57.278 –  PID: 3609 –  Level: INFO –  Prog: oslo.messaging._drivers.impl_qpid –  ID: [-] –  Msg: Connected to AMQP …

Page 10: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

use_syslog = True §  Existing syslog format is DEPRECATED during I,

and then will be changed in J to honor RFC5424 –  <132> –  May 15 12:28:57 –  compute-01 –  2014-05-15 12:28:57.767 –  20739 WARNING nova.openstack.common.loopingcall –  [-] –  task run outlasted interval by 110.003069 sec

Note1: standard ryslog config on CentOS 6.5 with remote shipping to central server

Page 11: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

use_syslog_rfc_format = True §  Adds APP-NAME before message §  Nice idea, but… §  Appears incompatible with use_syslog = True

–  Nova-compute fails to launch when both set §  With use_syslog = False

–  Messages in /var/log/nova/compute.log look the same §  Could be environmental, needs more exploration

Page 12: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Shipping via rsyslog §  rsyslog.conf global settings change:

–  $ActionFileDefaultTemplate RSYSLOG_FileFormat –  $ActionForwardDefaultTemplate RSYSLOG_ForwardFormat

§  Effect: –  <134> –  2014-05-15T13:37:11.138121+00:00 –  controller-01 –  2014-05-15 13:37:11.137 3412 INFO

nova.openstack.common.service [-] Caught SIGTERM, stopping children

Page 13: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Shipping via rsyslog (conf.d) §  rsyslog.d/10-goldstone.conf file: $WorkDirectory /var/lib/rsyslog # where to place spool files $ActionQueueFileName fwdGoldstone # unique name prefix for spool files $ActionQueueMaxDiskSpace 1g # 1gb space limit (use as much as possible) $ActionQueueSaveOnShutdown on # save messages to disk on shutdown $ActionQueueType LinkedList # run asynchronously $ActionResumeRetryCount -1 # infinite retries if host is down local0.* @@10.10.11.122:5514 # nova local1.* @@10.10.11.122:5514 # glance local2.* @@10.10.11.122:5514 # neutron local3.* @@10.10.11.122:5514 # ceilometer local4.* @@10.10.11.122:5514 # swift local5.* @@10.10.11.122:5514 # cinder local6.* @@10.10.11.122:5514 # keystone

Page 14: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Receiving via Logstash (Input) input { tcp { port => 5514 # matches port that rsyslog ships to type => syslog # insert a type field to identify this as an incoming message from syslog } }

Page 15: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Receiving via Logstash (Output) output { elasticsearch { host => localhost port => 9200 protocol => http } }

Page 16: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Receiving via Logstash (Patterns) OPENSTACK_PROG (?:[ a-zA-Z0-9_\-]+\.)+[ A-Za-z0-9_\-$]+ OPENSTACK_PROG_SINGLE [A-Za-z0-9_\-$]+ OPENSTACK_SOURCE %{OPENSTACK_PROG}|%{OPENSTACK_PROG_SINGLE} OPENSTACK_REQ_LIST (\[(?:(req-%{UUID}|%{UUID}|%{BASE16NUM}|None|-|%{SPACE}))+\])? OPENSTACK_PID ( %{POSINT:pid:int})? OPENSTACK_LOGLEVEL ([D|d]ebug|DEBUG|[N|n]otice|NOTICE|[I|i]nfo|INFO|[W|w]arn?(?:ing)?|WARN?(?:ING)?|[E|e]rr?(?:or)?|ERR?(?:OR)?|[C|c]rit?(?:ical)?|CRIT?(?:ICAL)?|[F|f]atal|FATAL|[S|s]evere|SEVERE|[A|a]udit|AUDIT) OPENSTACK_NORMAL %{TIMESTAMP_ISO8601:timestamp}%{OPENSTACK_PID} %{OPENSTACK_LOGLEVEL:loglevel} %{OPENSTACK_SOURCE:program} {OPENSTACK_REQ_LIST:request_id_list} %{GREEDYDATA:msg} RAW_TRACE (?:^[^0-9].*$|^$) OPENSTACK_TRACE %{TIMESTAMP_ISO8601:timestamp} %{POSINT:pid:int} ([T|t]race|TRACE) %{OPENSTACK_SOURCE:program} %{GREEDYDATA:msg}|%{RAW_TRACE:msg} OPENSTACK_MESSAGE %{OPENSTACK_NORMAL}|%{OPENSTACK_TRACE} OPENSTACK_SYSLOGLINE %{SYSLOG5424PRINUM}%{CISCOTIMESTAMP:syslog_ts} %{HOSTNAME:syslog5424_host} %{OPENSTACK_MESSAGE:os_message}

Page 17: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Receiving via Logstash (Filter Fun) filter { if ([type] == "syslog”) { grok { patterns_dir => "/opt/logstash/patterns" match => { "message" => "%{OPENSTACK_SYSLOGLINE}" } add_field => { "received_at" => "%{@timestamp}" } add_field => { "_message" => "%{syslog5424_host} %{message}" } } if ("_grokparsefailure" not in [tags]) { … see following slides … } } }

Page 18: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Receiving via Logstash (Filter Fun) syslog_pri { severity_labels => ["EMERGENCY", "ALERT", "CRITICAL", "ERROR", "WARNING", "NOTICE", "INFO", "DEBUG"] syslog_pri_field_name => "syslog5424_pri" } date { match => [ "timestamp", "yyyy-MM-dd HH:mm:ss.SSS" ] remove_field => "timestamp" timezone => "Etc/UTC" } … NOTE1: syslog_pri parses up that ugly number at the front of the incoming message (i.e. <132>) NOTE2: This date processing is based on the timestamp in the OpenStack generated message, not the rsyslog message. With enhanced rsyslog template, or better OpenStack message format, we can avoid inferring timezone.

Page 19: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Receiving via Logstash (Filter Fun) translate { field => "syslog_facility" dictionary => [ "local0", "nova", "local1", "glance", "local2", "neutron", "local3", "ceilometer", "local4", "swift", "local5", "cinder", "local6", "keystone" ] fallback => "unknown" destination => "component" } … NOTE1: syslog_facility generated by syslog_pri earlier. Adds a new component field so we can figure out who generated these messages.

Page 20: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Receiving via Logstash (Filter Fun) mutate { rename => [ "msg", "message" ] rename => [ "syslog5424_host", "host" ] remove_field => "syslog_ts" remove_field => "syslog5424_pri" remove_field => "os_message" add_tag => ["processed", "openstack_syslog", "filter_34"] } Note1: We made it to the end of the filter successfully, so let’s clean up a little and add some tags to indicate how we navigated the filter space.

Page 21: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Result in ES:

photo credit: <a href="http://www.flickr.com/photos/robbie73/4346732208/">Robbert van der Steeg</a> via <a href="http://photopin.com">photopin</a> <a href="http://creativecommons.org/licenses/by-sa/2.0/">cc</a>

Page 22: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Interpreting Specific Messages (Patterns) NOVA_API_CALL %{IP:ip} "(?:GET|PUT|POST|DELETE) %{URIPATH:uri} %{NOTSPACE:protocol}" status: %{NUMBER:response_status:int} len: %{NUMBER:response_length:int} time: %{NUMBER:response_time:float}

Page 23: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Interpreting Specific Messages if ("_grokparsefailure" not in [tags]) { # clean up extra fields and tag us mutate { replace => [ "type", "openstack_api_stats" ] remove_field => "pid" remove_field => "hostname" remove_field => "message" remove_field => "_message" remove_field => "loglevel" remove_field => "syslog_severity_code" remove_field => "syslog_facility_code" remove_field => "syslog_facility" remove_field => "syslog_severity" add_tag => ["metric", "filter_37"] } } Note1: Processed after successful openstack message filtering. We know the lineage, so we don’t need to keep a bunch of redundant information.

Page 24: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Result in ES:

photo credit: <a href="http://www.flickr.com/photos/calamity_photography/4778766879/">Www.CourtneyCarmody.com/</a> via <a href="http://photopin.com">photopin</a> <a href="http://creativecommons.org/licenses/by/2.0/">cc</a>

Page 25: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Querying ES for Logs { "query": { "bool": { "must": [

{"range": {"@timestamp": {"gte": "2014-05-08T16:31:07+00:00", "lte": "2014-05-15T16:31:07+00:00"}}}, {"terms": {"type": ["openstack_log"]}}

] } }, "aggs": { "events_by_time": { "date_histogram": {"field": "@timestamp", "interval": "5448.648648648648s", "min_doc_count": 0},

"aggs": { "events_by_loglevel": {"terms": {"field": "loglevel"}} } } } }

Page 26: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Querying Nova API Stats { "query": { "filtered": { "filter": {"match_all": {}}, "query": {"bool": "must": [ {"range": {"@timestamp": {"gte": "2014-04-15T16:45:53+00:00", "lte": "2014-05-15T16:45:53+00:00"}}}, {"term": {"component": "nova"}}]} }}}, "aggs": { "events_by_date": { "date_histogram": {"field": "@timestamp", "interval": "32400s", "min_doc_count": 0}, "aggs": {"range": {"range": { "ranges": [{"to": 299, "from": 200}, {"to": 399, "from": 300}, {"to": 499, "from": 400}, {"to": 599, "from": 500}], "field": "response_status", "keyed": true}}, "stats": {"extended_stats": {"field": "response_time"}} }}}}

Page 27: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Manage and Monitor OpenStack

http://gssr.jpl.nasa.gov/index.html

The Goldstone Deep Space Communications Complex (GDSCC), commonly called the Goldstone Observatory, is located in the U.S. state of California's Mojave Desert. Operated for the Jet Propulsion Laboratory, its main purpose is to track and communicate with space missions. It is named after Goldstone, California, a nearby gold-mining ghost town.

(Because everyone asks…)

Page 28: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Log Viewing, Filtering, and Searching

Page 29: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Key Metric Reporting

Page 30: OpenStack Log Mining

John Stanford, VP Development; [email protected]

Thank You

Page 31: OpenStack Log Mining
Page 32: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Solinea at a Glance

CUSTOMERS OVERVIEW Industry Focus Open Infrastructure, OpenStack

Headquarters San Francisco, CA Founded January 2013

Geographies Asia, USA

PAST DEPLOYMENTS TEAM Major US Telco

US Infrastructure Service Provider

North Asian Telco

Global Electronics Manufacturer

Global Gaming Company

Top 5 Global Automotive Firm

Tier 1 Network Provider

Risk Management Analytics

Easter Europe CSP

OpenStack Distro Provider

Security Analytics Firm

Leading SDN Provider

Page 33: OpenStack Log Mining

Copyright 2014 Solinea, Inc.

Solinea Services

! ! !"Conceive Architect Integrate Adopt

Conceive the cloud strategy for existing and new cloud services to drive customer adoption

Architect the cloud platforms based on market demand and internal capabilities

Implement the cloud offerings and integrate them into the existing infrastructure & processes

Operate the cloud, transfer knowledge, train the team and enable rapid adoption