Fluentd vs. Logstash for OpenStack Log Management

47
Fluentd vs. Logstash Masaki Matsushita NTT Communications

Transcript of Fluentd vs. Logstash for OpenStack Log Management

Page 1: Fluentd vs. Logstash for OpenStack Log Management

Fluentd vs. LogstashMasaki Matsushita

NTT Communications

Page 2: Fluentd vs. Logstash for OpenStack Log Management

About Me● Masaki MATSUSHITA● Software Engineer at

○ We are providing Internet access here!● Github: mmasaki Twitter: @_mmasaki● 16 Commits in Liberty

○ Trove, oslo_log, oslo_config● CRuby Commiter

○ 100+ commits for performance improvement

2

Page 3: Fluentd vs. Logstash for OpenStack Log Management

What are Log Collectors?● Provide pluggable and unified logging layer

Without Log Collectors With Log CollectorsImages from http://fluentd.org/ 3

Page 4: Fluentd vs. Logstash for OpenStack Log Management

Input, Filter and Output

4

Input Plugins

tail

syslog

Filter Plugins

grep

hostname

Output Plugins

InfluxDB

Elasticsearch

● They are implemented as plugins● Can be replaced easily

Log FIles

Components

Page 5: Fluentd vs. Logstash for OpenStack Log Management

Two Popular Log Collectors● Fluentd

○ Written in CRuby○ Used in Kubernetes○ Maintained by Treasure Data Inc.

● Logstash○ Written in JRuby○ Maintained by elastic.co

● They have similar features● Which one is better for you?

5

Page 6: Fluentd vs. Logstash for OpenStack Log Management

Agenda● Comparisons

○ Configuration○ Supported Plugins○ Performance○ Transport Protocol

● Integrate OpenStack with Fluentd/Logstash○ Considering High Availability

6

Page 7: Fluentd vs. Logstash for OpenStack Log Management

Configuration: Fluentd● Every inputs are tagged● Logs will be routed by tag

nova-api.log(tag: openstack.nova)

cinder-api.log(tag: openstack.cinder)

<match openstack.nova>

<match openstack.cinder>Filter/Route

7

Page 8: Fluentd vs. Logstash for OpenStack Log Management

Fluentd Configuration: Input

<source>

@type tail

path /var/log/nova/nova-api.log

tag openstack.nova

</source>

Example of tailing nova-api log

● Every inputs will be tagged

8

Page 9: Fluentd vs. Logstash for OpenStack Log Management

Fluentd Configuration: Output<match openstack.nova> # nova related logs

@type elasticsearch

host example.com

</match>

<match openstack.*> # all other OpenStack related logs

@type influxdb

# …</match>

Routed by tag(First match is priority)

Wildcards can be used9

Page 10: Fluentd vs. Logstash for OpenStack Log Management

Fluentd Configuration: Copy<match openstack.*>

@type copy

<store>

@type influxdb

</store>

<store>

@type elasticsearch

</store>

</match>

Copy plugin enables multipleoutputs for a tag

Copied Output

tag: openstack.*

10

Page 11: Fluentd vs. Logstash for OpenStack Log Management

Logstash Configuration● No tags● All inputs will be aggregated● Logs will be scattered to outputs

nova-api.log

cinder-api.logFilter/Aggregate

aggregated logs

11

Page 12: Fluentd vs. Logstash for OpenStack Log Management

Logstash Configurationinput {

file { path => “/var/log/nova/*.log” }

file { path => “/var/log/cinder/*.log” }

}

output {

elasticsearch { hosts => [“example.com”] }

influxdb { host => “example.com”... }

}12

Page 13: Fluentd vs. Logstash for OpenStack Log Management

Case 1: Separated Streams

Input1

Input2

Input3

Output2

Output3

Output1

● Handle multiple streams separately

13

Page 14: Fluentd vs. Logstash for OpenStack Log Management

Case 1: Separated StreamsFluentd: Simple matching by tag

<match input.input1> @type output1</match>

<match input.input2> @type output2</match>

<match input.input3> @type output3</match>

Logstash: Conditional Outputs

output { if [type] == “input1” { output1 {} } else if [type] == “input2” { output2 {} } else if [type] == “input3” { output3 {} }}

Need to split aggregated logs

14

Page 15: Fluentd vs. Logstash for OpenStack Log Management

Case 2: Aggregated Streams

Input1

Input2

Input3

Output2

Output3

Output1

● Streams will be aggregated and scattered

15

Page 16: Fluentd vs. Logstash for OpenStack Log Management

Case 2: Aggregated StreamsFluentd: Copy plugins is needed

<match input.*> @type copy <store> @type output1 </store> <store> @type output2 </store> <store> @type output3 </store></match>

Logstash: Quite simple

output {

output1 {}

output2 {}

output3 {}

}

16

Page 17: Fluentd vs. Logstash for OpenStack Log Management

Configuration● Fluentd

○ Routed by simple tag matching○ Suited to handle log streams separately

● Logstash○ Logs are aggregated○ Suited to handle logs in gather-scatter style

17

Page 18: Fluentd vs. Logstash for OpenStack Log Management

Plugins

● Both provide many plugins○ Fluentd: 300+, Logstash: 200+

● Popular plugins are bundled with Logstash○ They are maintained by the Logstash project

● Fluentd contains only minimal plugins○ Most plugins are maintained by individuals

● Plugins can be installed easily by one command18

Page 19: Fluentd vs. Logstash for OpenStack Log Management

Performance● Depends on circumstances● More than enough for OpenStack logs

○ Both can handle 10000+ logs/s● Applying heavy filters is not a good idea● CRuby is slow because of GVL?

○ GVL: Global VM (Interpreter) Lock○ It’s not true for IO bound loads

19

Page 20: Fluentd vs. Logstash for OpenStack Log Management

GVL on IO bound loads● IO operation can be performed in parallel

20

Thread 1 Thread 2

Idle : User Space:Kernel Space:

Actual Read/Write

Ruby Code Execution GVL Released/ Acquired

IO operationsin parallel

Page 21: Fluentd vs. Logstash for OpenStack Log Management

Transport Protocol

● Both collectors have their own transport protocol.○ Failure Detection and Fallback

● Logstash: Lumberjack protocol○ Active-Standby only

● Fluentd: forward protocol○ Active-Active (Load Balancing), Active-Standby○ Some additional features

21

Page 22: Fluentd vs. Logstash for OpenStack Log Management

Logstash Transport: lumberjack● Active-Standby lumberjack { #config@source

hosts => [

“primary”,

“secondary”

]

port => 1234

ssl_certificate => …}

primary

secondary

source

secondary is used when primary fails

Fail

Fallback

22

Page 23: Fluentd vs. Logstash for OpenStack Log Management

Fluentd Transport: forward● Active-Active

(Load Balancing)<match openstack.*>

type forward

<server>

host dest1

</server>

<server>

host dest2

</server>

</match>

source dest1

dest2

Equally balancedoutputs

23

Page 24: Fluentd vs. Logstash for OpenStack Log Management

Fluentd Transport: forward● Active-Standby <match openstack.*>

type forward

<server>

host primary

</server>

<server>

host secondary

standby

</server>

</match>

primary

secondary

source

Fail

Fallback

24

Page 25: Fluentd vs. Logstash for OpenStack Log Management

Fluentd Transport: forward● Weighted Load Balancing <match openstack.*>

type forward

<server>

host dest1

weight 60

</server>

<server>

host dest2

weight 40

</server>

</match>

source dest1

dest2

60%

40%

25

Page 26: Fluentd vs. Logstash for OpenStack Log Management

Fluentd Transport: forward● At-least-one Semantics

(may affect performance)<match openstack.*>

type forward

require_ack_response

<server>

host dest

</server>

</match>destsource

send logs

ACK

Logs are re-transmitteduntil ACK is received

26

Page 27: Fluentd vs. Logstash for OpenStack Log Management

Transport Protocol

● Both can be configured as Active-Standby mode.● Fluentd has great features:

○ Active-Active Mode (Load Balancing)○ At-least-one Semantics○ Weighted Load Balancing

27

Page 28: Fluentd vs. Logstash for OpenStack Log Management

Forwarders● Fluentd/Logstash have their own “forwarders”

○ Lightweight implementation written in Golang○ Low memory consumption○ One binary: Less dependent and easy to install

28

Node

Tail log files

Forwarder

Log AggregatorForward/LumberjackProtocol

Page 29: Fluentd vs. Logstash for OpenStack Log Management

Forwarders: Config Examplefluentd-forwarder:[fluentd-forwarder]

to = fluent://fluentd1:24224

to = fluent://fluentd2:24224

logstash-forwarder:"network": { "servers": [ "logstash1:5043", "logstash2:5043" ]}Always send logs to both servers.

Pick one active server and send logs only to it.Fallback to another server on failure.

29

Page 30: Fluentd vs. Logstash for OpenStack Log Management

Integration with OpenStack

● Tail log files by local Fluentd/Logstash○ must parse many form of log files

● Rsyslog○ installed by default in most distribution○ can receive logs in JSON format

● Direct output from oslo_log○ oslo_log: logging library used by components○ Logging without any parsing

30

Page 31: Fluentd vs. Logstash for OpenStack Log Management

Log Aggregators

OpenStack nodes

Tail Log Files

31

Tail log files

Forward Protocol dest1

dest2

Page 32: Fluentd vs. Logstash for OpenStack Log Management

Tail Log Files• Must handle many log files…

syslog

kern.log

apache2/access.log

apache2/error.log

keystone/keystone-all.log

keystone/keystone-manage.log

keystone/keystone.log

cinder/cinder-api.log

cinder/cinder-scheduler.log

neutron/neutron-server.log

neutron/neutron-server.log

nova/nova-api.log

nova/nova-conductor.log

nova/nova-consoleauth.log

nova/nova-manage.log

nova/nova-novncproxy.log

nova/nova-scheduler.log

mysql/error.log

mysql/mysql-slow.log

mysql.log

mysql.err

nova/nova-compute.log

nova/nova-manage.log...32

Page 33: Fluentd vs. Logstash for OpenStack Log Management

Tail Log Files• But you can use wildcard

Fluentd:

<source>

type tail

path /var/log/nova/*.log

tag openstack.nova

</source>

Logstash:

input {

file {

path => [“/var/log/nova/*.log”]

}

}

33

Page 34: Fluentd vs. Logstash for OpenStack Log Management

Parse Text Log

● Welcome to regular expression hell!<source>

type tail # or syslog

path /var/log/nova/nova-api.log

format /^(?<asctime>.+) (?<process>\d+) (?<loglevel>\w+) (?

<objname>\S+)( \[(-|(?<request_id>.+?) (?<user_identity>.+))\])?

((?<remote>\S*) "(?<method>\S+) (?<path>[^\"]*) \S*?" status: (?

<code>\d*) len: (?<size>\d*) time: (?<res_time>\S)|(?<message>.

*))/

</source>34

Page 35: Fluentd vs. Logstash for OpenStack Log Management

Log Aggregators

OpenStack nodes

Rsyslog

35

via /dev/log

Syslog Protocol(TCP or UDP)

rsyslog

Page 36: Fluentd vs. Logstash for OpenStack Log Management

Rsyslog: Logging.conf● Logging Configuration in detail● Handler: Syslog, Formatter: JSON

# /etc/{nova,cinder…}/logging.conf

[handler_syslog]

class = handlers.SysLogHandler

args = ('/dev/log', handlers.SysLogHandler.LOG_LOCAL1)

formatter = json

[formatter_json]

class = oslo_log.formatters.JSONFormatter36

Page 37: Fluentd vs. Logstash for OpenStack Log Management

Example Output: JSONFormatter{

"levelname": "INFO",

"funcname": "start",

"message": "Starting conductor node (version 13.0.0)",

"msg": "Starting %(topic)s node (version %(version)s)",

"asctime": "2015-09-29 18:29:57,690",

"relative_created": 2454.8499584198,

"process": 25204,

"created": 1443518997.690932,

"thread": 140119466896752,

"name": "nova.service",

"process_name": "MainProcess",

"thread_name": "GreenThread-1",

...37

Page 38: Fluentd vs. Logstash for OpenStack Log Management

Syslog Facilities

● Assignment of local0..7 Facilities for components● Logs are tagged as like “syslog.local0” in Fluentd● Example:

○ local0: Keystone○ local1: Nova○ local2: Cinder○ local3: Neutron○ local4: Glance

38

Page 39: Fluentd vs. Logstash for OpenStack Log Management

Rsyslog: Config@OpenStack nodes

● Active-Standby Configuration

# /etc/rsyslog.d/rsyslog.conf

user.* @@primary:5140

$ActionExecOnlyWhenPreviousIsSuspended on

&@@secondary:5140

39

Page 40: Fluentd vs. Logstash for OpenStack Log Management

Rsyslog: Config@Aggregator

Fluentd:<source> type syslog port 5140 protocol_type tcp format json tag syslog</source>

Logstash:input { syslog {

codec => json port => 5140 }} Listen on both TCP and UDP

Specify TCP or UDP 40

Page 41: Fluentd vs. Logstash for OpenStack Log Management

Rsyslog: Config@Aggregator

Fluentd:<source> type syslog port 5140 protocol_type tcp format json tag syslog</source>

Logstash:input { syslog {

codec => json port => 5140 }}

41

Page 42: Fluentd vs. Logstash for OpenStack Log Management

Log AggregatorsOpenStack nodes

42

via FluentHandler

Forward Protocol

Direct output from oslo_log

Local Fluentd for buffering/load balancing(Logstash also can be used)

Page 43: Fluentd vs. Logstash for OpenStack Log Management

Direct output from oslo_log

# logging.conf:

[handler_fluent]

class = fluent.handler.FluentHandler # fluent-logger

formatter = fluent

args = (’openstack.nova', 'localhost', 24224)

[formatter_fluent]

class = fluent.handler.FluentFormatter # our Blueprint

43

Format logs as Dictionary

Page 44: Fluentd vs. Logstash for OpenStack Log Management

Our BP in oslo_log: FluentFormatter{

"hostname":"allinone-vivid",

"extra":{"project":"unknown","version":"unknown"},

"process_name":"MainProcess",

"module":"wsgi",

"message":"(4132) wsgi starting up on http://0.0.0.0:8774/",

"filename":"wsgi.py",

"name":"nova.osapi_compute.wsgi.server",

"level":"INFO",

"traceback":null,

"funcname":"server",

"time":"2015-10-15 10:09:12,255"

}

Don’t need to parse!

44

Page 45: Fluentd vs. Logstash for OpenStack Log Management

Conclusion● Log Handling

○ Fluentd: Logs are distinguished by tag○ Logstash: No tags. Logs are aggregated

● Transport Protocol○ Both supports active-standby mode○ Fluentd supports some additional features

■ Client-side load balancing (Active-Active)■ At-least-one semantics■ Weighted load balancing

45

Page 46: Fluentd vs. Logstash for OpenStack Log Management

Conclusion● Integration with OpenStack

○ Tail log files: regular expression hell○ Rsyslog: No agents are needed ○ Direct output from oslo_log w/o any parsing○ Review is welcome for our Blueprint

(oslo_log: fluent-formatter)

46

Page 47: Fluentd vs. Logstash for OpenStack Log Management

Thank you!

Please visit our booth!

Robot Racing over WebRTC! →