the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file type...

51
Muga Nishizawa Treasure Data, Inc. the missing log collector

Transcript of the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file type...

Page 1: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Muga NishizawaTreasure Data, Inc.

the missing log collector

Page 2: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Muga Nishizawa (@muga_nishizawa)Chief Software Architect, Treasure Data

Page 3: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

3

Treasure Data Overview Founded to deliver big data analytics in days not months without

specialist IT resources for one-tenth the cost of other alternatives Service based subscription business model World class open source team

• Founded world’s largest Hadoop User Group• Developed Fluentd and MessagePack• Contributed to Memcached, Hibernate, etc.

Treasure Data is in production• 60+ customers incl. Fortune 500 companies• 400+ billion records stored

Processing 40,000 messages per second

Page 4: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

=Fluentd

syslogd+

many

Page 5: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

=Fluentd

syslogd+

many

✓ Plugins

✓ JSON

Page 6: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

> Open sourced log collector written in Ruby

> Using rubygems ecosystem for plugins

In short

It’s like syslogd, butuses JSON for log messages

Page 7: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Make log collection easyusing Fluentd

Page 8: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Reporting & Monitoring

Page 9: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Reporting & Monitoring

Collect Store Process Visualize

Page 10: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Collect Store Process Visualize

easier & shorter time

Hadoop / Hive

MongoDBTreasure Data

Tableau

Excel

RReporting & Monitoring

Page 11: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Collect Store Process Visualize

easier & shorter timeHow to shorten here?

Hadoop / Hive

MongoDBTreasure Data

Tableau

Excel

R

Page 12: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Collect Store Process Visualize

easier & shorter timeHow to shorten here?

Hadoop / Hive

MongoDBTreasure Data

Tableau

Excel

R

Page 13: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Before Fluentd

Application

・・・

Server2

Application

・・・

Server3

Application

・・・

Server1

FluentLog ServerHigh Latency!must wait for a day...

Page 14: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

After Fluentd

Application

・・・

Server2

Application

・・・

Server3

Application

・・・

Server1

In streaming!

Fluentd Fluentd Fluentd

Fluentd Fluentd

Page 15: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Many Users

Page 16: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Many Meetups

Page 17: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Growth by Community

Page 18: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Why did we develop Fluentd?

Page 19: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Apache

App

App

Other data sources

td-agent RDBMS

Treasure Data columnar data

warehouse

Query Processing Cluster

Query API

HIVE, PIG (to be supported)

JDBC, REST

MAPREDUCE JOBS

User

td-command

BI apps

Treasure Data Service Architecture

Page 20: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Apache

App

App

Other data sources

td-agent RDBMS

Treasure Data columnar data

warehouse

Query Processing Cluster

Query API

HIVE, PIG (to be supported)

JDBC, REST

MAPREDUCE JOBS

User

td-command

BI apps

Treasure Data Service ArchitectureOpen Sourced

Page 21: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

writes logs to text files

Rails app

GoogleSpreadsheet

MySQL

MySQL

MySQL

MySQL

writes logs to text files

Nightly

INSERT

hundreds of app servers

Daily/Hourly

Batch

KPI

visualizationFeedback rankings

Rails app

writes logs to text files

Rails app

- Limited scalability- Fixed schema- Not realtime- Unexpected INSERT latency

Example Use Case – MySQL to TD

Page 22: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

hundreds of app servers

sends event logs

sends event logs

sends event logs

Rails app td-agent

td-agent

td-agent

GoogleSpreadsheet

Treasure Data

MySQL

Logs are available

after several mins.

Daily/Hourly

Batch

KPI

visualizationFeedback rankings

Rails app

Rails app

✓ Unlimited scalability✓ Flexible schema✓ Realtime✓ Less performance impact

Example Use Case – MySQL to TD

Page 23: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

td-agent

> Open sourced distribution package of fluentd

> ETL part of Treasure Data

> Including useful components> ruby, jemalloc, fluentd> 3rd party gems: td, mongo, webhdfs, etc...

td plugin is for TD

> http://packages.treasure-data.com/

Page 24: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

How Fluentd works?

Page 25: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

=Fluentd

syslogd+

many

✓ Plugins

✓ JSON

Page 26: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Nagios

MongoDB

Hadoop

Alerting

Amazon S3

Analysis

Archiving

MySQL

Apache

Frontend

Access logs

syslogd

App logs

System logs

Backend

Databasesfilter / buffer / routing

Page 27: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Nagios

MongoDB

Hadoop

Alerting

Amazon S3

Analysis

Archiving

MySQL

Apache

Frontend

Access logs

syslogd

App logs

System logs

Backend

Databasesfilter / buffer / routing

Page 28: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Nagios

MongoDB

Hadoop

Alerting

Amazon S3

Analysis

Archiving

MySQL

Apache

Frontend

Access logs

syslogd

App logs

System logs

Backend

Databasesfilter / buffer / routing

Page 29: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Input Plugins Output Plugins

Buffer Plugins(Filter Plugins)

Nagios

MongoDB

Hadoop

Alerting

Amazon S3

Analysis

Archiving

MySQL

Apache

Frontend

Access logs

syslogd

App logs

System logs

Backend

Databasesfilter / buffer / routing

Page 30: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Architecture

Buffer OutputInput

> Forward> HTTP> File tail> dstat> ...

> Forward> File> Amazon S3> MongoDB> ...

> Memory> File

Pluggable Pluggable Pluggable

Page 31: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Architecture

Buffer OutputInput

> Forward> HTTP> File tail> dstat> ...

> Forward> File> Amazon S3> MongoDB> ...

> Memory> File

Pluggable Pluggable Pluggable

117 plugins!Contributions by Community

Page 32: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Input Plugins Output Plugins

2012-02-04 01:33:51myapp.buylog { “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing”}

timetag

record

JSON

log

Page 33: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

> second unit

> from data source oradding parsed time

Event structure(log message)

✓ Time

> for message routing

✓ Tag

> JSON format

> MessagePackinternally

> non-unstructured

✓ Record

Page 34: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

in_tail: reads file and parses lines

fluentdapache

access.log

✓ read a log file✓ custom regexp✓ custom parser in Ruby

in_tail

Page 35: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

out_mongo: writes buffered chunks

fluentdapache

access.log buffer

in_tail

Page 36: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

failure handling & retrying

fluentdapache

access.log buffer

✓ retry automatically✓ exponential retry wait✓ persistent on a file

in_tail

Page 37: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

out_s3

fluentdapache

access.log buffer

✓ retry automatically✓ exponential retry wait✓ persistent on a file

Amazon S3

✓ slice files based on time

in_tail

2013-01-01/01/access.log.gz2013-01-01/02/access.log.gz2013-01-01/03/access.log.gz...

Page 38: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

out_hdfs

fluentdapache

access.log buffer

✓ retry automatically✓ exponential retry wait✓ persistent on a file

✓ slice files based on time

in_tail

2013-01-01/01/access.log.gz2013-01-01/02/access.log.gz2013-01-01/03/access.log.gz...

HDFS

✓ custom text formater

Page 39: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

routing / copying

fluentdapache

access.log buffer

✓ routing based on tags✓ copy to multiple storages

in_tail

Amazon S3

Hadoop

Page 40: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Fluentd

# RubyFluent.open(“myapp”)Fluent.event(“login”, {“user” => 38})#=> 2012-12-11 07:56:01 myapp.login {“user”:38}

> Ruby> Java> Perl> PHP> Python> D> Scala> ...

Application

Time:Tag:Record

Client libraries

Page 41: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

# logs from a file<source> type tail path /var/log/httpd.log format apache2 tag web.access</source>

# logs from client libraries<source> type forward port 24224</source>

# store logs to MongoDB and S3<match **> type copy

<match> type mongo host mongo.example.com capped capped_size 200m </match>

<match> type s3 path archive/ </match></match>

Fluentd

Page 42: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

out_forward

fluentdapache

access.log buffer

✓ retry automatically✓ exponential retry wait✓ persistent on a file

✓ slice files based on time

in_tail

2013-01-01/01/access.log.gz2013-01-01/02/access.log.gz2013-01-01/03/access.log.gz...

fluentd

fluentd

fluentd

✓ automatic fail-over✓ load balancing

Page 43: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

forwarding

fluentd

fluentd

fluentd

fluentd

fluentd

fluentdfluentd

send / ackFluentd

Page 44: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Fluentd - plugin distribution platform

$ fluent-gem search -rd fluent-plugin

$ fluent-gem install fluent-plugin-mongo

Page 45: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Use cases

Page 46: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

hundreds of app servers

sends event logs

sends event logs

sends event logs

Rails app td-agent

td-agent

td-agent

GoogleSpreadsheet

Treasure Data

MySQL

Logs are available

after several mins.

Daily/Hourly

Batch

KPI

visualizationFeedback rankings

Rails app

Rails app

✓ Unlimited scalability✓ Flexible schema✓ Realtime✓ Less performance impact

Cookpad

✓ Over 100 RoR servers (2012/2/4)

Page 47: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013

NHN Japan

by @tagomoris

✓ 16 nodes✓ 120,000+ lines/sec✓ 400Mbps at peak✓ 1.5+ TB/day (raw)

Web Servers Fluentd

Cluster

ArchiveStorage(scribed)

FluentdWatchers

GraphTools

Notifications(IRC)

Hadoop ClusterCDH4

(HDFS, YARN)

webhdfs

HuahinManager

hiveserver

STREAM

Shib ShibUI

BATCH SCHEDULEDBATCH

Page 48: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Treasure Data

FrontendJob Queue

WorkerHadoop

Hadoop

Fluentd

Applications push metrics to Fluentd(via local Fluentd)

Librato Metricsfor realtime analysis

Treasure Data

for historical analysis

Fluentd sums up data minutes(partial aggregation)

Page 49: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Key to Fluentd’s growth is...

Page 50: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

=

Fluentd

syslogd+

many+

Community

✓ Plugins

✓ JSON

Page 51: the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file  type tail path /var/log/httpd.log format apache2 tag web.access

Muga NishizawaTreasure Data, Inc.

the missing log collector