An Introduction to Fluent & MongoDB Plugins

An Introductionto Fluent &

MongoDB Plugins@doryokujin

MongoDB Meet-Up #7 in Japan

http://twitter.com/%23!/doryokujin


・Takahiro Inoue(age 26)

・twitter: doryokujin

・Majored in Math (Statistics & Graph Algorithm)

・Data Scientist

・Leader of MongoDB JP

・Interest: DataProcessing, GraphDB

About Me



https://groups.google.com/group/mongodb-jp?hl=ja

https://groups.google.com/group/mongodb-jp?hl=ja

1. What is Fluent?

2. Introduction to MongoDB Plugins & Use Cases

3. Demo

Agenda

Sadayuki Furuhashi

Fluent

@frsyuki

!e Event Collector Service

Treasure Data, Inc.

Structured logging

Pluggable architecture

Reliable forwarding

What is Fluent?

Sadayuki Furuhashi

Fluent

@frsyuki


Treasure Data, Inc.

Structured logging


Reliable forwarding

CassandraWeb Server(nginx)

Appserver(Tomcat)

There are many formats(MySQL, Cassandra, Text...)

Access Log

MySQL

Action Log Save Data

Payment,Registration

Log Flow Example

CassandraWeb Server(nginx)

Appserver(Tomcat)

Access Log

MySQL

Action Log Save Data

Payment,Registration

log

Log Flow Example

Text Logs Grow very Fast and BIg!!

App Server App Server App Server App Server

log log log log

Analyze Server

Analyze Server

Analyze Server

2. Download Logs from S3 to Local

1. Upload Logs to Amazon S3 per Day

How to Generate Logs?-Traditional Approach-


log log log log

Analyze Server

Analyze Server

Analyze Server

Interval: Download Daily, not Hourly

Size: Lots of Stress on the Network

What is a Problem?-Traditional Approach-


log log log log

Analyze Server

Analyze Server

Analyze Server

Relay Server

Stream Data per Hour, Minute, even Second!!!

Relay Server

Improvement Network Stress very much!!!

How to Generate Logs?-Streaming Approach-

Event Collector Services

Not only Streaming:Realtime Stat & ML

For Large Data Streaming

One of Hadoop Ecosystem

Fluent: Handling Structured Data

Not only Streaming:Realtime Stat & ML

For Large Data Streaming

One of Hadoop Ecosystem

Sadayuki Furuhashi

Fluent

@frsyuki


Treasure Data, Inc.

Structured logging


Reliable forwarding

What is Fluent?

Sadayuki Furuhashi

Fluent

@frsyuki


Treasure Data, Inc.

Structured logging


Reliable forwarding

http://www.scribd.com/doc/70897187/Fluent-event-collector-update



http://livepage.apple.co.jp/


Introduction to MongoDB Plugins & Use Cases

Sadayuki Furuhashi

Fluent

@frsyuki


Treasure Data, Inc.

Structured logging


Reliable forwarding

https://github.com/fluent/fluent-plugin-mongo

1. Out Mongo





App Server

Out Mongo: For Local Back Up

App Server App Server App Server

log log log log

Analyze Server

Analyze Server

Analyze Server

Relay Server

Relay Server

App Server



log log log log

Analyze Server

Analyze Server

Analyze Server

Relay Server

Relay Server

・Network Partition ※fluentd can buffer data and retry sending it after

・Fluentd Down※ we want to

access event data by another way

MongoDB: Capped Collection is suitable: ・Fast Write &

・Fixed Size Storage

App Server App Server

log log

Analyze Server

Analyze Server

Analyze Server

Relay Server

Enable Data Access Quickly When

・Network Partition ・Fluent Down

BackUp to Capped Collection




Decreasing Possibility of Data Lost

App Server App Server

log log

Analyze Server

Analyze Server

Analyze Server

Relay Server





Enable Data Access Quickly

logBackUp to

Capped Collection


tcp

<match ...>

type mongo_backup

capped_size 100m

<store>

type tcp

host 192.168.0.13

...

</store>

</match>

Parent

Child

・To Back Up: We only add the configuration.

-Configuration-

App Server

Out Mongo: For Result Output


log log log log

Analyze Server

Analyze Server

Analyze Server

Relay Server

Relay Server

Output to Mongo Collection

Output to Mongo Collection

・Result Output:JSON Structured Data is suitable for Mongo!!!

<match mongo.**>

type mongo

database fluent

collection test

# Following attibutes are optional

host fluenter

port 10000

# Other buffer configurations here

</match>


log

Parent

Child Output to Mongo Collection

-Configuration-

Mon Nov 14 23:36:22 [conn13] run command admin.$cmd { replSetGetStatus: 1 }

Mon Nov 14 23:36:22 [conn13] command admin.$cmd command: { replSetGetStatus: 1 } ntoreturn:1 reslen:571 0ms

Mon Nov 14 23:36:22 [conn13] run command admin.$cmd { ismaster: 1 }

Mon Nov 14 23:36:22 [conn13] command admin.$cmd command: { ismaster: 1 } ntoreturn:1 reslen:234 0ms



log

Parent

Child Output to Mongo Collection

{

_id : ...,

time: Mon Nov 14 23:36:22,

key1 : “[conn13]”,

key2 : “command”,

key3 : ”admin.$cmd”,

key4 : {

“ismaster”: 1

},

value : “0ms”,

}

Input

Output

-In & Output-

https://github.com/doryokujin/fluent-plugin-aggregation

2. Aggregation Mongo





aggregate

key1 key2 key3 shuffle

aggregate perday, hour,

minute, second

aggregate

App Serverlog

aggregate

App Serverlog

aggregate

App Serverlog

aggregate

App Serverlog

aggregate

Relay Server

Relay Server

Relay Server

Analyze Server

aggregate aggregate

Overview

<source>

type tail

format /^(?<time>[^ ]* [^ ]* [^ ]* [^ ]*) (?<key1>[^ ]*) (?<key2>[^ ]*) (?<key3>[^ ]*)

(?<value1>[^ ]*)$/

time_format %a %b %e %H:%M:%S

path /var/log/something.log

tag aggr_hostneme

</source>

<metrics>

name one_key

partition_by m

each_key key1

</metrics>

<metrics>

name two_keys

partition_by m

each_key key2,key3

value_key value1

type float

</metrics>

aggregate per minute

count(*) group by key1

sum(value1), count(*)group by key2, key3

Aggregation Mongo: configuration

<server>

name host1

host host1

port 24224

</server>

<server>

name host2

host host2

port 24224

</server>

...

key-value are shuffled for each servers (like Hadoop)


Mon Nov 14 23:36:22 [conn13] command admin.$cmd command: { replSetGetStatus: 1 } ntoreturn:1 reslen:571 0ms

Mon Nov 14 23:36:22 [conn13] run command admin.$cmd { ismaster: 1 }

Mon Nov 14 23:36:22 [conn13] command admin.$cmd command: { ismaster: 1 } ntoreturn:1 reslen:234 0ms


Aggregation Mongo: in & output

{

_id : "399e94941cacf13eeb3f808e8ac00981",

name : one_key,

partition : "2011-11-14 19:17"

key : {

key1 : "PeriodicTask::Runner"

},

count : 30,

value : {

response : 1024

}

}

Input

Output

delta1 delta2 delta3

delta5delta4

delta6

per day, hour, minute, second

aggregate

key1

key2

key3 shuffle

aggregate

Relay Server

Relay Server

Relay Server

Analyze Server

aggregate aggregate

Analyze Server

Analyze Server

mongos mongos mongos

aggregate

shard key3shard key1shard key2

MongoSharding

We can Scale!!...

...

Scale Up

Demo

Sadayuki Furuhashi

Fluent

@frsyuki


Treasure Data, Inc.

Structured logging


Reliable forwarding

An Introduction to Fluent & MongoDB Plugins

Technology

Transcript of An Introduction to Fluent & MongoDB Plugins