An Introduction to Fluent & MongoDB Plugins
-
Upload
takahiro-inoue -
Category
Technology
-
view
5.878 -
download
0
description
Transcript of An Introduction to Fluent & MongoDB Plugins
An Introductionto Fluent &
MongoDB Plugins@doryokujin
MongoDB Meet-Up #7 in Japan
・Takahiro Inoue(age 26)
・twitter: doryokujin
・Majored in Math (Statistics & Graph Algorithm)
・Data Scientist
・Leader of MongoDB JP
・Interest: DataProcessing, GraphDB
About Me
1. What is Fluent?
2. Introduction to MongoDB Plugins & Use Cases
3. Demo
Agenda
Sadayuki Furuhashi
Fluent
@frsyuki
!e Event Collector Service
Treasure Data, Inc.
Structured logging
Pluggable architecture
Reliable forwarding
What is Fluent?
Sadayuki Furuhashi
Fluent
@frsyuki
!e Event Collector Service
Treasure Data, Inc.
Structured logging
Pluggable architecture
Reliable forwarding
CassandraWeb Server(nginx)
Appserver(Tomcat)
There are many formats(MySQL, Cassandra, Text...)
Access Log
MySQL
Action Log Save Data
Payment,Registration
Log Flow Example
CassandraWeb Server(nginx)
Appserver(Tomcat)
Access Log
MySQL
Action Log Save Data
Payment,Registration
log
Log Flow Example
Text Logs Grow very Fast and BIg!!
App Server App Server App Server App Server
log log log log
Analyze Server
Analyze Server
Analyze Server
2. Download Logs from S3 to Local
1. Upload Logs to Amazon S3 per Day
How to Generate Logs?-Traditional Approach-
App Server App Server App Server App Server
log log log log
Analyze Server
Analyze Server
Analyze Server
Interval: Download Daily, not Hourly
Size: Lots of Stress on the Network
What is a Problem?-Traditional Approach-
App Server App Server App Server App Server
log log log log
Analyze Server
Analyze Server
Analyze Server
Relay Server
Stream Data per Hour, Minute, even Second!!!
Relay Server
Improvement Network Stress very much!!!
How to Generate Logs?-Streaming Approach-
Event Collector Services
Not only Streaming:Realtime Stat & ML
For Large Data Streaming
One of Hadoop Ecosystem
Fluent: Handling Structured Data
Not only Streaming:Realtime Stat & ML
For Large Data Streaming
One of Hadoop Ecosystem
Sadayuki Furuhashi
Fluent
@frsyuki
!e Event Collector Service
Treasure Data, Inc.
Structured logging
Pluggable architecture
Reliable forwarding
What is Fluent?
Sadayuki Furuhashi
Fluent
@frsyuki
!e Event Collector Service
Treasure Data, Inc.
Structured logging
Pluggable architecture
Reliable forwarding
http://www.scribd.com/doc/70897187/Fluent-event-collector-update
Introduction to MongoDB Plugins & Use Cases
Sadayuki Furuhashi
Fluent
@frsyuki
!e Event Collector Service
Treasure Data, Inc.
Structured logging
Pluggable architecture
Reliable forwarding
https://github.com/fluent/fluent-plugin-mongo
1. Out Mongo
App Server
Out Mongo: For Local Back Up
App Server App Server App Server
log log log log
Analyze Server
Analyze Server
Analyze Server
Relay Server
Relay Server
App Server
Out Mongo: For Local Back Up
App Server App Server App Server
log log log log
Analyze Server
Analyze Server
Analyze Server
Relay Server
Relay Server
・Network Partition ※fluentd can buffer data and retry sending it after
・Fluentd Down※ we want to
access event data by another way
MongoDB: Capped Collection is suitable: ・Fast Write &
・Fixed Size Storage
App Server App Server
log log
Analyze Server
Analyze Server
Analyze Server
Relay Server
Enable Data Access Quickly When
・Network Partition ・Fluent Down
BackUp to Capped Collection
BackUp to Capped Collection
BackUp to Capped Collection
Out Mongo: For Local Back Up
Decreasing Possibility of Data Lost
App Server App Server
log log
Analyze Server
Analyze Server
Analyze Server
Relay Server
BackUp to Capped Collection
BackUp to Capped Collection
BackUp to Capped Collection
Out Mongo: For Local Back Up
Enable Data Access Quickly
logBackUp to
Capped Collection
Out Mongo: For Local Back Up
tcp
<match ...>
type mongo_backup
capped_size 100m
<store>
type tcp
host 192.168.0.13
...
</store>
</match>
Parent
Child
・To Back Up: We only add the configuration.
-Configuration-
App Server
Out Mongo: For Result Output
App Server App Server App Server
log log log log
Analyze Server
Analyze Server
Analyze Server
Relay Server
Relay Server
Output to Mongo Collection
Output to Mongo Collection
・Result Output:JSON Structured Data is suitable for Mongo!!!
<match mongo.**>
type mongo
database fluent
collection test
# Following attibutes are optional
host fluenter
port 10000
# Other buffer configurations here
</match>
Out Mongo: For Result Output
log
Parent
Child Output to Mongo Collection
-Configuration-
Mon Nov 14 23:36:22 [conn13] run command admin.$cmd { replSetGetStatus: 1 }
Mon Nov 14 23:36:22 [conn13] command admin.$cmd command: { replSetGetStatus: 1 } ntoreturn:1 reslen:571 0ms
Mon Nov 14 23:36:22 [conn13] run command admin.$cmd { ismaster: 1 }
Mon Nov 14 23:36:22 [conn13] command admin.$cmd command: { ismaster: 1 } ntoreturn:1 reslen:234 0ms
Mon Nov 14 23:36:22 [conn13] run command admin.$cmd { replSetGetStatus: 1 }
Out Mongo: For Result Output
log
Parent
Child Output to Mongo Collection
{
_id : ...,
time: Mon Nov 14 23:36:22,
key1 : “[conn13]”,
key2 : “command”,
key3 : ”admin.$cmd”,
key4 : {
“ismaster”: 1
},
value : “0ms”,
}
Input
Output
-In & Output-
https://github.com/doryokujin/fluent-plugin-aggregation
2. Aggregation Mongo
aggregate
key1 key2 key3 shuffle
aggregate perday, hour,
minute, second
aggregate
App Serverlog
aggregate
App Serverlog
aggregate
App Serverlog
aggregate
App Serverlog
aggregate
Relay Server
Relay Server
Relay Server
Analyze Server
aggregate aggregate
Overview
<source>
type tail
format /^(?<time>[^ ]* [^ ]* [^ ]* [^ ]*) (?<key1>[^ ]*) (?<key2>[^ ]*) (?<key3>[^ ]*)
(?<value1>[^ ]*)$/
time_format %a %b %e %H:%M:%S
path /var/log/something.log
tag aggr_hostneme
</source>
<metrics>
name one_key
partition_by m
each_key key1
</metrics>
<metrics>
name two_keys
partition_by m
each_key key2,key3
value_key value1
type float
</metrics>
aggregate per minute
count(*) group by key1
sum(value1), count(*)group by key2, key3
Aggregation Mongo: configuration
<server>
name host1
host host1
port 24224
</server>
<server>
name host2
host host2
port 24224
</server>
...
key-value are shuffled for each servers (like Hadoop)
Mon Nov 14 23:36:22 [conn13] run command admin.$cmd { replSetGetStatus: 1 }
Mon Nov 14 23:36:22 [conn13] command admin.$cmd command: { replSetGetStatus: 1 } ntoreturn:1 reslen:571 0ms
Mon Nov 14 23:36:22 [conn13] run command admin.$cmd { ismaster: 1 }
Mon Nov 14 23:36:22 [conn13] command admin.$cmd command: { ismaster: 1 } ntoreturn:1 reslen:234 0ms
Mon Nov 14 23:36:22 [conn13] run command admin.$cmd { replSetGetStatus: 1 }
Aggregation Mongo: in & output
{
_id : "399e94941cacf13eeb3f808e8ac00981",
name : one_key,
partition : "2011-11-14 19:17"
key : {
key1 : "PeriodicTask::Runner"
},
count : 30,
value : {
response : 1024
}
}
Input
Output
delta1 delta2 delta3
delta5delta4
delta6
per day, hour, minute, second
aggregate
key1
key2
key3 shuffle
aggregate
Relay Server
Relay Server
Relay Server
Analyze Server
aggregate aggregate
Analyze Server
Analyze Server
mongos mongos mongos
aggregate
shard key3shard key1shard key2
MongoSharding
We can Scale!!...
...
Scale Up
Demo
Sadayuki Furuhashi
Fluent
@frsyuki
!e Event Collector Service
Treasure Data, Inc.
Structured logging
Pluggable architecture
Reliable forwarding