Like loggly using open source

of 27 /27
Stream your Cloud Thomas Alrin [email protected]

Embed Size (px)


Streaming logs in cloud

Transcript of Like loggly using open source

  • Stream your Cloud Thomas Alrin [email protected]
  • Well cover What to stream Choices for streaming Setting up streaming from a VM Chef Recipes
  • What to stream You can stream the following from cloud Traces (Logs) Metrics Monitoring Status
  • Scenario App/service runs in Cloud We need the log files of your App Web Server logs, container logs, app logs We need the log files of your Service Service logs
  • SaaS Vendors You can avail this SaaS service from (loggly, papertrail..)
  • We plan to build a streamee...
  • Choices for streaming Logstash : Fluentd : Beaver : Logstash-Forwarder : Woodchuck : RSYSLOG : Heka :
  • Name Language Collector Shipper Footprint Ease of setting up Logstash JRuby (JVM) Yes No High > Easy Fluentd Ruby Yes No High > Easy Beaver Python No Yes Low Easy Logstash-Forwarder Go No Yes Low Difficult (uses SSL) Woodchuck Ruby No Yes High > Easy RSYSLOG C Yes Yes Low Difficult Heka Go Yes Yes Low Easy
  • Our requirements 2 sets of logs to collect All the trace when the VM is spinned off. All the trace inside the VM of the application or service Publish it to an in-memory store(queue) which can be accessed by a key
  • We tried We use Logstash Beaver Logstash-forwarder Woodchuck Heka RSYSLOG Heka Beaver RSYSLOG
  • megamd Queue#1 Queue#2 Queue#3 Queue#4 Shipper Agent howdy.log howdy_err.log howdy_err.log howdy_err.log howdy.log howdy_err.log howdy.log howdy.log AMQP /usr/share/mega m/megamd/logs
  • How does it work ? Heka resides inside our Megam Engine (megamd). Its job is to collect the trace information when a VM is run. 1. Reads the dynamically created VM execution log files 2. Format the log contents in json for every VM execution. 3. Publish the log contents to a queue Beaver resides in each of the VMs. It does the following steps, 1. Reads the log files inside the VM 2. Format log contents in json. 3. Publish the log contents to a queue.
  • Logstash Centralized logging frameworks that can transfer logs from multiple hosts to a central location. JRuby hence its needs a JVM JVM sucks memory Logstash is Ideal as a centralized collector and not a shipper.
  • Logstash Shipper Scenario Let us ship logs from a VM : /usr/share/megam/megamd/logs/*/* to Redis or AMQP. eg: ../megamd/logs/ Queue named in AMQP. ../megamd/logs/ Queue named in AMQP.
  • Logstash Shipper - Sample conf input { file { type => "access-log" path => [ "/usr/local/share/megam/megamd/logs/*/*" ] } } filter { grok { type => "access-log" match => [ "@source_path", "(//usr/local/share/megam/megamd/logs/)(? .+)(//*)" ] } } output { stdout { debug => true debug_format => "json"} redis { key => '%{source_key}' type => "access-log" data_type => "channel" host => "" } } Logs inside directory are shipped to Redis key named /opt/logstash/agent/etc$ sudo cat shipper.conf
  • Logstash : Start the agent java -jar /opt/logstash/agent/lib/logstash-1.4.2. jar agent -f /opt/logstash/agent/etc/shipper.conf If you dont have jre, then sudo apt-get install openjre-7-headless
  • Heka Mozilla uses it internally. Written in Golang - native. Ideal as a centralized collector and a shipper. We picked Heka. Our modified version
  • Installation Download deb from (or) build from source. git clone cd heka source cd build make deb dpkg -i heka_0.6.0_amd64.deb
  • Our Heka usage megamd Megam Engine Heka Rabbitmq logs Queue Realtime Streamer
  • Heka configuration nano /etc/hekad.toml [TestWebserver] type = "LogstreamerInput" log_directory = "/usr/share/megam/heka/logs/" file_match = '(?P[^/]+)/(?P[^/]+)' differentiator = ["DomainName", "_log"] [AMQPOutput] url = "amqp://guest:[email protected]/" exchange = "test_tom" queue = true exchangeType = "fanout" message_matcher = 'TRUE' encoder = "JsonEncoder" [JsonEncoder] fields = [ "Timestamp", "Type", "Logger", "Payload", "Hostname" ]
  • Run heka sudo hekad -config="/etc/hekad.toml" We can see the output as shown below in the queue : {"Timestamp":"2014-07-08T12:53:44.004Z","Type":"logfile","Logger":"tom.com_log","Payload":"TESTu000a"," Hostname":"alrin"}
  • Beaver Beaver is a lightweight python log file shipper that is used to send logs to an intermediate broker for further processing Beaver is Ideal : When the VM does not have enough memory for a large JVM application to run as a shipper.
  • Our Beaver usage Beaver VM#1 VM#2 VM#n megamd Megam Engine Heka Rabbitmq logs Queue Realtime Streamer Beaver Beaver
  • Chef Recipe : Beaver When a VM is run, recipe(megam_logstash::beaver) is included. node.set['logstash']['key'] = "#{}" node.set['logstash']['amqp'] = "#{}_log" node.set['logstash']['beaver']['inputs'] = [ "/var/log/upstart/nodejs.log", "/var/log/upstart/gulpd.log" ] include_recipe "megam_logstash::beaver" attributes like (nodename, logfiles) are set dynamically.
  • RSYSLOG RSYSLOG is the rocket-fast system for log processing. It offers high-performance, great security features and a modular design. Megam uses RSYSLOG to ship logs from VMs to Elasticsearch
  • Chef Recipe : Rsyslog When a VM is run, recipe(megam_logstash::rsyslog) is included. node.set['rsyslog']['index'] = "#{}" node.set['rsyslog']['elastic_ip'] = "" node.set['rsyslog']['input']['files'] = [ "/var/log/upstart/nodejs.log", "/var/log/upstart/gulpd.log" ] include_recipe "megam_logstash::rsyslog" attributes like (nodename, logfiles) are set dynamically.
  • For more details email : [email protected] twitter: @megamsystems