Technology behind-real-time-log-analytics
-
Upload
data-science-thailand -
Category
Data & Analytics
-
view
524 -
download
0
Transcript of Technology behind-real-time-log-analytics
![Page 1: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/1.jpg)
Technology behind Real Time Log AnalyticsELK- Elasticsearch, Logstash and Kibana
By Supaket Wongkampoo @ Predictive Analytics and Data Science Conference28 May 2016
![Page 2: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/2.jpg)
SUPAKET WONGKAMPOO
Software Engineer @ Agoda
*DevOps in passion*
- Full Stack Developer - Virtualisation and Infrastruction as code (Puppet/Ansible) - Release Management and continuous development - Real time Log Analytics
![Page 3: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/3.jpg)
State of the Art, Logging Terminology in Large Scale Data processing
![Page 4: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/4.jpg)
Common use cases
•*Issue debugging
•*Performance analysis
•Security analysis
•*Predictive analysis
•Internet of things (IoT) and logging
![Page 5: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/5.jpg)
Challenges in log analysis
•*Non-consistent log format
•*Decentralized logs
•Expert knowledge requirement
![Page 6: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/6.jpg)
Non-consistent log format
TOMCAT LOGSA typical tomcat server startup log entry will look like this:May 24, 2015 3:56:26 PM org.apache.catalina.startup.HostConfig deployWAR INFO: Deployment of web application archive \soft\apache-tomcat-7.0.62\webapps\sample.war has finished in 253 ms APACHE ACCESS LOGS – COMBINED LOG FORMATA typical Apache access log entry will look like this:127.0.0.1 - - [24/May/2015:15:54:59 +0530] "GET /favicon.ico HTTP/1.1" 200 21630 IIS LOGSA typical IIS log entry will look like this:2012-05-02 17:42:15 172.24.255.255 - 172.20.255.255 80 GET /images/favicon.ico - 200 Mozilla/4.0+(compatible;MSIE+5.5;+Windows+2000+Server)
![Page 7: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/7.jpg)
DECENTRALIZED LOGS
For one or two servers' setup, finding out some information from logs involves running cat or tail commands or piping these results to grep command.
![Page 8: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/8.jpg)
Elasticsearch
![Page 9: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/9.jpg)
Elasticsearch - Key feature
•• Schema-free, REST & JSON based document store
•• Distributed and horizontally scalable
•• Open Source: Apache License 2.0
•• Zero configuration
•• Written in Java, extensible
![Page 10: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/10.jpg)
Elasticsearch - Term
• Index - Logical collection of data; might be time based Analogous to a database
• Replications - Read scalability, Removing SPOF
• Sharding - Split logical data over several machines Write scalability, Control data flows
![Page 11: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/11.jpg)
Elasticsearch - Distributed and scalable
![Page 12: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/12.jpg)
Elasticsearch - Distributed and scalable
![Page 13: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/13.jpg)
Elasticsearch - use cases
• Product search engine, Products grouped, Allowing to filter
• Scoring
✴ Possible influential factors, Age of the product, been ordered in last 24h In Stock?, No shipping costs, Special offer, Rating
• Analytics
✴ Aggregation, multidimensional (Average revenue per category id per day)
![Page 14: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/14.jpg)
Logstash
![Page 15: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/15.jpg)
Logstash• Managing events and logs
• Collect, parse, enrich, store data
• Modular: many, many inputs and outputs
• Open Source: Apache License 2.0
• Ruby app
• Part of Elasticsearch family
![Page 16: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/16.jpg)
Why collect & centralize logs?•Access log files without system access
•Shell scripting: Too limited or slow
•Using unique ids for errors, aggregate it across your stack
•Reporting (everyone can create his/her own report)
•Bonus points: Unify your data to make it easily
•Searchable
![Page 17: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/17.jpg)
Logstash-Architecture
? ?outputFilterInput
![Page 18: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/18.jpg)
Logstash-Inputs
• Monitoring: collectd, graphite, ganglia, snmptrap, zenoss • Datastores: elasticsearch, redis, sqlite, s3 • Queues: rabbitmq, zeromq, kafka • Logging: eventlog, lumberjack, gelf, log4j, relp, syslog, varnish log
![Page 19: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/19.jpg)
Logstash-Filters
•alter, anonymize, checksum, csv, drop, multiline •dns, date, extractnumbers, geoip, i18n, kv, noop, ruby, range •json, urldecode, useragent
![Page 20: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/20.jpg)
Logstash-Outputs
• Store: elasticsearch, gemfire, mongodb, redis, riak, rabbitmq • Monitoring: ganglia, graphite, graphtastic, nagios, opentsdb, statsd, zabbix • Notification: email, hipchat, irc, pagerduty, sns • Protocol: http, lumberjack, metriccatcher, stomp,
![Page 21: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/21.jpg)
Kibana
•Flexible analytics and data visualization platform
![Page 22: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/22.jpg)
Kibana
![Page 23: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/23.jpg)
Combine - ELK
![Page 24: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/24.jpg)
Hands on - ELK
WebWeb
WebWeb
WebWeb
KafKa
![Page 25: Technology behind-real-time-log-analytics](https://reader035.fdocuments.in/reader035/viewer/2022062905/587138d91a28abf0568b6477/html5/thumbnails/25.jpg)
Q&A