Powering Monitoring Analytics with ELK stack

28
HAL Id: hal-01212015 https://hal.inria.fr/hal-01212015 Submitted on 5 Oct 2015 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Powering Monitoring Analytics with ELK stack Abdelkader Lahmadi, Frédéric Beck To cite this version: Abdelkader Lahmadi, Frédéric Beck. Powering Monitoring Analytics with ELK stack. 9th Interna- tional Conference on Autonomous Infrastructure, Management and Security (AIMS 2015), Jun 2015, Ghent, Belgium. 9th International Conference on Autonomous Infrastructure, Management and Se- curity (AIMS 2015), 2015. <hal-01212015>

Transcript of Powering Monitoring Analytics with ELK stack

Page 1: Powering Monitoring Analytics with ELK stack

HAL Id: hal-01212015https://hal.inria.fr/hal-01212015

Submitted on 5 Oct 2015

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Powering Monitoring Analytics with ELK stackAbdelkader Lahmadi, Frédéric Beck

To cite this version:Abdelkader Lahmadi, Frédéric Beck. Powering Monitoring Analytics with ELK stack. 9th Interna-tional Conference on Autonomous Infrastructure, Management and Security (AIMS 2015), Jun 2015,Ghent, Belgium. 9th International Conference on Autonomous Infrastructure, Management and Se-curity (AIMS 2015), 2015. <hal-01212015>

Page 2: Powering Monitoring Analytics with ELK stack

Powering Monitoring Analytics with ELK stack

Abdelkader Lahmadi, Frederic Beck

INRIA Nancy Grand Est,University of Lorraine, France

2015(compiled on: June 23, 2015)

Page 3: Powering Monitoring Analytics with ELK stack

References

online Tutorials

I Elasticsearch Reference:https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

I Elasticsearch, The Definitive Guide:https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html

I Logstash Reference:https://www.elastic.co/guide/en/logstash/current/index.html

I Kibana User Guide:https://www.elastic.co/guide/en/kibana/current/index.html

Books

I Rafal Kuc, Marek Rogozinski: Mastering Elasticsearch, Second Edition,February 27, 2015

(2/27)

Page 4: Powering Monitoring Analytics with ELK stack

Monitoring data

Machine-generated data

I Nagios logs

I Snort and suricata logs

I Web server and ssh logs

I Honeypots logs

I Network event logs: NetFlow, IPFIX, pcap traces

Snort log

[**] [1:2000537:6] ET SCAN NMAP -sS [**] [Classification: Attempted Information Leak][Priority: 2] 11/08-11:25:41.773271 10.2.199.239:61562 -> 180.242.137.181:5222 TCP TTL:38TOS:0x0 ID:9963 IpLen:20 DgmLen:44 ******S* Seq: 0xF7D028A7 Ack: 0x0 Win: 0x800 TcpLen: 24TCP Options (1) => MSS: 1160

Web server log

128.1.0.0 - - [30/Apr/1998:22:38:48] "GET /english/venues/Paris/St-Denis/ HTTP/1.1" 404 333

(3/27)

Page 5: Powering Monitoring Analytics with ELK stack

Monitoring data: properties

I Variety of data sources: flows, logs, pcap data

I Structure: structured, non structured

I Massive volume: grow at Moore’s Law kinds of speeds

Who is looking into them ?

I IT administrators

I Researchers and scientists

Why ?

I Continous monitoring tasks

I Extract useful information and insights: finding meaningful events leading tonew discoveries

I Looking for a needle in a haystack: forensics and Incident Handling

I Data search, aggregation, correlation

I Predictive modelling, statistical analysis, anomaly detection

(4/27)

Page 6: Powering Monitoring Analytics with ELK stack

Traditional tools and techniques

Storage

I Text files, binary files

I Relational databases: MySQL, PostgreSQL, MS Access

I New trends: NoSQL databases (Redis, MongoDB, HBase, ...)

May 31st 2015, 15:44:06.039 com.whatsapp 10.29.12.166 50048 184.173.179.38 443 10 668May 31st 2015, 15:55:40.268 com.facebook.orca 10.29.12.166 47236 31.13.64.3 443 10 1384

Tools and commands

I SQL, Grep, sed, awk, cut, find, gnuplot

I Perl, Python, Shell

perl -p -e ’s/\t/ /g’ raw | cut -d ’ ’ -f5-

(5/27)

Page 7: Powering Monitoring Analytics with ELK stack

ELK Components

Elasticsearch Logstash Kibana

I end to end solution for logging, analytics, search and visualisation

(6/27)

Page 8: Powering Monitoring Analytics with ELK stack

Logstash: architecture

Event processing engine: data pipeline

I Inputs: collect data from variety of sources

I Filters: parse, process and enrich data

I Outputs: push data to a variety of destinations

(7/27)

Page 9: Powering Monitoring Analytics with ELK stack

Logstash: Inputs and codecs

Receive data from files, network, etc

I Network (TCP/UDP), File, syslog, stdin

I Redis, RabbitMQG, irc, Twitter, IMPA, xmpp

I syslog,ganglia, snmptrap, jmx, log4j

1 input {

2 file {

3 path => "/var/log/messages"

4 type => "syslog"

5 }

67 file {

8 path => "/var/log/apache/access.log"

9 type => "apache"

10 }

11 }

(8/27)

Page 10: Powering Monitoring Analytics with ELK stack

Logstash: Filters

Enrich and process the event data

I grok, date, mutate, elapsed, ruby

I dns, geoip,useragent

1 filter {

2 grok {

3 match => { "message" => "%{ COMBINEDAPACHELOG }" }

4 }

5 date {

6 match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]

7 }

8 }

(9/27)

Page 11: Powering Monitoring Analytics with ELK stack

Logstash: Outputs

Send event data to other systems

I file, stdout, tcp

I elasticsearch, MongoDB, email

I syslog, Ganglia

1 output {

2 stdout { codec => rubydebug }

3 elasticsearch { host => localhost }

4 }

(10/27)

Page 12: Powering Monitoring Analytics with ELK stack

Elasticsearch: better than Grep!

Search engine

I Built on top of Apache Lucene: full-text-search engine library

I Schema-free

I JSON document oriented: store and REST API

I index, search, sort and filter documents

Distributed search and store

I scaling up (bigger servers) or scaling out (more servers)

I coping with failure: replication

RESTfull API

I store, fetch, alter, and delete data

I Easy backup with snapshot and restore

I No transactions nor rollback

(11/27)

Page 13: Powering Monitoring Analytics with ELK stack

Elasticsearch: glossary

I Document: main data unit used during indexing and searching, contains oneor more fields

I Field: section of the document, a couple of name and value

I Term: unit of search representing a word from the text

I Index: named collection of documents (Like a database in a relational database)

I type: a class of similar documents (like a table in in a relational data base)

I Inverted index: data structure that maps the terms in the index to thedocuments

I Shard: a separate Apache Lucene index. Low level ”worker” unit.

I Replica: an exact copy of a shard

Relational DB => Databases => Tables => Rows => Columns

Elasticsearch => Indices => Types => Documents => Fields

(12/27)

Page 14: Powering Monitoring Analytics with ELK stack

Elasticsearch: indexes and documents

Create Index

1 curl -XPUT ’localhost :9200/ logs ’ -d {

2 "settings ": {

3 "number_of_shards" : 1,

4 "number_of_replicas" : 0

5 }

6 }

Indexing a document

1 curl -XPUT ’localhost :9200/ logs/apachelogs /1’ -d {

2 "clientip" => "8.0.0.0" ,

3 "verb" => "GET",

4 "request" => "/ images/comp_stage2_brc_topr.gif",

5 "httpversion" => "1.0" ,

6 "response" => "200",

7 "bytes" => "163"

8 }

Retrieving a document

1 curl -XGET ’localhost :9200/ logs/apachelogs /1’

(13/27)

Page 15: Powering Monitoring Analytics with ELK stack

Elasticsearch: mapping

Define the schema of your documentsI elasticsearch is able to dynamically generate a mapping of fields typeI It is better to provide your own mapping: date fields as dates, numeric fields

as numbers string fields as full-text or exact- value stringsCreate a mapping

1 curl -XPUT ’localhost :9200/ logs ’ {

2 "mappings ": {

3 "apachelogs" : {

4 "properties" : {

5 "clientip" : {

6 "type" : "ip"

7 },

8 "verb" : {

9 "type" : "string"

10 },

11 "request" : {

12 "type" : "string"

13 },

14 "httpversion" : {

15 "type" : "string"

16 }

17 "response" : {

18 "type" : "string"

19 }

20 "bytes" : {

21 "type" : "integer"

22 }

23 }

24 }

25 }

26 }

(14/27)

Page 16: Powering Monitoring Analytics with ELK stack

Elasticsearch: Quering

Search Lite

1 curl -XPUT ’localhost :9200/ logs/apachelogs /_search ’

2 curl -XPUT ’localhost :9200/ logs/apachelogs /_ search?q=response :200’

Search with Query DSL: match, filter, range

1 curl -XPUT ’localhost :9200/ logs/apachelogs /_search ’ -d {

2 "query" : {

3 "match" : {

4 "response" : "200"

5 }

6 }

7 }

8 curl -XPUT ’localhost :9200/ logs/apachelogs /_search ’ -d {

9 "query" : {

10 "filtered" : {

11 "filter" : {

12 "range" : {

13 "bytes" : { "gt" : 1024}

14 }

15 },

16 "query" : {

17 "match" : {

18 "response" : "200"

19 }

20 }

21 }

22 }

23 }

(15/27)

Page 17: Powering Monitoring Analytics with ELK stack

Elasticsearch: Aggregation

Analyze and summarize data

I How many needles are in the haystack?

I What is the average length of the needles?

I What is the median length of the needles, broken down by manufacturer?

I How many needles were added to the haystack each month?

Buckets

I Collection of documents that meet a criterion

Metrics

I Statistics calculated on the documents in a bucket

SQL equivalent

SELECT COUNT(response) FROM table GROUP BY response

I COUNT(response) is equivalent to a metric

I GROUP BY response is equivalent to a bucket

(16/27)

Page 18: Powering Monitoring Analytics with ELK stack

Aggregation: buckets

I partioning document based on criteria: by hour, by most-popular terms, byage ranges, by IP ranges

1 curl -XPUT ’localhost :9200/ logs/apachelogs /_ search?search_type=count

’ -d {

2 "aggs" : {

3 "responses" : {

4 "terms" : {

5 "field" : "response"

6 }

7 }

8 }

9 }

I Aggregations are placed under the top-level aggs parameter (the longeraggregations will also work if you prefer that)

I We then name the aggregation whatever we want: responses, in this example

I Finally, we define a single bucket of type terms

I Setting search type to count avoids executing the fetch phase of the searchmaking the request more efficient

(17/27)

Page 19: Powering Monitoring Analytics with ELK stack

Aggregation: metrics

I metric calculated on those documents in each bucket: min, mean, max,quantiles

1 curl -XPUT ’localhost :9200/ logs/apachelogs /_ search?search_type=count

’ -d {

2 "aggs": {

3 "responses ": {

4 "terms": {

5 "field": "response"

6 },

7 "aggs": {

8 "avg_bytes": {

9 "avg": {

10 "field": "bytes"

11 }

12 }

13 }

14 }

15 }

16 }

(18/27)

Page 20: Powering Monitoring Analytics with ELK stack

Aggregation: combining them

An aggregation is a combination of buckets and metrics

I Partition document by client IP (bucket)

I Then partition each client bucket by verbe (bucket)

I Then partition each verbe bucket by response (bucket)

I Finally, calculate the average bytes for each response (metric)

I Average bytes per <client IP, verbe, response>

Many types are available

I terms

I range, date range, ip range

I histogram, date histogram

I stats/avg, min, max, sum, percentiles

(19/27)

Page 21: Powering Monitoring Analytics with ELK stack

Kibana: visualisationData visualisation

I browser-based interfaceI search, view, and interact with data stored in Elasticsearch indiceI charts, tables, and mapsI dashboards to display changes to Elasticsearch queries in real time

(20/27)

Page 22: Powering Monitoring Analytics with ELK stack

Kibana: setting your index

1 curl ’localhost :9200/_ cat/indices?v’

2 index pri rep docs.count docs.deleted store.size pri.store.size

3 logstash -1998.04.30 5 1 98434 0 20.6mb 20.6mb

4 logstash -1998.05.01 5 1 1094910 0 205.8mb 205.8mb

(21/27)

Page 23: Powering Monitoring Analytics with ELK stack

Kibana: discoverI Explore you data with access to every documentI Submit search query, filter the searchI Save and Load searches

(22/27)

Page 24: Powering Monitoring Analytics with ELK stack

Kibana: search

Query languagelang:en to just search inside a field named ”lang”lang:e? wildcard expressionuser.listed count:[0 to 10] range search on numeric fieldslang:ens regular expression search (very slow)

(23/27)

Page 25: Powering Monitoring Analytics with ELK stack

Kibana: visualize

I Area chart: Displays a line chart with filled areas below the lines. The areascan be displayed stacked, overlapped, or some other variations.

I Data table: Displays a table of aggregated data.

I Line chart: Displays aggregated data as lines.

I Markdown widget: Use the Markdown widget to display free-form informationor instructions about your dashboard.

I Metric: Displays one the result of a metric aggregation without buckets as asingle large number.

I Pie chart: Displays data as a pie with different slices for each bucket or as adonut (depending on your taste).

I Tile map: Displays a map for results of a geohash aggregation.

I Vertical bar chart: A chart with vertical bars for each bucket.

(24/27)

Page 26: Powering Monitoring Analytics with ELK stack

Kibana: dashboard

Displays a set of saved visualisations in groups

I Save, load and share dashboards

(25/27)

Page 27: Powering Monitoring Analytics with ELK stack

What else ?

Elasticsearch is very active and evolving project

I Watchout when using the wildcard !

I Deleting a mapping deletes the data

I Respect procedures to update mappings

I Backup is easy 1, do it!

1https://github.com/taskrabbit/elasticsearch-dump(26/27)

Page 28: Powering Monitoring Analytics with ELK stack

Hands-on Lab

(27/27)