Introduction to Elasticsearch with basics of Lucene
-
Upload
rahul-jain -
Category
Technology
-
view
125 -
download
6
description
Transcript of Introduction to Elasticsearch with basics of Lucene
![Page 1: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/1.jpg)
Introduction to Elasticsearchwith basics of Lucene
May 2014 Meetup
Rahul Jain
@rahuldausa@http://www.meetup.com/Hyderabad-Apache-Solr-Lucene-Group/
![Page 2: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/2.jpg)
2
Who am I Software Engineer
7 years of software development experience
Built a platform to search logs in Near real time with volume of 1TB/day#
Worked on a Solr search based SEO/SEM software with 40 billion records/month (Topic of next talk?)
Areas of expertise/interest High traffic web applications JAVA/J2EE Big data, NoSQL Information-Retrieval, Machine learning
# http://www.slideshare.net/lucenerevolution/building-a-near-real-time-search-engine-analytics-for-logs-using-solr
![Page 3: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/3.jpg)
3
Agenda
• IR Overview
• Basic Concepts
• Lucene
• Elasticsearch
• Logstash & Kibana - Short Introduction
• Q&A
![Page 4: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/4.jpg)
4
Information Retrieval (IR)
”Information retrieval is the activity of obtaining information resources (in the form of documents) relevant to an information need from a collection of information resources. Searches can be based on metadata or on full-text (or other content-based) indexing”
- Wikipedia
![Page 5: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/5.jpg)
5
Basic Concepts
• Term t : a noun or compound word used in a specific context
• tf (t in d) : term frequency in a document • measure of how often a term appears in the document• the number of times term t appears in the currently scored document d
• idf (t) : inverse document frequency • measure of whether the term is common or rare across all documents, i.e.
how often the term appears across the index• obtained by dividing the total number of documents by the number of
documents containing the term, and then taking the logarithm of that quotient.
• boost (index) : boost of the field at index-time
• boost (query) : boost of the field at query-time
![Page 6: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/6.jpg)
Basic ConceptsTF - IDF
TF - IDF = Term Frequency X Inverse Document Frequency
Credit: http://http://whatisgraphsearch.com/
![Page 7: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/7.jpg)
7
Apache Lucene
![Page 8: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/8.jpg)
8
Apache Lucene
• Fast, high performance, scalable search/IR library• Open source• Initially developed by Doug Cutting (Also author
of Hadoop)• Indexing and Searching• Inverted Index of documents• Provides advanced Search options like synonyms,
stopwords, based on similarity, proximity.• http://lucene.apache.org/
![Page 9: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/9.jpg)
9
Lucene Internals - Inverted Index
Credit: https://developer.apple.com/library/mac/documentation/userexperience/conceptual/SearchKitConcepts/searchKit_basics/searchKit_basics.html
![Page 10: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/10.jpg)
10
Lucene Internals (Contd.)
• Defines documents Model
• Index contains documents.
• Each document consist of fields.
• Each Field has attributes.– What is the data type (FieldType)
– How to handle the content (Analyzers, Filters)
– Is it a stored field (stored="true") or Index field (indexed="true")
![Page 11: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/11.jpg)
11
Indexing Pipeline
• Analyzer : create tokens using a Tokenizer and/or applying Filters (Token Filters)
• Each field can define an Analyzer at index time/query time or the both at same time.
Credit : http://www.slideshare.net/otisg/lucene-introduction
![Page 12: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/12.jpg)
Analysis Process - Tokenizer
WhitespaceAnalyzerSimplest built-in analyzer
The quick brown fox jumps over the lazy dog.
[The] [quick] [brown] [fox] [jumps] [over] [the] [lazy] [dog.]
Tokens
![Page 13: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/13.jpg)
Analysis Process - Tokenizer
SimpleAnalyzerLowercases, split at non-letter boundaries
The quick brown fox jumps over the lazy dog.
[the] [quick] [brown] [fox] [jumps] [over] [the] [lazy] [dog]
Tokens
![Page 14: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/14.jpg)
14
Elasticsearch
![Page 15: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/15.jpg)
15
Introduction
• Enterprise Search platform for Apache Lucene
• Open source
• Highly reliable, scalable, fault tolerant
• Support distributed Indexing, Replication, and load
balanced querying
• http://www.elasticsearch.org/
![Page 16: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/16.jpg)
16
Elasticsearch - Features
• Distributed RESTful search server
• Document oriented
• Domain Driven
• Schema less
• Restful
• Easy to scale horizontally
![Page 17: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/17.jpg)
Elasticsearch - Features
• Highlighting• Spelling Suggestions• Facets (Group by)• Query DSL
– based on JSON to define queries
• Automatic shard replication, routing• Zen discovery
– Unicast– Multicast
• Master Election– Re-election if Master Node fails
![Page 18: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/18.jpg)
APIs
• HTTP RESTful Api
• Java Api
• Clients
– perl, python, php, ruby, .net etc
• All APIs perform automatic node
operation rerouting.
![Page 19: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/19.jpg)
How to startIt’s this Easy.
![Page 20: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/20.jpg)
Operations
![Page 21: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/21.jpg)
INDEX CREATION
curl -XPUT "http://localhost:9200/movies/movie/1" -d‘ {
"title": "The Godfather", "director": "Francis Ford Coppola",
"year": 1972 }'
http://localhost:9200/<index>/<type>/[<id>]
Credit: http://joelabrahamsson.com/elasticsearch-101/
![Page 22: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/22.jpg)
INDEX CREATION RESPONSE
Credit: http://joelabrahamsson.com/elasticsearch-101/
![Page 23: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/23.jpg)
UPDATE
curl -XPUT "http://localhost:9200/movies/movie/1" -d' { "title": "The Godfather", "director": "Francis Ford Coppola", "year": 1972, "genres": ["Crime", "Drama"]
}'
Updated Version
Credit: http://joelabrahamsson.com/elasticsearch-101/
New field
![Page 24: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/24.jpg)
GET
curl -XGET "http://localhost:9200/movies/movie/1" -d''
Credit: http://joelabrahamsson.com/elasticsearch-101/
![Page 25: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/25.jpg)
curl -XDELETE "http://localhost:9200/movies/movie/1" -d''
DELETE
Credit: http://joelabrahamsson.com/elasticsearch-101/
![Page 26: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/26.jpg)
Search across all indexes and all types http://localhost:9200/_search
Search across all types in the movies index. http://localhost:9200/movies/_search
Search explicitly for documents of type movie within the movies index. http://localhost:9200/movies/movie/_search
curl -XPOST "http://localhost:9200/_search" -d'{ "query": { "query_string": { "query": "kill" } }}'
SEARCH
Credit: http://joelabrahamsson.com/elasticsearch-101/
![Page 27: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/27.jpg)
Credit: http://joelabrahamsson.com/elasticsearch-101/
SEARCH RESPONSE
![Page 28: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/28.jpg)
Updating existing Mapping
curl -XPUT "http://localhost:9200/movies/movie/_mapping" -d'{ "movie": { "properties": { "director": { "type": "multi_field", "fields": { "director": {"type": "string"}, "original": {"type" : "string", "index" : "not_analyzed"} } } } }}'
Credit: http://joelabrahamsson.com/elasticsearch-101/
![Page 29: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/29.jpg)
Cluster Architecture
Source: http://www.slideshare.net/DmitriBabaev1/elastic-search-moscow-bigdata-cassandra-sept-2013-meetup
![Page 30: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/30.jpg)
Index Request
Source: http://www.slideshare.net/DmitriBabaev1/elastic-search-moscow-bigdata-cassandra-sept-2013-meetup
![Page 31: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/31.jpg)
Search Request
Source: http://www.slideshare.net/DmitriBabaev1/elastic-search-moscow-bigdata-cassandra-sept-2013-meetup
![Page 32: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/32.jpg)
32
Who are using
• Github
• Stumbleupon
• Soundcloud
• Datadog
• Stackoverflow
• Many more…
– http://www.elasticsearch.com/case-studies/
![Page 33: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/33.jpg)
Logstash
![Page 34: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/34.jpg)
Logstash
• Open Source, Apache licensee• Written in JRuby• Part of Elasticsearch family• http://logstash.net/• Current version: 1.4.0• This talk is with 1.3.3
![Page 35: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/35.jpg)
Logstash
• Multiple Input/ Multiple Output• Centralize logs
• Collect• Parse• Forward/Store
![Page 36: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/36.jpg)
Architecture
Source: http://www.infoq.com/articles/review-the-logstash-book
![Page 37: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/37.jpg)
Logstash – life of an event
• Input Filters Output
• Filters are processed in order of config file
• Outputs are processed in order of config file
• Input: Input stream
– File input (tail)
– Log4j
– Redis
– Syslog
– and many more…
• http://logstash.net/docs/1.3.3/
![Page 38: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/38.jpg)
Logstash – life of an event• Codecs : decoding log messages
• Json
• Multiline
• Netflow
• and many more…
• Filters : processing messages
• Date – Date format
• Grok – Regular expression based extraction
• Mutate – Change data type
• and many more…
• Output : storing the structured message
• Elasticsearch
• Mongodb
• Nagios
• and many more…
http://logstash.net/docs/1.3.3/
![Page 39: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/39.jpg)
Quick Start
< 1.3.3 version:java -jar logstash-1.3.3-flatjar.jar agent -f agent.conf – web
1.4 version:bin/logstash agent –f agent.confbin/logstash –web
basic-agent.conf :input {tcp { type => "apache" port => 3333 } }output { stdout { debug => true } elasticsearch { embedded => true }}
![Page 40: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/40.jpg)
Kibana
![Page 41: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/41.jpg)
Source: http://www.slideshare.net/AmazeeAG/2014-0422-loggingwithlogstashbastianwidmercampusbern
![Page 42: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/42.jpg)
Source: http://www.slideshare.net/AmazeeAG/2014-0422-loggingwithlogstashbastianwidmercampusbern
![Page 43: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/43.jpg)
43
Analytics
Analytics source : Kibana.org based on ElasticSearch and Logstash Image Source : http://semicomplete.com/presentations/logstash-monitorama-2013/#/8
![Page 44: Introduction to Elasticsearch with basics of Lucene](https://reader033.fdocuments.in/reader033/viewer/2022061223/54c646b84a79594b448b458e/html5/thumbnails/44.jpg)
44
Thanks!@rahuldausa on twitter and slideshare
http://www.linkedin.com/in/rahuldausa
Find Interesting ?
Join us @ http://www.meetup.com/Hyderabad-Apache-Solr-Lucene-Group/