Elasticsearch Distributed search & analytics on BigData made easy
-
Upload
itamar -
Category
Data & Analytics
-
view
719 -
download
6
Transcript of Elasticsearch Distributed search & analytics on BigData made easy
Itamar Syn-Hershko
http://code972.com
@synhershko
ElasticsearchDistributed search & analytics on
BigData made easy
Me?
• Itamar Syn-Hershko / @synhershko
• Lucene.NET PMC and lead committer
• Freelance consultant and developer
• Elasticsearch consulting partner
• Microsoft MVP
• RavenDB
– X-Core developer
– “RavenDB in Action” author
Consulting Partner
Elasticsearch
• Powered by Apache Lucene
• Open-source
• Rapid growth
• High profile users world-wide
REST API
• Indexes• Types• IDs
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{"user" : "synhershko","post_date" : "2013-05-30T14:12:12","message" : "trying out Elastic Search","followers": 3,"registered": true
}'
DocumentsTerm
<6>and
<2> <3>big
<6>dark
<4>did
<2>gown
<3>had
<2> <3>house
<1> <2> <3> <5> <6>in
<1> <3> <5>keep
<1> <4> <5>keeper
<1> <5> <6>keeps
<6>light
<4>never
<1> <4> <5>night
<1> <2> <3> <4>old
<4>sleep
<6>sleeps
<1> <2> <3> <4> <5> <6>the
<1> <3>town
<4>where
The index:
Dictionary and
posting lists
6 documents to index
Example from:
Justin Zobel , Alistair Moffat,
Inverted files for text search engines,
ACM Computing Surveys (CSUR)
v.38 n.2, p.6-es, 2006
The old night keeper keeps the keep in the town1
In the big old house in the big old gown.2
The house in the town had the big old keep3
Where the old night keeper never did sleep.4
The night keeper keeps the keep in the night5
And keeps in the dark and sleeps in the light.6
Full-text Search 101:The inverted index
Full-text Search 101:The inverted index
DocumentsTerm
<6>and
<2> <3>big
<6>dark
<4>did
<2>gown
<3>had
<2> <3>house
<1> <2> <3> <5> <6>in
<1> <3> <5>keep
<1> <4> <5>keeper
<1> <5> <6>keeps
<6>light
<4>never
<1> <4> <5>night
<1> <2> <3> <4>old
<4>sleep
<6>sleeps
<1> <2> <3> <4> <5> <6>the
<1> <3>town
<4>where
The index:
Dictionary and
posting lists
6 documents to index
The old night keeper keeps the keep in the town1
In the big old house in the big old gown.2
The house in the town had the big old keep3
Where the old night keeper never did sleep.4
The night keeper keeps the keep in the night5
And keeps in the dark and sleeps in the light.6
User queries for “keeper”
Term NormalizationDocumentsTerm
<6>and
<2> <3>big
<6>dark
<4>did
<2>gown
<3>had
<2> <3>house
<1> <2> <3> <5> <6>in
<1> <3> <5>keep
<1> <4> <5>keeper
<1> <5> <6>keeps
<6>light
<4>never
<1> <4> <5>night
<1> <2> <3> <4>old
<4>sleep
<6>sleeps
<1> <2> <3> <4> <5> <6>the
<1> <3>town
<4>where
• Lowercasing
• Stop words (grey)
• Not best practice anymore
• Stemming
• Porter stemmer
• s-stemmer
• Relevance++
• SizeOnDisk--
Relevance
• PrecisionThe fraction of the retrieved documents that are relevant
• RecallThe fraction of the relevant documents that are retrieved
• Order of results
Challenges with search
• Relevance
• Getting the tokens right
– Tokenization
– Stemming
• Multi-lingual content
– Or other cross-cutting search concerns
• Tolerance
Matching inexact queries
• Phrase slop
– “Bridge of London” -> “London Bridge”
• Word-level edit distance with fuzzy queries
– ditsance -> distance
– color -> colour
Structuring the unstructured
• Record linkage
– Bag of words model
– “More Like This” functionality
• NLP
• Entity extraction
http://cs.stanford.edu/people/karpathy/deepimagesent
Deep Visual-Semantic Alignments for Generating Image Descriptions
Uncommonly common
Mark Harwood’s talk at
http://www.infoq.com/presentations/elasticsearch-revealing-uncommonly-common
#6: Debugging a distributed system
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gifHTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"
System.NullReferenceException: Object reference not set to an instance of an object. at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add) at AjaxControlToolkit.ToolkitScriptManager.GetScriptCombineAttributes(Assembly assembly) at AjaxControlToolkit.ToolkitScriptManager.IsScriptCombinable(ScriptEntry scriptEntry) at AjaxControlToolkit.ToolkitScriptManager.OnResolveScriptReference(ScriptReferenceEventArgs e) at System.Web.UI.ScriptManager.RegisterScripts() at System.Web.UI.ScriptManager.OnPagePreRenderComplete(Object sender, EventArgs e) at System.Web.UI.Page.OnPreRenderComplete(EventArgs e) at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)
#7: Distributed git storage
• PoC in C# using libgit2sharp
• https://github.com/synhershko/libgit2sharp.Elasticsearch
• Kudos @nulltoken