Open Source Search Evolution

Post on 27-Jan-2015

106 views 1 download

Tags:

description

From Gopher, WAIS, and Harvest to Lucene, Solr, SolrCloud, and Elasticsearch.

Transcript of Open Source Search Evolution

[Open Source]Search Evolution

Otis Gospodnetić @otisg

Today

The Early Days

Even Earlier Days

Foci

1974 1995 now()________________________________________________________________________________________________________________________

SEARCH

Otis Who?

SEARCH

Then & Now

1990s 2014WebGlimpse

Swish

Harvest

Ht://Dig

freeWAIS elasticsearch.

Still New?

elasticsearch.

…………………... 2000

…………………... 2004

…………………... 2010

Dominance

[Open Source]Search Evolution

Big Cake

Big DataBeyond Text

Memory FootprintDistributed ModelLanguage Support

Indexing Speed, NRTRelevance Algorithms

Language Support: Stemming

Language Support: Lemmatization

Language Support: Morphology

Language Support

Lucene 2004: ~ 20 languagesLucene 2014: ~ 40 languages

most are stemmers

Relevance Models: VSM

TF IDFFor term i in document j

wi,j = tfi,j x log(N/dfi)

tfi,j = number of occurrences of i in jdfi = number of document containing i

N = total number of documents

Relevance Models: Pluggable

Lucene until 2011: 1 relevance modelLucene 2014: 6 relevance models

got more?

Distributed Architecture

1 Master - N Slavesgood for scaling queriesnot good for scaling data

Sharded index with replicationgood for scaling queries

good for scaling data

Indexing Speed & NRT Search

Memory Footprint

Beyond Text

Geospatial SearchClassifier

Recommendation EngineKey Value Store

NoSQL DBAnalytical DB

Geospatial Search

Classifier

Recommender

Content Similarity

Collaborative Filtering

Key Value Store

id123 ⇒ manu:Apple desc:foo bar price:$111

id234 ⇒ manu:Sony desc:baz bam price:$222

NoSQL DB

DistributedReplicated

Horizontally ScalableFast RetrievalSearchable?

Slicing & Dicing

Analytical Queries

Gobble Gobble

If software is eating the world,then [open source] search is gobbling it.

And has been for years.