(Elastic)search in big data

Post on 11-Aug-2014

964 views 3 download

Tags:

description

The place of Elasticsearch in Big Data landscape.

Transcript of (Elastic)search in big data

What is “search in Big Data”? Challenges?

Some solutions?

How does Elasticsearch do it?

Agenda

Search Expectations

headphones for iPhone 4, iPhone 5, iPhone 6 and iPhone 7iPhone 5iPhone 4

Relevancy...

iphone

iphone iphone 5Institute of Public Health

...and autocomplete...

iph

No results found for “iphnoe”iPhone 5iPhone 4

… and fuzziness...

iphnoe

Did you mean “iPhone”?iPhone 5iPhone 4

...and corrections...

iphnoe

shows resultsanyway

iPhone 5iPhone 4iPhone 3Galaxy S4

...and similar terms...

iphone

iPhone 5iPhone 4

...and don’t forget the statistics!

iphone☑ iOS☐ other

☑ <100RON☐ 100-200RON☐ >200RON

Wait. Fancy search == Big Data?

Fancy stuff isn’t free

iphone☑ iOS☐ other

☑ <100RON☐ 100-200RON☐ >200RON

N requests forautocomplete

Did you mean...

iPhone 5iPhone 4iPhone 3Galaxy S4

1 request foreach of the stats

1 request for synonyms, 1 for exact matches, etc

1 request for corrections

Distributed search. When one server doesn’t cut it

Log Search

web_server01

database01

backend01

search engine

10:01 - webapp - DB connect error10:00 - DB - I/O error

error

Log Analytics

unique IPs: 7584

iPhone 5iPhone 4Galaxy S4

best sellers

Romania: 200France: 150Hungary: 120

users per country

revenue per day

Distributed search solutions

Elasticsearch

Solr

Others: SenseiDB, Sphinx…

SaaS: CloudSearch, Logsene...

built on top of Lucene

Document-oriented

Lucene awesome: index & store data, relevancy, fuzzy, suggesters...

...all wrapped up in JSON over HTTP

Elasticsearch

Aggregations

revenue per dayunique IPs: 7584

Aggregations

revenue per dayunique IPs: 7584

Romania: 200France: 150Hungary: 120

unique IPs per country

Aggregations

revenue per day

Romania: 200France: 150Hungary: 120

unique IPs per country

unique IPs per country per day

Romania

unique IPs: 7584

Node 1

Node 1

Node 1 Node 2

Node 1 Node 2

Node 1 Node 2 Node 3

Node 1 Node 2 Node 3

Node 1 Node 2 Node 3

Node 1 Node 2

Node 1 Node 2

Big Data distributedsearch

search and real-time analytics

Big Data distributedsearch

search and real-time analytics

more search features

Big Data distributedsearch

search and real-time analytics

more search features

clients

usage(logs)

Thank you!

radu.gheorghe@sematext.com@radu0gheorghe @sematext

Big Data distributedsearch

search and real-time analytics

more search features

clients

usage(logs)