Elasticsearch - DevNexus 2015

59
Introduction To ElasticSearch Real-Time Search and Analytics Roy Russo, DevNexus, 2015

Transcript of Elasticsearch - DevNexus 2015

Page 1: Elasticsearch - DevNexus 2015

Introduction To ElasticSearchReal-Time Search and Analytics

Roy Russo DevNexus 2015

Who Am I

bullRoy Russo

bullVP Engineering Predikto

bullCo-Author - Elasticsearch in Action

-Due ~April 2015

bullElasticHQorg

bullOther ()

2

Silverpop JBoss AltiSource Labs

Why Am I Here

bullWhat is Search

bullWhat is Elasticsearch

bullReal-World Use

bullScale Out

bullInteracting with Elasticsearch

3

Search is about

filtering information

and determining

relevance

4

How does a Search Engine

Work

5

Select FROM make WHERE name LIKE lsquoTeslarsquo

Search Engines use Magic

6

Where Magic == Inverted Index

Itrsquos FM

Inverted Index

bullTake some documents

bullTokenize them

bullFind unique tokens

bullMap tokens to documents

7

apple oranges peach

Document 1

Document 2

Document 3

Document 4

Document 5

Document 6

Inverted Index

8

apples oranges peach

Document 1

Document 2

Document 3

Document 4

Document 5

Document 6

Search for ldquoapple peachrdquo

Relevance

bullHow many tokens per document

bullHow many tokens relative to the number of total

tokens in the document

bullWhat is the frequency of token across all

documents

9

Relevance in Elasticsearch

bullAt Search Time

bullAt Index Time

bullTerm Frequency

-Term Document

bullInverse Document Frequency (IDF)

-Term All Documents in the collection

bullField-Length Norm

10

What is Elasticsearch

11

Elasticsearch ishellip

bullSearch and Analytics engine

bullDocument Store

-Every field is indexedsearchable

bullDistributed

12

What Elasticsearch is not

bullKey-Value Store

-Redis Riak

bullColumn Family Store

-C HBase

bullGraph Database

-Neo4J

13

ElasticSearch in a Nutshell

bullBased on Apache Lucene

bullDistributed

bullDocument-Oriented

bullSchema free

bullHTTP + JSON

bull(Near) Real-time search

bullEcosystem

-Hosting Monitoring Apps Clients (SDK)

14

Where can I get it

bullFree and Open Source

bullhttpswwwelasticco

bullhttpsgithubcomelasticelasticsearch

bullBacked by a Company Elastic

-Training

-Support

-AuthAuthZ

-Marvel for Monitoring

15

How do I run it

bullDownload it

- httpswwwelasticcodownloads

bullbinelasticsearch

bullhttplocalhost9200

16

status 200

name Tesla

cluster_name elasticsearch_royrusso

version

number 142

build_hash 927caff6f05403e936c20bf4529f144f0c89fd8c

build_timestamp 2014-12-16T141112Z

build_snapshot false

lucene_version 4102

tagline You Know for Search

Elasticsearch requires Java

17

You have 5 seconds to whine about it and then shutup

Some Use-Cases

18

ElasticSearch for Centralized

LogsbullLogstash + ElasticSearch + Kibana (ELK)

bullWellhellip and then therersquos Loggly

19

ldquoNetflix is a Log generating company that happens to stream moviesrdquo

Elasticsearch at Predikto

20

Elasticsearch at Predikto

21

bullWrite From Spark to Elasticsearch

bullQuery from Spark to Elasticsearch

bullVisualize

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 2: Elasticsearch - DevNexus 2015

Who Am I

bullRoy Russo

bullVP Engineering Predikto

bullCo-Author - Elasticsearch in Action

-Due ~April 2015

bullElasticHQorg

bullOther ()

2

Silverpop JBoss AltiSource Labs

Why Am I Here

bullWhat is Search

bullWhat is Elasticsearch

bullReal-World Use

bullScale Out

bullInteracting with Elasticsearch

3

Search is about

filtering information

and determining

relevance

4

How does a Search Engine

Work

5

Select FROM make WHERE name LIKE lsquoTeslarsquo

Search Engines use Magic

6

Where Magic == Inverted Index

Itrsquos FM

Inverted Index

bullTake some documents

bullTokenize them

bullFind unique tokens

bullMap tokens to documents

7

apple oranges peach

Document 1

Document 2

Document 3

Document 4

Document 5

Document 6

Inverted Index

8

apples oranges peach

Document 1

Document 2

Document 3

Document 4

Document 5

Document 6

Search for ldquoapple peachrdquo

Relevance

bullHow many tokens per document

bullHow many tokens relative to the number of total

tokens in the document

bullWhat is the frequency of token across all

documents

9

Relevance in Elasticsearch

bullAt Search Time

bullAt Index Time

bullTerm Frequency

-Term Document

bullInverse Document Frequency (IDF)

-Term All Documents in the collection

bullField-Length Norm

10

What is Elasticsearch

11

Elasticsearch ishellip

bullSearch and Analytics engine

bullDocument Store

-Every field is indexedsearchable

bullDistributed

12

What Elasticsearch is not

bullKey-Value Store

-Redis Riak

bullColumn Family Store

-C HBase

bullGraph Database

-Neo4J

13

ElasticSearch in a Nutshell

bullBased on Apache Lucene

bullDistributed

bullDocument-Oriented

bullSchema free

bullHTTP + JSON

bull(Near) Real-time search

bullEcosystem

-Hosting Monitoring Apps Clients (SDK)

14

Where can I get it

bullFree and Open Source

bullhttpswwwelasticco

bullhttpsgithubcomelasticelasticsearch

bullBacked by a Company Elastic

-Training

-Support

-AuthAuthZ

-Marvel for Monitoring

15

How do I run it

bullDownload it

- httpswwwelasticcodownloads

bullbinelasticsearch

bullhttplocalhost9200

16

status 200

name Tesla

cluster_name elasticsearch_royrusso

version

number 142

build_hash 927caff6f05403e936c20bf4529f144f0c89fd8c

build_timestamp 2014-12-16T141112Z

build_snapshot false

lucene_version 4102

tagline You Know for Search

Elasticsearch requires Java

17

You have 5 seconds to whine about it and then shutup

Some Use-Cases

18

ElasticSearch for Centralized

LogsbullLogstash + ElasticSearch + Kibana (ELK)

bullWellhellip and then therersquos Loggly

19

ldquoNetflix is a Log generating company that happens to stream moviesrdquo

Elasticsearch at Predikto

20

Elasticsearch at Predikto

21

bullWrite From Spark to Elasticsearch

bullQuery from Spark to Elasticsearch

bullVisualize

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 3: Elasticsearch - DevNexus 2015

Why Am I Here

bullWhat is Search

bullWhat is Elasticsearch

bullReal-World Use

bullScale Out

bullInteracting with Elasticsearch

3

Search is about

filtering information

and determining

relevance

4

How does a Search Engine

Work

5

Select FROM make WHERE name LIKE lsquoTeslarsquo

Search Engines use Magic

6

Where Magic == Inverted Index

Itrsquos FM

Inverted Index

bullTake some documents

bullTokenize them

bullFind unique tokens

bullMap tokens to documents

7

apple oranges peach

Document 1

Document 2

Document 3

Document 4

Document 5

Document 6

Inverted Index

8

apples oranges peach

Document 1

Document 2

Document 3

Document 4

Document 5

Document 6

Search for ldquoapple peachrdquo

Relevance

bullHow many tokens per document

bullHow many tokens relative to the number of total

tokens in the document

bullWhat is the frequency of token across all

documents

9

Relevance in Elasticsearch

bullAt Search Time

bullAt Index Time

bullTerm Frequency

-Term Document

bullInverse Document Frequency (IDF)

-Term All Documents in the collection

bullField-Length Norm

10

What is Elasticsearch

11

Elasticsearch ishellip

bullSearch and Analytics engine

bullDocument Store

-Every field is indexedsearchable

bullDistributed

12

What Elasticsearch is not

bullKey-Value Store

-Redis Riak

bullColumn Family Store

-C HBase

bullGraph Database

-Neo4J

13

ElasticSearch in a Nutshell

bullBased on Apache Lucene

bullDistributed

bullDocument-Oriented

bullSchema free

bullHTTP + JSON

bull(Near) Real-time search

bullEcosystem

-Hosting Monitoring Apps Clients (SDK)

14

Where can I get it

bullFree and Open Source

bullhttpswwwelasticco

bullhttpsgithubcomelasticelasticsearch

bullBacked by a Company Elastic

-Training

-Support

-AuthAuthZ

-Marvel for Monitoring

15

How do I run it

bullDownload it

- httpswwwelasticcodownloads

bullbinelasticsearch

bullhttplocalhost9200

16

status 200

name Tesla

cluster_name elasticsearch_royrusso

version

number 142

build_hash 927caff6f05403e936c20bf4529f144f0c89fd8c

build_timestamp 2014-12-16T141112Z

build_snapshot false

lucene_version 4102

tagline You Know for Search

Elasticsearch requires Java

17

You have 5 seconds to whine about it and then shutup

Some Use-Cases

18

ElasticSearch for Centralized

LogsbullLogstash + ElasticSearch + Kibana (ELK)

bullWellhellip and then therersquos Loggly

19

ldquoNetflix is a Log generating company that happens to stream moviesrdquo

Elasticsearch at Predikto

20

Elasticsearch at Predikto

21

bullWrite From Spark to Elasticsearch

bullQuery from Spark to Elasticsearch

bullVisualize

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 4: Elasticsearch - DevNexus 2015

Search is about

filtering information

and determining

relevance

4

How does a Search Engine

Work

5

Select FROM make WHERE name LIKE lsquoTeslarsquo

Search Engines use Magic

6

Where Magic == Inverted Index

Itrsquos FM

Inverted Index

bullTake some documents

bullTokenize them

bullFind unique tokens

bullMap tokens to documents

7

apple oranges peach

Document 1

Document 2

Document 3

Document 4

Document 5

Document 6

Inverted Index

8

apples oranges peach

Document 1

Document 2

Document 3

Document 4

Document 5

Document 6

Search for ldquoapple peachrdquo

Relevance

bullHow many tokens per document

bullHow many tokens relative to the number of total

tokens in the document

bullWhat is the frequency of token across all

documents

9

Relevance in Elasticsearch

bullAt Search Time

bullAt Index Time

bullTerm Frequency

-Term Document

bullInverse Document Frequency (IDF)

-Term All Documents in the collection

bullField-Length Norm

10

What is Elasticsearch

11

Elasticsearch ishellip

bullSearch and Analytics engine

bullDocument Store

-Every field is indexedsearchable

bullDistributed

12

What Elasticsearch is not

bullKey-Value Store

-Redis Riak

bullColumn Family Store

-C HBase

bullGraph Database

-Neo4J

13

ElasticSearch in a Nutshell

bullBased on Apache Lucene

bullDistributed

bullDocument-Oriented

bullSchema free

bullHTTP + JSON

bull(Near) Real-time search

bullEcosystem

-Hosting Monitoring Apps Clients (SDK)

14

Where can I get it

bullFree and Open Source

bullhttpswwwelasticco

bullhttpsgithubcomelasticelasticsearch

bullBacked by a Company Elastic

-Training

-Support

-AuthAuthZ

-Marvel for Monitoring

15

How do I run it

bullDownload it

- httpswwwelasticcodownloads

bullbinelasticsearch

bullhttplocalhost9200

16

status 200

name Tesla

cluster_name elasticsearch_royrusso

version

number 142

build_hash 927caff6f05403e936c20bf4529f144f0c89fd8c

build_timestamp 2014-12-16T141112Z

build_snapshot false

lucene_version 4102

tagline You Know for Search

Elasticsearch requires Java

17

You have 5 seconds to whine about it and then shutup

Some Use-Cases

18

ElasticSearch for Centralized

LogsbullLogstash + ElasticSearch + Kibana (ELK)

bullWellhellip and then therersquos Loggly

19

ldquoNetflix is a Log generating company that happens to stream moviesrdquo

Elasticsearch at Predikto

20

Elasticsearch at Predikto

21

bullWrite From Spark to Elasticsearch

bullQuery from Spark to Elasticsearch

bullVisualize

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 5: Elasticsearch - DevNexus 2015

How does a Search Engine

Work

5

Select FROM make WHERE name LIKE lsquoTeslarsquo

Search Engines use Magic

6

Where Magic == Inverted Index

Itrsquos FM

Inverted Index

bullTake some documents

bullTokenize them

bullFind unique tokens

bullMap tokens to documents

7

apple oranges peach

Document 1

Document 2

Document 3

Document 4

Document 5

Document 6

Inverted Index

8

apples oranges peach

Document 1

Document 2

Document 3

Document 4

Document 5

Document 6

Search for ldquoapple peachrdquo

Relevance

bullHow many tokens per document

bullHow many tokens relative to the number of total

tokens in the document

bullWhat is the frequency of token across all

documents

9

Relevance in Elasticsearch

bullAt Search Time

bullAt Index Time

bullTerm Frequency

-Term Document

bullInverse Document Frequency (IDF)

-Term All Documents in the collection

bullField-Length Norm

10

What is Elasticsearch

11

Elasticsearch ishellip

bullSearch and Analytics engine

bullDocument Store

-Every field is indexedsearchable

bullDistributed

12

What Elasticsearch is not

bullKey-Value Store

-Redis Riak

bullColumn Family Store

-C HBase

bullGraph Database

-Neo4J

13

ElasticSearch in a Nutshell

bullBased on Apache Lucene

bullDistributed

bullDocument-Oriented

bullSchema free

bullHTTP + JSON

bull(Near) Real-time search

bullEcosystem

-Hosting Monitoring Apps Clients (SDK)

14

Where can I get it

bullFree and Open Source

bullhttpswwwelasticco

bullhttpsgithubcomelasticelasticsearch

bullBacked by a Company Elastic

-Training

-Support

-AuthAuthZ

-Marvel for Monitoring

15

How do I run it

bullDownload it

- httpswwwelasticcodownloads

bullbinelasticsearch

bullhttplocalhost9200

16

status 200

name Tesla

cluster_name elasticsearch_royrusso

version

number 142

build_hash 927caff6f05403e936c20bf4529f144f0c89fd8c

build_timestamp 2014-12-16T141112Z

build_snapshot false

lucene_version 4102

tagline You Know for Search

Elasticsearch requires Java

17

You have 5 seconds to whine about it and then shutup

Some Use-Cases

18

ElasticSearch for Centralized

LogsbullLogstash + ElasticSearch + Kibana (ELK)

bullWellhellip and then therersquos Loggly

19

ldquoNetflix is a Log generating company that happens to stream moviesrdquo

Elasticsearch at Predikto

20

Elasticsearch at Predikto

21

bullWrite From Spark to Elasticsearch

bullQuery from Spark to Elasticsearch

bullVisualize

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 6: Elasticsearch - DevNexus 2015

Search Engines use Magic

6

Where Magic == Inverted Index

Itrsquos FM

Inverted Index

bullTake some documents

bullTokenize them

bullFind unique tokens

bullMap tokens to documents

7

apple oranges peach

Document 1

Document 2

Document 3

Document 4

Document 5

Document 6

Inverted Index

8

apples oranges peach

Document 1

Document 2

Document 3

Document 4

Document 5

Document 6

Search for ldquoapple peachrdquo

Relevance

bullHow many tokens per document

bullHow many tokens relative to the number of total

tokens in the document

bullWhat is the frequency of token across all

documents

9

Relevance in Elasticsearch

bullAt Search Time

bullAt Index Time

bullTerm Frequency

-Term Document

bullInverse Document Frequency (IDF)

-Term All Documents in the collection

bullField-Length Norm

10

What is Elasticsearch

11

Elasticsearch ishellip

bullSearch and Analytics engine

bullDocument Store

-Every field is indexedsearchable

bullDistributed

12

What Elasticsearch is not

bullKey-Value Store

-Redis Riak

bullColumn Family Store

-C HBase

bullGraph Database

-Neo4J

13

ElasticSearch in a Nutshell

bullBased on Apache Lucene

bullDistributed

bullDocument-Oriented

bullSchema free

bullHTTP + JSON

bull(Near) Real-time search

bullEcosystem

-Hosting Monitoring Apps Clients (SDK)

14

Where can I get it

bullFree and Open Source

bullhttpswwwelasticco

bullhttpsgithubcomelasticelasticsearch

bullBacked by a Company Elastic

-Training

-Support

-AuthAuthZ

-Marvel for Monitoring

15

How do I run it

bullDownload it

- httpswwwelasticcodownloads

bullbinelasticsearch

bullhttplocalhost9200

16

status 200

name Tesla

cluster_name elasticsearch_royrusso

version

number 142

build_hash 927caff6f05403e936c20bf4529f144f0c89fd8c

build_timestamp 2014-12-16T141112Z

build_snapshot false

lucene_version 4102

tagline You Know for Search

Elasticsearch requires Java

17

You have 5 seconds to whine about it and then shutup

Some Use-Cases

18

ElasticSearch for Centralized

LogsbullLogstash + ElasticSearch + Kibana (ELK)

bullWellhellip and then therersquos Loggly

19

ldquoNetflix is a Log generating company that happens to stream moviesrdquo

Elasticsearch at Predikto

20

Elasticsearch at Predikto

21

bullWrite From Spark to Elasticsearch

bullQuery from Spark to Elasticsearch

bullVisualize

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 7: Elasticsearch - DevNexus 2015

Inverted Index

bullTake some documents

bullTokenize them

bullFind unique tokens

bullMap tokens to documents

7

apple oranges peach

Document 1

Document 2

Document 3

Document 4

Document 5

Document 6

Inverted Index

8

apples oranges peach

Document 1

Document 2

Document 3

Document 4

Document 5

Document 6

Search for ldquoapple peachrdquo

Relevance

bullHow many tokens per document

bullHow many tokens relative to the number of total

tokens in the document

bullWhat is the frequency of token across all

documents

9

Relevance in Elasticsearch

bullAt Search Time

bullAt Index Time

bullTerm Frequency

-Term Document

bullInverse Document Frequency (IDF)

-Term All Documents in the collection

bullField-Length Norm

10

What is Elasticsearch

11

Elasticsearch ishellip

bullSearch and Analytics engine

bullDocument Store

-Every field is indexedsearchable

bullDistributed

12

What Elasticsearch is not

bullKey-Value Store

-Redis Riak

bullColumn Family Store

-C HBase

bullGraph Database

-Neo4J

13

ElasticSearch in a Nutshell

bullBased on Apache Lucene

bullDistributed

bullDocument-Oriented

bullSchema free

bullHTTP + JSON

bull(Near) Real-time search

bullEcosystem

-Hosting Monitoring Apps Clients (SDK)

14

Where can I get it

bullFree and Open Source

bullhttpswwwelasticco

bullhttpsgithubcomelasticelasticsearch

bullBacked by a Company Elastic

-Training

-Support

-AuthAuthZ

-Marvel for Monitoring

15

How do I run it

bullDownload it

- httpswwwelasticcodownloads

bullbinelasticsearch

bullhttplocalhost9200

16

status 200

name Tesla

cluster_name elasticsearch_royrusso

version

number 142

build_hash 927caff6f05403e936c20bf4529f144f0c89fd8c

build_timestamp 2014-12-16T141112Z

build_snapshot false

lucene_version 4102

tagline You Know for Search

Elasticsearch requires Java

17

You have 5 seconds to whine about it and then shutup

Some Use-Cases

18

ElasticSearch for Centralized

LogsbullLogstash + ElasticSearch + Kibana (ELK)

bullWellhellip and then therersquos Loggly

19

ldquoNetflix is a Log generating company that happens to stream moviesrdquo

Elasticsearch at Predikto

20

Elasticsearch at Predikto

21

bullWrite From Spark to Elasticsearch

bullQuery from Spark to Elasticsearch

bullVisualize

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 8: Elasticsearch - DevNexus 2015

Inverted Index

8

apples oranges peach

Document 1

Document 2

Document 3

Document 4

Document 5

Document 6

Search for ldquoapple peachrdquo

Relevance

bullHow many tokens per document

bullHow many tokens relative to the number of total

tokens in the document

bullWhat is the frequency of token across all

documents

9

Relevance in Elasticsearch

bullAt Search Time

bullAt Index Time

bullTerm Frequency

-Term Document

bullInverse Document Frequency (IDF)

-Term All Documents in the collection

bullField-Length Norm

10

What is Elasticsearch

11

Elasticsearch ishellip

bullSearch and Analytics engine

bullDocument Store

-Every field is indexedsearchable

bullDistributed

12

What Elasticsearch is not

bullKey-Value Store

-Redis Riak

bullColumn Family Store

-C HBase

bullGraph Database

-Neo4J

13

ElasticSearch in a Nutshell

bullBased on Apache Lucene

bullDistributed

bullDocument-Oriented

bullSchema free

bullHTTP + JSON

bull(Near) Real-time search

bullEcosystem

-Hosting Monitoring Apps Clients (SDK)

14

Where can I get it

bullFree and Open Source

bullhttpswwwelasticco

bullhttpsgithubcomelasticelasticsearch

bullBacked by a Company Elastic

-Training

-Support

-AuthAuthZ

-Marvel for Monitoring

15

How do I run it

bullDownload it

- httpswwwelasticcodownloads

bullbinelasticsearch

bullhttplocalhost9200

16

status 200

name Tesla

cluster_name elasticsearch_royrusso

version

number 142

build_hash 927caff6f05403e936c20bf4529f144f0c89fd8c

build_timestamp 2014-12-16T141112Z

build_snapshot false

lucene_version 4102

tagline You Know for Search

Elasticsearch requires Java

17

You have 5 seconds to whine about it and then shutup

Some Use-Cases

18

ElasticSearch for Centralized

LogsbullLogstash + ElasticSearch + Kibana (ELK)

bullWellhellip and then therersquos Loggly

19

ldquoNetflix is a Log generating company that happens to stream moviesrdquo

Elasticsearch at Predikto

20

Elasticsearch at Predikto

21

bullWrite From Spark to Elasticsearch

bullQuery from Spark to Elasticsearch

bullVisualize

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 9: Elasticsearch - DevNexus 2015

Relevance

bullHow many tokens per document

bullHow many tokens relative to the number of total

tokens in the document

bullWhat is the frequency of token across all

documents

9

Relevance in Elasticsearch

bullAt Search Time

bullAt Index Time

bullTerm Frequency

-Term Document

bullInverse Document Frequency (IDF)

-Term All Documents in the collection

bullField-Length Norm

10

What is Elasticsearch

11

Elasticsearch ishellip

bullSearch and Analytics engine

bullDocument Store

-Every field is indexedsearchable

bullDistributed

12

What Elasticsearch is not

bullKey-Value Store

-Redis Riak

bullColumn Family Store

-C HBase

bullGraph Database

-Neo4J

13

ElasticSearch in a Nutshell

bullBased on Apache Lucene

bullDistributed

bullDocument-Oriented

bullSchema free

bullHTTP + JSON

bull(Near) Real-time search

bullEcosystem

-Hosting Monitoring Apps Clients (SDK)

14

Where can I get it

bullFree and Open Source

bullhttpswwwelasticco

bullhttpsgithubcomelasticelasticsearch

bullBacked by a Company Elastic

-Training

-Support

-AuthAuthZ

-Marvel for Monitoring

15

How do I run it

bullDownload it

- httpswwwelasticcodownloads

bullbinelasticsearch

bullhttplocalhost9200

16

status 200

name Tesla

cluster_name elasticsearch_royrusso

version

number 142

build_hash 927caff6f05403e936c20bf4529f144f0c89fd8c

build_timestamp 2014-12-16T141112Z

build_snapshot false

lucene_version 4102

tagline You Know for Search

Elasticsearch requires Java

17

You have 5 seconds to whine about it and then shutup

Some Use-Cases

18

ElasticSearch for Centralized

LogsbullLogstash + ElasticSearch + Kibana (ELK)

bullWellhellip and then therersquos Loggly

19

ldquoNetflix is a Log generating company that happens to stream moviesrdquo

Elasticsearch at Predikto

20

Elasticsearch at Predikto

21

bullWrite From Spark to Elasticsearch

bullQuery from Spark to Elasticsearch

bullVisualize

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 10: Elasticsearch - DevNexus 2015

Relevance in Elasticsearch

bullAt Search Time

bullAt Index Time

bullTerm Frequency

-Term Document

bullInverse Document Frequency (IDF)

-Term All Documents in the collection

bullField-Length Norm

10

What is Elasticsearch

11

Elasticsearch ishellip

bullSearch and Analytics engine

bullDocument Store

-Every field is indexedsearchable

bullDistributed

12

What Elasticsearch is not

bullKey-Value Store

-Redis Riak

bullColumn Family Store

-C HBase

bullGraph Database

-Neo4J

13

ElasticSearch in a Nutshell

bullBased on Apache Lucene

bullDistributed

bullDocument-Oriented

bullSchema free

bullHTTP + JSON

bull(Near) Real-time search

bullEcosystem

-Hosting Monitoring Apps Clients (SDK)

14

Where can I get it

bullFree and Open Source

bullhttpswwwelasticco

bullhttpsgithubcomelasticelasticsearch

bullBacked by a Company Elastic

-Training

-Support

-AuthAuthZ

-Marvel for Monitoring

15

How do I run it

bullDownload it

- httpswwwelasticcodownloads

bullbinelasticsearch

bullhttplocalhost9200

16

status 200

name Tesla

cluster_name elasticsearch_royrusso

version

number 142

build_hash 927caff6f05403e936c20bf4529f144f0c89fd8c

build_timestamp 2014-12-16T141112Z

build_snapshot false

lucene_version 4102

tagline You Know for Search

Elasticsearch requires Java

17

You have 5 seconds to whine about it and then shutup

Some Use-Cases

18

ElasticSearch for Centralized

LogsbullLogstash + ElasticSearch + Kibana (ELK)

bullWellhellip and then therersquos Loggly

19

ldquoNetflix is a Log generating company that happens to stream moviesrdquo

Elasticsearch at Predikto

20

Elasticsearch at Predikto

21

bullWrite From Spark to Elasticsearch

bullQuery from Spark to Elasticsearch

bullVisualize

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 11: Elasticsearch - DevNexus 2015

What is Elasticsearch

11

Elasticsearch ishellip

bullSearch and Analytics engine

bullDocument Store

-Every field is indexedsearchable

bullDistributed

12

What Elasticsearch is not

bullKey-Value Store

-Redis Riak

bullColumn Family Store

-C HBase

bullGraph Database

-Neo4J

13

ElasticSearch in a Nutshell

bullBased on Apache Lucene

bullDistributed

bullDocument-Oriented

bullSchema free

bullHTTP + JSON

bull(Near) Real-time search

bullEcosystem

-Hosting Monitoring Apps Clients (SDK)

14

Where can I get it

bullFree and Open Source

bullhttpswwwelasticco

bullhttpsgithubcomelasticelasticsearch

bullBacked by a Company Elastic

-Training

-Support

-AuthAuthZ

-Marvel for Monitoring

15

How do I run it

bullDownload it

- httpswwwelasticcodownloads

bullbinelasticsearch

bullhttplocalhost9200

16

status 200

name Tesla

cluster_name elasticsearch_royrusso

version

number 142

build_hash 927caff6f05403e936c20bf4529f144f0c89fd8c

build_timestamp 2014-12-16T141112Z

build_snapshot false

lucene_version 4102

tagline You Know for Search

Elasticsearch requires Java

17

You have 5 seconds to whine about it and then shutup

Some Use-Cases

18

ElasticSearch for Centralized

LogsbullLogstash + ElasticSearch + Kibana (ELK)

bullWellhellip and then therersquos Loggly

19

ldquoNetflix is a Log generating company that happens to stream moviesrdquo

Elasticsearch at Predikto

20

Elasticsearch at Predikto

21

bullWrite From Spark to Elasticsearch

bullQuery from Spark to Elasticsearch

bullVisualize

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 12: Elasticsearch - DevNexus 2015

Elasticsearch ishellip

bullSearch and Analytics engine

bullDocument Store

-Every field is indexedsearchable

bullDistributed

12

What Elasticsearch is not

bullKey-Value Store

-Redis Riak

bullColumn Family Store

-C HBase

bullGraph Database

-Neo4J

13

ElasticSearch in a Nutshell

bullBased on Apache Lucene

bullDistributed

bullDocument-Oriented

bullSchema free

bullHTTP + JSON

bull(Near) Real-time search

bullEcosystem

-Hosting Monitoring Apps Clients (SDK)

14

Where can I get it

bullFree and Open Source

bullhttpswwwelasticco

bullhttpsgithubcomelasticelasticsearch

bullBacked by a Company Elastic

-Training

-Support

-AuthAuthZ

-Marvel for Monitoring

15

How do I run it

bullDownload it

- httpswwwelasticcodownloads

bullbinelasticsearch

bullhttplocalhost9200

16

status 200

name Tesla

cluster_name elasticsearch_royrusso

version

number 142

build_hash 927caff6f05403e936c20bf4529f144f0c89fd8c

build_timestamp 2014-12-16T141112Z

build_snapshot false

lucene_version 4102

tagline You Know for Search

Elasticsearch requires Java

17

You have 5 seconds to whine about it and then shutup

Some Use-Cases

18

ElasticSearch for Centralized

LogsbullLogstash + ElasticSearch + Kibana (ELK)

bullWellhellip and then therersquos Loggly

19

ldquoNetflix is a Log generating company that happens to stream moviesrdquo

Elasticsearch at Predikto

20

Elasticsearch at Predikto

21

bullWrite From Spark to Elasticsearch

bullQuery from Spark to Elasticsearch

bullVisualize

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 13: Elasticsearch - DevNexus 2015

What Elasticsearch is not

bullKey-Value Store

-Redis Riak

bullColumn Family Store

-C HBase

bullGraph Database

-Neo4J

13

ElasticSearch in a Nutshell

bullBased on Apache Lucene

bullDistributed

bullDocument-Oriented

bullSchema free

bullHTTP + JSON

bull(Near) Real-time search

bullEcosystem

-Hosting Monitoring Apps Clients (SDK)

14

Where can I get it

bullFree and Open Source

bullhttpswwwelasticco

bullhttpsgithubcomelasticelasticsearch

bullBacked by a Company Elastic

-Training

-Support

-AuthAuthZ

-Marvel for Monitoring

15

How do I run it

bullDownload it

- httpswwwelasticcodownloads

bullbinelasticsearch

bullhttplocalhost9200

16

status 200

name Tesla

cluster_name elasticsearch_royrusso

version

number 142

build_hash 927caff6f05403e936c20bf4529f144f0c89fd8c

build_timestamp 2014-12-16T141112Z

build_snapshot false

lucene_version 4102

tagline You Know for Search

Elasticsearch requires Java

17

You have 5 seconds to whine about it and then shutup

Some Use-Cases

18

ElasticSearch for Centralized

LogsbullLogstash + ElasticSearch + Kibana (ELK)

bullWellhellip and then therersquos Loggly

19

ldquoNetflix is a Log generating company that happens to stream moviesrdquo

Elasticsearch at Predikto

20

Elasticsearch at Predikto

21

bullWrite From Spark to Elasticsearch

bullQuery from Spark to Elasticsearch

bullVisualize

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 14: Elasticsearch - DevNexus 2015

ElasticSearch in a Nutshell

bullBased on Apache Lucene

bullDistributed

bullDocument-Oriented

bullSchema free

bullHTTP + JSON

bull(Near) Real-time search

bullEcosystem

-Hosting Monitoring Apps Clients (SDK)

14

Where can I get it

bullFree and Open Source

bullhttpswwwelasticco

bullhttpsgithubcomelasticelasticsearch

bullBacked by a Company Elastic

-Training

-Support

-AuthAuthZ

-Marvel for Monitoring

15

How do I run it

bullDownload it

- httpswwwelasticcodownloads

bullbinelasticsearch

bullhttplocalhost9200

16

status 200

name Tesla

cluster_name elasticsearch_royrusso

version

number 142

build_hash 927caff6f05403e936c20bf4529f144f0c89fd8c

build_timestamp 2014-12-16T141112Z

build_snapshot false

lucene_version 4102

tagline You Know for Search

Elasticsearch requires Java

17

You have 5 seconds to whine about it and then shutup

Some Use-Cases

18

ElasticSearch for Centralized

LogsbullLogstash + ElasticSearch + Kibana (ELK)

bullWellhellip and then therersquos Loggly

19

ldquoNetflix is a Log generating company that happens to stream moviesrdquo

Elasticsearch at Predikto

20

Elasticsearch at Predikto

21

bullWrite From Spark to Elasticsearch

bullQuery from Spark to Elasticsearch

bullVisualize

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 15: Elasticsearch - DevNexus 2015

Where can I get it

bullFree and Open Source

bullhttpswwwelasticco

bullhttpsgithubcomelasticelasticsearch

bullBacked by a Company Elastic

-Training

-Support

-AuthAuthZ

-Marvel for Monitoring

15

How do I run it

bullDownload it

- httpswwwelasticcodownloads

bullbinelasticsearch

bullhttplocalhost9200

16

status 200

name Tesla

cluster_name elasticsearch_royrusso

version

number 142

build_hash 927caff6f05403e936c20bf4529f144f0c89fd8c

build_timestamp 2014-12-16T141112Z

build_snapshot false

lucene_version 4102

tagline You Know for Search

Elasticsearch requires Java

17

You have 5 seconds to whine about it and then shutup

Some Use-Cases

18

ElasticSearch for Centralized

LogsbullLogstash + ElasticSearch + Kibana (ELK)

bullWellhellip and then therersquos Loggly

19

ldquoNetflix is a Log generating company that happens to stream moviesrdquo

Elasticsearch at Predikto

20

Elasticsearch at Predikto

21

bullWrite From Spark to Elasticsearch

bullQuery from Spark to Elasticsearch

bullVisualize

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 16: Elasticsearch - DevNexus 2015

How do I run it

bullDownload it

- httpswwwelasticcodownloads

bullbinelasticsearch

bullhttplocalhost9200

16

status 200

name Tesla

cluster_name elasticsearch_royrusso

version

number 142

build_hash 927caff6f05403e936c20bf4529f144f0c89fd8c

build_timestamp 2014-12-16T141112Z

build_snapshot false

lucene_version 4102

tagline You Know for Search

Elasticsearch requires Java

17

You have 5 seconds to whine about it and then shutup

Some Use-Cases

18

ElasticSearch for Centralized

LogsbullLogstash + ElasticSearch + Kibana (ELK)

bullWellhellip and then therersquos Loggly

19

ldquoNetflix is a Log generating company that happens to stream moviesrdquo

Elasticsearch at Predikto

20

Elasticsearch at Predikto

21

bullWrite From Spark to Elasticsearch

bullQuery from Spark to Elasticsearch

bullVisualize

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 17: Elasticsearch - DevNexus 2015

Elasticsearch requires Java

17

You have 5 seconds to whine about it and then shutup

Some Use-Cases

18

ElasticSearch for Centralized

LogsbullLogstash + ElasticSearch + Kibana (ELK)

bullWellhellip and then therersquos Loggly

19

ldquoNetflix is a Log generating company that happens to stream moviesrdquo

Elasticsearch at Predikto

20

Elasticsearch at Predikto

21

bullWrite From Spark to Elasticsearch

bullQuery from Spark to Elasticsearch

bullVisualize

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 18: Elasticsearch - DevNexus 2015

Some Use-Cases

18

ElasticSearch for Centralized

LogsbullLogstash + ElasticSearch + Kibana (ELK)

bullWellhellip and then therersquos Loggly

19

ldquoNetflix is a Log generating company that happens to stream moviesrdquo

Elasticsearch at Predikto

20

Elasticsearch at Predikto

21

bullWrite From Spark to Elasticsearch

bullQuery from Spark to Elasticsearch

bullVisualize

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 19: Elasticsearch - DevNexus 2015

ElasticSearch for Centralized

LogsbullLogstash + ElasticSearch + Kibana (ELK)

bullWellhellip and then therersquos Loggly

19

ldquoNetflix is a Log generating company that happens to stream moviesrdquo

Elasticsearch at Predikto

20

Elasticsearch at Predikto

21

bullWrite From Spark to Elasticsearch

bullQuery from Spark to Elasticsearch

bullVisualize

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 20: Elasticsearch - DevNexus 2015

Elasticsearch at Predikto

20

Elasticsearch at Predikto

21

bullWrite From Spark to Elasticsearch

bullQuery from Spark to Elasticsearch

bullVisualize

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 21: Elasticsearch - DevNexus 2015

Elasticsearch at Predikto

21

bullWrite From Spark to Elasticsearch

bullQuery from Spark to Elasticsearch

bullVisualize

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 22: Elasticsearch - DevNexus 2015

Widely Used

22

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 23: Elasticsearch - DevNexus 2015

Based on Apache Lucene

bullFree and Open Source

bullStarted in 1999

bullCreated by Doug Cutting

bullWhatrsquos it do

-Tokenizing

- Locations

-Relevance scoring

-Filtering

-Text search

-Date Parsing

23

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 24: Elasticsearch - DevNexus 2015

Elasticsearch is a Document

Store

24

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 25: Elasticsearch - DevNexus 2015

Document Store

bullLike MongoDB and CouchDB

bullDocument DBs

- JSON documents

-Collections of key-value collections

-Nesting

-Versioned

25

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 26: Elasticsearch - DevNexus 2015

What is a document

26

genre Crime

ldquolanguage English

country USA

runtime 170

title Scarface

year 1983

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 27: Elasticsearch - DevNexus 2015

Modeled in JSON

27

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_index imdb

_type movie

_id u17o8zy9RcKg6SjQZqQ4Ow

_version 1

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 28: Elasticsearch - DevNexus 2015

Schema-Free

bullDynamic Mapping

-Elasticsearch guesses the data-types (string int

floathellip)

28

imdb

movie

properties

country

type stringldquo

ldquostorerdquotrue

ldquoindexrdquofalse

genre

type stringldquo

null_value naldquo

ldquostorerdquofalse

ldquoindextrue

year

type long

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 29: Elasticsearch - DevNexus 2015

Elasticsearch is Distributed

29

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 30: Elasticsearch - DevNexus 2015

Terminology

30

MySQL Elasticsearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index (Everything is indexed)

SQL Query DSL

bullCluster 1N Nodes w same Cluster Name

bullNode One ElasticSearch instance (1 java proc)

bullShard = One Lucene instance

- 0 or more replicas

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 31: Elasticsearch - DevNexus 2015

High Availability

bullNo need for load balancer

bullDifferent Node Types

bullIndices are Sharded

bullReplica shards on different Nodes

bullAutomatic Master election amp failover

31

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 32: Elasticsearch - DevNexus 2015

About Indices Shards

32

$ curl -XPUT httplocalhost9200twitter -d

settings

index

number_of_shards 3

number_of_replicas 2

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 33: Elasticsearch - DevNexus 2015

Cluster Topology

33

A1 A2B2 B2 B1

B3

B1 A1 A2

B3

4 Node Cluster

Index A 2 Shards amp 1 Replica

Index B 3 Shards amp 1 Replica

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 34: Elasticsearch - DevNexus 2015

Discovery

bullNodes discover each other using multicast

-Unicast is an option

bullEach cluster has an elected master node

-Beware of split-brain

discoveryzenpingmulticastenabled false

discoveryzenpingunicasthosts [host1 host2port host3]

34

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 35: Elasticsearch - DevNexus 2015

Nodes

bullMaster node handles cluster-wide (Meta-API)

events

-Node participation

-New indices createdelete

-Re-Allocation of shards

bullData Nodes

- Indexing Searching operations

bullClient Nodes

-REST calls

- Light-weight load balancers

35

nodedata | nodemaster

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 36: Elasticsearch - DevNexus 2015

The Basics - Shards

bullPrimary Shard

-First time Indexing

- Index has 1N primary shards (default 5)

- Not changeable once index created

bullReplica Shard

-Copy of the primary shard

- Can be changed later

-Each primary has 0N replicas

- HA

bull Promoted to primary if primary fails

bull GetSearch handled by primary||replica36

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 37: Elasticsearch - DevNexus 2015

Shard Auto-Allocation

bullShard Phases

-Unassigned

- Initializing

-Started

-Relocating37

Node 1

0P

1R

Node 2

1P

0R

Node 2

0R

Add a Node Shards Relocate

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 38: Elasticsearch - DevNexus 2015

Allocation Awareness

bullShard Allocation Awareness

- clusterroutingallocationawarenessattributes rack

-Shards RELOCATE to even distribution

-Primary amp Replica will NOT be on the same rack

value

38

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 39: Elasticsearch - DevNexus 2015

Cluster State

bullCluster State

-Node Membership

- Indices Settings

-Shard Allocation

Table

-Shard State

39

cURL -XGET httplocalhost9200_clusterstatepretty=1

cluster_name elasticsearch_royrusso

version 27

master_node s3fpXfPKSFeUqo1MYZxSng

blocks

nodes

s3fpXfPKSFeUqo1MYZxSng

name Bulldozer

transport_address inet[localhost1270019300]

attributes

metadata

templates

logging_index_all

template logstash-09-

order 1

settings

index

number_of_shards 2

number_of_replicas 1

mappings

date

store false

logging_index

template logstash-

order 0

settings

index

number_of_shards 2

ldquonumber_of_replicasrdquo 1

mappings

ldquodaterdquo

ldquostorerdquo true

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 40: Elasticsearch - DevNexus 2015

Talking to Elasticsearch

40

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 41: Elasticsearch - DevNexus 2015

REST

bullHTTP Verbs GET POST PUT DELETE

bullJSON

bull_cat API

41

curl 19216856109200_cathealthvampts=0

cluster status nodeTotal nodeData shards pri relo init unassign

foo green 3 3 3 3 0 0 0

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 42: Elasticsearch - DevNexus 2015

The API

bullDocument

bullCluster

-Node

bullIndex

bullSearch

42

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 43: Elasticsearch - DevNexus 2015

Create a Document

43

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 44: Elasticsearch - DevNexus 2015

Of notehellip

44

curl -XPOST http1270019200imdbmovie -d

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

createdtrue

Auto-creates Index amp Type

Auto-Gen ID

Auto-Version

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 45: Elasticsearch - DevNexus 2015

Get a Document

45

curl -XGET http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version1

foundtrue

_source

genre Crime

language English

country USA

runtime 170

title Scarface

year 1983

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 46: Elasticsearch - DevNexus 2015

Update a Document

46

curl -XPUT http1270019200imdbmovieAUwGeWib1u4mCngDYT7y -d

genre Crime

language English

country USA

runtime 180

title Scarface

year 1983

_indeximdb

_typemovie

_idAUwGeWib1u4mCngDYT7y

_version2

createdfalse

More like an Upsert

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 47: Elasticsearch - DevNexus 2015

Delete a Document

47

curl -XDELETE http1270019200imdbmovieAUwGeWib1u4mCngDYT7y

ldquofoundtrue

ldquo_indeximdb

ldquo_typemovie

ldquo_idAUwGeWib1u4mCngDYT7y

ldquo_version2

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 48: Elasticsearch - DevNexus 2015

You can alsohellip

bullPartial document updating

bullSpecify Version

bullSpecify ID

bullMulti-Get API

bullExists API

bullBulk API

48

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 49: Elasticsearch - DevNexus 2015

How Searching Works

bullHow it works

-Search request hits a node

-Node broadcasts to every shard in the index

-Each shard performs query

-Each shard returns metadata about results

-Node merges results and scores them

-Node requests documents from shards

-Results merged sorted and returned to client

49

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 50: Elasticsearch - DevNexus 2015

REST API - Search

bullFree Text Search

-URL Request

bullComplex Query

50

httplocalhost9200imdbmovie_searchq=scar

httplocalhost9200imdbmovie_searchq=scarface+OR+star

httplocalhost9200imdbmovie_searchq=(scarface+OR+star)+AND+year[1981+TO+1984]

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 51: Elasticsearch - DevNexus 2015

REST API ndash Query DSL

curl -XPOST localhost9200_searchpretty -d

query

bool

must [

query_string

query scarface or star

range

year gte 1931

]

51

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 52: Elasticsearch - DevNexus 2015

REST API ndash Query DSL

bullBoolean Querybool

must[

match

colorblue

match

titleshirt

]

must_not[

match

sizexxl

]

should[

match

textilecotton

52

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 53: Elasticsearch - DevNexus 2015

REST API ndash Query DSL

bullRange Query

-Numeric Date Types

bullPrefixWildcard Query

-Match on partial terms

bullRegExp Query

bullGeo_bbox

-Bounding box filter

bullGeo_distance

-Geo_distance_range

range

founded_year

gte1990

lt2000

53

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 54: Elasticsearch - DevNexus 2015

Filters

bullFilters recommended over Queries

-Better cache support

54

curl -XGET httplocalhost9200my_indexevents_searchpretty=1 -d

from 0

size 0

query

terms

message [ apples]

minimum_should_match 1

post_filter

terms

userId [ 25476c6788ce g20d5470d7b4 ]

execution or

sort eventDate order desc

explain false

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 55: Elasticsearch - DevNexus 2015

Analyzers Tokenizers

55

curl -XPUT lsquohttplocalhost9200my_index -d

settings

analysis

analyzer

str_search_analyzer

tokenizer keyword

filter [lowercase]

str_index_analyzer

tokenizer substring

filter [lowercase stop]

tokenizer

substring

type edgeNgram

min_gram 3

max_gram 42

token_chars [letter digit]

curl -XPUT httplocalhost9200my_indexevents_mapping -d

events

properties

eventId type string store true index not_analyzed

userId type string store false index not_analyzed

message

type string store false

search_analyzer str_search_analyzer

index_analyzer str_index_analyzer

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 56: Elasticsearch - DevNexus 2015

Tokenizers

bullWhitespace

bullNGram

bullEdge NGram

bullLetter

- non-letters

56

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 57: Elasticsearch - DevNexus 2015

Clients

bullClient list

httpwwwelasticsearchorgguideclients

- Java (Node) Client JS PHP Perl Python Ruby

bullSpring Data

-Uses TransportClient

- Implementation of ElasticsearchRepository aligns

with generic Repository interfaces

57

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 58: Elasticsearch - DevNexus 2015

Monitoring

bullBigDesk

bullKopf

bullHead

bullElasticHQ

bullMarvel

bullSematext SPM

58

Questions

59

Page 59: Elasticsearch - DevNexus 2015

Questions

59