Finding the right stuff, an intro to Elasticsearch with Ruby/Rails

Post on 14-Feb-2017

794 views 4 download

Transcript of Finding the right stuff, an intro to Elasticsearch with Ruby/Rails

Elasticsearch?

clustered and sharded document storage with powerful

language analysing features and a query language,

all wrapped by a REST API

Getting Started

• install elasticsearch

• needs some JDK

• start it

Getting Started

• https://github.com/elastic/elasticsearch-rails

• gems for Rails:

• elasticsearch-model & elasticsearch-rails

• without Rails / AR:

• elasticsearch-persistence

class Event < ActiveRecord::Base include Elasticsearch::Model

class Event < ActiveRecord::Base include Elasticsearch::Model

def as_indexed_json(options={}) { title: title, description: description, starts_at: starts_at.iso8601 } end

Event.import

Event.import

PUT /events/event/31710 { "title": "Finding the right stuff, ...", "description": "Searching in data sets with ...", "starts_at": “2015-10-08T19:00:00+09:00" }

Event.import

PUT /events/event/31710 { "title": "Finding the right stuff, ...", "description": "Searching in data sets with ...", "starts_at": “2015-10-08T19:00:00+09:00" }

index

Event.import

PUT /events/event/31710 { "title": "Finding the right stuff, ...", "description": "Searching in data sets with ...", "starts_at": “2015-10-08T19:00:00+09:00" }

index

type

Event.import

PUT /events/event/31710 { "title": "Finding the right stuff, ...", "description": "Searching in data sets with ...", "starts_at": “2015-10-08T19:00:00+09:00" }

index

type

ID

Event.search 'tokyo rubyist'

response = Event.search 'tokyo rubyist' response.took # => 28 response.results.total # => 2075 response.results.first._score # => 0.921177 response.results.first._source.title # => "Drop in Ruby"

response = Event.search 'tokyo rubyist' response.took # => 28 response.results.total # => 2075 response.results.first._score # => 0.921177 response.results.first._source.title # => "Drop in Ruby"

GET /events/event/_search?q=tokyo%20rubyist

response = Event.search 'tokyo rubyist' response.records.to_a # => [#<Event id: 12409, ...>, ...]

response.page(2).results response.page(2).records

response = Event.search 'tokyo rubyist' response.records.to_a # => [#<Event id: 12409, ...>, ...]

response.page(2).results response.page(2).records supports kaminari /

will_paginate

response = Event.search 'tokyo rubyist' response.records.each_with_hit do |rec,hit| puts "* #{rec.title}: #{hit._score}" end # * Drop in Ruby: 0.9205564 # * Javascript meets Ruby in Kamakura: 0.8947 # * Meetup at EC Navi: 0.8766844 # * Pair Programming Session #3: 0.8603562 # * Kickoff Party: 0.8265461 # * Tales of a Ruby Committer: 0.74487066 # * One Year Anniversary Party: 0.7298212

Event.search 'tokyo rubyist'

Event.search 'tokyo rubyist'

only upcoming events?

Event.search 'tokyo rubyist'

only upcoming events?

sorted by start date?

Event.search query: { filtered: { query: { simple_query_string: { query: "tokyo rubyist", default_operator: "and" } }, filter: { and: [ { range: { starts_at: { gte: Time.now } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: "asc" } }

Event.search query: { filtered: { query: { simple_query_string: { query: "tokyo rubyist", default_operator: "and" } }, filter: { and: [ { range: { starts_at: { gte: Time.now } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: "asc" } }

basically same as before

Event.search query: { filtered: { query: { simple_query_string: { query: "tokyo rubyist", default_operator: "and" } }, filter: { and: [ { range: { starts_at: { gte: Time.now } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: "asc" } }

basically same as before

filtered by conditions

Event.search query: { filtered: { query: { simple_query_string: { query: "tokyo rubyist", default_operator: "and" } }, filter: { and: [ { range: { starts_at: { gte: Time.now } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: "asc" } }

basically same as before

filtered by conditions

sorted by start time

Query DSL

• query: { <query_type>: <arguments> }

• valid arguments depend on query type

• "Filtered Query" takes a query and a filter

• "Simple Query String Query" does not allow nested queries

Event.search query: { filtered: { query: { simple_query_string: { query: "tokyo rubyist", default_operator: "and" } }, filter: { and: [ { range: { starts_at: { gte: Time.now } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: "asc" } }

Query DSL

• filter: { <filter_type>: <arguments> }

• valid arguments depend on filter type

• "And filter" takes an array of filters

• "Range filter" takes a property and lt(e), gt(e)

• "Term filter" takes a property and a value

Match QueryMulti Match Query

Bool Query Boosting Query

Common Terms Query Constant Score Query

Dis Max Query Filtered Query

Fuzzy Like This Query Fuzzy Like This Field Query

Function Score QueryFuzzy Query

GeoShape Query Has Child Query Has Parent Query

Ids Query Indices Query

Match All Query More Like This Query

Nested Query Prefix Query

Query String Query Simple Query String Query

Range Query Regexp Query

Span First Query Span Multi Term Query

Span Near Query Span Not Query Span Or Query

Span Term Query Term Query Terms Query

Top Children Query Wildcard Query

Minimum Should Match Multi Term Query Rewrite

Template Query

And FilterBool Filter

Exists Filter Geo Bounding Box Filter

Geo Distance Filter Geo Distance Range Filter

Geo Polygon Filter GeoShape Filter

Geohash Cell Filter Has Child Filter Has Parent Filter

Ids Filter Indices Filter

Limit Filter Match All Filter Missing Filter Nested Filter

Not FilterOr Filter

Prefix Filter Query Filter

Range FilterRegexp Filter Script Filter Term Filter

Terms FilterType Filter

Event.search query: { filtered: { query: { simple_query_string: { query: "tokyo rubyist", default_operator: "and" } }, filter: { and: [ { range: { starts_at: { gte: Time.now } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: "asc" } }

class Event < ActiveRecord::Base include Elasticsearch::Model

def as_indexed_json(options={}) { title: title, description: description, starts_at: starts_at.iso8601, featured: group.featured? } end

settings do mapping dynamic: 'false' do indexes :title, type: 'string' indexes :description, type: 'string' indexes :starts_at, type: 'date' indexes :featured, type: 'boolean' end end

Event.import force: true

deletes existing index, creates new index with settings,

imports documents

Event.search query: { filtered: { query: { simple_query_string: { query: "tokyo rubyist", default_operator: "and" } }, filter: { and: [ { range: { starts_at: { gte: Time.now } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: "asc" } }

Event.search query: { bool: { should: [ { simple_query_string: { query: "tokyo rubyist", default_operator: "and" } }, { function_score: { filter: { and: [ { range: { starts_at: { lte: 'now' } } }, { term: { featured: true } } ] }, gauss: { starts_at: { origin: 'now', scale: '10d', decay: 0.5 }, }, boost_mode: "sum" } } ], minimum_should_match: 2 } }

Event.search '東京rubyist'

Dealing with different languages

built in analysers for arabic, armenian, basque, brazilian, bulgarian, catalan, cjk, czech, danish, dutch, english, finnish, french, galician, german, greek, hindi, hungarian, indonesian, irish, italian, latvian, norwegian, persian, portuguese, romanian, russian, sorani, spanish, swedish, turkish, thai.

Japanese?

• install kuromoji plugin

• https://github.com/elastic/elasticsearch-analysis-kuromoji

• plugin install elasticsearch/elasticsearch-analysis-kuromoji/2.7.0

class Event < ActiveRecord::Base include Elasticsearch::Model

def as_indexed_json(options={}) { title: { en: title_en, ja: title_ja }, description: { en: description_en, ja: description_ja }, starts_at: starts_at.iso8601, featured: group.featured? } end

settings do mapping dynamic: 'false' do indexes :title do indexes :en, type: 'string', analyzer: 'english' indexes :ja, type: 'string', analyzer: 'kuromoji' end indexes :description do indexes :en, type: 'string', analyzer: 'english' indexes :ja, type: 'string', analyzer: 'kuromoji' end indexes :starts_at, type: 'date' indexes :featured, type: 'boolean' end end

Event.search 'tokyo rubyist'

with data from other models?

class Event < ActiveRecord::Base include Elasticsearch::Model

def as_indexed_json(options={}) { title: { en: title_en, ja: title_ja }, description: { en: description_en, ja: description_ja }, group_name: { en: group.name_en, ja: group.name_ja }, starts_at: starts_at.iso8601, featured: group.featured? } end

settings do mapping dynamic: 'false' do indexes :title do indexes :en, type: 'string', analyzer: 'english' indexes :ja, type: 'string', analyzer: 'kuromoji' end indexes :description do indexes :en, type: 'string', analyzer: 'english' indexes :ja, type: 'string', analyzer: 'kuromoji' end indexes :group_name do indexes :en, type: 'string', analyzer: 'english' indexes :ja, type: 'string', analyzer: 'kuromoji' end indexes :starts_at, type: 'date' indexes :featured, type: 'boolean' end end

Automated Tests

class Event < ActiveRecord::Base include Elasticsearch::Model

index_name "drkpr_#{Rails.env}_events"

Index names with environment

Test Helpers

• https://gist.github.com/mreinsch/094dc9cf63362314cef4

• Helpers: wait_for_elasticsearchwait_for_elasticsearch_removalclear_elasticsearch!

• specs: Tag tests which require elasticsearch

Production Ready?

• use elastic.co/found or AWS ES

• use two instances for redundancy

• elasticsearch could go away

• usually only impacts search

• keep impact at a minimum

class Event < ActiveRecord::Base include Elasticsearch::Model

after_save do IndexerJob.perform_later( 'update', self.class.name, self.id) end

after_destroy do IndexerJob.perform_later( 'delete', self.class.name, self.id) end

...

class IndexerJob < ActiveJob::Base queue_as :default

def perform(action, record_type, record_id) record_class = record_type.constantize record_data = { index: record_class.index_name, type: record_class.document_type, id: record_id } client = record_class.__elasticsearch__.client

case action.to_s when 'update' record = record_class.find(record_id) client.index record_data.merge(body: record.as_indexed_json) when 'delete' client.delete record_data.merge(ignore: 404) end end

end

https://gist.github.com/mreinsch/acb2f6c58891e5cd4f13

Questions?

Elastic Docs https://www.elastic.co/guide/index.html

Ruby Gem Docshttps://github.com/elastic/elasticsearch-rails

Resources

or ask me later: michael@doorkeeper.jp @mreinsch