Finding the right stuff, an intro to Elasticsearch with Ruby/Rails
-
Upload
michael-reinsch -
Category
Technology
-
view
793 -
download
4
Transcript of Finding the right stuff, an intro to Elasticsearch with Ruby/Rails
Elasticsearch?
clustered and sharded document storage with powerful
language analysing features and a query language,
all wrapped by a REST API
Getting Started
• install elasticsearch
• needs some JDK
• start it
Getting Started
• https://github.com/elastic/elasticsearch-rails
• gems for Rails:
• elasticsearch-model & elasticsearch-rails
• without Rails / AR:
• elasticsearch-persistence
class Event < ActiveRecord::Base include Elasticsearch::Model
class Event < ActiveRecord::Base include Elasticsearch::Model
def as_indexed_json(options={}) { title: title, description: description, starts_at: starts_at.iso8601 } end
Event.import
Event.import
PUT /events/event/31710 { "title": "Finding the right stuff, ...", "description": "Searching in data sets with ...", "starts_at": “2015-10-08T19:00:00+09:00" }
Event.import
PUT /events/event/31710 { "title": "Finding the right stuff, ...", "description": "Searching in data sets with ...", "starts_at": “2015-10-08T19:00:00+09:00" }
index
Event.import
PUT /events/event/31710 { "title": "Finding the right stuff, ...", "description": "Searching in data sets with ...", "starts_at": “2015-10-08T19:00:00+09:00" }
index
type
Event.import
PUT /events/event/31710 { "title": "Finding the right stuff, ...", "description": "Searching in data sets with ...", "starts_at": “2015-10-08T19:00:00+09:00" }
index
type
ID
Event.search 'tokyo rubyist'
response = Event.search 'tokyo rubyist' response.took # => 28 response.results.total # => 2075 response.results.first._score # => 0.921177 response.results.first._source.title # => "Drop in Ruby"
response = Event.search 'tokyo rubyist' response.took # => 28 response.results.total # => 2075 response.results.first._score # => 0.921177 response.results.first._source.title # => "Drop in Ruby"
GET /events/event/_search?q=tokyo%20rubyist
response = Event.search 'tokyo rubyist' response.records.to_a # => [#<Event id: 12409, ...>, ...]
response.page(2).results response.page(2).records
response = Event.search 'tokyo rubyist' response.records.to_a # => [#<Event id: 12409, ...>, ...]
response.page(2).results response.page(2).records supports kaminari /
will_paginate
response = Event.search 'tokyo rubyist' response.records.each_with_hit do |rec,hit| puts "* #{rec.title}: #{hit._score}" end # * Drop in Ruby: 0.9205564 # * Javascript meets Ruby in Kamakura: 0.8947 # * Meetup at EC Navi: 0.8766844 # * Pair Programming Session #3: 0.8603562 # * Kickoff Party: 0.8265461 # * Tales of a Ruby Committer: 0.74487066 # * One Year Anniversary Party: 0.7298212
Event.search 'tokyo rubyist'
Event.search 'tokyo rubyist'
only upcoming events?
Event.search 'tokyo rubyist'
only upcoming events?
sorted by start date?
Event.search query: { filtered: { query: { simple_query_string: { query: "tokyo rubyist", default_operator: "and" } }, filter: { and: [ { range: { starts_at: { gte: Time.now } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: "asc" } }
Event.search query: { filtered: { query: { simple_query_string: { query: "tokyo rubyist", default_operator: "and" } }, filter: { and: [ { range: { starts_at: { gte: Time.now } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: "asc" } }
basically same as before
Event.search query: { filtered: { query: { simple_query_string: { query: "tokyo rubyist", default_operator: "and" } }, filter: { and: [ { range: { starts_at: { gte: Time.now } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: "asc" } }
basically same as before
filtered by conditions
Event.search query: { filtered: { query: { simple_query_string: { query: "tokyo rubyist", default_operator: "and" } }, filter: { and: [ { range: { starts_at: { gte: Time.now } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: "asc" } }
basically same as before
filtered by conditions
sorted by start time
Query DSL
• query: { <query_type>: <arguments> }
• valid arguments depend on query type
• "Filtered Query" takes a query and a filter
• "Simple Query String Query" does not allow nested queries
Event.search query: { filtered: { query: { simple_query_string: { query: "tokyo rubyist", default_operator: "and" } }, filter: { and: [ { range: { starts_at: { gte: Time.now } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: "asc" } }
Query DSL
• filter: { <filter_type>: <arguments> }
• valid arguments depend on filter type
• "And filter" takes an array of filters
• "Range filter" takes a property and lt(e), gt(e)
• "Term filter" takes a property and a value
Match QueryMulti Match Query
Bool Query Boosting Query
Common Terms Query Constant Score Query
Dis Max Query Filtered Query
Fuzzy Like This Query Fuzzy Like This Field Query
Function Score QueryFuzzy Query
GeoShape Query Has Child Query Has Parent Query
Ids Query Indices Query
Match All Query More Like This Query
Nested Query Prefix Query
Query String Query Simple Query String Query
Range Query Regexp Query
Span First Query Span Multi Term Query
Span Near Query Span Not Query Span Or Query
Span Term Query Term Query Terms Query
Top Children Query Wildcard Query
Minimum Should Match Multi Term Query Rewrite
Template Query
And FilterBool Filter
Exists Filter Geo Bounding Box Filter
Geo Distance Filter Geo Distance Range Filter
Geo Polygon Filter GeoShape Filter
Geohash Cell Filter Has Child Filter Has Parent Filter
Ids Filter Indices Filter
Limit Filter Match All Filter Missing Filter Nested Filter
Not FilterOr Filter
Prefix Filter Query Filter
Range FilterRegexp Filter Script Filter Term Filter
Terms FilterType Filter
Event.search query: { filtered: { query: { simple_query_string: { query: "tokyo rubyist", default_operator: "and" } }, filter: { and: [ { range: { starts_at: { gte: Time.now } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: "asc" } }
class Event < ActiveRecord::Base include Elasticsearch::Model
def as_indexed_json(options={}) { title: title, description: description, starts_at: starts_at.iso8601, featured: group.featured? } end
settings do mapping dynamic: 'false' do indexes :title, type: 'string' indexes :description, type: 'string' indexes :starts_at, type: 'date' indexes :featured, type: 'boolean' end end
Event.import force: true
deletes existing index, creates new index with settings,
imports documents
Event.search query: { filtered: { query: { simple_query_string: { query: "tokyo rubyist", default_operator: "and" } }, filter: { and: [ { range: { starts_at: { gte: Time.now } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: "asc" } }
Event.search query: { bool: { should: [ { simple_query_string: { query: "tokyo rubyist", default_operator: "and" } }, { function_score: { filter: { and: [ { range: { starts_at: { lte: 'now' } } }, { term: { featured: true } } ] }, gauss: { starts_at: { origin: 'now', scale: '10d', decay: 0.5 }, }, boost_mode: "sum" } } ], minimum_should_match: 2 } }
Event.search '東京rubyist'
Dealing with different languages
built in analysers for arabic, armenian, basque, brazilian, bulgarian, catalan, cjk, czech, danish, dutch, english, finnish, french, galician, german, greek, hindi, hungarian, indonesian, irish, italian, latvian, norwegian, persian, portuguese, romanian, russian, sorani, spanish, swedish, turkish, thai.
Japanese?
• install kuromoji plugin
• https://github.com/elastic/elasticsearch-analysis-kuromoji
• plugin install elasticsearch/elasticsearch-analysis-kuromoji/2.7.0
class Event < ActiveRecord::Base include Elasticsearch::Model
def as_indexed_json(options={}) { title: { en: title_en, ja: title_ja }, description: { en: description_en, ja: description_ja }, starts_at: starts_at.iso8601, featured: group.featured? } end
settings do mapping dynamic: 'false' do indexes :title do indexes :en, type: 'string', analyzer: 'english' indexes :ja, type: 'string', analyzer: 'kuromoji' end indexes :description do indexes :en, type: 'string', analyzer: 'english' indexes :ja, type: 'string', analyzer: 'kuromoji' end indexes :starts_at, type: 'date' indexes :featured, type: 'boolean' end end
Event.search 'tokyo rubyist'
with data from other models?
class Event < ActiveRecord::Base include Elasticsearch::Model
def as_indexed_json(options={}) { title: { en: title_en, ja: title_ja }, description: { en: description_en, ja: description_ja }, group_name: { en: group.name_en, ja: group.name_ja }, starts_at: starts_at.iso8601, featured: group.featured? } end
settings do mapping dynamic: 'false' do indexes :title do indexes :en, type: 'string', analyzer: 'english' indexes :ja, type: 'string', analyzer: 'kuromoji' end indexes :description do indexes :en, type: 'string', analyzer: 'english' indexes :ja, type: 'string', analyzer: 'kuromoji' end indexes :group_name do indexes :en, type: 'string', analyzer: 'english' indexes :ja, type: 'string', analyzer: 'kuromoji' end indexes :starts_at, type: 'date' indexes :featured, type: 'boolean' end end
Automated Tests
class Event < ActiveRecord::Base include Elasticsearch::Model
index_name "drkpr_#{Rails.env}_events"
Index names with environment
Test Helpers
• https://gist.github.com/mreinsch/094dc9cf63362314cef4
• Helpers: wait_for_elasticsearchwait_for_elasticsearch_removalclear_elasticsearch!
• specs: Tag tests which require elasticsearch
Production Ready?
• use elastic.co/found or AWS ES
• use two instances for redundancy
• elasticsearch could go away
• usually only impacts search
• keep impact at a minimum
class Event < ActiveRecord::Base include Elasticsearch::Model
after_save do IndexerJob.perform_later( 'update', self.class.name, self.id) end
after_destroy do IndexerJob.perform_later( 'delete', self.class.name, self.id) end
...
class IndexerJob < ActiveJob::Base queue_as :default
def perform(action, record_type, record_id) record_class = record_type.constantize record_data = { index: record_class.index_name, type: record_class.document_type, id: record_id } client = record_class.__elasticsearch__.client
case action.to_s when 'update' record = record_class.find(record_id) client.index record_data.merge(body: record.as_indexed_json) when 'delete' client.delete record_data.merge(ignore: 404) end end
end
https://gist.github.com/mreinsch/acb2f6c58891e5cd4f13
Questions?
Elastic Docs https://www.elastic.co/guide/index.html
Ruby Gem Docshttps://github.com/elastic/elasticsearch-rails
Resources
or ask me later: [email protected] @mreinsch