ElasticSearch
Sorting and Relevancehttp://elastic.openthinklabs.com/
SortingGET /_search{ "query" : { "bool" : { "filter" : { "term" : { "user_id" : 1 } } } }}
GET /_search{ "query" : { "constant_score" : { "filter" : { "term" : { "user_id" : 1 } } } }}
SortingSorting by Field Values
GET /_search{ "query" : { "bool" : { "filter" : { "term" : { "user_id" : 1 }} } }, "sort": { "date": { "order": "desc" }}}
"hits" : { "total" : 6, "max_score" : null, "hits" : [ { "_index" : "us", "_type" : "tweet", "_id" : "14", "_score" : null, "_source" : { "date": "2014-09-24", ... }, "sort" : [ 1411516800000 ] }, ...}
SortingMultilevel Sorting
GET /_search{ "query" : { "bool" : { "must": { "match": { "tweet": "manage text search" }}, "filter" : { "term" : { "user_id" : 2 }} } }, "sort": [ { "date": { "order": "desc" }}, { "_score": { "order": "desc" }} ]}
SortingMultilevel Sorting
GET /_search{ "query" : { "bool" : { "must": { "match": { "tweet": "manage text search" }}, "filter" : { "term" : { "user_id" : 2 }} } }, "sort": [ { "date": { "order": "desc" }}, { "_score": { "order": "desc" }} ]}
SortingSorting on Multivalue Fields
"sort": { "dates": { "order": "asc", "mode": "min" }}
String Sorting and Multifields
"tweet": { "type": "string", "analyzer": "english"}
"tweet": { "type": "string", "analyzer": "english", "fields": { "raw": { "type": "string", "index": "not_analyzed" } }}
GET /_search{ "query": { "match": { "tweet": "elasticsearch" } }, "sort": "tweet.raw"}
What Is Relevance?
● The standard similarity algorithm used in Elasticsearch : ● Term frequency : How often does the term appear in the
field? The more often, the more relevant. A field containing five mentions of the same term is more likely to be relevant than a field containing just one mention.
● Inverse document frequency : How often does each term appear in the index? The more often, the less relevant. Terms that appear in many documents have a lower weight than more-uncommon terms.
● Field-length norm : How long is the field? The longer it is, the less likely it is that words in the field will be relevant. A term appearing in a short title field carries more weight than the same term appearing in a long content field
What Is Relevance?Understanding the Score
GET /_search?explain { "query" : { "match" : { "tweet" : "honeymoon" }}}
What Is Relevance?Understanding Why a Document Matched
GET /us/tweet/12/_explain{ "query" : { "bool" : { "filter" : { "term" : { "user_id" : 2 }}, "must" : { "match" : { "tweet" : "honeymoon" }} } }}
"failure to match filter: cache(user_id:[2 TO 2])"
Doc Values Intro
● Doc values are used in several places in Elasticsearch:● Sorting on a field● Aggregations on a field● Certain filters (for example, geolocation filters)● Scripts that refer to fields
Referensi
● ElasticSearch, The Definitive Guide, A Distributed Real-Time Search and Analytics Engine, Clinton Gormely & Zachary Tong, O’Reilly
Top Related