Lucene Search Essentials: Scorers, Collectors and Custom Queries

179

description

Presented by Mikhail Khludnev, Principal Engineer, Grid Dynamics My team is building next generation eCommerce search platform for major an online retailer with quite challenging business requirements. Turns out, default Lucene toolbox doesn’t ideally fit for those challenges. Thus, the team had to hack deep into Lucene core to achieve our goals. We accumulated quite a deep understanding of Lucene search internals and want to share our experience. We will start with an API overview, and then look at essential search algorithms and their implementations in Lucene. Finally, we will review a few cases of query customization, pitfalls and common performance problems.

Transcript of Lucene Search Essentials: Scorers, Collectors and Custom Queries

Page 1: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 2: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Scorers, Collectors and Custom Queries

Mikhail Khludnev

Page 3: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Custom Queries

Page 4: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Custom Queries

Page 5: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Custom Queries

http://nlp.stanford.edu/IR-book/

Page 6: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Custom Queries

http://nlp.stanford.edu/IR-book/

Page 7: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Custom Queries

Match Spotting

http://nlp.stanford.edu/IR-book/

Page 8: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Custom Queries ..hm what for ?

Page 9: Lucene Search Essentials: Scorers, Collectors and Custom Queries

qf=STYLE TYPE

denim dress

Page 10: Lucene Search Essentials: Scorers, Collectors and Custom Queries

qf=STYLE TYPE

denim dress

DisjunctionMaxQuery((

(STYLE:denim OR TYPE:denim) |

(STYLE:dress OR TYPE:dress)

))

Page 11: Lucene Search Essentials: Scorers, Collectors and Custom Queries

qf=STYLE TYPEdenim dress

( DisjunctionMaxQuery((

STYLE:denim | TYPE:denim ))

)OR( DisjunctionMaxQuery((

STYLE:dress | TYPE::dress ))

)

Page 12: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Custom Queries

Page 13: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Inverted Index

Page 14: Lucene Search Essentials: Scorers, Collectors and Custom Queries

T[0] = "it is what it is"T[1] = "what is it"T[2] = "it is a banana"

Page 15: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"a": {2}"banana": {2}"is": {0, 1, 2}"it": {0, 1, 2}"what": {0, 1}

T[0] = "it is what it is"T[1] = "what is it"T[2] = "it is a banana"

Page 16: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"a": {2}"banana": {2}"is": {0, 1, 2}"it": {0, 1, 2}"what": {0, 1} postings list

term dictionary

Page 17: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"a" "banana""is"→"t""what"

{2}{2}{0, 1, 2}{0, 1, 2}{0, 1}

index/_1.tis

index/_1.frq

Page 19: Lucene Search Essentials: Scorers, Collectors and Custom Queries

What is a Scorer?

Page 20: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"a": {2}"banana": {2}"is": {0, 1, 2}"it": {0, 1, 2}"what": {0, 1}

Page 21: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"a": {2}"banana": {2}"is": {0, 1, 2}"it": {0, 1, 2}"what": {0, 1}

Page 22: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"a": {2}"banana": {2}"is": {0, 1, 2}"it": {0, 1, 2}"what": {0, 1}

Page 23: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 24: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 25: Lucene Search Essentials: Scorers, Collectors and Custom Queries

while(

(doc = nextDoc())!=NO_MORE_DOCS){

println("found "+ doc +

" with score "+score());

}

Page 26: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 27: Lucene Search Essentials: Scorers, Collectors and Custom Queries

2783 issues

Page 28: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Note: Weight is omitted for sake of compactness

Page 29: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 30: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Custom Queries

http://nlp.stanford.edu/IR-book/

Page 31: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Doc-at-time search

Page 32: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

what OR is OR a OR banana

Page 33: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

what OR is OR a OR banana

Page 34: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"is": {0, 1, 2}

"what": {0, 1}

"a": {2}

"banana": {2}

"it": {0, 1, 2}

Page 35: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"is": {0, 1, 2}

"what": {0, 1}

"a": {2}

"banana": {2}

collect(0)score():2

Collector

Page 36: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"is": {0, 1, 2}

"what": {0, 1}

"a": {2}

"banana": {2}

docID×score0×2

Page 37: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"is": {0, 1, 2}

"what": {0, 1}

"a": {2}

"banana": {2}

collect(1)score():2

Collector0×2

Page 38: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"is": {0, 1, 2}

"what": {0, 1}

"a": {2}

"banana": {2}

Collector0×21×2

Page 39: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"is": {0, 1, 2}

"a": {2}

"banana": {2}

"what": {0, 1}collect(2)score():3

Collector0×21×2

Page 40: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Term-at-time search"lorem" "ipsum" "dolor""sit" "amet" "consectetur"

Page 41: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

what OR is OR a OR banana

Page 42: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Accumulator... 0×1 ... 1×1 ...

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Page 43: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Accumulator... 0×2 ... 1×2 ... 2×1 ...

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Page 44: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Accumulator... 0×2 ... 1×2 ... 2×2 ...

Page 45: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Accumulator... 0x2 ... 1x2 ... 2x3 ...

Page 46: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Accumulator... 0×2 ... 1×2 ... 2×3 ...

Collector2×30×21×2

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Page 47: Lucene Search Essentials: Scorers, Collectors and Custom Queries

http://nlp.stanford.edu/IR-book/

"lorem" "ipsum" "dolor""sit" "amet" "consectetur"

O(n)

Page 48: Lucene Search Essentials: Scorers, Collectors and Custom Queries

1×97×92×72×59×56×4......≤4......

k

n

Page 50: Lucene Search Essentials: Scorers, Collectors and Custom Queries

6×4

log k 9×5 2×4

2×7 7×9 1×9

...

...≤4......

n

Page 51: Lucene Search Essentials: Scorers, Collectors and Custom Queries

q

p

what OR is OR a OR banana

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Page 52: Lucene Search Essentials: Scorers, Collectors and Custom Queries

doc at time term at time

complexity

memory

Page 53: Lucene Search Essentials: Scorers, Collectors and Custom Queries

doc at time term at time

complexity O(p + n log k)

memory

Page 54: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"what": {0, 1}

q1

1 2

2

Page 55: Lucene Search Essentials: Scorers, Collectors and Custom Queries

doc at time term at time

complexity O(p log q + n log k) O(p + n log k)

memory

Page 56: Lucene Search Essentials: Scorers, Collectors and Custom Queries

doc at time term at time

complexity O(p log q + n log k) O(p + n log k)

memory q + k

Page 57: Lucene Search Essentials: Scorers, Collectors and Custom Queries

doc at time term at time

complexity O(p log q + n log k) O(p + n log k)

memory q + k n

Page 58: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 59: Lucene Search Essentials: Scorers, Collectors and Custom Queries

BooleanScorer

Page 60: Lucene Search Essentials: Scorers, Collectors and Custom Queries

×1

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Hashtable[2]

org.apache.lucene.search.BooleanScorer

×1 0 1

chunk

Page 61: Lucene Search Essentials: Scorers, Collectors and Custom Queries

x2

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

org.apache.lucene.search.BooleanScorer

x2 0 1

chunk

Page 62: Lucene Search Essentials: Scorers, Collectors and Custom Queries

org.apache.lucene.search

Collector0×21×2×2 ×2

0 1

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Page 63: Lucene Search Essentials: Scorers, Collectors and Custom Queries

org.apache.lucene.search

Collector0×21×2×1

0 1

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Page 64: Lucene Search Essentials: Scorers, Collectors and Custom Queries

org.apache.lucene.search

Collector0×21×2×2

0 1

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Page 65: Lucene Search Essentials: Scorers, Collectors and Custom Queries

org.apache.lucene.search

Collector0×21×2×3

0 1

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Page 66: Lucene Search Essentials: Scorers, Collectors and Custom Queries

org.apache.lucene.search

Collector2×30×21×2

×3

0 1

"a": {2}

"banana": {2}

"is": {0, 1, 2}

"it": {0, 1, 2}

"what": {0, 1}

Page 67: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Linked Open Hash [2K]

×1 ×1 ×5 ×2 ×2

0 1 2 3 4 5 6 7

×3

Page 68: Lucene Search Essentials: Scorers, Collectors and Custom Queries

new BooleanScorer

new BooleanScorer2

//term-at-time

//doc-at-time

if ( collector.acceptsDocsOutOfOrder() && topScorer &&

required.size() == 0 && minNrShouldMatch == 1) {

else

Page 69: Lucene Search Essentials: Scorers, Collectors and Custom Queries

q=village operations years disaster visit

Page 70: Lucene Search Essentials: Scorers, Collectors and Custom Queries

q=village operations years disaster visit etc map seventieth peneplains tussock sir memory character campaign author public wonder forker middy vocalize enable race object signal symptom deputy where typhous rectifiable polygamous originally look generation ultimately reasonably ratio numb apposing enroll manhood problem suddenly definitely corp event material affair diploma would dimout speech notion engine artist hotel text field hashed rottener impeding i cricket virtually valley sunday rock come observes gallnuts vibrantly prize involve

Page 71: Lucene Search Essentials: Scorers, Collectors and Custom Queries

q=+village +operations +years +disaster +visit

Page 72: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Conjunction(+, MUST)

Page 73: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"a": {2,3}

"banana": {2,3}

"is": {0, 1, 2, 3}

"it": {0, 1, 3}

"what": {0, 1, 3}

what AND is AND a AND it

Page 74: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"a": {2,3}

"banana": {2,3}

"is": {0, 1, 2, 3}

"it": {0, 1, 3}

"what": {0, 1, 3}

Page 75: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"a": {2,3}

"banana": {2,3}

"is": {0, 1, 2, 3}

"it": {0, 1, 3}

"what": {0, 1, 3}

Page 76: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"a": {2,3}

"banana": {2,3}

"is": {0, 1, 2, 3}

"it": {0, 1, 3}

"what": {0, 1, 3}

Page 77: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"a": {2,3}

"banana": {2,3}

"is": {0, 1, 2, 3}

"it": {0, 1, 3}

"what": {0, 1, 3}

Page 78: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"a": {2,3}

"banana": {2,3}

"is": {0, 1, 2, 3}

"it": {0, 1, 3}

"what": {0, 1, 3}Collector

3 x 4

Page 79: Lucene Search Essentials: Scorers, Collectors and Custom Queries

http://www.flickr.com/photos/fatniu/184615348/

Page 80: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Ω(n q + n log k)

Page 81: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Wrap-up● doc-at-time vs term-at-time

● conjunction and leapfrog

Page 82: Lucene Search Essentials: Scorers, Collectors and Custom Queries

complexity O(n)

memory O(const)

Page 83: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Custom Queries

http://nlp.stanford.edu/IR-book/

Page 84: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Custom Queries

● Sample Coverage Query

● Deeply Branched vs Flat

● minShouldMatch

● Filtering

● Performance Problem

Page 85: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"silver" "jeans" "dress"

silver jeans dress

Note: "foo bar" is not a phrase query, just a string

Page 86: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"silver" "jeans" "dress""silver jeans dress"

silver jeans dress

Page 87: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"silver" "jeans" "dress""silver jeans dress""silver jeans" "dress""silver" "jeans dress"

silver jeans dress

Page 88: Lucene Search Essentials: Scorers, Collectors and Custom Queries

"silver" "jeans" "dress""silver jeans dress""silver jeans" "dress""silver" "jeans dress"

"silver" "dress""silver jeans" "jeans""silver jeans""jeans" "dress"

silver jeans dress

Note: "foo bar" is not a phrase query, just a string

Page 89: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 90: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 91: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 92: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 93: Lucene Search Essentials: Scorers, Collectors and Custom Queries

boolean verifyMatch(){ int sumLength=0; for(Scorer child:getChildren()){ if(child.docID()==docID()){ TermQuery tq=child.weight.query; sumLength += tq.term.text.length; } } return sumLength>=expectedLength;}

Page 94: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Deeply Branched vs Flat

Page 95: Lucene Search Essentials: Scorers, Collectors and Custom Queries

(+"silver jeans" +"dress")ORmax

(+"silver jeans dress")ORmax

(+"silver" +((+"jeans" +"dress")

ORmax +"jeans dress"

) )

ORmax is DisjunctionMaxQuery

Page 96: Lucene Search Essentials: Scorers, Collectors and Custom Queries

(+"silver jeans" +"dress")ORmax

(+"silver jeans dress")ORmax

(+"silver" +((+"jeans" +"dress")

ORmax +"jeans dress"

) )

ORmax is DisjunctionMaxQuery

Page 97: Lucene Search Essentials: Scorers, Collectors and Custom Queries

(+"silver jeans" +"dress")ORmax

(+"silver jeans dress")ORmax

(+"silver" +((+"jeans" +"dress")

ORmax +"jeans dress"

) )

ORmax is DisjunctionMaxQuery

Page 98: Lucene Search Essentials: Scorers, Collectors and Custom Queries

("silver jeans" "dress")ORmax

("silver jeans dress")ORmax

("silver" (("jeans" "dress")

ORmax "jeans dress"

) )

ORmax is DisjunctionMaxQuery

Page 99: Lucene Search Essentials: Scorers, Collectors and Custom Queries

B:"silver jeans dress" ORmaxT:"silver jeans dress" ORmaxS:"silver jeans dress"

B:"silver" ORmaxT:"silver" ORmaxS:"silver"

+B:"jeans dress" ORmaxT:"jeans dress" ORmaxS:"jeans dress"

+

ORmax

ORmax

ORmax

B:"silver jeans" ORmaxT:"silver jeans" ORmaxS:"silver jeans"

+B:"dress" ORmaxT:"dress" ORmaxS:"dress"

+

B:"jeans" ORmaxT:"jeans" ORmaxS:"jeans"

+B:"dress" ORmaxT:"dress" ORmaxS:"dress"

+

B - BRANDT - TYPES - STYLE

Page 100: Lucene Search Essentials: Scorers, Collectors and Custom Queries

B:"silver" T:"silver" S:"silver"

B:"jeans" T:"jeans" S:"jeans"

B:"dress" T:"dress" S:"dress"

B:"silver jeans" T:"silver jeans" S:"silver jeans"

B:"silver jeans dress" T:"silver jeans dress"

S:"silver jeans dress"

B:"jeans dress" T:"jeans dress" S:"jeans dress"

Page 101: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 102: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Steadiness problemAFAIK 3.x only.

Page 103: Lucene Search Essentials: Scorers, Collectors and Custom Queries

{1, 3, 7, 10, 27,30,..}

{3, 5, 10, 27,32,..}

{2,3, 27,31,..}

{..., 30,37,..}

3

3 20

3 30 30

{..., 30, 31,32,..}{..., 20, 27,32,..}

Page 104: Lucene Search Essentials: Scorers, Collectors and Custom Queries

{1, 3, 7, 10, 27,30,..}

{3, 5, 10, 27,32,..}

{2,3, 27,31,..}

{..., 30,37,..}

5

7 20

27 30 30

{..., 30, 31,32,..}{..., 20, 27,32,..}

3docID=

3.x

Page 105: Lucene Search Essentials: Scorers, Collectors and Custom Queries

minShouldMatch

Page 106: Lucene Search Essentials: Scorers, Collectors and Custom Queries

straight jeans

silver jeans

silver jeans straight

jeans

silver

minShouldMatch=2

straight silver jeans

Page 107: Lucene Search Essentials: Scorers, Collectors and Custom Queries

int nextDoc() {while(true) {

while (subScorers[0].docID() == doc) { if (subScorers[0].nextDoc() != NO_DOCS) { heapAdjust(0); } else { .... } } ... if (nrMatchers >= minimumNrMatchers) { break; }

}return doc;

}

org.apache.lucene.search.DisjunctionSumScorer

Page 108: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Let’s filter!btw, what it is?

Page 109: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 110: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 111: Lucene Search Essentials: Scorers, Collectors and Custom Queries

RANDOM_ACCESS_FILTER_STRATEGY

LEAP_FROG_FILTER_FIRST_STRATEGY

LEAP_FROG_QUERY_FIRST_STRATEGY

QUERY_FIRST_FILTER_STRATEGY

Page 112: Lucene Search Essentials: Scorers, Collectors and Custom Queries

http://localhost:8983/solr/collection1/select

?q=village operations years disaster visit etc map

seventieth peneplains tussock sir memory character

campaign author public wonder forker middy vocalize

enable race object signal symptom deputy where

generation ultimately reasonably ratio numb apposing

enroll manhood problem suddenly definitely corp event

gallnuts vibrantly prize involve explanation module&

qf=text_all&defType=edismax&

Page 113: Lucene Search Essentials: Scorers, Collectors and Custom Queries

http://localhost:8983/solr/collection1/select

?q=village operations years disaster visit etc map

seventieth peneplains tussock sir memory character

campaign author public wonder forker middy vocalize

enable race object signal symptom deputy where

generation ultimately reasonably ratio numb apposing

enroll manhood problem suddenly definitely corp event

gallnuts vibrantly prize involve explanation module&

qf=text_all&defType=edismax&

fq= id:yes_49912894 id:nurse_30134968&

Page 114: Lucene Search Essentials: Scorers, Collectors and Custom Queries

http://localhost:8983/solr/collection1/select

?q=village operations years disaster visit etc map

seventieth peneplains tussock sir memory character

campaign author public wonder forker middy vocalize

enable race object signal symptom deputy where

generation ultimately reasonably ratio numb apposing

enroll manhood problem suddenly definitely corp event

gallnuts vibrantly prize involve explanation module&

qf=text_all&defType=edismax&

fq= id:yes_49912894 id:nurse_30134968&

mm=32&

Page 115: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 116: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 117: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 118: Lucene Search Essentials: Scorers, Collectors and Custom Queries

{1, 3, 7, 10, 27,30,..}

{3, 5, 10, 27,32,..}

{ 20,27,31,..}

mm=3 { 30,37,..}

Page 119: Lucene Search Essentials: Scorers, Collectors and Custom Queries

{1, 3, 7, 10, 27,30,..}

{3, 5, 10, 27,32,..}

{ 20,27,31,..}

mm=3 { 30,37,..}

Page 120: Lucene Search Essentials: Scorers, Collectors and Custom Queries

{1, 3, 7, 10, 27,30,..}

{3, 5, 10, 27,32,..}

{ 20,27,31,..}

mm=3 { 30,37,..}

Page 121: Lucene Search Essentials: Scorers, Collectors and Custom Queries

{1, 3, 7, 10, 27,30,..}

{3, 5, 10, 27,32,..}

{ 20,27,31,..}

mm=3 { 30,37,..}

Page 122: Lucene Search Essentials: Scorers, Collectors and Custom Queries

{1, 3, 7, 10, 27,30,..}

{3, 5, 10, 27,32,..}

{ 20,27,31,..}

mm=3 { 30,37,..}

Page 123: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 124: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 125: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Custom Queries

Match Spotting

http://nlp.stanford.edu/IR-book/

Page 126: Lucene Search Essentials: Scorers, Collectors and Custom Queries

BRAND:"silver jeans" TYPE:"dress" STYLE:"white"

BRAND:"alfani" TYPE:"dress" STYLE:"silver","jeans"

BRAND:"chaloree" TYPE:"dress" STYLE:"silver"

BRAND:"style&co" TYPE:"jeans dress" STYLE:"silver"

BRAND:"silver jeans" TYPE:"dress" STYLE:"black"

BRAND:"silver jeans" TYPE:"dress" STYLE:"white"

BRAND:"silver jeans" TYPE:"jacket" STYLE: "black"

BRAND:"angie" TYPE:"dress" STYLE:"silver","jeans"

BRAND:"chaloree" TYPE:"jeans dress" STYLE:"silver"

BRAND:"silver jeans" TYPE:"dress" STYLE:"blue"

BRAND:"dotty" TYPE:"dress" STYLE:"silver","jeans"

BRAND:"chaloree" STYLE:"jeans" "dress"

Page 127: Lucene Search Essentials: Scorers, Collectors and Custom Queries

BRAND:"silver jeans" TYPE:"dress" STYLE:"white"

BRAND:"alfani" TYPE:"dress" STYLE:"silver","jeans"

BRAND:"chaloree" TYPE:"dress" STYLE:"silver"

BRAND:"style&co" TYPE:"jeans dress" STYLE:"silver"

BRAND:"silver jeans" TYPE:"dress" STYLE:"black"

BRAND:"silver jeans" TYPE:"dress" STYLE:"white"

BRAND:"silver jeans" TYPE:"jacket" STYLE: "black"

BRAND:"angie" TYPE:"dress" STYLE:"silver","jeans"

BRAND:"chaloree" TYPE:"jeans dress" STYLE:"silver"

BRAND:"silver jeans" TYPE:"dress" STYLE:"blue"

BRAND:"dotty" TYPE:"dress" STYLE:"silver","jeans"

BRAND:"chaloree" STYLE:"jeans" "dress"

silver jeans dress

Page 128: Lucene Search Essentials: Scorers, Collectors and Custom Queries

BRAND:"silver jeans" TYPE:"dress" STYLE:"white"

BRAND:"alfani" TYPE:"dress" STYLE:"silver","jeans"

BRAND:"chaloree" TYPE:"dress" STYLE:"silver"

BRAND:"style&co" TYPE:"jeans dress" STYLE:"silver"

BRAND:"silver jeans" TYPE:"dress" STYLE:"black"

BRAND:"silver jeans" TYPE:"dress" STYLE:"white"

BRAND:"silver jeans" TYPE:"jacket" STYLE: "black"

BRAND:"angie" TYPE:"dress" STYLE:"silver","jeans"

BRAND:"chaloree" TYPE:"jeans dress" STYLE:"silver"

BRAND:"silver jeans" TYPE:"dress" STYLE:"blue"

BRAND:"dotty" TYPE:"dress" STYLE:"silver","jeans"

BRAND:"chaloree" STYLE:"jeans" "dress"

Page 129: Lucene Search Essentials: Scorers, Collectors and Custom Queries

BRAND:"silver jeans" TYPE:"dress"

TYPE:"dress" STYLE:"silver","jeans"

TYPE:"jeans dress" STYLE:"silver"

BRAND:"silver jeans" TYPE:"dress"

BRAND:"silver jeans" TYPE:"dress"

TYPE:"dress" STYLE:"silver","jeans"

TYPE:"jeans dress" STYLE:"silver"

BRAND:"silver jeans" TYPE:"dress"

TYPE:"dress" STYLE:"silver","jeans"

Page 130: Lucene Search Essentials: Scorers, Collectors and Custom Queries

BRAND:"silver jeans" TYPE:"dress"

TYPE:"dress" STYLE:"silver","jeans"

TYPE:"jeans dress" STYLE:"silver"

BRAND:"silver jeans" TYPE:"dress"

BRAND:"silver jeans" TYPE:"dress"

TYPE:"dress" STYLE:"silver","jeans"

TYPE:"jeans dress" STYLE:"silver"

BRAND:"silver jeans" TYPE:"dress"

TYPE:"dress" STYLE:"silver","jeans"

Page 131: Lucene Search Essentials: Scorers, Collectors and Custom Queries

BRAND:"silver jeans" TYPE:"dress" (4)TYPE:"dress" STYLE:"silver","jeans"

TYPE:"jeans dress" STYLE:"silver"

TYPE:"dress" STYLE:"silver","jeans"

TYPE:"jeans dress" STYLE:"silver"

TYPE:"dress" STYLE:"silver","jeans"

Page 132: Lucene Search Essentials: Scorers, Collectors and Custom Queries

BRAND:"silver jeans" TYPE:"dress" (4)TYPE:"dress" STYLE:"silver","jeans"

TYPE:"jeans dress" STYLE:"silver"

TYPE:"dress" STYLE:"silver","jeans"

TYPE:"jeans dress" STYLE:"silver"

TYPE:"dress" STYLE:"silver","jeans"

Page 133: Lucene Search Essentials: Scorers, Collectors and Custom Queries

BRAND:"silver jeans" TYPE:"dress" (4)

TYPE:"dress" STYLE:"silver","jeans" (3)

TYPE:"jeans dress" STYLE:"silver"

TYPE:"jeans dress" STYLE:"silver"

Page 134: Lucene Search Essentials: Scorers, Collectors and Custom Queries

BRAND:"silver jeans" TYPE:"dress" (4)

TYPE:"dress" STYLE:"silver","jeans" (3)

TYPE:"jeans dress" STYLE:"silver" (2)

Page 135: Lucene Search Essentials: Scorers, Collectors and Custom Queries

BRAND:"silver jeans" TYPE:"dress" (4)

TYPE:"dress" STYLE:"silver","jeans" (3)

TYPE:"jeans dress" STYLE:"silver" (2)

silver jeans dress

Page 136: Lucene Search Essentials: Scorers, Collectors and Custom Queries

BRAND:"silver jeans" TYPE:"dress" (4)

TYPE:"dress" STYLE:"silver","jeans" (3)

TYPE:"jeans dress" STYLE:"silver" (2)

silver jeans dress

Page 137: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Scorers, Collectors and Custom Queries

http://google.com/+MikhailKhludnev

http://goo.gl/7LJFi

Page 138: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Appendixes● Drill Sideways Facets● Collectors

Page 139: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Appendix D

Drill Sideways Facets

Page 140: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 141: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 142: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 143: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 144: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 145: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 146: Lucene Search Essentials: Scorers, Collectors and Custom Queries

+CATEGORY: Denim +FIT: Straight +WASH: Dark&B

Page 147: Lucene Search Essentials: Scorers, Collectors and Custom Queries

+CATEGORY: Denim +FIT: Straight +WASH: Dark&B

+CATEGORY: Denim +WASH: Dark&B

Page 148: Lucene Search Essentials: Scorers, Collectors and Custom Queries

+CATEGORY: Denim +FIT: Straight +WASH: Dark&B

+CATEGORY: Denim +WASH: Dark&B

+CATEGORY: Denim +FIT: Straight

Page 149: Lucene Search Essentials: Scorers, Collectors and Custom Queries

+CATEGORY: Denim FIT: Straight WASH: Dark&Black ... /minShouldMatch=Ndrilldowns-1

Page 150: Lucene Search Essentials: Scorers, Collectors and Custom Queries

+CAT: Denim

FIT: Straight

WASH: Dark

Page 151: Lucene Search Essentials: Scorers, Collectors and Custom Queries

+CAT: Denim

FIT: Straight

WASH: Dark

totalHits3

near miss2

near miss2

Page 152: Lucene Search Essentials: Scorers, Collectors and Custom Queries

+CAT: Denim

FIT: Straight

WASH: Dark

totalHits3

near miss2

near miss2

Page 153: Lucene Search Essentials: Scorers, Collectors and Custom Queries

+CAT: Denim

FIT: Straight

WASH: Dark

totalHits3

near miss2

near miss2

Page 154: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Doc at timebase query is highly selective

Page 155: Lucene Search Essentials: Scorers, Collectors and Custom Queries

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

Page 156: Lucene Search Essentials: Scorers, Collectors and Custom Queries

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

Page 157: Lucene Search Essentials: Scorers, Collectors and Custom Queries

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

Page 158: Lucene Search Essentials: Scorers, Collectors and Custom Queries

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

TopDocsCollector

Page 159: Lucene Search Essentials: Scorers, Collectors and Custom Queries

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

TopDocsCollector

Page 160: Lucene Search Essentials: Scorers, Collectors and Custom Queries

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

TopDocsCollector

Page 161: Lucene Search Essentials: Scorers, Collectors and Custom Queries

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

TopDocsCollector

Page 162: Lucene Search Essentials: Scorers, Collectors and Custom Queries

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

TopDocsCollector

Page 163: Lucene Search Essentials: Scorers, Collectors and Custom Queries

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

TopDocsCollector

Page 164: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Term at timedrilldown queries are highly selective

Page 165: Lucene Search Essentials: Scorers, Collectors and Custom Queries

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

hits 1

miss Fit

hits 1

miss Fit

hits 1

miss Fit

hits 1

miss Fit

hits 1

miss Fit

1 2 7 11 12 13 1510

8 9...

Page 166: Lucene Search Essentials: Scorers, Collectors and Custom Queries

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

hits 1

miss Fit

hits 1

miss Fit

hits 1

miss Fit

hits 2

miss no

1 2 7 11 12 13 1510

hits 1

miss Wash

hits 1

missWash

8 9...

hits 1

miss Wash

hits 2

miss no

hits 1

miss Wash

Page 167: Lucene Search Essentials: Scorers, Collectors and Custom Queries

+CAT:D..{1, 7, 9, 15 }FIT:S.. {2, 7, 8, 9, 10,12}WASH:D..{2, 7, 11,13,15}...

hits 1

miss Wash Cat

hits 1

miss FitCat

hits 1

miss Wash Cat

hits 1

miss Fit Cat

hits 2

miss Fit

hits 2

miss Cat

1 2 7 11 12 13 1510

hits 1

missWash Cat

8 9...

hits 3

miss

hits 2

miss Wash

Page 168: Lucene Search Essentials: Scorers, Collectors and Custom Queries

hits 1

miss Wash Cat

hits 1

miss FitCat

hits 1

miss Wash Cat

hits 1

miss Fit Cat

hits 2

miss Fit

hits 2

miss Cat

1 2 7 11 12 13 1510

hits 1

missWash Cat

8 9...

hits 3

miss no

hits 2

miss Wash

Page 169: Lucene Search Essentials: Scorers, Collectors and Custom Queries

hits 2

miss Fit

1 2 7 11 12 13 15108 9...

hits 3

miss no

hits 2

miss Wash

TopDocsCollector

Page 170: Lucene Search Essentials: Scorers, Collectors and Custom Queries

TopDocsCollector

hits 2

miss Fit

1 2 7 11 12 13 15108 9...

hits 3

miss no

hits 2

miss Wash

Page 171: Lucene Search Essentials: Scorers, Collectors and Custom Queries

TopDocsCollector

hits 2

miss Fit

1 2 7 11 12 13 15108 9...

hits 3

miss no

hits 2

miss Wash

Page 172: Lucene Search Essentials: Scorers, Collectors and Custom Queries

Collector

DocSetCollector TopDocsCollector

TopFieldCollector

TopScoreDocsCollector

Page 173: Lucene Search Essentials: Scorers, Collectors and Custom Queries

long [952045] = { 0, 0, 0, 0, 2050, 0, 0, 8, 0, 0, 0,... }

int [2079] = {4, 12, 45, 67, 103, 673, 5890, 34103,...}

int [100] = {8947, 7498,1, 230, 2356, 9812, 167,....}

DocSet or DocList?

Page 174: Lucene Search Essentials: Scorers, Collectors and Custom Queries

DocList/TopDoc DocSet

Sizek

(numHits or rows)

N(maxDocs)

Ordered by score or field docID

Out-of-order collecting allows*

almost could allow

(No)

Page 175: Lucene Search Essentials: Scorers, Collectors and Custom Queries

?×4 6×4

9×5 2×4

2×7 7×9 1×9

Page 176: Lucene Search Essentials: Scorers, Collectors and Custom Queries

http://www.flickr.com/photos/jbagley/4303976811/sizes/o/

Page 177: Lucene Search Essentials: Scorers, Collectors and Custom Queries

class OutOfOrderTopScoreDocCollector

boolean acceptsDocsOutOfOrder(){ return true; } .. void collect(int doc) { float score = scorer.score(); ... if (score == pqTop.score && doc > pqTop.doc) { ...}

Page 178: Lucene Search Essentials: Scorers, Collectors and Custom Queries
Page 179: Lucene Search Essentials: Scorers, Collectors and Custom Queries

UML

http://www.flickr.com/photos/kristykay/2922670979/lightbox/