Lucene rev preso busch realtime search lr1010
-
Upload
lucidworks-archived -
Category
Documents
-
view
338 -
download
0
Transcript of Lucene rev preso busch realtime search lr1010
![Page 2: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/2.jpg)
A Billion Queries Per Day
Agenda
‣ Introduction
- Posting list format and early query termination
- Index memory model
- Lock-free algorithms and data structures
- Roadmap
2
![Page 3: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/3.jpg)
Introduction
3
![Page 4: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/4.jpg)
Introduction
• Search @ Twitter today:
• >1,000 TPS (tweets/sec) = >80,000,000 tweets/day
• >12,000 QPS (queries/sec) = >1,000,000,000 queries/day
• <10s latency (entire pipeline)
• <1s latency (indexer)
• Performance goal: Ability to scale to support Twitter’s continued growth
4
![Page 5: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/5.jpg)
Introduction
• 1st gen search engine based on MySQL
• Next-gen engine based on (partially rewritten) Lucene Java
• Significantly more performant (orders of magnitude)
• Much easier to develop new features on the new platform - stay tuned!
5
![Page 6: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/6.jpg)
A Billion Queries Per Day
Agenda
- Introduction
‣ Posting list format and early query termination
- Index memory model
- Lock-free algorithms and data structures
- Roadmap
6
![Page 7: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/7.jpg)
Posting list format and early query termination
7
![Page 8: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/8.jpg)
Objectives
• Only evaluate as few documents as possible before terminating the query
• Rank documents in reverse time order (newest documents first)
8
![Page 9: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/9.jpg)
Inverted Index 101
1 The old night keeper keeps the keep in the town
2 In the big old house in the big old gown.
3 The house in the town had the big old keep
4 Where the old night keeper never did sleep.
5 The night keeper keeps the keep in the night
6 And keeps in the dark and sleeps in the light.
Table with 6 documents
Example from:Justin Zobel , Alistair Moffat, Inverted files for text search engines, ACM Computing Surveys (CSUR)v.38 n.2, p.6-es, 2006
9
![Page 10: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/10.jpg)
Inverted Index 101
1 The old night keeper keeps the keep in the town
2 In the big old house in the big old gown.
3 The house in the town had the big old keep
4 Where the old night keeper never did sleep.
5 The night keeper keeps the keep in the night
6 And keeps in the dark and sleeps in the light.
term freq
and 1 <6>big 2 <2> <3>
dark 1 <6>did 1 <4>
gown 1 <2>had 1 <3>
house 2 <2> <3>in 5 <1> <2> <3> <5> <6>
keep 3 <1> <3> <5>keeper 3 <1> <4> <5>keeps 3 <1> <5> <6>light 1 <6>
never 1 <4>night 3 <1> <4> <5>old 4 <1> <2> <3> <4>
sleep 1 <4>sleeps 1 <6>
the 6 <1> <2> <3> <4> <5> <6>town 2 <1> <3>where 1 <4>
Table with 6 documents
Dictionary and posting lists10
![Page 11: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/11.jpg)
Inverted Index 101
1 The old night keeper keeps the keep in the town
2 In the big old house in the big old gown.
3 The house in the town had the big old keep
4 Where the old night keeper never did sleep.
5 The night keeper keeps the keep in the night
6 And keeps in the dark and sleeps in the light.
term freq
and 1 <6>big 2 <2> <3>
dark 1 <6>did 1 <4>
gown 1 <2>had 1 <3>
house 2 <2> <3>in 5 <1> <2> <3> <5> <6>
keep 3 <1> <3> <5>keeper 3 <1> <4> <5>keeps 3 <1> <5> <6>light 1 <6>
never 1 <4>night 3 <1> <4> <5>old 4 <1> <2> <3> <4>
sleep 1 <4>sleeps 1 <6>
the 6 <1> <2> <3> <4> <5> <6>town 2 <1> <3>where 1 <4>
Table with 6 documents
Dictionary and posting lists
Query: keeper
11
![Page 12: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/12.jpg)
Inverted Index 101
1 The old night keeper keeps the keep in the town
2 In the big old house in the big old gown.
3 The house in the town had the big old keep
4 Where the old night keeper never did sleep.
5 The night keeper keeps the keep in the night
6 And keeps in the dark and sleeps in the light.
term freq
and 1 <6>big 2 <2> <3>
dark 1 <6>did 1 <4>
gown 1 <2>had 1 <3>
house 2 <2> <3>in 5 <1> <2> <3> <5> <6>
keep 3 <1> <3> <5>keeper 3 <1> <4> <5>
keeps 3 <1> <5> <6>light 1 <6>
never 1 <4>night 3 <1> <4> <5>old 4 <1> <2> <3> <4>
sleep 1 <4>sleeps 1 <6>
the 6 <1> <2> <3> <4> <5> <6>town 2 <1> <3>where 1 <4>
Table with 6 documents
Dictionary and posting lists
Query: keeper
12
![Page 13: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/13.jpg)
Early query termination
1 The old night keeper keeps the keep in the town
2 In the big old house in the big old gown.
3 The house in the town had the big old keep
4 Where the old night keeper never did sleep.
5 The night keeper keeps the keep in the night
6 And keeps in the dark and sleeps in the light.
term freq
and 1 <6>big 2 <2> <3>
dark 1 <6>did 1 <4>
gown 1 <2>had 1 <3>
house 2 <2> <3>in 5 <1> <2> <3> <5> <6>
keep 3 <1> <3> <5>keeper 3 <1> <4> <5>keeps 3 <1> <5> <6>light 1 <6>
never 1 <4>night 3 <1> <4> <5>old 4 <1> <2> <3> <4>
sleep 1 <4>sleeps 1 <6>
the 6 <1> <2> <3> <4> <5> <6>town 2 <1> <3>where 1 <4>
Table with 6 documents
Dictionary and posting lists
Query: old
13
![Page 14: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/14.jpg)
Early query termination
1 The old night keeper keeps the keep in the town
2 In the big old house in the big old gown.
3 The house in the town had the big old keep
4 Where the old night keeper never did sleep.
5 The night keeper keeps the keep in the night
6 And keeps in the dark and sleeps in the light.
term freq
and 1 <6>big 2 <2> <3>
dark 1 <6>did 1 <4>
gown 1 <2>had 1 <3>
house 2 <2> <3>in 5 <1> <2> <3> <5> <6>
keep 3 <1> <3> <5>keeper 3 <1> <4> <5>keeps 3 <1> <5> <6>light 1 <6>
never 1 <4>night 3 <1> <4> <5>old 4 <1> <2> <3> <4>
sleep 1 <4>sleeps 1 <6>
the 6 <1> <2> <3> <4> <5> <6>town 2 <1> <3>where 1 <4>
Table with 6 documents
Dictionary and posting lists
Query: old
Walk posting list in reverse order
14
![Page 15: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/15.jpg)
Early query termination
1 The old night keeper keeps the keep in the town
2 In the big old house in the big old gown.
3 The house in the town had the big old keep
4 Where the old night keeper never did sleep.
5 The night keeper keeps the keep in the night
6 And keeps in the dark and sleeps in the light.
term freq
and 1 <6>big 2 <2> <3>
dark 1 <6>did 1 <4>
gown 1 <2>had 1 <3>
house 2 <2> <3>in 5 <1> <2> <3> <5> <6>
keep 3 <1> <3> <5>keeper 3 <1> <4> <5>keeps 3 <1> <5> <6>light 1 <6>
never 1 <4>night 3 <1> <4> <5>old 4 <1> <2> <3> <4>
sleep 1 <4>sleeps 1 <6>
the 6 <1> <2> <3> <4> <5> <6>town 2 <1> <3>where 1 <4>
Table with 6 documents
Dictionary and posting lists
Query: old
E.g. 2 results are requested: Optimal termination after exactly
two postings were read
15
![Page 16: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/16.jpg)
Early query termination
• Possible if time-ordered results are desired (which is often the case in realtime search applications)
• Posting list requirements:
• Pointer from dictionary to last posting in a list (must always be up to date)
• Possibility to iterate lists in reverse order
• Allow efficient skipping
16
![Page 17: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/17.jpg)
Posting list encodings
Doc IDs to encode: 5, 15, 9000, 9002
Delta encoding: 5, 10, 8985, 2
Safes space, if appropriate compression techniques are used, e.g.
variable-length integers (VInt)
• Lucene uses a combination of delta encoding and VInt compression
• VInts are expensive to decode
• Problem 1: How to traverse posting lists backwards?
• Problem 2: How to write a posting atomically, if not a primitive java type
17
![Page 18: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/18.jpg)
Postinglist format
int (32 bits)
docID24 bits
max. 16.7M
textPosition8 bits
max. 255
• Tweet text can only have 140 chars
• Decoding speed significantly improved compared to delta and VInt decoding (early experiments suggest 5x improvement compared to vanilla Lucene with FSDirectory)
18
![Page 19: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/19.jpg)
Postinglist format
• ints can be written atomically in Java
• Backwards traversal easy on absolute docIDs (not deltas)
• Every posting is a possible entry point for a searcher
• Skipping can be done without additional data structures as binary search, even though there are better approaches which should be explored
• On tweet indexes we need about 30% more storage for docIDs compared to delta+Vints, but that’s compensated by savings on position encoding!
• Max. segment size: 2^24 = 16.7M tweets
19
![Page 20: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/20.jpg)
Objectives
• Only evaluate as few documents as possible before terminating the query
• Rank documents in reverse time order (newest documents first)
20
![Page 21: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/21.jpg)
A Billion Queries Per Day
Agenda
- Introduction
- Posting list format and early query termination
‣ Index memory model
- Lock-free algorithms and data structures
- Roadmap
21
![Page 22: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/22.jpg)
Index memory model
22
![Page 23: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/23.jpg)
Objectives
• Store large amount of linked lists of a priori unknown lengths efficiently in memory
• Allow traversal of lists in reverse order (backwards linked)
• Allow efficient garbage collection
23
![Page 24: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/24.jpg)
Inverted Index
1 The old night keeper keeps the keep in the town
2 In the big old house in the big old gown.
3 The house in the town had the big old keep
4 Where the old night keeper never did sleep.
5 The night keeper keeps the keep in the night
6 And keeps in the dark and sleeps in the light.
term freq
and 1 <6>big 2 <2> <3>
dark 1 <6>did 1 <4>
gown 1 <2>had 1 <3>
house 2 <2> <3>in 5 <1> <2> <3> <5> <6>
keep 3 <1> <3> <5>keeper 3 <1> <4> <5>keeps 3 <1> <5> <6>light 1 <6>
never 1 <4>night 3 <1> <4> <5>old 4 <1> <2> <3> <4>
sleep 1 <4>sleeps 1 <6>
the 6 <1> <2> <3> <4> <5> <6>town 2 <1> <3>where 1 <4>
Table with 6 documents
Dictionary and posting lists
Per term we store different kinds of metadata: text pointer, frequency, postings pointer, etc.
24
![Page 25: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/25.jpg)
LUCENE-2329: Parallel posting arrays
class PostingList
int textPointer;int postingsPointer;int frequency;...
• Term hashtable is an array of these objects: PostingList[] termsHash
• For each unique term in a segment we need an instance; this results in a very large number of objects that are long-living, i.e. the garbage collecter can’t remove them quickly (they need to stay in memory until the segment is flushed)
• With a searchable RAM buffer we want to flush much less often and allow DocumentsWriter to fill up the available memory
25
![Page 26: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/26.jpg)
LUCENE-2329: Parallel posting arrays
class PostingList
int textPointer;int postingsPointer;int frequency;...
• Having a large number of long-living objects is very expensive in Java, especially when the default mark-and-sweep garbage collector is used
• The mark phase of GC becomes very expensive, because all long-living objects in memory have to be checked
• We need to reduce the number of objects to improve GC performance!-> Parallel posting arrays
26
![Page 27: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/27.jpg)
LUCENE-2329: Parallel posting arrays
PostingList[]
textPointer;
frequency;
intintint
postingsPointer;
27
![Page 28: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/28.jpg)
LUCENE-2329: Parallel posting arrays
termIDint[] textPointer; frequency;postingsPointer;
int[] int[] int[]
0
1
2
3
4
5
6
28
![Page 29: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/29.jpg)
LUCENE-2329: Parallel posting arrays
0
termIDint[] textPointer; frequency;postingsPointer;
int[] int[] int[]
t0 p0 f00
1
2
3
4
5
6
29
![Page 30: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/30.jpg)
LUCENE-2329: Parallel posting arrays
1
0
termIDint[] textPointer; frequency;postingsPointer;
int[] int[] int[]
t0
t1
p0
p1
f0
f1
0
1
2
3
4
5
6
30
![Page 31: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/31.jpg)
LUCENE-2329: Parallel posting arrays
1
0
2
termIDint[] textPointer; frequency;postingsPointer;
int[] int[] int[]
t0
t1
t2
p0
p1
p2
f0
f1
f2
0
1
2
3
4
5
6
• Total number of objects is now greatly reduced and is constant and independent of number of unique terms
• With parallel arrays we safe 28 bytes per unique term -> 41% savings compared to PostingList object
31
![Page 32: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/32.jpg)
LUCENE-2329: Parallel posting arrays - Performance
• Performance experiments: Index 1M wikipedia docs
1) -Xmx2048M, indexWriter.setMaxBufferSizeMB(200)
4.3% improvement
2) -Xmx256M, indexWriter.setMaxBufferSizeMB(200)
86.5% improvement
32
![Page 33: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/33.jpg)
LUCENE-2329: Parallel posting arrays - Performance
• With large heap there is a small improvement due to per-term memory savings
• With small heap the garbage collector is invoked much more often - huge improvement due to smaller number of objects (depending on doc sizes we have seen improvements of up to 400%!)
• With searchable RAM buffers we want to utilize all the RAM we have; with parallel arrays we can maintain high indexing performance even if we get close to the max heap size
33
![Page 34: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/34.jpg)
Inverted index components
Parallel arraysDictionary
pointer to the most recently indexed posting for a term
Posting list storage
?
34
![Page 35: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/35.jpg)
• Store many single-linked lists of different lengths space-efficiently
• The number of java objects should be independent of the number of lists or number of items in the lists
• Every item should be a possible entry point into the lists for iterators, i.e. items should not be dependent on other items (e.g. no delta encoding)
• Append and read possible by multiple threads in a lock-free fashion (single append thread, multiple reader threads)
• Traversal in backwards order
Posting lists storage - Objectives
35
![Page 36: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/36.jpg)
Memory management
= 32K int[]
4 int[]pools
36
![Page 37: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/37.jpg)
Memory management
= 32K int[]
4 int[]pools
Each pool can be grown
individually by adding 32K
blocks
37
![Page 38: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/38.jpg)
Memory management
• For simplicity we can forget about the blocks for now and think of the pools as continuous, unbounded int[] arrays
• Small total number of Java objects (each 32K block is one object)
4 int[]pools
38
![Page 39: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/39.jpg)
Memory management
• Slices can be allocated in each pool
• Each pool has a different, but fixed slice size
21
24
27
211
slice size
39
![Page 40: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/40.jpg)
Adding and appending to a list
21
24
27
211
slice size
available
allocated
current list
40
![Page 41: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/41.jpg)
Adding and appending to a list
21
24
27
211
slice size
Store first twopostings in this slice
available
allocated
current list
41
![Page 42: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/42.jpg)
Adding and appending to a list
21
24
27
211
slice size
When first slice is full, allocate another one in second pool
available
allocated
current list
42
![Page 43: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/43.jpg)
Adding and appending to a list
21
24
27
211
slice size
available
allocated
current list
Allocate a slice on each level as list grows
43
![Page 44: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/44.jpg)
Adding and appending to a list
21
24
27
211
slice size
available
allocated
current list
On upper most level one list can own multiple slices
44
![Page 45: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/45.jpg)
Posting list format
int (32 bits)
docID24 bits
max. 16.7M
textPosition8 bits
max. 255
• Tweet text can only have 140 chars
• Decoding speed significantly improved compared to delta and VInt decoding (early experiments suggest 5x improvement compared to vanilla Lucene with FSDirectory)
45
![Page 46: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/46.jpg)
Addressing items
• Use 32 bit (int) pointers to address any item in any list unambiguously:
int (32 bits)
poolIndex2 bits0-3
offset in slice1-11 bits
depends on pool
sliceIndex19-29 bits
depends on pool
• Nice symmetry: Postings and address pointers both fit into a 32 bit int
46
![Page 47: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/47.jpg)
Linking the slices
21
24
27
211
slice size
available
allocated
current list
47
![Page 48: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/48.jpg)
Linking the slices
21
24
27
211
slice size
available
allocated
current list
Parallel arraysDictionary
pointer to the last posting indexed for a term
48
![Page 49: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/49.jpg)
Objectives
• Store large amount of linked lists of a priori unknown lengths efficiently in memory
• Allow traversal of lists in reverse order (backwards linked)
• Allow efficient garbage collection
49
![Page 50: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/50.jpg)
A Billion Queries Per Day
Agenda
- Introduction
- Posting list format and early query termination
- Index memory model
‣ Lock-free algorithms and data structures
- Roadmap
50
![Page 51: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/51.jpg)
Lock-free algorithms and data structures
51
![Page 52: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/52.jpg)
Objectives
• Support continuously high indexing rate and concurrent search rate
• Both should be independent, i.e. indexing rate should not decrease when search rate increases and vice versa.
52
![Page 53: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/53.jpg)
Definitions
• Pessimistic locking
• A thread holds an exclusive lock on a resource, while an action is performed [mutual exclusion]
• Usually used when conflicts are expected to be likely
• Optimistic locking
• Operations are tried to be performed atomically without holding a lock; conflicts can be detected; retry logic is often used in case of conflicts
• Usually used when conflicts are expected to be the exception
53
![Page 54: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/54.jpg)
Definitions
• Non-blocking algorithm
Ensures, that threads competing for shared resources do not have their execution indefinitely postponed by mutual exclusion.
• Lock-free algorithm
A non-blocking algorithm is lock-free if there is guaranteed system-wide progress.
• Wait-free algorithm
A non-blocking algorithm is wait-free, if there is guaranteed per-thread progress.
* Source: Wikipedia54
![Page 55: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/55.jpg)
Concurrency
• Having a single writer thread simplifies our problem: no locks have to be used to protect data structures from corruption (only one thread modifies data)
• But: we have to make sure that all readers always see a consistent state of all data structures -> this is much harder than it sounds!
• In Java, it is not guaranteed that one thread will see changes that another thread makes in program execution order, unless the same memory barrier is crossed by both threads -> safe publication
• Safe publication can be achieved in different, subtle ways. Read the great book “Java concurrency in practice” by Brian Goetz for more information!
55
![Page 56: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/56.jpg)
Java Memory Model
• Program order rule
Each action in a thread happens-before every action in that thread that comes later in the program order.
• Volatile variable rule
A write to a volatile field happens-before every subsequent read of that same field.
• Transitivity
If A happens-before B, and B happens-before C, then A happens-before C.
* Source: Brian Goetz: Java Concurrency in Practice56
![Page 57: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/57.jpg)
Concurrency
0RAM
int x;
Cache
Thread A Thread B
time
57
![Page 58: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/58.jpg)
Concurrency
0RAM
int x;
Cache 5
Thread A Thread B
x = 5;
Thread A writes x=5 to cache
time
58
![Page 59: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/59.jpg)
Concurrency
0RAM
int x;
Cache 5
Thread A Thread B
x = 5;
while(x != 5);time
This condition will likely never become false!
59
![Page 60: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/60.jpg)
Concurrency
0RAM
int x;
Cache
Thread A Thread B
time
60
![Page 61: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/61.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread A Thread B
time
volatile int b;
x = 5;5
Thread A writes b=1 to RAM, because b is volatile
b = 1;
61
![Page 62: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/62.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread A Thread B
time
volatile int b;
x = 5;5
Read volatile b
b = 1;
int dummy = b;
while(x != 5);
62
![Page 63: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/63.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread A Thread B
time
volatile int b;
x = 5;5
b = 1;
int dummy = b;
while(x != 5);
• Program order rule: Each action in a thread happens-before every action in that thread that comes later in the program order.
happens-before
63
![Page 64: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/64.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread A Thread B
time
volatile int b;
x = 5;5
b = 1;
int dummy = b;
while(x != 5);
happens-before
• Volatile variable rule: A write to a volatile field happens-before every subsequent read of that same field.
64
![Page 65: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/65.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread A Thread B
time
volatile int b;
x = 5;5
b = 1;
int dummy = b;
while(x != 5);
happens-before
• Transitivity: If A happens-before B, and B happens-before C, then A happens-before C.
65
![Page 66: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/66.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread A Thread B
time
volatile int b;
x = 5;5
b = 1;
int dummy = b;
while(x != 5);
This condition will be false, i.e. x==5
• Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.
66
![Page 67: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/67.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread A Thread B
time
volatile int b;
x = 5;5
b = 1;
int dummy = b;
while(x != 5);
Memory barrier
• Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.
67
![Page 68: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/68.jpg)
Demo
68
![Page 69: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/69.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread A Thread B
time
volatile int b;
x = 5;5
b = 1;
int dummy = b;
while(x != 5);
Memory barrier
• Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.
69
![Page 70: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/70.jpg)
Concurrency
IndexWriter IndexReader
time
write 100 docs
maxDoc = 100
in IR.open(): read maxDoc
search upto maxDoc
maxDoc is volatile
write more docs
70
![Page 71: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/71.jpg)
Concurrency
IndexWriter IndexReader
time
write 100 docs
maxDoc = 100
in IR.open(): read maxDoc
search upto maxDoc
maxDoc is volatile
write more docs
happens-before
• Only maxDoc is volatile. All other fields that IW writes to and IR reads from don’t need to be!
71
![Page 72: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/72.jpg)
• Single writer/multi reader approach
• High performance can be achieved by avoiding many volatile/atomic fields
• Safe publication can be achieved by ensuring that memory barriers are crossed
• Opening an in-memory reader is extremely cheap - can be done for every query (at Twitter we open a billion IndexReaders per day!)
• Problem still unsolved: how to update term->postinglist pointers to ensure correct search results?
So far...
72
![Page 73: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/73.jpg)
Updating posting pointers
21
24
27
211
slice size
written after memory barrier was crossed
postingpointer
73
![Page 74: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/74.jpg)
Updating posting pointers
21
24
27
211
slice size
written after memory barrier was crossed
• Safe publication guarantees that everything after a memory barrier is crossed is visible to a thread
• But: a thread might already see a newer state even if the next memory barrier wasn’t crossed yet
updatedpointer
previouspointer
74
![Page 75: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/75.jpg)
Updating posting pointers
21
24
27
211
slice size
written after memory barrier was crossed
• This means: a reader might get a postings pointer addressing an int[] block that is not allocated yet!
• We have to handle this case carefully
updatedpointer
previouspointer
75
![Page 76: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/76.jpg)
Updating posting pointers
• Basic idea: keep two most recent pointers
• Writer toggles active pointer whenever memory barrier is crossed
• Reader can detect if a pointer is safe to follow
• Optimistic approach: Reader always considers latest pointer first - only if that one isn’t safe it falls back to previous pointer
76
![Page 77: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/77.jpg)
• Not a single exclusive lock
• Writer thread can always make progress
• Optimistic locking (retry-logic) in a few places for searcher thread
• Retry logic very simple and guaranteed to always make progress - one of the two available posting pointers is always safe to use
Wait-free
77
![Page 78: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/78.jpg)
Objectives
• Support continuously high indexing rate and concurrent search rate
• Both should be independent, i.e. indexing rate should not decrease when search rate increases and vice versa.
78
![Page 79: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/79.jpg)
A Billion Queries Per Day
Agenda
- Introduction
- Posting list format and early query termination
- Index memory model
- Lock-free algorithms and data structures
‣ Roadmap
79
![Page 80: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/80.jpg)
Roadmap
80
![Page 81: Lucene rev preso busch realtime search lr1010](https://reader033.fdocuments.in/reader033/viewer/2022060115/55795592d8b42ab6648b4a08/html5/thumbnails/81.jpg)
• LUCENE-2329: Parallel posting arrays
• LUCENE-2324: Per-thread DocumentsWriter and sequence IDs
• LUCENE-2346: Change in-memory postinglist format
• LUCENE-2312: Search on DocumentsWriters RAM buffer
• IndexReader, that can switch from RAM buffer to flushed segment on-the-fly
• Sorted term dictionary (wildcards, numeric queries)
• Stored fields, TermVectors, Payloads (Attributes)
Roadmap
81