Indexing delight --thinking cap of fractal-tree indexes

34
Indexing Delight Thinking Cap of Fractal-tree Indexes BohuTANG@2012/12 [email protected]

description

 

Transcript of Indexing delight --thinking cap of fractal-tree indexes

Page 1: Indexing delight --thinking cap of fractal-tree indexes

Indexing DelightThinking Cap of Fractal-tree Indexes

BohuTANG@2012/[email protected]

Page 2: Indexing delight --thinking cap of fractal-tree indexes

B-treeInvented in 1972, 40 years!

Page 3: Indexing delight --thinking cap of fractal-tree indexes

B-tree

Block0

Block1 Block2 Block3

Block4 Block5

.... ....

.....................................................................................

... Block0 ... ... Block3 ... Block5 ...File on disk:

Page 4: Indexing delight --thinking cap of fractal-tree indexes

B-tree Insert

Block0

Block1 Block2 Block3

Block4 Block5

.... ....

.....................................................................................

Insert x

seek

... Block0 ... ... Block3 ... Block5 ...File on disk:

Page 5: Indexing delight --thinking cap of fractal-tree indexes

B-tree Insert

Block0

Block1 Block2 Block3

Block4 Block5

.... ....

.....................................................................................

Insert x

seek

seek

... Block0 ... ... Block3 ... Block5 ...File on disk:

Page 6: Indexing delight --thinking cap of fractal-tree indexes

B-tree Insert

Block0

Block1 Block2 Block3

Block4 Block5

.... ....

.....................................................................................

Insert x

seek

seek

... Block0 ... ... Block3 ... Block5 ...File on disk:

Insert one item causes many random seeks!

Page 7: Indexing delight --thinking cap of fractal-tree indexes

B-tree Search

Block0

Block1 Block2 Block3

Block4 Block5

.... ....

.....................................................................................

Search x

seek

seek

Query is fast, I/Os costs O(logBN)

Page 8: Indexing delight --thinking cap of fractal-tree indexes

B-tree Conclusions

● Search: O(logBN ) block transfers.● Insert: O(logBN ) block transfers(slow).● B-tree range queries are slow.● IMPORTANT: --Parent and child blocks sparse in disk.

Page 9: Indexing delight --thinking cap of fractal-tree indexes

A Simplified Fractal-treeCache Oblivious Lookahead Array, invented by MITers

Page 10: Indexing delight --thinking cap of fractal-tree indexes

COLA

log2N

...........

Binary Search in one level:O(log2N) 2

Page 11: Indexing delight --thinking cap of fractal-tree indexes

COLA (Using Fractional Cascading)

log2N

...........

● Search: O(log2N) block transfers.● Insert: O((1/B)log2N) amortized block transfers.● Data is stored in log2N arrays of sizes 2, 4, 8, 16,...● Balanced Binary Search Tree

Page 12: Indexing delight --thinking cap of fractal-tree indexes

COLA Conclusions

● Search: O(log2N) block transfers(Using Fractional Cascading).

● Insert: O((1/B)log2N) amortized block transfers.● Data is stored in log2N arrays of sizes 2, 4, 8, 16,...● Balanced Binary Search Tree● Lookahead(Prefetch), Data-Intensive!● BUT, the bottom level will be big and bigger,

merging expensive.

Page 13: Indexing delight --thinking cap of fractal-tree indexes

COLA vs B-tree

● Search: -- (log2N)/(logBN)

= log2B times slower than B-tree(In theory)● Insert:

--(logBN)/((1/B)log2N)= B/(log2B) times faster than B-trees(In theory)

if B = 4KB:COLA search is 12 times slower than B-treeCOLA insert is 341 times faster than B-tree

Page 14: Indexing delight --thinking cap of fractal-tree indexes

LSM-tree

Page 15: Indexing delight --thinking cap of fractal-tree indexes

LSM-tree

buffer

buffer

bufferbuffer bufferbuffer

buffer ...

... ... ...

● Lazy insertion, Sorted before● Leveli is the buffer of Leveli+1● Search: O(logBN) * O(logN) ● Insert:O((logBN)/B)

In memory

Page 16: Indexing delight --thinking cap of fractal-tree indexes

LSM-tree (Using Fractional Cascading)

buffer

bufferbuffer bufferbuffer

buffer ...

... ... ...

● Search: O(logBN) (Using FC)● Insert:O((logBN)/B)● 0.618 Fractal-tree?But NOT Cache Oblivious...

bufferIn memory

Page 17: Indexing delight --thinking cap of fractal-tree indexes

LSM-tree (Merging)

buffer

bufferbuffer bufferbuffer

buffer ...

... ... ...

A lot of I/O wasted during merging!Like a headless fly flying...

merge merge merge

Zzz...

bufferIn memory

Page 18: Indexing delight --thinking cap of fractal-tree indexes

Fractal-tree IndexesJust Fractal. Patented by Tokutek...

Page 19: Indexing delight --thinking cap of fractal-tree indexes

Fractal-tree Indexes

Search: O(logBN) Insert: O((logBN)/B) (amortized)Search is same as B-tree, but insert faster than B-tree

Page 20: Indexing delight --thinking cap of fractal-tree indexes

Fractal-tree Indexes (Block size)

B is 4MB...

....

.... .... ....

Page 21: Indexing delight --thinking cap of fractal-tree indexes

Fractal-tree Indexes (Block size)

B is 4MB...

....

.... .... ....

full

Page 22: Indexing delight --thinking cap of fractal-tree indexes

Fractal-tree Indexes (Block size)

B is 4MB...

....

.... .... ....

full

Page 23: Indexing delight --thinking cap of fractal-tree indexes

Fractal-tree Indexes (Block size)

..

.. ....

..

.. ....

..

.. ....

... ... ...

full

Fractal! 4MB one seek...

Page 24: Indexing delight --thinking cap of fractal-tree indexes

Bε-treeJust a constant factor on Block fanout...

Page 25: Indexing delight --thinking cap of fractal-tree indexes

Bε-tree

Search

Inserts

Slow

Slow

Fast

Fast

B-tree

AOF

ε=1/2

Optimal Curve

Page 26: Indexing delight --thinking cap of fractal-tree indexes

Bε-tree

insert search

B-tree(ɛ=1)

O(logBN) O(logBN)

ɛ=1/2 O((logBN)/√B) O(logBN)

ɛ=0 O((logN)/B) O(logN)

if we want optimal point queries + very fast inserts, we should choose ɛ=1/2

Page 27: Indexing delight --thinking cap of fractal-tree indexes

Bε-tree

So, if block size is B, the fanout should be √B

Page 28: Indexing delight --thinking cap of fractal-tree indexes

Cache Oblivious Data StructureAll the above is JUST Cache Oblivious Data Structures...

Page 29: Indexing delight --thinking cap of fractal-tree indexes

Cache Oblivious Data Structure

Question:Reading a sequence of k consecutive blocks

at once is not much more expensive than reading a single block. How to take advantage of this feature?

Page 30: Indexing delight --thinking cap of fractal-tree indexes

Cache Oblivious Data Structure

My Questions(In Chinese):Q1:

只有1MB内存,怎样把两个64MB有序文件合并成一个有序文件?

Q2:大多数机械磁盘,连续读取多个Block和读取

单个Block花费相差不大,在Q1中如何利用这个优势?

Page 31: Indexing delight --thinking cap of fractal-tree indexes

nessDBhttps://github.com/shuttler/nessDBYou should agree that VFS do better than yourself cache!

Page 32: Indexing delight --thinking cap of fractal-tree indexes

nessDB

..

.. ....

..

.. ....

... ... ...

Each Block is Small-Splittable Tree

Page 33: Indexing delight --thinking cap of fractal-tree indexes

nessDB, What's going on?

..

.. ....

..

.. ....

..

.. ....

... ... ...

From the line to the plane..

Page 34: Indexing delight --thinking cap of fractal-tree indexes

Thanks!Most of the references are from:Tokutek & MIT CSAIL & Stony Brook.

Drafted By BohuTANG using Google Drive, @2012/12/12