Highload2o013 osipv

43
Популярные алгоритмы хранения данных на диске Konstantin Osipov, [email protected] October 28th, 2013

Transcript of Highload2o013 osipv

Page 1: Highload2o013 osipv

Популярные алгоритмыхранения данных на диске

Konstantin Osipov,[email protected] 28th, 2013

Page 2: Highload2o013 osipv

Случай в квадрате 36-80

• B-tree – most popular disk-based data structure

• B-tree balances INSERT, UPDATE and SELECT speed

• DELETEs can be slow

Page 3: Highload2o013 osipv

СУБД быстрая, настраивать надо уметь

Page 4: Highload2o013 osipv

B-tree: внутреннее устройство

Page 5: Highload2o013 osipv

Что означает сache-oblivious?

Page 6: Highload2o013 osipv

Что означает сache-oblivious? (2)

BLOCK­MULT(A,B,C,n):1 for i = 1 to n/s do:2    for j = 1 to n/s do:3         for k = 1 to n/s do:4             ORD­MULT(Aik, Bkj, Cij, s)

Page 7: Highload2o013 osipv

LSM-tree: архитектура

Page 8: Highload2o013 osipv

LSM-tree: архитектура (2)

Page 9: Highload2o013 osipv

LevelDB: архитектура

Page 10: Highload2o013 osipv

LevelDB: insert RPS

Page 11: Highload2o013 osipv

LSM-tree: применение● Данные с разной степенью актуальности

– Ленты сообщений

– Стена в соцсети

– Чаты

– События

● Сегрегация данных– Данные в LSM, индекс в памяти

Page 12: Highload2o013 osipv

COLA: архитектура

O(logB(N)) vs. O(logB(N)/B)

Page 13: Highload2o013 osipv

WAL:

Memory

Disk

Self-Balancing TreePUT(37), PUT(16)

Page 14: Highload2o013 osipv

16 37

WAL: 37, 16

Memory

Disk

Self-Balancing Tree

Page 15: Highload2o013 osipv

7 41

WAL: 41, 7, 37, 16

Memory

Disk16 37

Self-Balancing Tree

Sorted String Table

Page 16: Highload2o013 osipv

WAL: 41, 7, 28, 16

Memory

Disk

7 16 37 41

7 37

Page 17: Highload2o013 osipv

10 28

WAL: 10, 28, 41, 7, 37, 16

Memory

Disk

7 16 37 41

7 37

Page 18: Highload2o013 osipv

WAL: 10, 28, 41, 7, 37, 16

Memory

Disk

7 16 37 41

10 28

Page 19: Highload2o013 osipv

2 47

WAL: 47, 2, 10, 28, 41, 7, 37, 16

Memory

Disk

7 16 37 41

10 28

Page 20: Highload2o013 osipv

WAL: 47, 2, 10, 28, 41, 7, 37, 16

MemoryDisk

2 7 10 16 28 37 41 47

2 10 28 41

2 28

Page 21: Highload2o013 osipv

6 49

WAL: 49, 6, 47, 2, 10, 28, 41, 7, 37, 16

MemoryDisk

2 7 10 16 28 37 41 47

2 10 28 41

2 28

Page 22: Highload2o013 osipv

WAL: 49, 6, 47, 2, 10, 28, 41, 7, 37, 16

MemoryDisk

2 7 10 16 28 37 41 47

2 10 28 41

6 49

Page 23: Highload2o013 osipv

23 32

WAL: 32, 23, 49, 6, 47, 2, 10, 28, 41, 7, 37, 16

MemoryDisk

2 7 10 16 28 37 41 47

2 10 28 41

6 49

Page 24: Highload2o013 osipv

WAL: 32, 23, 49, 6, 47, 2, 10, 28, 41, 7, 37, 16

MemoryDisk

2 7 10 16 28 37 41 47

6 23 32 49

6 32

Page 25: Highload2o013 osipv

30 45

WAL: 30, 45, 32, 23, 49, 6, 47, 2, 10, 28, 41, 7, 37, 16

Memory

Disk

2 7 10 16 28 37 41 47

6 23 32 49

6 32

Page 26: Highload2o013 osipv

14 38

WAL: 38, 14, 30, 45, 32, 23, 49, 6, 47, 2, 10, 28, 41, 7, 37, 16

Memory

Disk

2 7 10 16 28 37 41 47

6 23 32 49

30 45

Page 27: Highload2o013 osipv

6 10

WAL: 10, 6, 38, 14, 45, 30, 45, 32, 23, 49, 6, 47, 2, 10, 28, 41, 7, 37, 16

Memory

Disk

2 6 7 10 14 16 23 28 30 32 37 38 41 45 47 49

2 7 14 23 30 37 41 47

2 14 30 41

2 30

Page 28: Highload2o013 osipv

WAL: 37, 22, 36, 10, 25, 42, 10, 6, 38, 14, 45, 30, 45, 32, 23, 49, 6, 47, 2, 10, 28, 41, 7, 37, 16

Memory

Disk

2 6 7 10 14 16 23 28 30 32 37 38 41 45 47 49

3 8 15 26 35 40 45 48

10 25 36 42

22 37

Page 29: Highload2o013 osipv

WAL: 37, 22, 36, 10, 25, 42, 10, 6, 38, 14, 45, 30, 45, 32, 23, 49, 6, 47, 2, 10, 28, 41, 7, 37, 16

Memory

Disk

2 6 7 10 14 16 23 28 30 32 37 38 41 45 47 49

3 8 15 26 35 40 45 48

10 25 36 42

22 37

GET(16)

Page 30: Highload2o013 osipv

WAL: 37, 22, 36, 10, 25, 42, 10, 6, 38, 14, 45, 30, 45, 32, 23, 49, 6, 47, 2, 10, 28, 41, 7, 37, 16

Memory

Disk

2 6 7 10 14 16 23 28 30 32 37 38 41 45 47 49

3 8 15 26 35 40 45 48

10 25 36 42

22 37

GET(16)

Page 31: Highload2o013 osipv

BitCask: архитектора AOF

Page 32: Highload2o013 osipv

BitCask: архитектура keydir

Page 33: Highload2o013 osipv

Key-value index

Disk

10, 25Memory 15 26 40 84

Page index

Disk

26, 31

10, 15, 16, 25

39, 40, 84, 85

split

26, 31

86, 96

39, 85 86, 96

Sophia: архитектура

Page 34: Highload2o013 osipv

Key-value index

Disk

39, 16, 85, 96

Memory

Insert

Page index

WAL

Page 35: Highload2o013 osipv

Key-value index

Disk

Memory

Insert

16 39 85 96

Page index

Disk

Page 36: Highload2o013 osipv

Key-value index

Disk

16, 96MemoryPage index

Disk 16, 39, 85, 96

31, 25, 10, 86

Insert

WAL

Page 37: Highload2o013 osipv

Key-value index

Disk

16, 96Memory 10 25 31 86

Page index

Disk 16, 39, 85, 96

merge

Page 38: Highload2o013 osipv

Key-value index

Disk

10, 31Memory 10 25 31 86

Page index

Disk

split

39, 96

10, 16, 25, 31

39, 85, 86, 96

Page 39: Highload2o013 osipv

Key-value index

Disk

10, 31MemoryPage index

Disk

39, 96

10, 16, 25, 31

39, 85, 86, 96

15, 26, 40, 84

Insert

WAL

Page 40: Highload2o013 osipv

Key-value index

Disk

Memory 15 26 40 84

Page index

Disk 10, 16, 25, 31

39, 85, 86, 96

10, 31 39, 96

merge

Page 41: Highload2o013 osipv

Key-value index

Disk

10, 25Memory 15 26 40 84

Page index

Disk

26, 31

10, 15, 16, 25

39, 40, 84, 85

split

26, 31

86, 96

39, 85 86, 96

Page 42: Highload2o013 osipv

?Эпилог: choose your db wisely

Page 43: Highload2o013 osipv

Links● Bitcask A Log-Structured Hash Table for Fast Key/Value Data, Justin Sheehy David Smith with

inspiration from Eric Brewer● The Log-Structured Merge-Tree (LSM-Tree) Patrick O'Neil , Edward Cheng, Dieter Gawlick,

Elizabeth O'Neil● Cache-Oblivious Algorithms by Harald Prokop (Master theses)● Space/time trade-offs in hash coding with allowable errors, Burton H. Bloom● Data Structures and Algorithms for Big Databases, Michael A. Bender Stony Brook & Tokutek

Bradley C. Kuszmaul (XLDB tutorial)● http://github.com/pmwkaa/sophia, http://sphia.org● http://codecapsule.com/2012/12/30/implementing-a-key-value-store-part-3-comparative-analysis-of-the-architectures-of-kyoto-cabinet-and-leveldb/● http://stackoverflow.com/questions/6079890/cache-oblivious-lookahead-array● http://www.youtube.com/watch?v=88NaRUdoWZM(Tim Callaghan: Fractal Tree indexes)● http://code.google.com/p/leveldb/downloads/list