External Memory Data Structures

40
External Memory Data Structures Srinivasa Rao Satti Workshop on Recent Advances in Data Structures December 20, 2011

description

External Memory Data Structures. Srinivasa Rao Satti Workshop on Recent Advances in Data Structures December 20, 2011. Fundamental Algorithmic Problems. Searching : Given a list (sequence) L of elements x1, x2, .., xn and query element x , check whether x is present in L . - PowerPoint PPT Presentation

Transcript of External Memory Data Structures

Page 1: External Memory Data Structures

External Memory Data Structures

Srinivasa Rao Satti

Workshop on Recent Advances in Data Structures

December 20, 2011

Page 2: External Memory Data Structures

Fundamental Algorithmic Problems

• Searching: Given a list (sequence) L of elements x1, x2, .., xn and query element x, check whether x is present in L.

– When L is not sorted, we use linear search – scan the list to check if x is present in it.

– When L is sorted, we use binary search – divide the remaining list to be searched in half with every comparison.

Also insert and delete elements to/from L.

• Sorting: Given a sequence of elements, sort them in increasing (or decreasing) order.

– Insertion sort, bubble sort, quick sort, merge sort

2

Page 3: External Memory Data Structures

Random Access Machine (RAM) Model

• Standard theoretical model of computation:

– Infinite memory

– Uniform access cost

• Unit-cost RAM model: All the basic operations (reading/writing a location from/to the memory, standard arithmetic and Boolean operations) take one unit of time.

• Simple model crucial for success of computer industry.

R

A

M

3

Page 4: External Memory Data Structures

Hierarchical Memory

• Modern machines have complicated memory hierarchy

– Levels get larger and slower further away from CPU

– Data moved between levels using large blocks

L

1

L

2

R

A

M

4

Page 5: External Memory Data Structures

Hard disk drive

5

Page 6: External Memory Data Structures

Slow I/O

– Disk systems try to amortize large access time transferring large contiguous blocks of data (8-16Kbytes)

• Important to store/access data to take advantage of blocks (locality)

• Disk access is 106 times slower than main memory access

track

magnetic surface

read/write armread/write head

“The difference in speed between modern CPU and

disk technologies is analogous to the difference

in speed in sharpening a pencil using a sharpener on

one’s desk or by taking an airplane to the other side of

the world and using a sharpener on someone else’s

desk.” (D. Comer)

6

Page 7: External Memory Data Structures

N = # of items in the problem instance

B = # of items per disk block

M = # of items that fit in main memory

T = # of items in output

I/O: Move block between memory and disk

Performance measures:

Space: # of disk blocks used by the structure

Time: # of I/Os performed by the algorithm

(CPU time is “free”)

D

P

M

Block I/O

External Memory Model

8

[Aggarwal-Vitter 1988]

Page 8: External Memory Data Structures

Scalability Problems: Block Access Matters• Example: Traversing linked list

– Array size N = 10 elements

– Disk block size B = 2 elements

– Main memory size M = 4 elements (2 blocks)

• Large difference between N and N/B since block size is large

– Example: N = 256 x 106, B = 8000 , 1ms disk access time

N I/Os take 256 x 103 sec = 4266 min = 71 hr

N/B I/Os take 256/8 sec = 32 sec

Algorithm 2: N/B=5 I/OsAlgorithm 1: N=10 I/Os

1 5 2 6 73 4 108 9 1 2 10 9 85 4 76 3

9

Page 9: External Memory Data Structures

Queues and Stacks• Queue:

– Maintain push and pop blocks in main memory

O(1/B) Push/Pop operations

• Stack:

– Maintain push/pop blocks in main memory

O(1/B) Push/Pop operations

Push Pop

10

Page 10: External Memory Data Structures

Fundamental Bounds Internal External

• Scanning: N

• Sorting: N log N

• Searching:

• Note:

– Linear I/O: O(N/B)

– B factor VERY important:

– Cannot sort optimally with search tree

NBlogBN

BN

BMlog

BN

N2log

11

Page 11: External Memory Data Structures

Search trees: API

• Given a set S of keys, support the operations:

– search(x) : return TRUE if x is in S, and FALSE otherwise

– insert(x) : insert x into S (error if x is already in S)

– delete(x) : delete x from S (error if x is not in S)

– rangesearch(x,y) : return all the keys z such that x ≤ z ≤ y

14

Page 12: External Memory Data Structures

– If nodes are stored arbitrarily on disk Search in I/Os Rangesearch in I/Os

• Binary search tree:

– Standard method for search among N elements

– We assume elements in leaves

– Search traces a root-to-leaf path

Binary Search Trees

)(log2 NO

)(log2 N

)(log2 TNO

15

Page 13: External Memory Data Structures

External Search Trees

• BFS blocking:

– Block height

– Output elements blocked

Rangesearch in I/Os

• Optimal: O(N/B) space and query

)(log2 B

)(B

)(log)(log/)(log 22 NOBONO B

)(log BT

B N

)(log BT

B N

16

Page 14: External Memory Data Structures

• Maintaining BFS blocking during updates?

– Balance is normally maintained in search trees using rotations

• Seems very difficult to maintain BFS blocking during rotation

– Also need to make sure output (leaves) is blocked!

External Search Trees

x

y

x

y

17

Page 15: External Memory Data Structures

B-trees• BFS-blocking naturally corresponds to tree with fan-out

• B-trees balanced by allowing node degree to vary

– Rebalancing performed by splitting and merging nodes

)(B

18

Page 16: External Memory Data Structures

• (a,b)-tree uses linear space and has height

Choosing a,b = each node/leaf stored in one disk block

space and query

(a,b)-tree• T is an (a,b)-tree (a≥2 and b≥2a-1)

– All leaves on the same level and contain between a and b elements

– Except for the root, all nodes have degree between a and b

– Root has degree between 2 and b

)(log NO a

)(log BT

B N

)(B

tree

19

Page 17: External Memory Data Structures

(a,b)-Tree Insert• Insert:

Search and insert element in leaf v

DO { if v has b+1 elements/children

Split v:

make nodes v’ and v’’ with

and elements

insert element (ref) in parent(v)

(make new root if necessary)

v=parent(v) }

• Insert touches nodes

bb 2

1 ab 2

1

)(log Na

v

v’ v’’

21b 2

1b

1b

20

Page 18: External Memory Data Structures

(2,4)-Tree Insert

21

Page 19: External Memory Data Structures

(a,b)-Tree Delete• Delete:

Search and delete element from leaf v

DO { if v has a-1 elements/children

Fuse v with sibling v’:

move children of v’ to v

delete element (ref) from parent(v)

(delete root if necessary)

If v has >b (and ≤ a+b-1<2b) children split v

v=parent(v) }

• Delete touches nodes )(log NO a

v

v

1a

12 a

22

v’

Page 20: External Memory Data Structures

(2,4)-Tree Delete

23

Page 21: External Memory Data Structures

Summary/Conclusion: B-tree• B-trees: (a,b)-trees with a,b =

– O(N/B) space

– O(logB N+T/B) I/Os for search and rangesearch

– O(logB N) I/Os for insert and delete

• B-trees with elements in the leaves sometimes called B+-tree

• Construction in I/Os

– Sort elements and construct leaves

– Build tree level-by-level bottom-up

)(B

)log(BN

BN

BMO

24

Page 22: External Memory Data Structures

25

B-tree Construction• In internal memory we can sort N elements in O(N log N) time using

a balanced search tree:

– Insert all elements one-by-one (construct tree)

– Output in sorted order using in-order traversal

• Same algorithm using B-tree use I/Os

– A factor of non-optimal

• As discussed we could build B-tree bottom-up in I/Os

– In general we would like to have dynamic data structure to use in algorithms I/O operations

)log( NNO B

)(log

log

BBM

BO

)log(BN

BMBNO

O( NB logM B

NB ) )log( 1

BN

BMBO

Page 23: External Memory Data Structures

Flash memory

30

Page 24: External Memory Data Structures

Flash memory

31

Page 25: External Memory Data Structures

32

Flash memory• Non-volatile memory which can be erased and programmed

• Characteristics:

– Lighter

– Provides better shock resistance

– Provides more throughput

– Consumes less power

– More denser (uses less space)

compared to magnetic disks

• Commonly used in digital cameras, handheld computers, mobile phones, portable music players etc.

• Also used in embedded systems, sensor networks; and even replacing magnetic disks in PCs.

Page 26: External Memory Data Structures

HDD vs SSD

33

The disassembled components of a hard disk drive (left)

and of the PCB and components of a solid-state drive (right)

Page 27: External Memory Data Structures

Limitations of flash memory• Memory cells in a flash memory device can be written only a

limited number of times

– between 10,000 and 1,000,000, after which they wear out and become unreliable.

• The only way to set bits (change their value from 0 to 1) is to erase an entire region memory. These regions have fixed size in a given device, typically ranging from several kilobytes to hundreds of kilobytes, and are called erase units.

• Two different types of Flash memories: NOR and NAND

– they have slightly different characteristics

34

Page 28: External Memory Data Structures

Flash memory• The memory space of the chip is partitioned into blocks called erase

blocks. The only way to change a bit from 0 to 1 is to erase the entire unit containing the bit.

• Each block is further partitioned into pages, which usually store 2048 bytes of data and 64 bytes of meta-data. Erase blocks typically contain 32 or 64 pages.

• Bits are changed from 1 to 0 by programming (writing) data onto a page. An erased page can be programmed only a small number of times (1 to 3) before it must be erased again.

35

Page 29: External Memory Data Structures

Flash memory• Reading data takes tens of microseconds for the first access to a

page, plus tens of nanoseconds per byte.

• Writing a page takes hundreds of microseconds, plus tens of nanoseconds per byte.

• Erasing a block takes several milliseconds.

• Each block can sustain only a limited number of erasures.

Algorithms/data structures designed for I/O model do not always work well when implemented on flash memory.

36

Page 30: External Memory Data Structures

Flash memory models (I)• General flash model:

• The complexity of an algorithm is x + c · y, where x and y are the number of read and write I/Os respectively, and c is a penalty factor for writing.

• Typically, we assume that BR < BW << M, and c ≥ 1.

37

BR

BW

cM Flash

Page 31: External Memory Data Structures

Flash memory models (II)• Unit-cost flash model:

• General flash model augmented with the assumption of an equal access time per element for reading and writing.

• The cost of an algorithm performing x read I/Os and y write I/Os is given by x.BR + y.BW.

• This simplifies the model considerably, as it becomes easier to adapt external-memory results.

38

BR

BW

M Flash

Page 32: External Memory Data Structures

B-trees on flash memory• An insertion in a B-tree updates a single leaf (unless the leaf splits)

• Since we cannot perform an in-place update in flash memory, we need to create a new copy of the leaf, with the new element inserted.

• Since the parent of this leaf has to update its pointer to the leaf, we need to create a new copy of the parent. And so on..up to the root.

• Thus the write performance is quite bad for the naïve implementation.

39

Page 33: External Memory Data Structures

Flash Translation Layer (FTL)• Software layer on the flash disk which performs logical to physical

block mapping.

• Distributes writes uniformly across blocks.

• B-tree with FTL:

– All nodes contain just the logical address of other nodes

– Allows any update to write just the target node

• Achieves one erase per update (amortized)

40

Page 34: External Memory Data Structures

μ-tree• Minimally Updated tree

• Achieves similar performance as ‘B-tree with FTL’ on raw flash

• Sizes of the nodes decreases exponentially from leaf to the root

• Each block corresponds to a leaf-to-root path, and stores the nodes on a prefix of this path

• Works only when log2 B ≥ logB N

41

[Kang, Jung, Kang, Kim, 2007]

Page 35: External Memory Data Structures

FD-tree

• Flash Disk aware tree index

• Transforms random writes into sequential writes

• Limits random writes to within a small region

42

[Li, He, Yang, Luo, Yi, 2010]

Page 36: External Memory Data Structures

FD-tree

43

•Flash Disk aware tree index

•Transforms random writes into sequential writes

•Contains a head tree and a few levels of sorted runs of increasing sizes

•O(logk N) levels, where k is the size ratio between levels

Page 37: External Memory Data Structures

Other B-tree indexes for flash memory• BFTL [Wu, Luo, Chang, 2007]

• Lazy Adaptive tree [Agrawal, Ganesan, Sitaraman, Diao, Singh, 2009]

• Lazy Update tree [On, Hu, Li, Xu, 2009]

• In-page Logging approach [Lee, Moon, 2007]

• …

All these are designed to get better practical performance, and take different aspects of flash characteristics into consideration.

-- not easy to compare with each other

44

Page 38: External Memory Data Structures

Comparison of tree indexes on flash

45

N – number of elements

BR – read block size

BW – write block size

BU – size of buffer

h – height of the tree

k - parameter

Page 39: External Memory Data Structures

Directions for further research

• The area is still in its infancy.

• Not much is work has been done apart from the development of some file systems and tree indexing structures

• Open problems:

– Efficient tree indexes for flash memory

– Tons of other (practically significant) algorithmic problems

– Better memory model.

46

Page 40: External Memory Data Structures

47

Thank You