slides
-
Upload
flashdomain -
Category
Documents
-
view
201 -
download
1
Transcript of slides
Tree Indexing on Flash Disks
Yinan LiCooperate with:
Bingsheng He, Qiong Luo, and Ke Yi
Hong Kong University of Science and Technology
1
Introduction
• Flash based device: the main-stream storage in mobile devices and embedded systems.
• Recently, the flash disk, or flash Solid State Disk (SSD), has emerged as a viable alternative to the magnetic hard disk for non-volatile storage.
“Tape is Dead, Disk is Tape, Flash is Disk” – Jim Gray
2
Flash SSD
• Intel X-25M 80GB SATA SSD• Mtron 64GB SATA SSD• Other manufactories: Samsung,
SanDisk, Seagate, Fusion-IO, …
3
Internal Structure of Flash Disk
4
Flash Memory
Three basic operations of flash memory• Read: Page (512B-2KB), 80us• Write: Page (512B-2KB), 200us
– writes are only able to change bits from 1 to 0.
• Erase: Block (128-512KB), 1.5ms– clear all bits to 1. – Each block can be erased for a finite number of
times before wear out.
5
Flash Translation Layer (FTL)
• Flash SSDs employ a firmware layer, called FTL, to implement out-place update scheme.
• Maintaining a mapping table between the logical and physical pages:– Address Translation– Garbage Collection– Wear Leveling
• Page-Level Mapping, Block-Level Mapping, Fragmentation
6
Superiority of Flash Disk
• Pure electrical device (No mechanical moving part)– Extremely fast random read speed– Low power consumption
7
MagneticHardDisk
FlashDisk
Challenge of Flash Disk
• Due to the physical feature of flash memory, flash disk exhibits relative Poor Random Write performance.
8
Bandwidth of Basic Access Patterns• Random writes are 5.6 - 55X slower than random
reads on flash SSDs [Intel, Mtron, Samsung SSDs].• Random accesses are significantly slower than
sequential ones with multi-page optimization.
9Access Unit Size: 2KB Access Unit Size: 512KB
Tree Indexing on Flash Disk
• Tree indexes are a primary access method in databases
• Tree indexes on flash disk– exploit the fast random read speed.– suffer from the poor random write performance.
• we study how to adapt them to the flash disk exploiting the hardware features for
efficiency.
10
B+-Tree
• Search I/O Cost: O(logBN) Random Reads• Update I/O Cost: O(logBN) Rndom Reads
+O(1) Rndom Writes
11
43 54 58
39
Search Key: 48
9 15 27 36
43 48 5339 41 54 56 58… …48
Insert Key: 40
O(logBN)Levels
4140
LSM-Tree (Log Structure Merge Tree)
• Search I/O Cost: O(logkN*logBN) Random Reads• Update I/O Cost: O(logkN) Sequential Write
12
O(logBN)Levels
B+-Tree B+-TreeB+-Tree
Size Ratio: k
O(logkN) B+Trees
Size Ratio: kSize Ratio: k
B+-Tree
Search Key: X Insert Key: Y
MergeMergeMerge
[1] P. E. O’Neil, E. Cheng, D. Gawlick, and E. J. O’Neil. The Log-Structure Merge-Tree(LSM-Tree). Acta Informatica. 1996
BFTL• Search I/O cost: O(c*logBN) Random Reads• Update I/O cost: O(1/c) Random Writes
13
Pid: 0
Pid: 1 Pid:2
Pid: 100
Pid: 200
Pid:3 … …
0
1
2
…
100
Max Length of link lists: c
…
Pid
[2] Chin-Hsien Wu, Tei-Wei Kuo, and Li Ping Chang. An efficient B-tree layer implementation for flash memory storage systems, In RTCSA, 2003
Designing Index for Flash Disk
• Our Goal:– reducing update cost– preserving search efficiency
• Two ways to reduce random write cost– Transform into sequential ones.– Limit them within a small area (< 512-8MB).
14
Outline
• Introduction• Structure of FD-Tree• Cost Analysis• Experimental Results• Conclusion
15
FD-Tree
• Transforming Random Writes into Sequential ones by logarithmic method.– Insert perform on a small tree first– Gradually merge to larger ones
• Improving search efficiency by fractional cascading.– In each level, using a special entry to find the page
in the next level that search will go next.
16
Data Structure of FD-Tree
• L Levels: • one head tree (B+-tree) on the top• L-1 sorted runs at the bottom
• Logarithmically increasing sizes(capacities) of levels
17
Data Structure of FD-Tree• Entry: a pair of key and pointer• Fence: a special entry, used to improve search
efficiency– Key is equal to the FIRST key in its pointed page.– Pointer is ID of a page in the immediate next level that
search will go next.
18
Data Structure of FD-Tree
• Each page is pointed by one or more fences in the immediate topper level.
• The first entry of each page is a fence. (If not, we insert one)
19
Insertion on FD-Tree
• Insert new entry into the head tree
• If the head tree is full, merge it into next level and then empty it.
• The merge process may invoke recursive merge process (merge to lower levels).
20
11
Merge on FD-Tree
• Scan two sorted runs and generate new sorted runs.
2 31 1911 29
1 5 6 7 9 10
Li
11 12 15 22 24 26Li+1
1 9
1
New Li
New Li+1 2 3 5 6 7 9 10 12 15 19 22 24 26
219 22
x Fence
x Entry in Li
Entry in Li+1x
Insertion & Merge on FD-Tree
• When top L levels are full, merge top L levels and replace them with new ones.
22
Insert
Merge
Search on FD-Tree
7263 9584
63 8479787571
63
71 8176 83 86 91
L1
L2
L0(Head Tree)
58 60 93
Search Key: 81
81
23
Deletion on FD-Tree
• A deletion is handled in a way similar to an insertion.
• Insert a special entry, called filter entry, to mark the original entry, called phantom entry, has been deleted.
• Filter entry will encounter its corresponding phantom entry in a particular level as the merges occurring. Thus, we discard both of them.
24
Deletion on FD-Tree
16
45
37
16
45
16 45
16
16
Delete three entries
Merge L0, L1, L2
Merge L0,L1
25
L1
L1
L2
L2
L0
L0
L1
L1
L2
L2
L0
L0
Outline
• Introduction• Structure of FD-Tree• Cost Analysis• Experimental Results• Conclusion
26
Cost Analysis of FD-Tree
• I/O cost of FD-Tree
– Search:
– Insertion:
– Deletion: Search + Insertion– Update: Deletion + Insertion
0
logL
Nk
0
log1
1
L
N
kf
kk−−
+
k: size ratio between adjacent levelsf: # entries in a pageN: # entries in index : # entries in the head tree0L
27
I/O Cost Comparison
Search Insertion
Rand ReadRand. Read
Seq. ReadRand. Write
Seq. Write
FD-Tree
B+-Tree 1
LSM-Tree
BFTL
Nklog
Nflog Nflog
Nkf
kklog
−
NN fk loglog ⋅
Nc flog⋅ Nc flog⋅c
1
28
Nkf
kklog
−
Nkf
kklog
−
Nkf
kklog
−
You may assume for simplicity of comparison, thus2
fk = 1=
− kf
k
Cost Model
• Tradeoff of k value– Large k value: high insertion cost– Small k value: high search cost
• We develop a cost model to calculate the optimal value of k, given the characteristics of both flash SSD and workload.
29
Cost Model
• Estimated cost varying k values
30
Outline
• Introduction• Structure of FD-Tree• Cost Analysis• Experimental Results• Conclusion
31
Implementation Details
• Storage Layout– Fixed-length record page
format– Disable OS disk buffering
• Buffer Manager– LRU replacement policy
32
Flash SSDs
Storage Layout
Buffer Manager
FD-treeLSM-treeBFTLB+-tree
Experimental Setup
• Platform– Intel Quad Core CPU– 2GB memory– Windows XP
• Three Flash SSDs: – Intel X-25M 80GB, Mtron 64GB, Samsung 32GB.– SATA interface
33
Experimental Settings
• Index Size: 128MB-8GB (8GB by default)• Entry Size: 8 Bytes (4 Bytes Key + 4 Bytes Ptr)• Buffer Size: 16MB• Warm up period: 10000 queries• Workload: 50% search + 50% insertion (by
default)
34
Validation of the Cost Model• The estimated costs are very close to the measured
ones.• We can estimated relative accurate k value to
minimize the overall cost by our cost model.
35Mtron SSD Intel SSD
Overall Performance Comparison
• On Mtron SSD, FD-tree is 24.2X, 5.8X, and 1.8X faster than B+-tree, BFTL and LSM-tree, respectively.
• On Intel SSD, FD-tree is 3X, 3X, and1.5X faster than B+-tree, BFTL, and LSM-tree, respectively
36
Mtron SSD Intel SSD
Search Performance Comparison
• FD-tree has similar search performance as B+-tree• FD-tree and B+-tree outperform others on both SSDs
37
Mtron SSD Intel SSD
Insertion Performance Comparison
• FD-tree has similar insertion performance as LSM-tree• FD-tree and LSM-tree outperform others on both SSDs.
38
Mtron SSD Intel SSD
Performance Comparison
• W_Search: 80% search + 10% insertion + 5% deletion + 5% update
• W_Update: 20% search + 40% insertion + 20% deletion + 20% update
39
Outline
• Introduction• Structure of FD-Tree• Cost Analysis• Experimental Results• Conclusion
40
Conclusion
• We design a new index structure that can transform almost all random writes into sequential ones, and preserve the search efficiency.
• We empirically and analytically show that FD-tree outperform all other indexes on various flash SSDs.
41
Related Publication
• Yinan Li, Bingsheng He, Qiong Luo, Ke Yi. Tree Indexing on Flash Disks. ICDE 2009. Short Paper.
• Yinan Li, Bingsheng He, Qiong Luo, Ke Yi. Tree Indexing on Flash Based Solid State Drives. Preparing to submit to a journal.
42
Q&A
• Thank You!• Q&A
43
44
Additional Slides
45
Block-Level FTL
• Mapping Granularity: Block• Cost: 1 erase + N writes + N reads
Logical Block ID Physical Block ID
XXX
46
Page-Level FTL
• Mapping Granularity: Page• Larger mapping table• Cost: 1/N erase + 1 write + 1 read
Logical Block ID Physical Block ID
XXX
YYY
47
Fragmentation
• Cost of Recycling ONE block: N^2 reads, N*(N-1) writes, N erases.
Flash Disk is full now…. We have to recycle space
48
Deamortized FD-Tree
• Normal FD-Tree– High average insertion performance– Poor worst case insertion performance
• Deamoritzed FD-Tree– Reducing the worst case insertion cost– Preserving the average insertion cost.
49
Deamortized FD-Tree
• Maintain Two Head Trees T0 , T0’
– Insert into T0’
– Search on both T0 and T0’
– Concurrent Merge
T0 T0’
Search Insert
Insert into T0’
50
Deamortized FD-Tree
• The high merge cost is amortized to all entries inserted into the head tree.
• The overall cost (almost) does not increased.
51
FD-Tree vs. Deamortized FD-Tree
• Relative high worst case performance• Low overhead
52