slides

Tree Indexing on Flash Disks

Yinan LiCooperate with:

Bingsheng He, Qiong Luo, and Ke Yi

Hong Kong University of Science and Technology

Introduction

• Flash based device: the main-stream storage in mobile devices and embedded systems.

• Recently, the flash disk, or flash Solid State Disk (SSD), has emerged as a viable alternative to the magnetic hard disk for non-volatile storage.

“Tape is Dead, Disk is Tape, Flash is Disk” – Jim Gray

Flash SSD

• Intel X-25M 80GB SATA SSD• Mtron 64GB SATA SSD• Other manufactories: Samsung,

SanDisk, Seagate, Fusion-IO, …

Internal Structure of Flash Disk

Flash Memory

Three basic operations of flash memory• Read: Page (512B-2KB), 80us• Write: Page (512B-2KB), 200us

– writes are only able to change bits from 1 to 0.

• Erase: Block (128-512KB), 1.5ms– clear all bits to 1. – Each block can be erased for a finite number of

times before wear out.

Flash Translation Layer (FTL)

• Flash SSDs employ a firmware layer, called FTL, to implement out-place update scheme.

• Maintaining a mapping table between the logical and physical pages:– Address Translation– Garbage Collection– Wear Leveling

• Page-Level Mapping, Block-Level Mapping, Fragmentation

Superiority of Flash Disk

• Pure electrical device (No mechanical moving part)– Extremely fast random read speed– Low power consumption

MagneticHardDisk

FlashDisk

Challenge of Flash Disk

• Due to the physical feature of flash memory, flash disk exhibits relative Poor Random Write performance.

Bandwidth of Basic Access Patterns• Random writes are 5.6 - 55X slower than random

reads on flash SSDs [Intel, Mtron, Samsung SSDs].• Random accesses are significantly slower than

sequential ones with multi-page optimization.

9Access Unit Size: 2KB Access Unit Size: 512KB

Tree Indexing on Flash Disk

• Tree indexes are a primary access method in databases

• Tree indexes on flash disk– exploit the fast random read speed.– suffer from the poor random write performance.

• we study how to adapt them to the flash disk exploiting the hardware features for

efficiency.

B+-Tree

• Search I/O Cost: O(logBN) Random Reads• Update I/O Cost: O(logBN) Rndom Reads

+O(1) Rndom Writes

43 54 58

Search Key: 48

9 15 27 36

43 48 5339 41 54 56 58… …48

Insert Key: 40

O(logBN)Levels

LSM-Tree (Log Structure Merge Tree)

• Search I/O Cost: O(logkN*logBN) Random Reads• Update I/O Cost: O(logkN) Sequential Write

O(logBN)Levels

B+-Tree B+-TreeB+-Tree

Size Ratio: k

O(logkN) B+Trees

Size Ratio: kSize Ratio: k

B+-Tree

Search Key: X Insert Key: Y

MergeMergeMerge

[1] P. E. O’Neil, E. Cheng, D. Gawlick, and E. J. O’Neil. The Log-Structure Merge-Tree(LSM-Tree). Acta Informatica. 1996

BFTL• Search I/O cost: O(c*logBN) Random Reads• Update I/O cost: O(1/c) Random Writes

Pid: 0

Pid: 1 Pid:2

Pid: 100

Pid: 200

Pid:3 … …

Max Length of link lists: c

[2] Chin-Hsien Wu, Tei-Wei Kuo, and Li Ping Chang. An efficient B-tree layer implementation for flash memory storage systems, In RTCSA, 2003

Designing Index for Flash Disk

• Our Goal:– reducing update cost– preserving search efficiency

• Two ways to reduce random write cost– Transform into sequential ones.– Limit them within a small area (< 512-8MB).

Outline

• Introduction• Structure of FD-Tree• Cost Analysis• Experimental Results• Conclusion

FD-Tree

• Transforming Random Writes into Sequential ones by logarithmic method.– Insert perform on a small tree first– Gradually merge to larger ones

• Improving search efficiency by fractional cascading.– In each level, using a special entry to find the page

in the next level that search will go next.

Data Structure of FD-Tree

• L Levels: • one head tree (B+-tree) on the top• L-1 sorted runs at the bottom

• Logarithmically increasing sizes(capacities) of levels

Data Structure of FD-Tree• Entry: a pair of key and pointer• Fence: a special entry, used to improve search

efficiency– Key is equal to the FIRST key in its pointed page.– Pointer is ID of a page in the immediate next level that

search will go next.

Data Structure of FD-Tree

• Each page is pointed by one or more fences in the immediate topper level.

• The first entry of each page is a fence. (If not, we insert one)

Insertion on FD-Tree

• Insert new entry into the head tree

• If the head tree is full, merge it into next level and then empty it.

• The merge process may invoke recursive merge process (merge to lower levels).

Merge on FD-Tree

• Scan two sorted runs and generate new sorted runs.

2 31 1911 29

1 5 6 7 9 10

11 12 15 22 24 26Li+1

New Li

New Li+1 2 3 5 6 7 9 10 12 15 19 22 24 26

219 22

x Fence

x Entry in Li

Entry in Li+1x

Insertion & Merge on FD-Tree

• When top L levels are full, merge top L levels and replace them with new ones.

Insert

Search on FD-Tree

7263 9584

63 8479787571

71 8176 83 86 91

L0(Head Tree)

58 60 93

Search Key: 81

Deletion on FD-Tree

• A deletion is handled in a way similar to an insertion.

• Insert a special entry, called filter entry, to mark the original entry, called phantom entry, has been deleted.

• Filter entry will encounter its corresponding phantom entry in a particular level as the merges occurring. Thus, we discard both of them.

Deletion on FD-Tree

Delete three entries

Merge L0, L1, L2

Merge L0,L1

Outline

Cost Analysis of FD-Tree

• I/O cost of FD-Tree

– Search:

– Insertion:

– Deletion: Search + Insertion– Update: Deletion + Insertion

kk−−

k: size ratio between adjacent levelsf: # entries in a pageN: # entries in index : # entries in the head tree0L

I/O Cost Comparison

Search Insertion

Rand ReadRand. Read

Seq. ReadRand. Write

Seq. Write

FD-Tree

B+-Tree 1

LSM-Tree

Nflog Nflog

NN fk loglog ⋅

Nc flog⋅ Nc flog⋅c

You may assume for simplicity of comparison, thus2

fk = 1=

− kf

Cost Model

• Tradeoff of k value– Large k value: high insertion cost– Small k value: high search cost

• We develop a cost model to calculate the optimal value of k, given the characteristics of both flash SSD and workload.

Cost Model

• Estimated cost varying k values

Outline

Implementation Details

• Storage Layout– Fixed-length record page

format– Disable OS disk buffering

• Buffer Manager– LRU replacement policy

Flash SSDs

Storage Layout

Buffer Manager

FD-treeLSM-treeBFTLB+-tree

Experimental Setup

• Platform– Intel Quad Core CPU– 2GB memory– Windows XP

• Three Flash SSDs: – Intel X-25M 80GB, Mtron 64GB, Samsung 32GB.– SATA interface

Experimental Settings

• Index Size: 128MB-8GB (8GB by default)• Entry Size: 8 Bytes (4 Bytes Key + 4 Bytes Ptr)• Buffer Size: 16MB• Warm up period: 10000 queries• Workload: 50% search + 50% insertion (by

default)

Validation of the Cost Model• The estimated costs are very close to the measured

ones.• We can estimated relative accurate k value to

minimize the overall cost by our cost model.

35Mtron SSD Intel SSD

Overall Performance Comparison

• On Mtron SSD, FD-tree is 24.2X, 5.8X, and 1.8X faster than B+-tree, BFTL and LSM-tree, respectively.

• On Intel SSD, FD-tree is 3X, 3X, and1.5X faster than B+-tree, BFTL, and LSM-tree, respectively

Mtron SSD Intel SSD

Search Performance Comparison

• FD-tree has similar search performance as B+-tree• FD-tree and B+-tree outperform others on both SSDs

Mtron SSD Intel SSD

Insertion Performance Comparison

• FD-tree has similar insertion performance as LSM-tree• FD-tree and LSM-tree outperform others on both SSDs.

Mtron SSD Intel SSD

Performance Comparison

• W_Search: 80% search + 10% insertion + 5% deletion + 5% update

• W_Update: 20% search + 40% insertion + 20% deletion + 20% update

Outline

Conclusion

• We design a new index structure that can transform almost all random writes into sequential ones, and preserve the search efficiency.

• We empirically and analytically show that FD-tree outperform all other indexes on various flash SSDs.

Related Publication

• Yinan Li, Bingsheng He, Qiong Luo, Ke Yi. Tree Indexing on Flash Disks. ICDE 2009. Short Paper.

• Yinan Li, Bingsheng He, Qiong Luo, Ke Yi. Tree Indexing on Flash Based Solid State Drives. Preparing to submit to a journal.

• Thank You!• Q&A

Additional Slides

Block-Level FTL

• Mapping Granularity: Block• Cost: 1 erase + N writes + N reads

Logical Block ID Physical Block ID

Page-Level FTL

• Mapping Granularity: Page• Larger mapping table• Cost: 1/N erase + 1 write + 1 read

Logical Block ID Physical Block ID

Fragmentation

• Cost of Recycling ONE block: N^2 reads, N*(N-1) writes, N erases.

Flash Disk is full now…. We have to recycle space

Deamortized FD-Tree

• Normal FD-Tree– High average insertion performance– Poor worst case insertion performance

• Deamoritzed FD-Tree– Reducing the worst case insertion cost– Preserving the average insertion cost.

Deamortized FD-Tree

• Maintain Two Head Trees T0 , T0’

– Insert into T0’

– Search on both T0 and T0’

– Concurrent Merge

T0 T0’

Search Insert

Insert into T0’

Deamortized FD-Tree

• The high merge cost is amortized to all entries inserted into the head tree.

• The overall cost (almost) does not increased.

FD-Tree vs. Deamortized FD-Tree

• Relative high worst case performance• Low overhead

slides

Documents

Transcript of slides

ENGLISH/ THEATRE: Mrs. Sellers— Blue slides: English Pink slides: Theatre Green slides: Both.

Slides 1€¦ · Slides 13. Their day to day Slides 14. FVF complementary schools Slides 15. Slides 16 FVF complementary schools . Created Date: 9/27/2012 10:08:21 AM ...

Amendments Rachel Smiley (Green Slides) Victoria Thompson (Blue Slides) Monica Chen (Orange Slides) Erika Moxley (Purple Slides) Period 1.

Slides 1...Knowing India Slides 1 Slides 2 Knowing India The city Slides 3 The city Slides 4 Slides 5 The city The castes Slides 6 e & livestock Slides 7 Slides 8 e & livestock Clothing

PowerPoint Presentation€¦ · SLIDE EQUIPAMENTOS SLIDE 4 NIX SLIDES 5 ARTEMIS SLIDES PEGASUS SLIDES 12 ORION SLIDES 15 MERCURIUS SLIDES 18 ARGOS SLIDES 21 8 14 17 20 23 25 TITAN

Challenges – 2 slides Opportunities – 4 slides How ? - 3 slides Recommendations – 1 slide

Lecture Slides Week 8 Slides

Lecture Slides-slides 42

ANDRILL webinar slides webinar slides

Presented by Jason Huang - Roche · slides) ii. Sample modes, aspiration, sequence (36 slides) iii. QC (26 slides) iv. Sensitivity and Calibration (50 slides) v. XN Rules (30 slides)

Chapter 1 slides - 2 slides per page

Microscopic slides and Cover glasses - · PDF fileMicroscopic slides and Cover glasses cut / ground edges slides Superfrost economy slides diagnostic cell slides adhesive slides slides

Passwords_______________ Slides 3,4 Downloads ______________ Slides 5,6 Redes Sociais_____________ Slides 7,8,9 Vírus e spyware _________ Slides.

Tiersen - Comptine D'un Autre Ete... · Slides Slides Slides Slides Slides Slides Slide Slide 1+2 3+4 5-10 9+11 3+4 5-10 9 12 SUBSCRIBE' Comptine d'un autre été - Overview RHOH

Lecture Slides-slides 12

ARCHETYPES · CHARACTERS: slides 4-19 SETTINGS: slides 20-27 SYMBOLS: slides 28-37 SITUATIONS: slides 38-42 THEMES/MOTIFS: slides 43-51. ARCHETYPAL CHARACTERS. THE HERO • portrayed

Slides 1 › ... › Diapositivas.pdf · Slides 1. Slides 2 Knowing India. eas Slides 3. Slides 4 eas. Slides 5 The castes. Slides 6 e. Slides 7 e. Slides 8 Clothing. Clothing Slides

Lecture Slides Week 4 Slides NEW

Slides view the slides

Outline Data flow networks 101 (3 slides) VTK (20 slides) Contracts (10 slides) An architecture for big data (3 slides)

Passwords_______ Slides 3,4 Downloads Slides 5,6 Redes Sociais_ Slides 7,8,9 Vírus e spyware ___ Slides.