Optimizing forest db for flash based ssd: Couchbase Connect 2015
-
Upload
couchbase -
Category
Technology
-
view
217 -
download
0
Transcript of Optimizing forest db for flash based ssd: Couchbase Connect 2015
OPTIMIZING FORESTDB FOR FLASH-BASED SSD
Sang-Won LeeProfessor, Sungkungkwan University
Sundar SridharanSenior Software Engineer, Couchbase Inc.
©2015 Couchbase Inc.
2
Contents
▪ Introduction▪ SHARE Interface in Flash-Based SSD for
ForestDB▪ ForestDB Optimizations at File System Layer▪ Evaluation Results▪ Future Work▪ Summary
©2015 Couchbase Inc.
3
Introduction
▪It is all-flash storage era!
▪Legacy of harddisk era at system softwares▪ Suboptimal on top of flash storage
▪ForestDB: next-generation KV engine of Couchbase
▪Opportunities▪ Exploit flash storage characteristics (SHARE Interface)▪ Leverage modern CoW-based file systems
SHARE Interface in Flash-Based SSD
for ForestDB
©2015 Couchbase Inc.
5
Characteristics of Flash Storage (vs. Hard Disk)
▪No-overwrite and FTL layer▪ Overwrite is not allowed▪ Another layer of address mapping inside flash storage
▪Limited lifetime
▪Write time in flash storage ~ write amount▪ Write time in harddisk ~ mechanical disk head
movement
©2015 Couchbase Inc.
6
Copy-on-Write in ForestDB
▪Document update▪ Copy-on-Write, instead of in-place-update
©2015 Couchbase Inc.
7
Copy-On-Write in ForestDB (2)
▪Why CoW? ▪ 1) Write atomicity and 2) multi-version concurrency
control ▪ A reasonable solution in HDD
▪Problems with CoW in flash storage▪ Tree-wandering write amplification low performance ▪ Flash storage lifetime
©2015 Couchbase Inc.
8
Opportunities in Flash Storage
▪Address mapping inside flash storage (by FTL)
©2015 Couchbase Inc.
9
Opportunities in Flash Storage(2)
▪SHARE interface: explicit address remapping
©2015 Couchbase Inc.
10
Opportunities in Flash Storage (3)
▪ForestDB Compaction with SHARE▪ No write of valid documents to new file
©2015 Couchbase Inc.
11
SHARE Implementation
▪Firmware extension for SHARE▪ OpenSSD Board (http://www.openssd-project.org/)▪ Atomic and recoverable
©2015 Couchbase Inc.
12
Performance Evaluation
▪Normal time performance: YCSB’s workload-F
©2015 Couchbase Inc.
13
Performance Evaluation (2)
▪Compaction performance
Elapsed Time(sec)
Written Bytes(MB)
Original ForestDB 227.5 1126.4
ForestDB with SHARE 88.4 150.6
ForestDB Optimizations atFile System Layer
©2015 Couchbase Inc.
15
Overview
▪Motivation – the catch-22
▪Why B-Tree file system (Btrfs)
▪How ForestDB solves the catch-22 using Btrfs
▪Optimizing with Linux Asynchronous library (libaio)
▪Performance Results
©2015 Couchbase Inc.
16
Append-Only Key-Value Stores are Great!
▪Consistency▪Stable access to multiple point-in-time snapshots of data
▪Performance with Isolation▪Multi-Version Concurrency Control (MVCC) means readers
and writers do not block each other
▪Recoverability▪Can easily rollback entire database to a stable past state
▪SSD Friendly▪Avoids in-place updates and Flash Layer Translations
©2015 Couchbase Inc.
17
Append-Only KV Stores are Great!
©2015 Couchbase Inc.
18
MVCC: Readers & Writer Run Unblocked!
©2015 Couchbase Inc.
19
But...
▪Disk can fill up with stale data
▪Need to do garbage collection - Compaction
©2015 Couchbase Inc.
20
Compactions Do Garbage Collection...
©2015 Couchbase Inc.
21
Compactions for Garbage Collection
©2015 Couchbase Inc.
22
What if size of active data exceeds free space available….
A Fundamental Problem with Disk Space
Writer appends too much data
©2015 Couchbase Inc.
23
A Fundamental Problem: Catch-22
“My disk is getting full... I want to free up space but don’t have enough free space to free up space!”
Size of Active Data must be strictly lesser than free space available on disk!!
©2015 Couchbase Inc.
24
B-Tree File System (Btrfs)
▪Btrfs is a copy-on-write filesystem for Linux
▪Development began in Oracle in 2007 and marked as stable since August 2014 (http://goo.gl/upukn4)
▪Industry support from Facebook, Fujitsu, Fusion-IO, Intel, Netgear, Novel/SUSE, Oracle, Red Hat etc
▪Available as an option in all major Linux distributions
©2015 Couchbase Inc.
25
Btrfs Features (Short list)▪Max file size upto 16 exbibytes (1 exbibyte in ext4)▪Self healing due to copy-on-write nature▪Online defragmentation▪Online volume growth and shrinking▪Online block device addition and removal▪Block discards for improved wear levelling on SSDs using TRIM▪Transparent compression configurable with file or volume ▪Online data scrubbing▪Send/receive of diffs▪Snapshots and subvolumes
▪File Cloning!
©2015 Couchbase Inc.
26
Btrfs Basics - Representation
File P with reference counted extents
©2015 Couchbase Inc.
27
Btrfs Feature - Copy File Range
Copy file range api lets new File “Q” share physical disk extents from File “P”
©2015 Couchbase Inc.
28
Btrfs Feature - Blocks shared across files
Copy-On-Write lets new updates to happen on File Q
©2015 Couchbase Inc.
29
Btrfs Basics - Deleting File
Deleting file Q
©2015 Couchbase Inc.
30
Btrfs Basics - Freeing up space
Freeing up space
©2015 Couchbase Inc.
31
ForestDB Compaction Using Btrfs Cloning
Compaction works by using BTRFS to copy-on-write (clone) valid block-ranges from old file into new file...
©2015 Couchbase Inc.
32
ForestDB Compaction Using Btrfs Cloning
Deleting old file.fdb.0 frees up space only belonging to the stale blocks. Valid blocks of file.fdb.1 stay intact!
Performance ResultsUbuntu 14.04, Btrfs v3.12, 4 CPU cores, 20GB
SSD drive 8GB DRAM
©2015 Couchbase Inc.
34
Performance (1) – ForestDB on Btrfs
~1.25 - 2 X Faster! ½ write amplification!
©2015 Couchbase Inc.
35
Performance (2) – ForestDB on Btrfs
~1.5 - 4 X Faster! ½ write amplification!
©2015 Couchbase Inc.
36
Performance (3) – ForestDB on Btrfs
~2 X Faster! ½ write amplification!
©2015 Couchbase Inc.
37
Speeding up Reads with libaio
▪Modern SSDs have multiple I/O channels
▪Asynchronous I/O maximizes throughput
▪Well suited for ForestDB compaction tasks
©2015 Couchbase Inc.
38
Performance (4) ForestDB on Btrfs with libaio
13X faster!
7X faster!
4X faster!
©2015 Couchbase Inc.
39
Advantages of Btrfs with libaio
▪Efficiently uses disk space avoiding the catch-22
▪Reduces Write Amplification by 2 times▪Longer SSD lifespan due to reduced wear
▪Over 13 X faster compaction speeds
▪Generic file system layer solution that applies to SSD as well as spinning disks
Future Work
©2015 Couchbase Inc.
41
Future Work
▪Optimize Btrfs clone feature for better performance▪Working with the Linux Btrfs community
▪Optimize ForestDB to skip reading if cloning on compaction
▪Adapt Ext4 file system to add the new system call that allows us to share physical blocks among multiple files
Summary
©2015 Couchbase Inc.
43
Summary
▪ForestDB with SHARE interface in SSD▪Speeds up compactions by 3X with 10X lower write
amplification
▪ForestDB with Btrfs clone feature in File system layer▪Speeds up compactions by 2X with 2X lower write
amplification
▪ForestDB with Btrfs clone feature with Linux libaio▪ Speeds up compactions by 13X with 2X lower write
amplification
©2015 Couchbase Inc.
45
Initial Load Performance
3x ~ 6x less time
©2015 Couchbase Inc.
46
Initial Load Performance
4x less write overhead
©2015 Couchbase Inc.
47
Read-Only Performance
1 2 4 80
5000
10000
15000
20000
25000
30000
Throughput
ForestDB LevelDB RocksDB
# reader threads
Ope
ratio
ns p
er s
econ
d
2x ~ 5x
©2015 Couchbase Inc.
48
Write-Only Performance
1 4 16 64 2560
2000
4000
6000
8000
10000
12000
Throughput
ForestDB LevelDB RocksDB
Write batch size (# documents)
Ope
ratio
ns p
er s
econ
d
- Small batch size (e.g., < 10) is not usually common
3x ~ 5x
©2015 Couchbase Inc.
49
Write-Only Performance
1 4 16 64 2560
50
100
150
200
250
300
350
400
450
Write Amplification
ForestDB LevelDB RocksDB
Write batch size (# documents)
Writ
e am
plifi
catio
n(N
orm
aliz
ed t
o a
sing
le d
oc s
ize)
ForestDB shows 4x ~ 20x less write amplification
©2015 Couchbase Inc.
50
Mixed Workload Performance
1 2 4 80
2000
4000
6000
8000
10000
12000
Mixed (Unrestricted) Performance
ForestDB LevelDB RocksDB
# reader threads
Ope
ratio
ns p
er s
econ
d
2x ~ 5x