Early Experience and Evaluation of File Systems on SSD ... · PDF fileEarly Experience and...
Transcript of Early Experience and Evaluation of File Systems on SSD ... · PDF fileEarly Experience and...
Early Experience and Evaluation of File Systems on SSD with Database Applications
1
Yongkun WANG, Kazuo GODA, Miyuki NAKANO, Masaru KITSUREGAWA
The University of Tokyo
Outline
• Motivation
• Flash SSD
• Basic Performance Study
• Performance Evaluation by TPC‐C Benchmark
• Conclusion and Future Work
2
Motivation
• Flash SSDs are likely to be used in enterprise storage platforms for achieving high performance in data‐intensive applications
• IO path management techniques should be evaluated carefully– Existing systems are designed for traditional hard disks
– IO performance features of flash SSD are different from that of hard disk
• For better utilization of SSDs in DBMS– Evaluate basic performance of SSDs
– Evaluate performance of IO path in conventional DBMS• With different file systems and IO schedulers
3
Flash SSD
• Flash SSD (Solid State Drive)– A package of multiple flash memory
chips
– FTL (Flash Translation Layer) provides block device emulation
• Performance properties of flash Memory (Samsung K9XXG08UXM) – READ (4KB) takes 25us
– PROGRAM (4KB) takes 200us
– ERASE (256KB) takes 1500us
• Erase‐before‐program design can lead to poor performance in a normal in‐place‐write system
4
NAND Flash
Memory
NAND Flash
Memory
FTL
Controller Chip
SDRAM Buffer
Flash Memory
Chip
NAND Flash
Memory
NAND Flash
Memory
Flash Memory
Chip
NAND Flash
Memory
NAND Flash
Memory
Flash Memory
Chip
Flash SSD
SATA
Por
t
Flas
h M
emor
y B
us
Bus
Outline
• Motivation
• Flash SSD
• Basic Performance Study
• Performance Evaluation by TPC‐C Benchmark
• Conclusion and Future Work
5
Purpose of Basic Performance Study
• Clarify the performance between SSD and HDD
• Clarify the performance difference among SSDs
• Clarify the erase problem on SSDs
6
Dell Precision™ 390 WorkstationDual‐core Intel Core 2 Duo 1.86GHz2GB MemorySATA 3.0Gbps ControllerCentOS 5.2 64‐bitKernel 2.6.18
Flash SSDMtron PRO 7500SLC, 3.5”32GB
Flash SSDOCZ VERTEX EXSLC, 2.5”120GB
Flash SSDIntel X25‐ESLC, 2.5”64GB
Experimental System
Hard Disk (HDD)Hitachi HDS72107, 3.5”, 7200RPM, 32M Cache, 750GB
Inside each device, read‐ahead pre‐fetching and write‐back caching are enabled
Micro Benchmark
• One million requests for each case
• Request Size: 512B to 256KB
• Access patterns– Sequential Read/Write
– Random Read/Write
– Mixed Random (50% Read plus 50% write)
• Number of outstanding IOs– One outstanding IO: submit one IO request at a time
– 30 outstanding IOs: submit 30 IO requests at a time
8
Basic Performance of Flash SSDs~ Sequential Access ~
9
• The read throughput of Intel’s SSD and OCZ’s SSD is much higher
• The write throughput of Intel’s SSD is higher
• Write throughput of Intel’s SSD drops quickly after the request size is larger than 32KB
• The performance gap between read and write throughput of OCZ’s SSD is large
HDD Mtron Intel OCZ
0
50
100
150
200
250
300
IO Size [bytes]
0
50
100
150
200
250
300
IO Size [bytes]
0
50
100
150
200
250
300
IO Size [bytes]
100% …100% …Read
Write
512 1K 2K 4K 8K 16K 32K 64K128K256K 512 1K 2K 4K 8K 16K 32K 64K128K256K 512 1K 2K 4K 8K 16K 32K 64K128K256K 512 1K 2K 4K 8K 16K 32K64K128K256K0
50
100
150
200
250
300
IO T
hrou
ghpu
t [M
B/s
]
IO Size [bytes]
10
• The read IOPS of SSD is much higher than that of HDD.
• The performance of random write drops drastically on Mtron’s SSD and OCZ’s SSD.
• The performance of mixed‐access also drops drastically on Mtron’s SSD and OCZ’s SSD. [Bathtub effect, by Freitas on FAST2010 tutorial]
Basic Performance of Flash SSDs~ Random Access (Single outstanding IO) ~
HDD Mtron Intel OCZ
02468
101214161820
IO T
hrou
ghpu
t [K
IOPS
]
IO Size [bytes]
02468
101214161820
IO Size [bytes]
02468
101214161820
IO Size [bytes]
02468
101214161820
IO Size [bytes]
100% Read100% Write50% Read 50% Write
ReadWriteMix
512 1K 2K 4K 8K 16K 32K 64K128K256K 512 1K 2K 4K 8K 16K 32K 64K 128K 256K 512 1K 2K 4K 8K 16K 32K 64K 128K 256K512 1K 2K 4K 8K 16K 32K 64K128K256K
11
• The read throughput is improved Intel’s SSD and OCZ’ SSD.
Basic Performance of Flash SSDs~ Random Access (30 outstanding IOs) ~
HDD Mtron Intel OCZ
0
10
20
30
40
50
60
IO T
hrou
ghpu
t [K
IOPS
]
IO Size [bytes]
0
10
20
30
40
50
60
IO Size [bytes]
0
10
20
30
40
50
60
IO Size [bytes]
0
10
20
30
40
50
60
IO Size [bytes]
100% Read100% ReadRead (30 outstanding IOs)Read (one outstanding IOs)
512 1K 2K 4K 8K 16K 32K 64K128K256K 512 1K 2K 4K 8K 16K 32K 64K128K256K 512 1K 2K 4K 8K 16K 32K 64K128K256K 512 1K 2K 4K 8K 16K 32K 64K128K256K
Basic Performance of Flash SSDs~ Response Time Distribution of 4KB Random Access ~
12
• Random Read (blue line)
• Most of random reads could complete in a very small range of response times on SSDs
• Random Write (red line)
• The random write behavior is different among three SSDs
HDD Mtron Intel OCZ
0102030405060708090
100
1 100 10000 1000000
Cum
ulat
ive
Freq
uenc
y [%
]
Response Time [us]
0102030405060708090
100
1 100 10000 1000000
Response Time [us]
0102030405060708090
100
1 100 10000 1000000
Response Time [us]
0102030405060708090
100
1 100 10000 1000000
Response Time [us]
100% Read100% Write50% Read 50% Write
ReadWriteMix
Outline
• Motivation
• Flash SSD
• Basic Performance Study
• Performance Evaluation by TPC‐C Benchmark
• Conclusion and Future Work
13
Purpose of Evaluation By TPC‐C
• Provide evaluation on the IO behaviors of SSDs running an actual database application– Two file systems, two DBMSs and four IO schedulers
• Investigate the detailed behavior of IO path
14
Dell Precision™ 390 WorkstationDual‐core Intel Core 2 Duo 1.86GHz2GB MemorySATA 3.0Gbps ControllerCentOS 5.2 64‐bitKernel 2.6.18
Flash SSDMtron PRO 7500SLC, 3.5”32GB
Flash SSDOCZ VERTEX EXSLC, 2.5”120GB
Flash SSDIntel X25‐ESLC, 2.5”64GB
Experimental System
Hard Disk (HDD)Hitachi HDS72107, 3.5”, 7200RPM, 32M Cache, 750GB
Inside each device, read‐ahead pre‐fetching and write‐back caching are enabled
16
System Configuration
• TPC‐C benchmark 5.10
• Database settings– MySQL: InnoDB
– Commercial DBMS
• File system options– Ext2fs (ext2)
– Nilfs2
• IO scheduler– No operation (Noop)
– Anticipatory
– Deadline
– Completely Fair Queuing (CFQ)
16
OS kernel
Kernel Tracer
DBMS(MySQL, Commercial DBMS)
Disk for OS
File System (ext2fs, nilfs2)
IO Schedulers
Database Application (TPC-C Benchmark)
Device Driver (SATA)
HDD for Database
Flash SSDs for Database
Configuration of TPC‐C Benchmark
• 30 warehouses, with 30 virtual users
• “Key and Think” time was 0
• DBMS configuration for TPC‐C benchmark
17
Commercial DBMS MySQL(InnoDB)Data buffer size 8MB 4MBLog buffer size 5MB 2MBData block size 4KB 16KBData file fixed, 5.5GB, database size is 2.7GBSynchronous IO Yes YesLog flushing method flushing log at transaction commit
File Systems
• Ext2fs (ext2)– In‐place update
– Seek then read
– Seek then update
• Nilfs2– An example of log‐structured file system
– Seek then read
– Random writes =>
sequential writes
18
Buffer
Diska
a
b
b
c
c
d
d
Buffer
Disk
a b
b a’a b’ c’ d’c
c
d
d
readwrite
data page
obsolete data page
Experimental Study
• Transaction Throughput
• IO Throughput
• Buffer Size
• Workload Property
• IO Scheduler
19
Transaction Throughput• Intel’s SSD is better than HDD.• Mtron’s SSD is better than HDD with LFS.• OCZ’s SSD is better than HDD with ext2fs.• The performance difference is caused by the combination of SSDs and file
system
20
02,0004,0006,0008,000
10,00012,00014,000
HDD Mtron Intel OCZ HDD Mtron Intel OCZ
Commercial DBMS MySQL
Tran
sact
ion
Thr
ough
put [
tpm
]
ext2fs
nilfs2
IO Path Investigation
• Logical IO is captured at the system call level, where DBMS call the service routine of OS kernel.
• Physical IO is captured at the device driver level, where the IO requests are sorted and merged, ready to be served by the device.
21
OS kernel
Kernel Tracer
DBMS(MySQL, Commercial DBMS)
Disk for OS
File System (ext2fs, nilfs2)
IO Schedulers
Database Application (TPC-C Benchmark)
Device Driver (SATA)
HDD for Database
Flash SSDs for Database
Logical IO
Physical IO
Logical IO Throughput
• The transaction throughput follows the results of the logical IO throughput.
22
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
HDD Mtron Intel OCZ HDD Mtron Intel OCZ
Commercial DBMS MySQL
Tran
sact
ion
Thr
ough
put [
tpm
]
ext2fs nilfs2
0
20
40
60
80
100
120
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
HDD Mtron Intel OCZ HDD Mtron Intel OCZ
Commercial DBMS MySQL
Rea
d/W
rite
Rat
e by
DB
MS
[MB
/s]
Write Read
Transaction Throughput Logical IO Throughput
Physical IO Throughput
23
Logical IO Throughput Physical IO Throughput
0
20
40
60
80
100
120
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
HDD Mtron Intel OCZ HDD Mtron Intel OCZ
Commercial DBMS MySQL
Rea
d/W
rite
Rat
e by
DB
MS
[MB
/s]
Write Read
0
20
40
60
80
100
120
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
HDD Mtron Intel OCZ HDD Mtron Intel OCZ
Commercial DBMS MySQL
Rea
d/W
rite
Rat
e to
dev
ice
[MB
/s]
Write Read
Physical IO Throughput (Read)• Large amount of reads are absorbed by the file system buffer cache.
24
Logical IO Throughput Physical IO Throughput
0
20
40
60
80
100
120
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
HDD Mtron Intel OCZ HDD Mtron Intel OCZ
Commercial DBMS MySQL
Rea
d/W
rite
Rat
e by
DB
MS
[MB
/s]
Write Read
0
20
40
60
80
100
120
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
HDD Mtron Intel OCZ HDD Mtron Intel OCZ
Commercial DBMS MySQL
Rea
d/W
rite
Rat
e to
dev
ice
[MB
/s]
Write Read
Physical IO Throughput (Write,ext2fs)
25
Logical IO Throughput Physical IO Throughput
0
20
40
60
80
100
120
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
HDD Mtron Intel OCZ HDD Mtron Intel OCZ
Commercial DBMS MySQL
Rea
d/W
rite
Rat
e by
DB
MS
[MB
/s]
Write Read
0
20
40
60
80
100
120
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
HDD Mtron Intel OCZ HDD Mtron Intel OCZ
Commercial DBMS MySQL
Rea
d/W
rite
Rat
e to
dev
ice
[MB
/s]
Write Read
• Large amount of reads are absorbed by the file system buffer cache.
• For ext2fs, write throughput are almost the same between logical throughput and physical throughput. ( Synchronous IO)
Physical IO Throughput (Write, nilfs2)
26
Logical IO Throughput Physical IO Throughput
0
20
40
60
80
100
120
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
HDD Mtron Intel OCZ HDD Mtron Intel OCZ
Commercial DBMS MySQL
Rea
d/W
rite
Rat
e by
DB
MS
[MB
/s]
Write Read
0
20
40
60
80
100
120
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
ext2
fsni
lfs2
HDD Mtron Intel OCZ HDD Mtron Intel OCZ
Commercial DBMS MySQL
Rea
d/W
rite
Rat
e to
dev
ice
[MB
/s]
Write Read
• Large amount of reads are absorbed by the file system buffer cache.
• For ext2fs, write throughput are almost the same between logical throughput and physical throughput. ( Synchronous IO)
• LFS(nilfs2) produces additional writes at the physical IO layer, which has a serious impact on the overall transaction throughput.
Physical IO Size• The average request size of Physical IO
27
180,
392
107,
423
181,
115
186,
352
0
10,000
20,000
30,000
40,000
50,000
60,000
HDD Mtron Intel OCZ HDD Mtron Intel OCZ
Commercial DBMS MySQL
Aver
age
Rea
d/W
rite
Siz
e [b
ytes
]
Read Size (ext2fs) Write Size (ext2fs)Read Size (nilfs2) Write Size (nilfs2)
Physical IO Size (HDD, Mtron)• The average request size of Physical IO
• The avg. write size of LFS is much larger than that of ext2fs, which is beneficial for hard disk and some SSD such as Mtron’s SSD.
28
180,
392
107,
423
181,
115
186,
352
0
10,000
20,000
30,000
40,000
50,000
60,000
HDD Mtron Intel OCZ HDD Mtron Intel OCZ
Commercial DBMS MySQL
Aver
age
Rea
d/W
rite
Siz
e [b
ytes
]
Read Size (ext2fs) Write Size (ext2fs)Read Size (nilfs2) Write Size (nilfs2) 0
50
100
150
200
250
300
512 1,024 2,048 4,096 8,192 16,384 32,768 65,536 131,072 262,144
IO T
hrou
ghpu
t [M
B/s
]
IO Size [bytes]
0
50
100
150
200
250
300
512 1,024 2,048 4,096 8,192 16,384 32,768 65,536 131,072 262,144
IO T
hrou
ghpu
t [M
B/s
]IO Size [bytes]
HDD
Mtron
Physical IO Size (Intel, OCZ)• The average request size of Physical IO
• The avg. write size of LFS is much larger than that of ext2fs, which is beneficial for hard disk and some SSD such as Mtron’s SSD.
• Large write size is not beneficial on Intel’s and OCZ’s SSD, as shown in the basic performance study. This helps to explain the inferior transaction throughput on nilfs2.
29
180,
392
107,
423
181,
115
186,
352
0
10,000
20,000
30,000
40,000
50,000
60,000
HDD Mtron Intel OCZ HDD Mtron Intel OCZ
Commercial DBMS MySQL
Aver
age
Rea
d/W
rite
Siz
e [b
ytes
]
Read Size (ext2fs) Write Size (ext2fs)Read Size (nilfs2) Write Size (nilfs2)
Intel
OCZ
0
50
100
150
200
250
300
512 1,024 2,048 4,096 8,192 16,384 32,768 65,536 131,072 262,144
IO T
hrou
ghpu
t [M
B/s
]
IO Size [bytes]
0
50
100
150
200
250
300
512 1,024 2,048 4,096 8,192 16,384 32,768 65,536 131,072 262,144
IO T
hrou
ghpu
t [M
B/s
]IO Size [bytes]
Database Buffer Size (Mtron)• The throughput is improved when increasing the buffer size
30
Commercial DBMS MySQL
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
8M 16M 32M 64M 128M 256M 512M 1G
Tran
sact
ion
Thr
ough
put [
tpm
]
Buffer Size [bytes]
ext2fs nilfs2
0
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
4M 8M 16M 32M 64M 128M 256M 512M 1G
Tran
sact
ion
Thr
ough
put [
tpm
]
Buffer Size [bytes]
ext2fs nilfs2
Workload Property
Transaction Type
IO Property
% of mix
readintensive
normalwriteintensive
New Order Read‐Write 4.35 43.48 96.00
Payment Read‐Write 4.35 43.48 1.00
Delivery Read‐Write 4.35 4.35 1.00
Stock Level Read‐Only 43.48 4.35 1.00
Order Status Read‐Only 43.48 4.35 1.00
31
• Measure with three types of workloads
• Speedup of nilfs2 over ext2fs is increasing when the percentage of read‐write transactions is increased
0
2
4
6
8
10
12
0
5,000
10,000
15,000
20,000
25,000
read intensive
normal write intensive
read intensive
normal write intensive
Commercial DBMS MySQL
spee
dup
Tran
sact
ion
Thr
ough
put [
tpm
]
Buffer Size [bytes]
ext2fs nilfs2 speedup
IO Schedulers
• Noop– No operation
• Anticipatory– Merge the IO requests, and re‐order in an elevation manner
• Deadline– Impose the deadline for each request
• Completely Fair Queuing (CFQ)– Balance the service time of IOs among processes
32
Transaction Throughput with IO Schedulers
• IO scheduling does not affect the transaction throughput largely.
33
0
5000
10000
15000
20000
25000
ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2 ext2fs nilfs2
Mtron Intel OCZ Mtron Intel OCZ
Commercial DBMS MySQL
Tran
sact
ion
Thr
ough
put [
tpm
]
Noop Anticipatory Deadline CFQ
Conclusion and Future Work
• We study the basic performance characteristics of flash SSDs
• We measure and analyze the application performance and the IO behavior on three flash SSDs and two file systems with TPC‐C benchmark. – Transaction Throughput
– Logical IO Throughput
– Physical IO Throughput
• We plan to study IO path management techniques for database applications running on flash SSDs.
34
Q&A
Thank you very much!
35