SSD/Flash for Modern Databases - Percona · PDF fileIn this Presentation Flash technology...
Transcript of SSD/Flash for Modern Databases - Percona · PDF fileIn this Presentation Flash technology...
Peter Zaitsev, CEO, Percona
November 1, 2014 Highload++ 2014
Moscow,Russia
SSD/Flash for Modern Databases
www.percona.com 2
Percona
• Percona Server
• Percona Xtrabackup
• Percona XtraDB Cluster
• Percona Toolkit
We love Open Source
Software
• Consulting
• Support
• Managed Services
We want to help you to
succeed with MySQL and
Beyond
www.percona.com 3
In this Presentation
Flash technology overview
Review some of the available technology
What does this mean for databases ?
Specific opportunities for MySQL
www.percona.com 4
Before SSDs
www.percona.com 5
There were HDDs
Good at Sequential Read/Writes
RT=Seek Time + Rotation Latency
Reads/Write – Similar Latency
No Specific Write Limits
Retain data for a long time
One IO Request in Parallel
Low cost per GB
www.percona.com 6
RAID and SAN
www.percona.com 7
Using Many HDDs together
Caching Reads
Buffering Writes (Writeback Cache)
Better Sequential Read/Write speed
Better throughput at high concurrency
Higher IO latencies for uncached IO
www.percona.com 8
Flash Revolution
Use Flash chips instead
of platters
No moving parts
No seeks
www.percona.com 9
NAND Flash
Cell
Page/Read Block
Erase Block
Write but no overwrite
Wears with writes (erases)
www.percona.com 10
Writing to the Flash
• Set all bits to “1111111…”
Erase
• Set some of the bits to 0: “0100111..”
Write
• Impossible. Do Erase, when Write
Change Zero to one
www.percona.com 11
Types of NAND Flash
From AnandTech:
www.percona.com 12
Flash Storage Design
Cache
Battery/Super Capacitor
Controller + Complex Firmware
Built-in Parallelism
www.percona.com 13
Flash Controller Tasks
Write wear leveling
Garbage collection
Error correction
Bad block mapping
Read scrubbing
Read disturb management
Encryption
www.percona.com 14
Flash Properties
Lots of IOs per device! (100K+)
Less random IO penalty
Writes more expensive than reads (but can be faster)
Limited by amount of writes
Limited retention
Concurrent execution on single device
Fast write acknowledgement (safe or not)
Can burst writes
www.percona.com 15
Flash Interface Designs
DIMM
PCI-E
SFF-8639
SATA/SAS
FC and Network
www.percona.com 16
Transitioning
AHCI NVMe
www.percona.com 17
AHCI vs NVMe
• Source: AnandTech.com
www.percona.com 18
Sandisk ULLtraDIMM
www.percona.com 19
HGST Virident
www.percona.com 20
Sandisk FusionIO
www.percona.com 21
Intel P3700
www.percona.com 22
Intel 730 (SATA)
www.percona.com 23
mSATA
www.percona.com 24
M.2 Interface
www.percona.com 25
Violin Memory
www.percona.com 26
“Consumer” vs “Enterprise”
Performance
Endurance
Durability
Retention
Encryption
www.percona.com 27
Not your HDD
All HDDs are the same; All SSDs are
different
www.percona.com 28
Evaluation
Performance changes over time
Empty Space Matters
Complex internals
Watch stability carefully
www.percona.com 29
How Flash Fails
Clear write amount defined EOL
(but often can handle a lot more)
One day… it’s gone
“Power Loss Protection”
Internal ECC and redundancy
www.percona.com 30
To RAID or not to RAID ?
More valuable for consumer grade
Watch for good Flash support
RAID controller logic may slow things down
Use a redundant array of inexpensive servers instead?
www.percona.com 31
Redundancy
Device internal redundancy
Hardware RAID
Software RAID
Filesystem “RAID”
www.percona.com 32
OS Support
Flash support is actively being improved
TRIM
Sparse Files
www.percona.com 33 www.percona.com
Flash And Databases
www.percona.com 34
Database History
Most have been designed in HDD time
Optimize for sequential IO
Count on cheap sequential writes
RAID, BBU to improve performance
www.percona.com 35
It’s time for Flash
Your OLTP Database should
live on Flash
www.percona.com 36
But What Flash ?
Pick a flash type that is right for your
application
www.percona.com 37
IO vs Memory
www.percona.com 38
Warmup
Much faster warmup times
Even if the database fits in memory, SSD might be justified
www.percona.com 39
Tolerate more IO bound load
• 5ms
• Can do 20 IO/s for 100ms response time (non parallel)
HDD
• 0.1ms
• Can do 1000 IO/s for 100ms response time (non parallel)
Flash
www.percona.com 40
Endurance
Might be a top consideration
www.percona.com 41
Endurance Math
• 4400GB/day over 5 Years
• 1400MB/sec peak writes
• 66 days at peak write throughput
HGST FlashMax III 2200GB
• 72TB total life time writes
• 400MB/sec write
• 52 hours at peak write throughput
Crucial M500
960GB
www.percona.com 42
Databases and Flash
How do we optimize databases to us
Flash best?
www.percona.com 43
“Torn Page” problem
Flash can avoid this with little cost due to internal design
FusionIO NVMFS (Atomic Writes)
Copy-on-Write File Systems
• ZFS
• BTRFS
Filesystem level data journaling less preferred
• data=journal for EXT4
Skip-Innodb-double-write
www.percona.com 44
Fast IO Path
Bypass Caching O_DIRECT
Native Asynchronous IO
Efficient Checksuming
Innodb_checksum_algorithm=crc32
Innodb_flush_method=O_DIRECT
www.percona.com 45
IO Cost Accounting
Sequential vs Random IO balance
IO vs CPU Balance
Smaller page sizes might make sense
• innodb_page_size=4K
www.percona.com 46
Less Pre-fetching
Most pre-fetched data must be used
Often best to try It out
www.percona.com 47
Less merging on flushing
Do not assume flushing multiple sequential dirty pages has same cost
Innodb_flush_neighbors=0
www.percona.com 48
Less Space on Disk
Innodb Compression (2x typical)
TokuDB Compression (5-10x typical)
Archiving data off OLTP System
www.percona.com 49
Less Writes on Flash
Hybrid Flash/SSD System
Transactional Logs, Other logs on the HDD with RAID and BBU
Small Temporary objects on tmpfs
Innodb_log_file_size=<LARGE>
www.percona.com 50
Logs on RAID can be fast
www.percona.com 51
Single Intel 730 Sysbench
www.percona.com 52
IOPS
www.percona.com 54
Is Flash Too Fast ?
• Multiple instances might scale better
www.percona.com 55
Other Thoughts
Host hardware and OS matter, especially with high end flash
Virtualization has higher relative overhead
Network higher relative overhead
www.percona.com 56 www.percona.com
Peter Zaitsev
@PeterZaitsev
https://www.linkedin.com/in/peterzaitsev
Thank You!