SSD For Databases - Percona · PDF fileSSD For Databases Vadim Tkachenko ... In this...
Transcript of SSD For Databases - Percona · PDF fileSSD For Databases Vadim Tkachenko ... In this...
SSD For DatabasesVadim Tkachenko
Peter ZaitsevPercona
April 2016
In this Presentation
2
Flashtechnologyoverview
Reviewsomeoftheavailablehardware
Whatdoesthismeanfordatabases?
SpecificopportunitiesforMySQL
There were HDDs…
4
GoodatSequentialRead/Writes
RT=SeekTime+RotationLatency
Reads/Write– SimilarLatency
NoSpecificWriteLimits
Retaindataforalongtime
LowcostperGB
RAID and SAN
Using Many HDDs together
6
CachingReads
BufferingWrites(WritebackCache)
BetterSequentialRead/Writespeed
Betterthroughputathighconcurrency
Higher IOlatenciesforuncachedIO
Flash Revolution
7
UseFlashchipsinsteadofplatters
Nomovingparts Noseeks
NAND Flash
8
Cell
Page/ReadBlock
EraseBlock
Writebutnooverwrite
Wearswithwrites(erases)
Writing to the Flash
9
• Setallbitsto“1111111…”Erase
• Setsomeofthebitsto0:“0100111..”Write
• Impossible.DoErase,whenWriteChangeZerotoone
Types of NAND Flash
From AnandTech:
10
Flash Storage Design
11
Cache
Battery/SuperCapacitor
Controller+ComplexFirmware
Built-inParallelism
Flash Controller and Firmware Tasks
12
Writewearleveling
Garbagecollection
Errorcorrection
Badblockmapping
Readdisturbmanagement
Encryption
Flash Properties
13
LotsofIOsperdevice!(100K+)
LessrandomIOpenalty
Writesmoreexpensivethanreads(butcanbefaster)
Limitedbyamountofwrites
Limitedretention
Concurrentexecutiononsingledevice
Fastwriteacknowledgement(safeornot)
Flash Interface Designs
14
DIMM
PCI-E
SFF-8639
SATA/SAS
FCandNetwork
Transitioning
15
AHCI NVMe
AHCI vs NVMe
Source: AnandTech.com
16
Some Product Examples
17
ProductsandLeadersarechangingquickly
Sandisk ULLtraDIMM
18
HGST Virident
19
Sandisk FusionIO
20
Intel P3x00
21
Intel 750
22
Intel 730 (SATA)
23
mSATA
24
M.2 Interface
25
Violin Memory
26
“Consumer” vs “Enterprise”
27
Performance
Endurance
Durability
Retention
Encryption
Not your HDD
28
AllHDDsarethesame;AllSSDsaredifferent
Evaluation
29
Performancechangesovertime
EmptySpaceMatters
Complexinternals
Watchstabilitycarefully
How Flash Fails
30
WriteamountdefinedEOL(butoftencanhandlealotmore)
Oneday…it’sgone
InternalECCandredundancy
To RAID or not to RAID ?
31
Moreimportantforconsumergrade
CheckRAIDcontrollerforgoodFlashsupport
RAIDcontrollerlogicmayslowthingsdown
Usearedundantarrayofinexpensiveserversinstead?
Redundancy
32
Deviceinternalredundancy
HardwareRAID
SoftwareRAID
Filesystem“RAID”
OS Support
33
Flashsupportisactivelybeingimproved
TRIM
SparseFiles
www.percona.com
Flash And Databases
Database History
35
MosthavebeendesignedinHDDtime
OptimizeforsequentialIO
Countoncheapsequentialwrites
RAID,BBUtoimproveperformance
It’s time for Flash
36
YourOLTPDatabaseshouldliveonFlash
Intel P3600 – 1.6TB/$2450
37
Samsung SM863 – 1.9TB/$1500
38
Samsung 850 PRO – 2TB/$850
39
But What Flash ?
40
Pickaflashtypethatisrightforyourapplication
IO vs Memory
41
Warmup
42
Muchfasterwarmuptimes
Evenifthedatabasefitsinmemory,SSDmightbejustified
Tolerate more IO bound load
43
• 5ms• Cando20IO/sfor100msresponsetime(nonparallel)
HDD
• 0.1ms• Cando1000IO/sfor100msresponsetime(nonparallel)
Flash
Endurance
44
Mightbeatopconsideration
Endurance Math
45
• 4400GB/dayover5Years• 1400MB/secpeakwrites• 66daysatpeakwritethroughput
HGSTFlashMaxIII2200GB
• 72TBtotallifetimewrites• 400MB/secwrite• 52hoursatpeakwritethroughput
CrucialM500960GB
Databases and Flash
46
HowdoweoptimizedatabasestousFlashbest?
Storage Engines
47
Innodb
TokuDB
“Torn Page” problem
48
Flashcanavoidthiswithlittlecostduetointernaldesign
FusionIO NVMFS(AtomicWrites)
Copy-on-WriteFileSystems• ZFS• BTRFS
Filesystem leveldatajournalinglesspreferred• data=journalforEXT4
Skip-Innodb-double-write
Fast IO Path
49
BypassCachingO_DIRECT
NativeAsynchronousIO
EfficientChecksuming
Innodb_checksum_algorithm=crc32
Innodb_flush_method=O_DIRECT
IO Cost Accounting
50
SequentialvsRandomIObalance
IOvs CPUBalance
Smallerpagesizesmightmakesense• innodb_page_size=4K
Less Pre-fetching
51
Mostpre-fetcheddatamustbeused
OftenbesttotryItout
Less merging on flushing
52
Donotassumeflushingmultiplesequentialdirtypageshassamecost
Innodb_flush_neighbors=0
Less Space on Flash
53
InnodbCompression(2xtypical)
TokuDBCompression(5-10xtypical)
ArchivingdataoffOLTPSystem
Less Writes on Flash
54
HybridFlash/HDDSystem
TransactionalLogs,OtherlogsontheHDDwithRAIDandBBU
SmallTemporaryobjectsontmpfs
Innodb_log_file_size=<LARGE>
Logs on RAID can be fast
55
Single Intel 730 Sysbench
56
IOPS
57
Consistency
58
Is Flash Too Fast ?
Multiple instances might scale better
59
Other Thoughts
60
HosthardwareandOSmatter,especiallywithhighendflash
Virtualizationhashigherrelativeoverhead
Networkhigherrelativeoverhead
Thank you!
61
[email protected]://www.linkedin.com/in/vadimtk
@VadimTk
[email protected]://www.linkedin.com/in/peterzaitsev
https://twitter.com/peterzaitsev