How Shit Works: Storage
-
Upload
tomer-gabel -
Category
Software
-
view
399 -
download
0
Transcript of How Shit Works: Storage
How Shit Works:
Storage
Tomer Gabel, Wix
@ GeeCON Kraków 2016
Like all good stories…
• We’ll start with a question.
• “What’s wrong with this picture?”
Like all good stories…
• We’ll start with a question.
• “What’s wrong with this picture?”
MY, OH, MY.WHAT COULD IT BE?
Axioms
• Not a trick question
– Servers are properly configured
– System architecture makes sense
– No obvious bugs
– No scheduled jobs
• So what else goes bump in the night?
PROLOGUE
“A LAUGHABLE CLAIM”
I/O is simple
• Just open a file, write, flush, close
• Nothing to it, right?
HDDApplication File
I/O is simple
• A little closer…
HDD
Application File
Kernel
File system(ext4)
Virtual File
System Logical Volume Manager
I/O scheduler
SCSI driver stack
I/O is simple
• But really…
HDD
Application File
Kernel
Hardware
Storage Subsystem
System Bus Drivers
PCI Express Bus
SATA Controller
THE ONION OF ABSTRACTION
ACT ITHESE BOOTSARE MADEFOR WALKIN’
Everybody knows...
• Sequential access is fast
• Random access is slow
• … so what?
Everybody knows…
“Disk seeks are a huge performance bottleneck… When the amount of data starts to grow so large that effective caching becomes impossible… you need at least one disk seek to read and a couple of disk seeks to write things.”
-- MySQL Reference Manual (8.12.3)
Everybody knows…
“Disk seeks are a huge performance bottleneck… When the amount of data starts to grow so large that effective caching becomes impossible… you need at least one disk seek to read and a couple of disk seeks to write things.”
-- MySQL Reference Manual (8.12.3)
But why?
Rotational Latency
Rotational Latency
Rotational Latency
Rotational Latency
Throughput
• So you understand
latency…
• What about throughput?
• Depends on two factors:
– Areal density
– Newtonian physics
Areal Density
Interlude: Math
• Rotation is fixed
– Constant angular
velocity (CAV)
• Newton tells us that…
v = ω ∙ r
• Throughput increases
with radius!
Interlude: Math
• Commodity drives
are available at:
– 5400-15000 RPM
– Usually 7200 RPM
• What does it mean
for latency?
7200
60
= 120 Revolutions/ Second
1
120
= 0.08333
~ 8.33ms!
In practice?
• Modern drives give you:
200+ MB/s
300 IOPS
• Pure random access nets only 1.2MB/s!
RIGHT.WHAT CAN WE DO ABOUT IT?
Fine-tuning
• Provision more RAM
• Careful index structure
– Represent IPs as UNSIGNED INT for 75% reduction
– Implement better UUIDs¹for 30% reduction
¹ Store UUID in an optimized way, Percona blog
… or use a sledgehammer!
• RAID 0 (and variants)
employ striping
• Data is distributed to
multiple spindles
• If it sounds familiar…
– It is!
– We call it “sharding”
It’s turtles all the way down
• Don’t jump to
conclusions!
– RAID 0 is impractical
– RAID 5 may be slow
– RAID 10 is expensive
– etc.
• Do your homework
• Benchmark!
ACT II: I’LL USE MY CREDIT CARD
Let’s talk SSDs
• Non-volatile RAM
• Lots of IOPS
• Expensive :-)
• Same caveats
apply…
Let’s talk SSDs
• Value starts at “1”
• Electrons accrue in the
floating gate
• After programming,
value becomes “0”
• Electrons are drained
to reset value to “0”
Surprise and Terror
• “Draining” is destructive!
• Limited erases
• Limited lifespan!
Wear Leveling
Caveats, remember?
• Addressing
– Cells (1 bit) – not addressable
Caveats, remember?
• Addressing
– Cells (1 bit) – not addressable
– Pages (0.5-8KB)
Caveats, remember?
• Addressing
– Cells (1 bit) – not addressable
– Pages (0.5-8KB)
– Blocks (32-64 pages)
Caveats, remember?
• Addressing
– Cells (1 bit) – not addressable
– Pages (0.5-8KB)
– Blocks (32-64 pages)
• Why do you care?
– Reads/writes on a page
– But erasure on a block
Write Amplification
1
1
1
1
1
Δ = 1 bit Δ = 1 block!
Surprising Results
• Defragmentation
– Relocates blocks
– Contiguous files
– Lower LBAs
– Background job
• Bad, bad, bad!
– No benefit with SSDs
– Major write load!
Background GC
7
5
6
1
2
Block A Block B
Block C Block D
1 2 5
6 7
Block A Block B
Block C Block D
Surprising Results
• What happens when
you delete file?
– Not much
– Bit flip on file table
– Space is not reclaimed
• Result?
– SATA TRIM command
7
5
6
1
2
Block A Block B
Block C Block D
SSD Takeaways
• A moving target
–File systems
–Data structures
–Longevity
• As usual:
–Benchmark
–Monitor
EPILOGUE
“LET ME EMBRACE THEE, SOUR ADVERSITY,FOR WISE MEN SAYIT IS THE WISEST COURSE.”
WE’RE DONE HERE!… AND YES, WE’RE HIRING :-)
Thank you for listening
@tomerg
http://il.linkedin.com/in/tomergabel
Wix Engineering blog:
http://engineering.wix.com