How Shit Works: Storage

44
How Shit Works: Storage Tomer Gabel, Wix @ GeeCON Kraków 2016

Transcript of How Shit Works: Storage

Page 1: How Shit Works: Storage

How Shit Works:

Storage

Tomer Gabel, Wix

@ GeeCON Kraków 2016

Page 2: How Shit Works: Storage

Like all good stories…

• We’ll start with a question.

• “What’s wrong with this picture?”

Page 3: How Shit Works: Storage

Like all good stories…

• We’ll start with a question.

• “What’s wrong with this picture?”

Page 4: How Shit Works: Storage

MY, OH, MY.WHAT COULD IT BE?

Page 5: How Shit Works: Storage

Axioms

• Not a trick question

– Servers are properly configured

– System architecture makes sense

– No obvious bugs

– No scheduled jobs

• So what else goes bump in the night?

Page 6: How Shit Works: Storage

PROLOGUE

“A LAUGHABLE CLAIM”

Page 7: How Shit Works: Storage

I/O is simple

• Just open a file, write, flush, close

• Nothing to it, right?

HDDApplication File

Page 8: How Shit Works: Storage

I/O is simple

• A little closer…

HDD

Application File

Kernel

File system(ext4)

Virtual File

System Logical Volume Manager

I/O scheduler

SCSI driver stack

Page 9: How Shit Works: Storage

I/O is simple

• But really…

HDD

Application File

Kernel

Hardware

Storage Subsystem

System Bus Drivers

PCI Express Bus

SATA Controller

Page 10: How Shit Works: Storage

THE ONION OF ABSTRACTION

Page 11: How Shit Works: Storage

ACT ITHESE BOOTSARE MADEFOR WALKIN’

Page 12: How Shit Works: Storage

Everybody knows...

• Sequential access is fast

• Random access is slow

• … so what?

Page 13: How Shit Works: Storage

Everybody knows…

“Disk seeks are a huge performance bottleneck… When the amount of data starts to grow so large that effective caching becomes impossible… you need at least one disk seek to read and a couple of disk seeks to write things.”

-- MySQL Reference Manual (8.12.3)

Page 14: How Shit Works: Storage

Everybody knows…

“Disk seeks are a huge performance bottleneck… When the amount of data starts to grow so large that effective caching becomes impossible… you need at least one disk seek to read and a couple of disk seeks to write things.”

-- MySQL Reference Manual (8.12.3)

Page 15: How Shit Works: Storage

But why?

Page 16: How Shit Works: Storage

Rotational Latency

Page 17: How Shit Works: Storage

Rotational Latency

Page 18: How Shit Works: Storage

Rotational Latency

Page 19: How Shit Works: Storage

Rotational Latency

Page 20: How Shit Works: Storage

Throughput

• So you understand

latency…

• What about throughput?

• Depends on two factors:

– Areal density

– Newtonian physics

Page 21: How Shit Works: Storage

Areal Density

Page 22: How Shit Works: Storage

Interlude: Math

• Rotation is fixed

– Constant angular

velocity (CAV)

• Newton tells us that…

v = ω ∙ r

• Throughput increases

with radius!

Page 23: How Shit Works: Storage

Interlude: Math

• Commodity drives

are available at:

– 5400-15000 RPM

– Usually 7200 RPM

• What does it mean

for latency?

7200

60

= 120 Revolutions/ Second

1

120

= 0.08333

~ 8.33ms!

Page 24: How Shit Works: Storage

In practice?

• Modern drives give you:

200+ MB/s

300 IOPS

• Pure random access nets only 1.2MB/s!

Page 25: How Shit Works: Storage

RIGHT.WHAT CAN WE DO ABOUT IT?

Page 26: How Shit Works: Storage

Fine-tuning

• Provision more RAM

• Careful index structure

– Represent IPs as UNSIGNED INT for 75% reduction

– Implement better UUIDs¹for 30% reduction

¹ Store UUID in an optimized way, Percona blog

Page 27: How Shit Works: Storage

… or use a sledgehammer!

• RAID 0 (and variants)

employ striping

• Data is distributed to

multiple spindles

• If it sounds familiar…

– It is!

– We call it “sharding”

Page 28: How Shit Works: Storage

It’s turtles all the way down

• Don’t jump to

conclusions!

– RAID 0 is impractical

– RAID 5 may be slow

– RAID 10 is expensive

– etc.

• Do your homework

• Benchmark!

Page 29: How Shit Works: Storage

ACT II: I’LL USE MY CREDIT CARD

Page 30: How Shit Works: Storage

Let’s talk SSDs

• Non-volatile RAM

• Lots of IOPS

• Expensive :-)

• Same caveats

apply…

Page 31: How Shit Works: Storage

Let’s talk SSDs

• Value starts at “1”

• Electrons accrue in the

floating gate

• After programming,

value becomes “0”

• Electrons are drained

to reset value to “0”

Page 32: How Shit Works: Storage

Surprise and Terror

• “Draining” is destructive!

• Limited erases

• Limited lifespan!

Page 33: How Shit Works: Storage

Wear Leveling

Page 34: How Shit Works: Storage

Caveats, remember?

• Addressing

– Cells (1 bit) – not addressable

Page 35: How Shit Works: Storage

Caveats, remember?

• Addressing

– Cells (1 bit) – not addressable

– Pages (0.5-8KB)

Page 36: How Shit Works: Storage

Caveats, remember?

• Addressing

– Cells (1 bit) – not addressable

– Pages (0.5-8KB)

– Blocks (32-64 pages)

Page 37: How Shit Works: Storage

Caveats, remember?

• Addressing

– Cells (1 bit) – not addressable

– Pages (0.5-8KB)

– Blocks (32-64 pages)

• Why do you care?

– Reads/writes on a page

– But erasure on a block

Page 38: How Shit Works: Storage

Write Amplification

1

1

1

1

1

Δ = 1 bit Δ = 1 block!

Page 39: How Shit Works: Storage

Surprising Results

• Defragmentation

– Relocates blocks

– Contiguous files

– Lower LBAs

– Background job

• Bad, bad, bad!

– No benefit with SSDs

– Major write load!

Page 40: How Shit Works: Storage

Background GC

7

5

6

1

2

Block A Block B

Block C Block D

1 2 5

6 7

Block A Block B

Block C Block D

Page 41: How Shit Works: Storage

Surprising Results

• What happens when

you delete file?

– Not much

– Bit flip on file table

– Space is not reclaimed

• Result?

– SATA TRIM command

7

5

6

1

2

Block A Block B

Block C Block D

Page 42: How Shit Works: Storage

SSD Takeaways

• A moving target

–File systems

–Data structures

–Longevity

• As usual:

–Benchmark

–Monitor

Page 43: How Shit Works: Storage

EPILOGUE

“LET ME EMBRACE THEE, SOUR ADVERSITY,FOR WISE MEN SAYIT IS THE WISEST COURSE.”

Page 44: How Shit Works: Storage

WE’RE DONE HERE!… AND YES, WE’RE HIRING :-)

Thank you for listening

[email protected]

@tomerg

http://il.linkedin.com/in/tomergabel

Wix Engineering blog:

http://engineering.wix.com