How Shit Works: Storage

Post on 08-Jan-2017

399 views 0 download

Transcript of How Shit Works: Storage

How Shit Works:

Storage

Tomer Gabel, Wix

@ GeeCON Kraków 2016

Like all good stories…

• We’ll start with a question.

• “What’s wrong with this picture?”

Like all good stories…

• We’ll start with a question.

• “What’s wrong with this picture?”

MY, OH, MY.WHAT COULD IT BE?

Axioms

• Not a trick question

– Servers are properly configured

– System architecture makes sense

– No obvious bugs

– No scheduled jobs

• So what else goes bump in the night?

PROLOGUE

“A LAUGHABLE CLAIM”

I/O is simple

• Just open a file, write, flush, close

• Nothing to it, right?

HDDApplication File

I/O is simple

• A little closer…

HDD

Application File

Kernel

File system(ext4)

Virtual File

System Logical Volume Manager

I/O scheduler

SCSI driver stack

I/O is simple

• But really…

HDD

Application File

Kernel

Hardware

Storage Subsystem

System Bus Drivers

PCI Express Bus

SATA Controller

THE ONION OF ABSTRACTION

ACT ITHESE BOOTSARE MADEFOR WALKIN’

Everybody knows...

• Sequential access is fast

• Random access is slow

• … so what?

Everybody knows…

“Disk seeks are a huge performance bottleneck… When the amount of data starts to grow so large that effective caching becomes impossible… you need at least one disk seek to read and a couple of disk seeks to write things.”

-- MySQL Reference Manual (8.12.3)

Everybody knows…

“Disk seeks are a huge performance bottleneck… When the amount of data starts to grow so large that effective caching becomes impossible… you need at least one disk seek to read and a couple of disk seeks to write things.”

-- MySQL Reference Manual (8.12.3)

But why?

Rotational Latency

Rotational Latency

Rotational Latency

Rotational Latency

Throughput

• So you understand

latency…

• What about throughput?

• Depends on two factors:

– Areal density

– Newtonian physics

Areal Density

Interlude: Math

• Rotation is fixed

– Constant angular

velocity (CAV)

• Newton tells us that…

v = ω ∙ r

• Throughput increases

with radius!

Interlude: Math

• Commodity drives

are available at:

– 5400-15000 RPM

– Usually 7200 RPM

• What does it mean

for latency?

7200

60

= 120 Revolutions/ Second

1

120

= 0.08333

~ 8.33ms!

In practice?

• Modern drives give you:

200+ MB/s

300 IOPS

• Pure random access nets only 1.2MB/s!

RIGHT.WHAT CAN WE DO ABOUT IT?

Fine-tuning

• Provision more RAM

• Careful index structure

– Represent IPs as UNSIGNED INT for 75% reduction

– Implement better UUIDs¹for 30% reduction

¹ Store UUID in an optimized way, Percona blog

… or use a sledgehammer!

• RAID 0 (and variants)

employ striping

• Data is distributed to

multiple spindles

• If it sounds familiar…

– It is!

– We call it “sharding”

It’s turtles all the way down

• Don’t jump to

conclusions!

– RAID 0 is impractical

– RAID 5 may be slow

– RAID 10 is expensive

– etc.

• Do your homework

• Benchmark!

ACT II: I’LL USE MY CREDIT CARD

Let’s talk SSDs

• Non-volatile RAM

• Lots of IOPS

• Expensive :-)

• Same caveats

apply…

Let’s talk SSDs

• Value starts at “1”

• Electrons accrue in the

floating gate

• After programming,

value becomes “0”

• Electrons are drained

to reset value to “0”

Surprise and Terror

• “Draining” is destructive!

• Limited erases

• Limited lifespan!

Wear Leveling

Caveats, remember?

• Addressing

– Cells (1 bit) – not addressable

Caveats, remember?

• Addressing

– Cells (1 bit) – not addressable

– Pages (0.5-8KB)

Caveats, remember?

• Addressing

– Cells (1 bit) – not addressable

– Pages (0.5-8KB)

– Blocks (32-64 pages)

Caveats, remember?

• Addressing

– Cells (1 bit) – not addressable

– Pages (0.5-8KB)

– Blocks (32-64 pages)

• Why do you care?

– Reads/writes on a page

– But erasure on a block

Write Amplification

1

1

1

1

1

Δ = 1 bit Δ = 1 block!

Surprising Results

• Defragmentation

– Relocates blocks

– Contiguous files

– Lower LBAs

– Background job

• Bad, bad, bad!

– No benefit with SSDs

– Major write load!

Background GC

7

5

6

1

2

Block A Block B

Block C Block D

1 2 5

6 7

Block A Block B

Block C Block D

Surprising Results

• What happens when

you delete file?

– Not much

– Bit flip on file table

– Space is not reclaimed

• Result?

– SATA TRIM command

7

5

6

1

2

Block A Block B

Block C Block D

SSD Takeaways

• A moving target

–File systems

–Data structures

–Longevity

• As usual:

–Benchmark

–Monitor

EPILOGUE

“LET ME EMBRACE THEE, SOUR ADVERSITY,FOR WISE MEN SAYIT IS THE WISEST COURSE.”

WE’RE DONE HERE!… AND YES, WE’RE HIRING :-)

Thank you for listening

tomer@tomergabel.com

@tomerg

http://il.linkedin.com/in/tomergabel

Wix Engineering blog:

http://engineering.wix.com