Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng...

27
Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin Microsoft Corporation USENIX ATC, Boston, MA, June 2012

Transcript of Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng...

Page 1: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

Erasure Coding in Windows Azure Storage

Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin Microsoft Corporation USENIX ATC, Boston, MA, June 2012

Page 2: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

2

Page 3: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

North America Region Europe Region Asia Pacific Region

3

Major datacenter

CDN PoPs

Page 4: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

Windows Azure Storage

• Blobs

• CDN

• Drives

• Tables

• Queues

4

Page 5: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

Massive Distributed Storage Systems in the Cloud

nMTTFMTTF OneFirst /

5

Page 6: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

Replication vs. Erasure Coding

a=2

b=3

a=2

b=3 b=3

a=2

a+b=5

replication

erasure

coding

a=2

b=3

a=2

6

2a+b=7

Page 7: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

WAS Stream Layer

7

Storage Location Service

Storage Stamp

Partition Layer

Stream Layer

Intra-stamp replication

Storage Stamp

Partition Layer

Stream Layer

Intra-stamp replication

Inter-stamp (Geo)

replication

Page 8: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

Microsoft Confidential 8

Page 9: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

Conventional Erasure Coding – Reed-Solomon 6+3

sealed extent ( 3 GB )

9

d0 d1 d2 d3 d4 d5

p1

p2

p3

Page 10: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

Designing For Erasure Coding - 1

10

Page 11: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

Designing For Erasure Coding - 2

11

Page 12: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

Space Savings with RS 6+3 (over 3-replication)

12

Savings

Replicated

Data Fragments

Code Fragments

Page 13: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

d0 d1 d2 d3 d4 d5 p0 p1 p2

sealed extent ( 3 GB )

overhead

(6+3)/6 = 1.5x

13

Page 14: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

p0 p1 p2 p3

d0 d1 d2 d3 d4 d5 p0 p1 p2

sealed extent ( 3 GB )

d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11

overhead

(6+3)/6 = 1.5x

(12+4)/12 = 1.33x

14

Page 15: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

reconstruction read is expensive –

reading d0 during unavailability requiring 12 fragments (12 disk I/Os, 12 net transfers)

d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11

p0

p1

p3

p2

15

Page 16: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

16

can we achieve1.33x overhead

while requiring only 6 fragments for reconstruction?

Page 17: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

17

optimize erasure coding for cloud storage making single failure reconstruction most efficient

Page 18: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

y0 y1 y2 y3 y4 y5 x0 x1 x2 x3 x4 x5

• LRC12+2+2: 12 data fragments, 2 local parities and 2 global parities

• storage overhead: (12 + 2 + 2) / 12 = 1.33x

• Local parity: reconstruction requires only 6 fragments

18

sealed extent ( 3 GB )

Page 19: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

• LRC12+2+2 needs to recover • arbitrary 3 failures

• as many 4 failures as possible

19

Page 20: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

y0 y1 y2 y3 y4 y5 x0 x1 x2 x3 x4 x5

q0

q1

py px

recover y1 from py (group y)

20

Page 21: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

y0 y1 y2 y3 y4 y5 x0 x1 x2 x3 x4 x5

q0

q1

py px

recover y1 from py (group y)

recover x0 and x2 from q0 and q1

21

Page 22: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

y0 y1 y2 y3 y4 y5 x0 x1 x2 x3 x4 x5

q0

q1

py px

how to recover the 4 failures and all similar cases?

(see paper)

22

Page 23: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

• LRC12+2+2:

• reliability: RS12+4 > LRC12+2+2 > RS6+3

23

Page 24: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

24

Page 25: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

0

2

4

6

8

10

12

1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2

Re

con

stru

ctio

n R

ead

Co

st

Storage Overhead

Reed-Solomon

LRC

same overhead

half cost (6 3)

same cost

1.5x 1.33x

LRC (12+4+2)

LRC (12+2+2)

RS6+3

25

RS10+4

• RS10+4: HDFS-RAID

at Facebook

• RS6+3: GFS II (Colossus)

at Google

RS12+4

Page 26: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

RS (6 + 3)

reconstruction cost = 6

LRC (14 + 2 + 2)

reconstruction cost = 7

RS (14 + 4)

reconstruction cost = 14

26

14% savings

Page 27: Erasure Coding in Windows Azure Storage - USENIX · Erasure Coding in Windows Azure Storage Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin

http://blogs.msdn.com/b/windowsazurestorage/

27