Network Coding for Distributed Storage Systems

41
Network Coding for Distributed Storage Systems IEEE TRANSACTIONS ON INFORMATION THEORY, SEPTEMBER 2010 Alexandros G. Dimakis Brighten Godfrey Yunnan Wu Martin J. Wainwright Kannan Ramchandran 1

description

Network Coding for Distributed Storage Systems. IEEE TRANSACTIONS ON INFORMATION THEORY, SEPTEMBER 2010 Alexandros G. Dimakis Brighten Godfrey Yunnan Wu Martin J. Wainwright Kannan Ramchandran. Outline. Introduction Background Analysis Evaluation Conclusion. Introduction. - PowerPoint PPT Presentation

Transcript of Network Coding for Distributed Storage Systems

Page 1: Network Coding for Distributed Storage Systems

1

Network Coding for Distributed Storage Systems

IEEE TRANSACTIONS ON INFORMATION THEORY, SEPTEMBER 2010

Alexandros G. Dimakis Brighten Godfrey

Yunnan WuMartin J. WainwrightKannan Ramchandran

Page 2: Network Coding for Distributed Storage Systems

2

Outline

ه Introductionه Backgroundه Analysisه Evaluationه Conclusion

Page 3: Network Coding for Distributed Storage Systems

3

Introduction

ه Distributed storage systems provide reliable access to data through redundancy spread over individually unreliable nodes.

ه Storing data in distributed storage systemsه the encoded data are spread across nodes.ه require less redundancy than replication.ه replace stored data periodically.

Page 4: Network Coding for Distributed Storage Systems

4

Introduction

ه Key issue in distributed storage systems.ه repair bandwidthه storage space

ه How to generate encoded data in a distributed way as little data as possible ?

Page 5: Network Coding for Distributed Storage Systems

5

MDS Codes

ه A common practice to repair from a single node failure for an erasure coded system.1. a new node to reconstruct the whole encoded data object.2. then, generate just one encoded block.

ه Maximum Distance Separable (MDS) code.ه (n, k)-MDS propertyه recover original file by any k set of encoded data.

Page 6: Network Coding for Distributed Storage Systems

6

MDS Codes

File divide

M/k

M/k

M/k

M/k

encode store at n nodes

MDS encode

Page 7: Network Coding for Distributed Storage Systems

7

Introduction

ه Redundancy must be continually refreshed as nodes fail in distributed storage systems.ه large data transfers across the network.

Page 8: Network Coding for Distributed Storage Systems

8

Introduction

ه The erasure codes can be repaired without communicating the whole data object.

ه (4, 2)-MSR example when node is fail.ه generate smaller parity packets of their data.ه forward them to the newcomer.ه the newcomer mix packets to generate two new packets.

0.50.50.50.5

0.5

0.5

0.5

Page 9: Network Coding for Distributed Storage Systems

9

Introduction

ه This paper identifies that there is a optimal tradeoff curve between storage and repair bandwidth.ه smaller storage space => less redundancy => more repair

bandwidth

ه This paper calls codes that lie on this optimal tradeoff curve regenerating codes.

Page 10: Network Coding for Distributed Storage Systems

10

Introduction

ه Minimum-Storage Regenerating (MSR) codes.ه can be efficiently repaired.

ه Minimum-Bandwidth Regenerating (MBR) codes.ه storage node stores slightly more than M/k .ه the repair bandwidth can be reduced.

Page 11: Network Coding for Distributed Storage Systems

11

Outline

ه Introductionه Backgroundه Analysisه Evaluationه Conclusion

Page 12: Network Coding for Distributed Storage Systems

12

Erasure Codes

ه Classical coding theory focuses on the tradeoff between redundancy and error tolerance.

ه In terms of the redundancy-reliability tradeoff, the Maximum Distance Separable (MDS) codes are optimal.ه the most well-known is Reed-Solomon codes.

Page 13: Network Coding for Distributed Storage Systems

13

Network Coding

ه Network coding allows ه the intermediate nodes to generate output data by encoding

previously received input data.ه information to be “mixed” at intermediate nodes.

ه This paper investigates the application of network coding for the repair problem in distributed storage.ه tradeoff between storage and repair network bandwidth

Page 14: Network Coding for Distributed Storage Systems

14

Distributed Storage Systems

ه Erasure codes could reduce bandwidth use by an order of magnitude compared with replication.

ه Hybrid strategy: ه one special storage node maintains one full replica.ه multiple erasure encoded data.ه transfer only M / k bytes for a new encoded data by replica node.ه there is the problem when replica data lost.

Page 15: Network Coding for Distributed Storage Systems

15

Outline

ه Introductionه Backgroundه Analysisه Evaluationه Conclusion

Page 16: Network Coding for Distributed Storage Systems

16

Information Flow Graph

Page 17: Network Coding for Distributed Storage Systems

17

Storage-Bandwidth Tradeoff

ه The normal redundancy we want to maintain requires active storage nodesه each storing α bitsه β bits each from any d surviving nodesه total repair bandwidth is γ = d β

ه For each set of parameters (n, k, d, α, γ), there is a family of information flow graphs, each of which corresponds to a particular evolution of node failures / repairs.

Page 18: Network Coding for Distributed Storage Systems

18

Storage-Bandwidth Tradeoff

ه Denote this family of directed acyclic graphs by

ه (4, 2, 3, 1 Mb, 1.5 Mb) is feasible.

Page 19: Network Coding for Distributed Storage Systems

19

Storage-Bandwidth Tradeoff

ه Theorem 1 : For any α ≥ α*(n, k, d, γ), the points are feasible.

Page 20: Network Coding for Distributed Storage Systems

20

Theorem Proof (1/4)

Page 21: Network Coding for Distributed Storage Systems

21

Theorem Proof (2/4)

ه .

ه .

ه .

ه .

Page 22: Network Coding for Distributed Storage Systems

22

Theorem Proof (3/4)

ه .

ه .

Page 23: Network Coding for Distributed Storage Systems

23

Theorem Proof (4/4)

ه .

ه .

Page 24: Network Coding for Distributed Storage Systems

24

Storage-Bandwidth Tradeoff

ه Code repair can be achieved if and only if the underlying information flow graph has sufficiently large min-cuts.

Page 25: Network Coding for Distributed Storage Systems

25

Storage-Bandwidth Tradeoff

ه Optimal tradeoff curve between storage α and repair bandwidth γه (γ = 1, α = 0.2) (γ = 1, α = 0.1)

Page 26: Network Coding for Distributed Storage Systems

26

Special Cases (1/2)

ه Minimum-Storage Regenerating (MSR) Codes

ه .

ه .

Page 27: Network Coding for Distributed Storage Systems

27

Special Cases (2/2)

ه Minimum-Bandwidth Regenerating (MBR) Codes

ه .

ه .

Page 28: Network Coding for Distributed Storage Systems

28

Outline

ه Introductionه Backgroundه Analysisه Evaluation

ه Node Dynamics and Objectivesه Modelه Quantitative Results

ه Conclusion

Page 29: Network Coding for Distributed Storage Systems

29

Node Dynamics and Objectives (1/2)

ه A permanent failureه the permanent departure of a node from the systemه a disk failure resulting in loss of the data stored on the node

ه A transient failureه node rebootه temporary network disconnection

Page 30: Network Coding for Distributed Storage Systems

30

Node Dynamics and Objectives (2/2)

ه A file is availableه it can be reconstructed from the data stored on currently available

nodes.

ه A file is durabilityه after permanent node failures, it may be available at some point in

the future.

Page 31: Network Coding for Distributed Storage Systems

31

Model (1/5)

ه The model has two key parameters, f and a.ه a fraction f of the nodes storing file data fail permanently per unit

time.ه at any given time, the node storing data is available with some

probability a.

ه The expected availability and maintenance bandwidth of various redundancy schemes can be computed to maintain a file of M bytes.

Page 32: Network Coding for Distributed Storage Systems

32

Model (2/5)

ه Replicationه redundancy R replicasه store total R M bytesه replace f R M bytes per unit timeه the file is unavailable if no replica is available

ى probability

ه Ideal Erasure Codesه n = k R, redundancy R n / kه transfer just M / k bytes each packetه replace f R M bytes per unit timeه unavailability probability

Page 33: Network Coding for Distributed Storage Systems

33

Model (3/5)

ه Hybridه n = k (R− 1)ه store total R M bytesه transfer f R M bytes per unit timeه The file is unavailable if the replica is unavailable and fewer than

k erasure-coded packets are availableى probability

Page 34: Network Coding for Distributed Storage Systems

34

Model (4/5)

ه Minimum-Storage Regenerating Codesه store total R M bytesه redundancy R n / kه replace f R M bytes per unit timeه extra amount of informationه unavailability

Page 35: Network Coding for Distributed Storage Systems

35

Model (5/5)

ه Minimum-Bandwidth Regenerating Codesه store total M n bytesه redundancy R n / kه replace f M n bytes per unit timeه extra amount of informationه unavailability

Page 36: Network Coding for Distributed Storage Systems

36

Estimating f and a

Page 37: Network Coding for Distributed Storage Systems

37

Quantitative Results (1/2)

Page 38: Network Coding for Distributed Storage Systems

38

Quantitative Results (2/2)

Page 39: Network Coding for Distributed Storage Systems

39

Quantitative Comparison

ه Comparison With Hybridه Disadvantage : asymmetric design

ه MBR codesه Disadvantage :

ى reconstruct the entire file, requires communication with n1 nodesى if the reading frequency of a file is sufficiently high and k is sufficiently small,

this inefficiency could become unacceptable.

Page 40: Network Coding for Distributed Storage Systems

40

Outline

ه Introductionه Backgroundه Analysisه Evaluationه Conclusion

Page 41: Network Coding for Distributed Storage Systems

41

Conclusion

ه This paper presented a general theoretic framework that can determine the information.ه communicate to repair failures in encoded systems.ه identify a tradeoff between storage and repair bandwidth.

ه One potential application area for the proposed regenerating codes is distributed archival storage or backup.ه regenerating codes potentially can offer desirable tradeoffs in

terms of redundancy, reliability, and repair bandwidth.