Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

57
Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding Yuchong Hu, Yinlong Xu, Xiaozhao Wang, Cheng Zhan and Pei Li IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 28, NO. 2, FEBRUARY 2010

description

Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding. Yuchong Hu, Yinlong Xu, Xiaozhao Wang, Cheng Zhan and Pei Li IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL . 28, NO. 2, FEBRUARY 2010. Outline. Introduction Problem Statement - PowerPoint PPT Presentation

Transcript of Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

Page 1: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

Cooperative Recovery of Distributed Storage Systems from Multiple Losses

with Network CodingYuchong Hu, Yinlong Xu, Xiaozhao Wang, Cheng Zhan and Pei Li

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 28, NO. 2, FEBRUARY 2010

Page 2: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

2

Outline

• Introduction• Problem Statement• Mutually Cooperative Recovery(MCR)• MCR Transmission And Coding Schemes• Conclusion

Page 3: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

Introduction

• The recovery from multiple node failures in distributed storage systems.

• We design a mutually cooperative recovery (MCR) mechanism for multiple node failures.

Page 4: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

4

Introduction

• Dimakis et al.[8][10] prove that the file reconstruction problem in distributed storage systems is equivalent to the multicasting problem.

• Two symmetric mechanisms for maintaining redundancy– Minimum-storage regenerating (MSR) codes– Minimum-bandwidth regenerating (MBR) codes

Page 5: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

5

Introduction

• In MCR all the new nodes repair the lost data cooperatively and simultaneously.

• Will prove that there exists a random linear coding with the minimal maintenance bandwidth in MCR while keeping the strong-MDS Property.

Page 6: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

Outline

• Introduction• Problem Statement• Mutually Cooperative Recovery(MCR)• MCR Transmission And Coding Schemes• Conclusion

Page 7: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

7

Problem Statement

• An identical storage capability.• Communications between any two nodes are

symmetric in a distributed storage system.• Original file is (n, k) MDS encoded• The n encoded fragments are stored evenly at n

nodes chosen from the system.• When r nodes become unavailable, the system

chooses another r nodes to repair

Page 8: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

8

Problem Statement• As time goes by, the source node may leave the storage

network.– define a virtual source (V S)

• Initial set of n nodes , . . ., – Each node stores one encoded fragment.– The destination node D can connect to any k nodes to

download k fragments for the file reconstruction.• When r storage nodes become unavailable– , . . . , (named “old nodes”)– , . . . , (named “new nodes”)

Page 9: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

Outline• Introduction• Problem Statement• Mutually Cooperative Recovery(MCR)– Model based on MCR– Assumptions in MCR– Information flow graph G(n, k, r, β)– Lower bound of maintenance bandwidth– Comparisons of MCR, MSR and MBR

• MCR Transmission And Coding Schemes• Conclusion

Page 10: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

10

Mutually Cooperative Recovery(MCR)

• Our study bases on the assumption that all the new nodes can mutually cooperatively complete their fragment reconstructions.

• DSN(n, k, r).

Page 11: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

Outline• Introduction• Problem Statement• Mutually Cooperative Recovery(MCR)– Model based on MCR– Assumptions in MCR– Information flow graph G(n, k, r, β)– Lower bound of maintenance bandwidth– Comparisons of MCR, MSR and MBR

• MCR Transmission And Coding Schemes• Conclusion

Page 12: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

12

Model based on MCR

• The repair process of our mutually cooperative recovery in DSN(n, k, r) is specified as follows.

Page 13: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

13

Model based on MCR

Page 14: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

Outline• Introduction• Problem Statement• Mutually Cooperative Recovery(MCR)– Model based on MCR– Assumptions in MCR– Information flow graph G(n, k, r, β)– Lower bound of maintenance bandwidth– Comparisons of MCR, MSR and MBR

• MCR Transmission And Coding Schemes• Conclusion

Page 15: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

15

Assumptions in MCR

• Assuming that βi,j and β´j,j is the same as β.• The total bandwidth overhead for the recovery is

Page 16: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

Outline• Introduction• Problem Statement• Mutually Cooperative Recovery(MCR)– Model based on MCR– Assumptions in MCR– Information flow graph G(n, k, r, β)– Lower bound of maintenance bandwidth– Comparisons of MCR, MSR and MBR

• MCR Transmission And Coding Schemes• Conclusion

Page 17: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

17

Information flow graph G(n, k, r, β)

• An information flow graph G(n, k, r, β), a similar idea in [8].

• Nodes in G(n, k, r, β)• Edges in G(n, k, r, β)

Page 18: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

18

Information flow graph G(n, k, r, β)

Page 19: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

Outline• Introduction• Problem Statement• Mutually Cooperative Recovery(MCR)– Model based on MCR– Assumptions in MCR– Information flow graph G(n, k, r, β)– Lower bound of maintenance bandwidth– Comparisons of MCR, MSR and MBR

• MCR Transmission And Coding Schemes• Conclusion

Page 20: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

20

Lower bound of maintenance bandwidth

• Finding the lower bound of β by studying the capacity of the min-cut of G(n, k, r, β).

• To keep the (n, k) MDS property in DSN(n, k, r)– each of the capacities of min-cuts in all the

possible information flow graphs must be ≥ the original file size M bytes.

Page 21: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

21

Lower bound of maintenance bandwidth

• Lemma 1– To keep (n, k) MDS property in DSN(n, k, r), min-

cuts separating V S from D in all possible information flow graphs G(n, k, r, β) must be not smaller than M bytes.

Page 22: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

22

Lower bound of maintenance bandwidth

• Lemma 2– Let (S, ) be the cut of G(n, k, r, β), where V S S∈ , D

∈ .– If β ≥ M/[k(n − k)], each of the capacities of min-

cuts of all possible G(n, k, r, β) is not smaller than M bytes.

Page 23: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

23

Lower bound of maintenance bandwidth

• Proof Lemma 2• Assume that D connects to t new nodes and k −t old

nodes for the file reconstruction.– Case 1 : is in . ()– Case 2 : , . ()– Case 3 : , but are in S. ()

Page 24: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

24

Lower bound of maintenance bandwidth

Page 25: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

25

Lower bound of maintenance bandwidth

Part 1: The sum of capacities of edges from to for in S and is (M/k) .

Page 26: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

26

Lower bound of maintenance bandwidth

Part 2: The sum of capacities of edges from to for in S and is β(n−r−)

Page 27: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

27

Lower bound of maintenance bandwidth

Part 3: The sum of capacities of edges between and for in S and is β(r−) .

Page 28: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

28

Lower bound of maintenance bandwidth

Part 4: The sum of capacities of edges from to for in S and is (M/k) .

Page 29: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

29

Lower bound of maintenance bandwidth

We analyze c(S, ) for two cases of (M/k − β.

Page 30: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

30

Lower bound of maintenance bandwidth

• Case 1 : M/k − β ≥ 0

Page 31: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

31

Lower bound of maintenance bandwidth

• Case 1 : M/k − β ≥ 0

• It means that there are some cases where the capacity of the min-cut is equal to M for β = M/[k(n−k)].

Page 32: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

32

Lower bound of maintenance bandwidth

• Case 2 : M/k − β ≤ 0

• n-r ≥ k implies c(S, ) ≥ M

Page 33: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

33

Lower bound of maintenance bandwidth

• By Case 1 and Case 2, Lemma 2 concludes– Let (S, ) be the cut of G(n, k, r, β), where V S S∈ , D

∈ .– If β ≥ M/[k(n − k)], each of the capacities of min-

cuts of all possible G(n, k, r, β) is not smaller than M bytes.

Page 34: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

34

Lower bound of maintenance bandwidth

• Lemma 3:– there exists a random linear network coding

scheme guaranteeing that D can reconstruct the original file for any connection choice.

– with a probability that can be driven arbitrarily to 1 by increasing the field size of F.

Page 35: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

35

Lower bound of maintenance bandwidth

• Theorem 1:– the (n, k) MDS property is still kept after the

recovery if β is not smaller than M/[(n − k)k].• Proof by Lemma 2 and Lemma 3.

Page 36: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

Outline• Introduction• Problem Statement• Mutually Cooperative Recovery(MCR)– Model based on MCR– Assumptions in MCR– Information flow graph G(n, k, r, β)– Lower bound of maintenance bandwidth– Comparisons of MCR, MSR and MBR

• MCR Transmission And Coding Schemes• Conclusion

Page 37: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

37

Comparisons of MCR, MSR and MBR

• MCR– Each node stores α = M/k bytes.– And r nodes become unavailable.– Each of r new node downloads β bytes from each

of any d available nodes.• The storage cost = (M/k)*n• Maintenance bandwidth = [(n − 1)/(n − k)]*(M/k)*r

Page 38: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

38

Comparisons of MCR, MSR and MBR

• = (n-r)/k, = n/k.• (i.e. n = 6k and r = 4k), when 4k of 6k original nodes

become unavailable, the multi-loss recovery is triggered

• Setting d as the maximum d = n − r to compare MCR with the best tradeoff of MSR and MBR.

Page 39: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

39

Comparisons of MCR, MSR and MBR

• Compared with MSR– maintenance

bandwidth : 22%.– storage cost : same .

• Compared with MBR– maintenance

bandwidth : same.– storage cost : 23%.

Page 40: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

40

Comparisons of MCR, MSR and MBR

• Compared with MSR– maintenance

bandwidth : 23%.– storage cost : same .

• Compared with MBR– maintenance

bandwidth : 11%.– storage cost : 23%.

Page 41: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

41

Comparisons of MCR, MSR and MBR

• We can conclude that MCR has a better performance in the storage cost and maintenance bandwidth in multi-loss recovery of distributed storage systems compared with other non-cooperative recovery mechanisms.

Page 42: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

Outline

• Introduction• Problem Statement• Mutually Cooperative Recovery(MCR)• MCR Transmission And Coding Schemes– Transmission scheme in MCR– Coding scheme in MCR

• Conclusion

Page 43: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

43

MCR Transmission And Coding Schemes

• Theorem 1 gives a lower bound of maintenance bandwidth with β = M/[k(n − k)].

• Constructing a recovery transmission scheme in MCR based on β = M/[k(n − k)] and a linear coding scheme based on Strong-MDS code.

Page 44: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

Outline

• Introduction• Problem Statement• Mutually Cooperative Recovery(MCR)• MCR Transmission And Coding Schemes– Transmission scheme in MCR– Coding scheme in MCR

• Conclusion

Page 45: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

45

Transmission scheme in MCR

• To satisfy β = M/[k(n − k)], the original file of size M is represented as k(n − k) packets in MCR, each of size M/[k(n − k)].

• Assumptions:– The entries of , , are randomly selected from a

finite field F in the following scheme.– The file can be viewed as k(n − k) packets, each of

size z.

Page 46: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

46

Transmission scheme in MCR

• (1) File distribution– 1.1) Choose a set of n nodes , . . . , from idle nodes

for a file distribution.– 1.2)Encode the k(n − k) packets of the original file

into n(n − k) packets.– 1.3) Each initial node stores n − k packets as a

fragment

Page 47: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

47

Transmission scheme in MCR• (2) Data recovery from r failed nodes– 2.1) Choose a set of r new nodes , . . . , from idle nodes

for repairing.– 2.2) Each old node transmits one encoded packet to

new node – 2.3) Each new node transmits one encoded packet to

each of the other new nodes.– 2.4) Each new node encodes n − 1 accepted packets

into n − k linearly independent packets– 2.5) Each new node stores n − k packets as a fragment.

Page 48: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

Outline

• Introduction• Problem Statement• Mutually Cooperative Recovery(MCR)• MCR Transmission And Coding Schemes– Transmission scheme in MCR– Coding scheme in MCR

• Conclusion

Page 49: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

49

Coding scheme in MCR

• (n, k) Strong-MDS code:– An original file is divided into k(n − k) packets and

encoded into n(n − k) packets.– The n(n − k) encoded packets are stored at n

nodes.– each node storing n − k encoded packets.

• Each node can select encoded packets such that the original file can be reconstructed from the k(n − k) selected packets.

0 ≤ ≤ n − k,

Page 50: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

50

Coding scheme in MCR

• Coding scheme in MCR via linear coding is based on the following Lemma 4.

• Lemma 4 (Schwartz-Zippel Theorem)[11]• Theorem 2 to show that if a file in a distributed

storage system is Strong-MDS encoded, it will still satisfy Strong-MDS Property after multiloss recovery in MCR.

[11] R. Motwani and P. Raghavan, Randomized Algorithms. Cambridge University Press, 1995.

Page 51: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

51

Coding scheme in MCR

• Theorem 2:– Let an (n, k) Strong-MDS encoded file be

distributed at , , …, .– After , … , fail.– The lost data are recovered at, , …, – The file stored at , , …, and , , …, will still satisfy

(n, k) Strong-MDS Property– with a probability that can be driven arbitrarily to

1 by increasing the field size of F.

Page 52: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

52

Coding scheme in MCR

• To prove that the file in the storage system still satisfies (n, k) Strong-MDS Property after the recovery from multiple losses.– it is be equivalent to showing that

• is a matrix that corresponding to old or new node mapping.

Page 53: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

53

Coding scheme in MCR

• Lemma 5 to show that there is an assignment such that the left of (10) is a non-zero polynomial.– is equivalent to whether and are transmissive

– is a row vector that corresponding to old or new node mapping.

Page 54: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

54

Coding scheme in MCR

• By lemma 4 and lemma 5, Theorem 2 holds.– A file in a distributed storage system with MCR

transmission scheme will keep (n, k) strong-MDS property with a probability that can be driven arbitrarily to 1 by increasing the field size of F.

– Original file can be reconstructed after multiple losses with a probability that can be driven arbitrarily to 1 by increasing the field size of F.

Page 55: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

Outline

• Introduction• Problem Statement• Mutually Cooperative Recovery(MCR)• MCR Transmission And Coding Schemes• Conclusion

Page 56: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

56

Conclusion

• This paper gave the tight lower bound of maintenance bandwidth.

• designed a MCR transmission scheme matching the tight lower bound.

• design a MCR coding scheme, presented a strong-MDS code and showed the existence of a random linear strong-MDS code with a sufficient large finite field.

Page 57: Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

57

Conclusion

• The decoding complexity of our random linear coding scheme is expensive

• The future work is to design a determinate coding algorithm to reduce the decoding complexity.