On Coding for Distributed Networked Storage...

57
On Coding for Distributed Networked Storage Systems Fr´ ed´ erique Oggier, Joint work with Anwitaman Datta Nanyang Technological University, Singapore IMS-NTU Workshop on Coding and Cryptography, Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 1 / 34

Transcript of On Coding for Distributed Networked Storage...

Page 1: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

On Coding for Distributed Networked Storage Systems

Frederique Oggier,Joint work with Anwitaman Datta

Nanyang Technological University, Singapore

IMS-NTU Workshop on Coding and Cryptography,Singapore, May 2011

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 1 / 34

Page 2: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Outline

1 Coding for Distributed Networked Storage

2 Self-Repairing Codes: Definition and Constructions

3 Self-Repairing Codes: Analysis and Properties

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 2 / 34

Page 3: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Coding for Distributed Networked Storage

Distributed Networked Storage

A data owner wants to store data over a network of nodes (e.g. datacenter, back-up or archival in peer-to-peer networks).

Redundancy is essential for resilience

Replication: good availability and durability, but very costly.Erasure codes: good trade-off of availability, durability and storagecost.

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 3 / 34

Page 4: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Coding for Distributed Networked Storage

Distributed Networked Storage

A data owner wants to store data over a network of nodes (e.g. datacenter, back-up or archival in peer-to-peer networks).

Redundancy is essential for resilience

Replication: good availability and durability, but very costly.Erasure codes: good trade-off of availability, durability and storagecost.

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 3 / 34

Page 5: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Coding for Distributed Networked Storage

Distributed Networked Storage

A data owner wants to store data over a network of nodes (e.g. datacenter, back-up or archival in peer-to-peer networks).

Redundancy is essential for resilience

Replication: good availability and durability, but very costly.

Erasure codes: good trade-off of availability, durability and storagecost.

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 3 / 34

Page 6: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Coding for Distributed Networked Storage

Distributed Networked Storage

A data owner wants to store data over a network of nodes (e.g. datacenter, back-up or archival in peer-to-peer networks).

Redundancy is essential for resilience

Replication: good availability and durability, but very costly.Erasure codes: good trade-off of availability, durability and storagecost.

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 3 / 34

Page 7: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Coding for Distributed Networked Storage

Erasure codes for storage systems

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 4 / 34

Page 8: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Coding for Distributed Networked Storage

Repair

Nodes may go offline, or may fail, so that the data they storebecomes unavailable.

Redundancy needs to be replenished, else data may be permanentlylost over time (after multiple storage node failures)

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 5 / 34

Page 9: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Coding for Distributed Networked Storage

Repair

Nodes may go offline, or may fail, so that the data they storebecomes unavailable.

Redundancy needs to be replenished, else data may be permanentlylost over time (after multiple storage node failures)

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 5 / 34

Page 10: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Coding for Distributed Networked Storage

Repair process using traditional Erasure Codes

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 6 / 34

Page 11: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Coding for Distributed Networked Storage

Related work

1 J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D.Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C.Wells, and B. Zhao. OceanStore: An Architecture for Global-ScalePersistent Storage, ASPLOS 2000.

2 H. Weatherspoon, J. Kubiatowicz. Erasure Coding Vs. Replication: AQuantitative Comparison, Peer-to-Peer Systems, LNCS, 2002.

3 A. G. Dimakis, P. Brighten Godfrey, M. J. Wainwright, K.Ramchandran, The Benefits of Network Coding for Peer-to-PeerStorage Systems, Netcod 2007.

4 A. Duminuco, E. Biersack, Hierarchical Codes: How to Make ErasureCodes Attractive for Peer-to-Peer Storage Systems, Peer-to-PeerComputing (P2P), 2008.

5 K. V. Rashmi, N. B. Shah, P. V. Kumar and K. Ramchandran, ExplicitConstruction of Optimal Exact Regenerating Codes for DistributedStorage, Allerton Conf. on Control, Computing and Comm., 2009.

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 7 / 34

Page 12: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Coding for Distributed Networked Storage

Related work

1 J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D.Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C.Wells, and B. Zhao. OceanStore: An Architecture for Global-ScalePersistent Storage, ASPLOS 2000.

2 H. Weatherspoon, J. Kubiatowicz. Erasure Coding Vs. Replication: AQuantitative Comparison, Peer-to-Peer Systems, LNCS, 2002.

3 A. G. Dimakis, P. Brighten Godfrey, M. J. Wainwright, K.Ramchandran, The Benefits of Network Coding for Peer-to-PeerStorage Systems, Netcod 2007.

4 A. Duminuco, E. Biersack, Hierarchical Codes: How to Make ErasureCodes Attractive for Peer-to-Peer Storage Systems, Peer-to-PeerComputing (P2P), 2008.

5 K. V. Rashmi, N. B. Shah, P. V. Kumar and K. Ramchandran, ExplicitConstruction of Optimal Exact Regenerating Codes for DistributedStorage, Allerton Conf. on Control, Computing and Comm., 2009.

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 7 / 34

Page 13: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

Outline

1 Coding for Distributed Networked Storage

2 Self-Repairing Codes: Definition and Constructions

3 Self-Repairing Codes: Analysis and Properties

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 8 / 34

Page 14: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

Self-Repairing Codes (SRC)

Motivation: minimize the number of nodes necessary to repair amissing block.

Gain: lower bandwidth consumption, lower computational complexityof repair, possibility for faster and parallel replenishment of lostredundancy.

Self-repairing codes are (n, k) codes such that

encoded fragments can be repaired directly from other subsets ofencoded fragments.a fragment can be repaired from a fixed number of encoded fragments,independently of which specific blocks are missing(analogous to erasure codes supporting reconstruction using any n − klosses, independently of which).

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 9 / 34

Page 15: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

Self-Repairing Codes (SRC)

Motivation: minimize the number of nodes necessary to repair amissing block.

Gain: lower bandwidth consumption, lower computational complexityof repair, possibility for faster and parallel replenishment of lostredundancy.

Self-repairing codes are (n, k) codes such that

encoded fragments can be repaired directly from other subsets ofencoded fragments.a fragment can be repaired from a fixed number of encoded fragments,independently of which specific blocks are missing(analogous to erasure codes supporting reconstruction using any n − klosses, independently of which).

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 9 / 34

Page 16: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

Self-Repairing Codes (SRC)

Motivation: minimize the number of nodes necessary to repair amissing block.

Gain: lower bandwidth consumption, lower computational complexityof repair, possibility for faster and parallel replenishment of lostredundancy.

Self-repairing codes are (n, k) codes such that

encoded fragments can be repaired directly from other subsets ofencoded fragments.a fragment can be repaired from a fixed number of encoded fragments,independently of which specific blocks are missing(analogous to erasure codes supporting reconstruction using any n − klosses, independently of which).

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 9 / 34

Page 17: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

Self-Repairing Codes (SRC)

Motivation: minimize the number of nodes necessary to repair amissing block.

Gain: lower bandwidth consumption, lower computational complexityof repair, possibility for faster and parallel replenishment of lostredundancy.

Self-repairing codes are (n, k) codes such that

encoded fragments can be repaired directly from other subsets ofencoded fragments.

a fragment can be repaired from a fixed number of encoded fragments,independently of which specific blocks are missing(analogous to erasure codes supporting reconstruction using any n − klosses, independently of which).

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 9 / 34

Page 18: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

Self-Repairing Codes (SRC)

Motivation: minimize the number of nodes necessary to repair amissing block.

Gain: lower bandwidth consumption, lower computational complexityof repair, possibility for faster and parallel replenishment of lostredundancy.

Self-repairing codes are (n, k) codes such that

encoded fragments can be repaired directly from other subsets ofencoded fragments.a fragment can be repaired from a fixed number of encoded fragments,independently of which specific blocks are missing(analogous to erasure codes supporting reconstruction using any n − klosses, independently of which).

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 9 / 34

Page 19: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

Self-Repairing Codes (a black-box view)

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 10 / 34

Page 20: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

Homomorphic SRC (HSRC)

A first instance of self-repairing code.

Self-repairing Homomorphic Codes for Distributed Storage SystemsF. Oggier, A. Datta, INFOCOM 2011

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 11 / 34

Page 21: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

Preliminaries: Weakly linearized polynomials

A weakly linearized polynomial p(X ) over Fq, q = 2m, has the form

p(X ) =k−1∑i=0

piX2i , pi ∈ Fq.

Let a, b ∈ F2m and let p(X ) be a weakly linearized polynomial given

by p(X ) =∑k−1

i=0 piX2i . We have

p(a + b) = p(a) + p(b).

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 12 / 34

Page 22: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

HSRC: Encoding

1 Take an object o of length M:

o = (o1, . . . , ok), oi ∈ F2M/k .

2 Take a linearized polynomial with coefficients in F2M/k

p(X ) =k−1∑i=0

piX2i ,

and encode the k fragments pi = oi+1, i = 0, . . . , k − 1.

3 Evaluate p(X ) in n non-zero values α1, . . . , αn of F2M/k to get an-dimensional codeword

(p(α1), . . . , p(αn)),

and each p(αi ) is given to node i for storage.

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 13 / 34

Page 23: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

HSRC: Encoding Illustration

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 14 / 34

Page 24: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

HSRC: Decoding and Repair

1 Decoding is ensured by Lagrange interpolation.

2 Repair: Express αi in a F2-basis B = {b1, . . . , bM/k} of F2M/k , then

αi =

M/k∑j=1

αijbj , αij ∈ F2 ⇒ p(αi ) =

M/k∑j=1

αijp(bj).

3 Computational cost of a repair: XORs.

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 15 / 34

Page 25: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

HSRC: Decoding and Repair

1 Decoding is ensured by Lagrange interpolation.

2 Repair: Express αi in a F2-basis B = {b1, . . . , bM/k} of F2M/k , then

αi =

M/k∑j=1

αijbj , αij ∈ F2 ⇒ p(αi ) =

M/k∑j=1

αijp(bj).

3 Computational cost of a repair: XORs.

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 15 / 34

Page 26: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

HSRC: Decoding and Repair

1 Decoding is ensured by Lagrange interpolation.

2 Repair: Express αi in a F2-basis B = {b1, . . . , bM/k} of F2M/k , then

αi =

M/k∑j=1

αijbj , αij ∈ F2 ⇒ p(αi ) =

M/k∑j=1

αijp(bj).

3 Computational cost of a repair: XORs.

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 15 / 34

Page 27: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

HSRC: A toy example (I)

A data file o = (o1, . . . , o12) of M = 12 bits is cut into k = 3fragments (M/k = 4)

o1 = (o1, . . . , o4), o2 = (o5, . . . , o8), o3 = (o9, . . . , o12) ∈ F24 .

F∗24 = 〈w〉, with w4 = w + 1. Encode with the polynomial

p(X ) =4∑

i=1

oiwiX +

4∑i=1

oi+4wiX 2 +

4∑i=1

oi+8wiX 4.

For n = 7, evaluate p(X ) at say 1,w ,w2,w4,w5,w8,w10. We get:

(p(1), p(w), p(w2), p(w4), p(w5), p(w8), p(w10))

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 16 / 34

Page 28: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

HSRC: A toy example (I)

A data file o = (o1, . . . , o12) of M = 12 bits is cut into k = 3fragments (M/k = 4)

o1 = (o1, . . . , o4), o2 = (o5, . . . , o8), o3 = (o9, . . . , o12) ∈ F24 .

F∗24 = 〈w〉, with w4 = w + 1. Encode with the polynomial

p(X ) =4∑

i=1

oiwiX +

4∑i=1

oi+4wiX 2 +

4∑i=1

oi+8wiX 4.

For n = 7, evaluate p(X ) at say 1,w ,w2,w4,w5,w8,w10. We get:

(p(1), p(w), p(w2), p(w4), p(w5), p(w8), p(w10))

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 16 / 34

Page 29: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

HSRC: A toy example (I)

A data file o = (o1, . . . , o12) of M = 12 bits is cut into k = 3fragments (M/k = 4)

o1 = (o1, . . . , o4), o2 = (o5, . . . , o8), o3 = (o9, . . . , o12) ∈ F24 .

F∗24 = 〈w〉, with w4 = w + 1. Encode with the polynomial

p(X ) =4∑

i=1

oiwiX +

4∑i=1

oi+4wiX 2 +

4∑i=1

oi+8wiX 4.

For n = 7, evaluate p(X ) at say 1,w ,w2,w4,w5,w8,w10. We get:

(p(1), p(w), p(w2), p(w4), p(w5), p(w8), p(w10))

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 16 / 34

Page 30: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

HSRC: A toy example (II)

missing pairs to reconstruct missing fragment(s)fragment(s)

p(1) (p(w), p(w4));(p(w2), p(w8));(p(w5), p(w10))p(w) (p(1), p(w4));(p(w2), p(w5));(p(w8), p(w10))p(w2) (p(1), p(w8));(p(w), p(w5));(p(w4), p(w10))

p(1) and (p(w2), p(w8)) or (p(w5), p(w10)) for p(1)p(w) (p(w8), p(w10)) or (p(w2), p(w5)) for p(w)

p(1) and (p(w5), p(w10)) for p(1)p(w) and (p(w8), p(w10)) for p(w)p(w2) (p(w4), p(w10)) for p(w2)

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 17 / 34

Page 31: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

Self-Repairing Codes from Projective Geometry (PSRC)

A second instance of self-repairing code.

Self-Repairing Codes for Distributed Storage - A Projective GeometricConstruction, F. Oggier, A. Datta, preprint 2011

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 18 / 34

Page 32: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

Preliminaries: Spreads

Consider a vector space of dimension m over Fq, namely, a projectivespace PG (m − 1, q).

Let P be a projective space. A t-spread of P is a set S oft-dimensional subspaces of P which partitions P.

Theorem (Andre)

In PG(m − 1, q), a t-spread exists if and only if t + 1| m.

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 19 / 34

Page 33: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

Preliminaries: Spreads

Consider a vector space of dimension m over Fq, namely, a projectivespace PG (m − 1, q).

Let P be a projective space. A t-spread of P is a set S oft-dimensional subspaces of P which partitions P.

Theorem (Andre)

In PG(m − 1, q), a t-spread exists if and only if t + 1| m.

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 19 / 34

Page 34: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

Spreads from Field Extensions

Suppose that t + 1| m. Consider the finite fields F0 = Fq, F1 = Fqt+1

and F2 = Fqm .

Then F0 ⊆ F1 ⊆ F2. The field F2 is an m-dimensional vector space Vover F0.

The subspaces of V form the projective space P=PG(m, q). The fieldF1 is a (t + 1)-dimensional subspace of V and hence a t-dimensional(projective) subspace of P.

The same holds for all cosets aF1, (a ∈ F2). These cosets partitionthe multiplicative group of F2. Hence they form a t-spread of P.

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 20 / 34

Page 35: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

Spreads from Field Extensions

Suppose that t + 1| m. Consider the finite fields F0 = Fq, F1 = Fqt+1

and F2 = Fqm .

Then F0 ⊆ F1 ⊆ F2. The field F2 is an m-dimensional vector space Vover F0.

The subspaces of V form the projective space P=PG(m, q). The fieldF1 is a (t + 1)-dimensional subspace of V and hence a t-dimensional(projective) subspace of P.

The same holds for all cosets aF1, (a ∈ F2). These cosets partitionthe multiplicative group of F2. Hence they form a t-spread of P.

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 20 / 34

Page 36: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

Spreads from Field Extensions

Suppose that t + 1| m. Consider the finite fields F0 = Fq, F1 = Fqt+1

and F2 = Fqm .

Then F0 ⊆ F1 ⊆ F2. The field F2 is an m-dimensional vector space Vover F0.

The subspaces of V form the projective space P=PG(m, q). The fieldF1 is a (t + 1)-dimensional subspace of V and hence a t-dimensional(projective) subspace of P.

The same holds for all cosets aF1, (a ∈ F2). These cosets partitionthe multiplicative group of F2. Hence they form a t-spread of P.

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 20 / 34

Page 37: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

Spreads from Field Extensions

Suppose that t + 1| m. Consider the finite fields F0 = Fq, F1 = Fqt+1

and F2 = Fqm .

Then F0 ⊆ F1 ⊆ F2. The field F2 is an m-dimensional vector space Vover F0.

The subspaces of V form the projective space P=PG(m, q). The fieldF1 is a (t + 1)-dimensional subspace of V and hence a t-dimensional(projective) subspace of P.

The same holds for all cosets aF1, (a ∈ F2). These cosets partitionthe multiplicative group of F2. Hence they form a t-spread of P.

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 20 / 34

Page 38: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

PSRC: Encoding

1 For an object o of size M, consider the finite field F2M .

2 Consider a t-spread S formed of t-dimensional subspaces of P suchthat t + 1|M. Set α = t + 1. Assign to each node an F2-basiscontaining α vectors. The number of storage nodes is (at most)

n =2M − 1

2α − 1.

3 The ith node will actually store

{ovTiα+1, . . . , ov(i+1)α}

for a total storage of α.

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 21 / 34

Page 39: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

PSRC: Decoding and Repair

1 Decoding is solving a system of linear equations.

2 Repair The lth node Nl stores ν lF∗2α , l = 1, . . . , n. Let us assume this

lth node fails, and a new comer Ni joins. Contact the jth node Nj

such that ν j = ν i + ν l . By combining the data stored at node Ni andNj , we get

ν iF∗2α

∐(ν i + ν l)F∗

which contains ν lF∗2α .

Lemma

For any choice of node Ni among the remaining n − 1 live nodes, thereexists at least one node Nj such that Nl can be repaired by downloadingthe data stored at nodes Ni and Nj .

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 22 / 34

Page 40: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

PSRC: Decoding and Repair

1 Decoding is solving a system of linear equations.

2 Repair The lth node Nl stores ν lF∗2α , l = 1, . . . , n. Let us assume this

lth node fails, and a new comer Ni joins. Contact the jth node Nj

such that ν j = ν i + ν l . By combining the data stored at node Ni andNj , we get

ν iF∗2α

∐(ν i + ν l)F∗

which contains ν lF∗2α .

Lemma

For any choice of node Ni among the remaining n − 1 live nodes, thereexists at least one node Nj such that Nl can be repaired by downloadingthe data stored at nodes Ni and Nj .

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 22 / 34

Page 41: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Definition and Constructions

PSRC: A toy example

node basis vectors data stored

N1 v1 = (1000), v2 = (0110) {o1, o2 + o3}N2 v3 = (0100), v4 = (0011) {o2, o3 + o4}N3 v5 = (0010), v6 = (1101) {o3, o1 + o2 + o4}N4 v7 = (0001), v8 = (1010) {o4, o1 + o3}N5 v9 = (1100), v10 = (0101) {o1 + o2, o2 + o4}

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 23 / 34

Page 42: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Analysis and Properties

Outline

1 Coding for Distributed Networked Storage

2 Self-Repairing Codes: Definition and Constructions

3 Self-Repairing Codes: Analysis and Properties

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 24 / 34

Page 43: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Analysis and Properties

Static resilience

There is at least one pair to repair a node, for up to (n − 1)/2simultaneous failures

Static resilience of a distributed storage system is the probability thatan object stored in the system stays available without any furthermaintenance, even when a fraction of nodes become unavailable.

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 25 / 34

Page 44: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Analysis and Properties

Static resilience: HSRC versus EC

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

pfrag

p obj

sim SRC(31,5)ana SRC(31,5)sim SRC(15,4)ana SRC(15,4)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

pfrag

p obj

EC(63,5)SRC(63,5)EC(31,5)SRC(31,5)

Figure: Static resilience of self-repairing codes (SRC): Validation of analysis, andcomparison with erasure codes (EC)

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 26 / 34

Page 45: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Analysis and Properties

Static resilience: PSRC versus EC

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

pnode

p obj

PSRC(21,4)EC(21,4)

Figure: Static resilience of self-repairing codes (HSRC): Comparison with erasurecodes (EC)

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 27 / 34

Page 46: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Analysis and Properties

More on Resilience: HSRC versus EC

0 5 10 15 20 25 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

ρ x

HSRC(31,5)EC(31,5)

Figure: Static resilience of self-repairing codes (HSRC): Comparison with erasurecodes (EC)

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 28 / 34

Page 47: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Analysis and Properties

More on Resilience: PSRC versus EC

0 2 4 6 8 10 12 14 16 18 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

ρ x

PSRC(21,4)EC(21,4)

Figure: Static resilience of self-repairing codes (HSRC): Comparison with erasurecodes (EC)

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 29 / 34

Page 48: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Analysis and Properties

Fast & parallel repairs using HSRC: A toy example

Consider:

(15,4) code, nodes storing p(w i ) for i = 0, 1, 2, 3, 4, 5, 6 are missingNodes have upload/download bandwidth limit: one block per time unit

Possible pairs to repair each missing block:

fragment suitable pairs to reconstruct

p(1) (p(w 7), p(w 9));(p(w 11), p(w 12))p(w) (p(w 7), p(w 14));(p(w 8), p(w 10))p(w 2) (p(w 7), p(w 12));(p(w 9), p(w 11));(p(w 12), p(w 10))p(w 3) (p(w 8), p(w 13));(p(w 10), p(w 12))p(w 4) (p(w 9), p(w 14));(p(w 11), p(w 13))p(w 5) (p(w 7), p(w 13));(p(w 12), p(w 14))p(w 6) (p(w 7), p(w 10));(p(w 8), p(w 14))

A parallelized schedule:node p(w 0) p(w 1) p(w 2) p(w 3) p(w 4) p(w 5) p(w 6)

Time 1 p(w 7) p(w 8) p(w 9) p(w 13) p(w 11) p(w 12) p(w 10)Time 2 p(w 9) p(w 10) p(w 11) p(w 8) p(w 13) p(w 14) p(w 7)

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 30 / 34

Page 49: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Analysis and Properties

Fast & parallel repairs using HSRC: A toy example

Consider:

(15,4) code, nodes storing p(w i ) for i = 0, 1, 2, 3, 4, 5, 6 are missingNodes have upload/download bandwidth limit: one block per time unit

Possible pairs to repair each missing block:

fragment suitable pairs to reconstruct

p(1) (p(w 7), p(w 9));(p(w 11), p(w 12))p(w) (p(w 7), p(w 14));(p(w 8), p(w 10))p(w 2) (p(w 7), p(w 12));(p(w 9), p(w 11));(p(w 12), p(w 10))p(w 3) (p(w 8), p(w 13));(p(w 10), p(w 12))p(w 4) (p(w 9), p(w 14));(p(w 11), p(w 13))p(w 5) (p(w 7), p(w 13));(p(w 12), p(w 14))p(w 6) (p(w 7), p(w 10));(p(w 8), p(w 14))

A parallelized schedule:node p(w 0) p(w 1) p(w 2) p(w 3) p(w 4) p(w 5) p(w 6)

Time 1 p(w 7) p(w 8) p(w 9) p(w 13) p(w 11) p(w 12) p(w 10)Time 2 p(w 9) p(w 10) p(w 11) p(w 8) p(w 13) p(w 14) p(w 7)

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 30 / 34

Page 50: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Analysis and Properties

Fast & parallel repairs using HSRC: A toy example

Consider:

(15,4) code, nodes storing p(w i ) for i = 0, 1, 2, 3, 4, 5, 6 are missingNodes have upload/download bandwidth limit: one block per time unit

Possible pairs to repair each missing block:

fragment suitable pairs to reconstruct

p(1) (p(w 7), p(w 9));(p(w 11), p(w 12))p(w) (p(w 7), p(w 14));(p(w 8), p(w 10))p(w 2) (p(w 7), p(w 12));(p(w 9), p(w 11));(p(w 12), p(w 10))p(w 3) (p(w 8), p(w 13));(p(w 10), p(w 12))p(w 4) (p(w 9), p(w 14));(p(w 11), p(w 13))p(w 5) (p(w 7), p(w 13));(p(w 12), p(w 14))p(w 6) (p(w 7), p(w 10));(p(w 8), p(w 14))

A parallelized schedule:node p(w 0) p(w 1) p(w 2) p(w 3) p(w 4) p(w 5) p(w 6)

Time 1 p(w 7) p(w 8) p(w 9) p(w 13) p(w 11) p(w 12) p(w 10)Time 2 p(w 9) p(w 10) p(w 11) p(w 8) p(w 13) p(w 14) p(w 7)

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 30 / 34

Page 51: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Analysis and Properties

Systematic Object Retrieval using PSRC: A toy example

node basis vectors data stored

N1 v1 = (1000), v2 = (0110) {o1, o2 + o3}N2 v3 = (0100), v4 = (0011) {o2, o3 + o4}N3 v5 = (0010), v6 = (1101) {o3, o1 + o2 + o4}N4 v7 = (0001), v8 = (1010) {o4, o1 + o3}N5 v9 = (1100), v10 = (0101) {o1 + o2, o2 + o4}

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 31 / 34

Page 52: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Analysis and Properties

Future/ongoing work

Efficient decoding, other instances of SRC

Implementation & integration in a distributed storage system

Various systems/algorithmic issues: Topology optimized placement,repair scheduling

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 32 / 34

Page 53: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Analysis and Properties

Wrap Up

Design of codes for distributed networked storage

Self-Repairing Codes

New research topic in coding theory!

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 33 / 34

Page 54: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Analysis and Properties

Wrap Up

Design of codes for distributed networked storage

Self-Repairing Codes

New research topic in coding theory!

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 33 / 34

Page 55: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Analysis and Properties

Wrap Up

Design of codes for distributed networked storage

Self-Repairing Codes

New research topic in coding theory!

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 33 / 34

Page 56: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Analysis and Properties

Q&A

More information:http://sands.sce.ntu.edu.sg/CodingForNetworkedStorage/

Contact: {frederique,anwitaman}@ntu.edu.sg

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 34 / 34

Page 57: On Coding for Distributed Networked Storage Systemssands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/talkS...Singapore, May 2011 F. Oggier (NTU) Coding for Storage IMS-NTU Workshop

Self-Repairing Codes: Analysis and Properties

Q&A

More information:http://sands.sce.ntu.edu.sg/CodingForNetworkedStorage/

Contact: {frederique,anwitaman}@ntu.edu.sg

F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 34 / 34