Network Coding for Distributed Storage Systems(Group Meeting Talk)

68
Network Coding for Distributed Storage Systems* Presented by Jayant Apte ASPITRG 7/9/13 & 7/11/13 *Dimakis, A.G.; Godfrey, P.B.; Wu, Y.; Wainwright, M.J.; Ramchandran, K. "Network Coding for Distributed Storage Systems", Information Theory, IEEE Transactions on, On page(s): 4539 – 4551 Volume: 56, Issue: 9, Sept. 2010

description

Reviews work of Koetter et al. and Dimakis et al. The former provides an algebraic framework for linear network coding. The latter reduces the so called repair problem to single-source multicast network-coding problem and shows that there is a tradeoff between amount of data stored in a distributed sturage system and amount of data transfer required to repair the system if a node(hard-drive) fails.

Transcript of Network Coding for Distributed Storage Systems(Group Meeting Talk)

Page 1: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Network Coding for Distributed Storage Systems*

Presented byJayant ApteASPITRG

7/9/13 & 7/11/13

*Dimakis, A.G.; Godfrey, P.B.; Wu, Y.; Wainwright, M.J.; Ramchandran, K. "Network Coding for Distributed Storage Systems", Information Theory, IEEE Transactions on, On page(s): 4539 – 4551 Volume: 56, Issue: 9, Sept. 2010

Page 2: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Outline

● Part 1– Single Source Multi-cast Linear Network Coding

● Part 2– The repair problem

– Reduction of repair problem to single source multicast network

– Family of single source multi-cast networks arising from the reduction

– A lower bound on min-cuts(i.e. An upper bound on max-flow and hence coding capacity of network)

– Minimization of storage bandwidth subject to this lower bound

Page 3: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Some background on single source multi-cast network coding

*Koetter, R.; Medard, M., "An algebraic approach to network coding," Networking, IEEE/ACM Transactions on , vol.11, no.5, pp.782,795, Oct. 2003

Page 4: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Some background on single source multi-cast network coding

*Koetter, R.; Medard, M., "An algebraic approach to network coding," Networking, IEEE/ACM Transactions on , vol.11, no.5, pp.782,795, Oct. 2003

Page 5: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Max-Flow-Min-Cut Theorem

Page 6: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Max-Flow-Min-Cut Theorem

Page 7: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Max-Flow-Min-Cut Theorem

Page 8: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Some background on single source multi-cast network coding

*Koetter, R.; Medard, M., "An algebraic approach to network coding," Networking, IEEE/ACM Transactions on , vol.11, no.5, pp.782,795, Oct. 2003

Page 9: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Basic Network Model

Page 10: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Basic Network Model

Page 11: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Local coding coefficients

Page 12: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Global coding coefficients

Page 13: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Matrix formulation

Page 14: Network Coding for Distributed Storage Systems(Group Meeting Talk)

The transfer matrix

Page 15: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Proof of Theorem 2

Page 16: Network Coding for Distributed Storage Systems(Group Meeting Talk)
Page 17: Network Coding for Distributed Storage Systems(Group Meeting Talk)
Page 18: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Proof of Theorem 3

Page 19: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Some background on single source multi-cast network coding

*Koetter, R.; Medard, M., "An algebraic approach to network coding," Networking, IEEE/ACM Transactions on , vol.11, no.5, pp.782,795, Oct. 2003

Page 20: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Extension to multicast

Page 21: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Part 2- Outline

● Introduction● The repair problem ● Reduction of repair problem to single source multicast network ● Family of single source multi-cast networks arising from the

reduction● A lower bound on min-cuts(i.e. An upper bound on max-flow

and hence coding capacity of network)● Minimization of storage bandwidth subject to this lower bound

Page 22: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Distributed storage

● We are living in an internet age

● Demand for large scale data storage has increased significantly

● Social networks, file and video sharing require seamless storage, access and security for massive amounts of data

● Storage mediums(viz. hard-drives) are individually unreliable

● Hence we introduce redundancy via the use of erasure codes to improve reliability

Page 23: Network Coding for Distributed Storage Systems(Group Meeting Talk)

A storage code((4,2) MDS)

KwefgwsJwehfwgSjfJHFJjhfefogSikytrdsdjhvkjd

A1

A2

B1

B2

A1

A2

B1

B2

A1+B

1

A2+B

2

A2+B

1

A1+ A

2+B

2

Fragment 1

Fragment 2

Disk 1

Disk 2

Disk 3

Disk 4

Page 24: Network Coding for Distributed Storage Systems(Group Meeting Talk)

A storage code((4,2) MDS)

KwefgwsJwehfwgSjfJHFJjhfefogSikytrdsdjhvkjd

A1

A2

B1

B2

A1

A2

B1

B2

A1+B

1

A2+B

2

A2+B

1

A1+ A

2+B

2

Fragment 1

Fragment 2

Disk 1

Disk 2

Disk 3

Disk 4

Page 25: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Part 2- Outline

● Introduction● The repair problem ● Reduction of repair problem to single source multicast network ● Family of single source multi-cast networks arising from the

reduction● A lower bound on min-cuts(i.e. An upper bound on max-flow

and hence coding capacity of network)● Minimization of storage bandwidth subject to this lower bound

Page 26: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Problem Definition

● Storage nodes are distributed and connected in a network● Together they represent some storage code(MDS or

approximate MDS like LDPC)● The issue of repairing a node arises when a storage node of the

system fails● The still functioning nodes are called active nodes● A newcomer node called repair node must connect to a subset

of active nodes, obtain information from them and reconstruct the storage code i.e, repair the code

● The objective is to minimize amount of information transferred in this process

Page 27: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Notation

Page 28: Network Coding for Distributed Storage Systems(Group Meeting Talk)

The repair problem

x1

x2

x3

x4

y1

y2

x5

Example: A (4,2) MDS code ( = repair bandwidth per node )

Page 29: Network Coding for Distributed Storage Systems(Group Meeting Talk)

The repair problem

● Data object (2Mb) is divided into two fragments: y1,y2 (1 Mb each)

● 4 encoded fragments generated: x1,x2,x3,x4 (1 Mb each)

● x4 fails, x5, the newcomer needs to communicate with existing nodes and create a new encoded packet

● Any two out of x1,x2,x3,x5 must suffice to recover

original data object

Page 30: Network Coding for Distributed Storage Systems(Group Meeting Talk)

The repair problem

● What(and how much) should x1,x2,x3 communicate to x5 such that are minimized?

x1

x2

x3

x4

y1

y2

x5

Example 1: A (4,2) MDS code

Page 31: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Variants of the repair problem

● Exact Repair: Failed blocks are exactly regenerated i.e. newcomer node must reconstruct exact replica of encoded block in the failed node

● Functional Repair: Newly generated data block need not be exact replica of encoded block on the failed node

● Exact repair of the systematic part: Only repair the systematic part exactly so there is always a un-coded copy of original file available

Page 32: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Variants of the repair problem

● Exact Repair: Failed blocks are exactly regenerated i.e. newcomer node must reconstruct exact replica of encoded block in the failed node

● Functional Repair: Newly generated data block need not be exact replica of encoded block on the failed node

● Exact repair of the systematic part: Only repair the systematic part exactly so there is always a un-coded copy of original file available

Page 33: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Functional repair example(Using RLNC)

a1

b1

a2

b2

a1+b

1+a

2+b

2

a1+2b

1+a

2+2b

2

a1+2b

1+3a

2+b

2

3a1+2b

1+2a

2+3b

2

a1

b1

a2

b2

p1=a1+2b

1

p2=2a2+b

2

p1=4a1+5b

1+4a

2+5b

2

5a1+7b

1+8a

2+7b

2

6a1+9b

1+6a

2+6b

2

1

2

2

1

3

1

1

1

1

1

22

File fragments

Encoded data blocks

Encoded repair packets

Repair node

(Each box is 0.5Mb)

Page 34: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Functional repair example(Using RLNC)

a1

b1

a2

b2

a1+b

1+a

2+b

2

a1+2b

1+a

2+2b

2

a1+2b

1+3a

2+b

2

3a1+2b

1+2a

2+3b

2

a1

b1

a2

b2

p1=a1+2b

1

p2=2a2+b

2

p1=4a1+5b

1+4a

2+5b

2

5a1+7b

1+8a

2+7b

2

6a1+9b

1+6a

2+6b

2

1

2

2

1

3

1

1

1

1

1

22

File fragments

Encoded data blocks

Encoded repair packets

Repair node

(Each box is 0.5Mb)

Flow across this Cut is repair b/w

Page 35: Network Coding for Distributed Storage Systems(Group Meeting Talk)

An attempt at solution

x1

x2

x3

x4

y1

y2

x5

Example 1: A (4,2) MDS code

Page 36: Network Coding for Distributed Storage Systems(Group Meeting Talk)

An attempt at solution

x1

x2

x3

x4

y1

y2

x5

Example 1: A (4,2) MDS code

x5 Recovers original data object and creates a newindependent linear combination

Page 37: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Can we do better than this?

Page 38: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Can we do better than this?

YES!

Page 39: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Part 2- Outline

● Introduction● The repair problem ● Reduction of repair problem to single source

multicast network ● Family of single source multi-cast networks arising

from the reduction● A lower bound on min-cuts(i.e. An upper bound on

max-flow and hence coding capacity of network)● Minimization of storage bandwidth subject to this

lower bound

Page 40: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Reduction to information flow graph

Page 41: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Example

x1in

x2in

x3in

x4in

x5in

x1out

x2out

x3out

x4out

S

x5out

DC

Information flow graph corresponding to Example 1: A (4,2) MDS code

Node 4 has failed

Page 42: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Dynamic nature of information flow graph due to given failure pattern

x1in

x2in

x3in

x4in

x5in

x1out

x2out

x3out

x4out

S

x5out

DC

Information flow graph corresponding to Example 1: A (4,2) MDS code

Node 4 has failed

Page 43: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Family of information flow graphs

x1in

x2in

x3in

x4in

x5in

x1out

x2out

x3out

x4out

S

x5out

DC

Information flow graph corresponding to Example 1: A (4,2) MDS code

Node 3 also failed say a few minutes later

x6in

x6out

Page 44: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Lemma 1

Page 45: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Outline

● The repair problem ● Reduction of repair problem to single source

multicast network ● Family of single source multi-cast networks arising

from the reduction● A lower bound on min-cuts(i.e. An upper bound on

max-flow and hence coding capacity of network)● Minimization of storage bandwidth subject to this

lower bound

Page 46: Network Coding for Distributed Storage Systems(Group Meeting Talk)
Page 47: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Information flow graph

S

Page 48: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Information flow graph

S

Page 49: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Information flow graph

S

Page 50: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Information flow graph

S

Page 51: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Information flow graph

S

Page 52: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Information flow graph

S

Page 53: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Proof

Page 54: Network Coding for Distributed Storage Systems(Group Meeting Talk)
Page 55: Network Coding for Distributed Storage Systems(Group Meeting Talk)
Page 56: Network Coding for Distributed Storage Systems(Group Meeting Talk)

WLOG

Page 57: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Outline

● The repair problem ● Reduction of repair problem to single source

multicast network ● Family of single source multi-cast networks arising

from the reduction● A lower bound on min-cuts(i.e. An upper bound on

max-flow and hence coding capacity of network)● Minimization of storage bandwidth subject to this

lower bound

Page 58: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Minimize subject to the lower bound

Page 59: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Nature of constraint

Page 60: Network Coding for Distributed Storage Systems(Group Meeting Talk)

LHS of constraint as function of

Page 61: Network Coding for Distributed Storage Systems(Group Meeting Talk)

LHS of constraint as function of

Page 62: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Solution to the optimization

Page 63: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Simplification of solution

Page 64: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Simplification of solution

Page 65: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Solution

Page 66: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Minimum repair bandwidth

Page 67: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Storage-Bandwidth TradeoffRelationship between and [1]

Page 68: Network Coding for Distributed Storage Systems(Group Meeting Talk)

References

● [1]Alexandros G. Dimakis, P. Brighten Godfrey, Yunnan Wu, Martin J. Wainwright, and Kannan Ramchandran. 2010. Network coding for distributed storage systems. IEEE Trans. Inf. Theor. 56, 9 (September 2010), 4539-4551.

● [2]Koetter, R.; Medard, M., "An algebraic approach to network coding," Networking, IEEE/ACM Transactions on , vol.11, no.5, pp.782,795, Oct. 2003

● [3]Tracey Ho and Desmond Lun. 2008. Network Coding: An Introduction. Cambridge University Press, New York, NY, USA.

● [4]Dimakis, A.G.; Ramchandran, K.; Wu, Y.; Changho Suh, "A Survey on Network Codes for Distributed Storage," Proceedings of the IEEE , vol.99, no.3, pp.476,489, March 2011