Network Coding for Distributed Storage Systems(Group Meeting Talk)

Post on 27-Jun-2015

543 views 1 download

Tags:

description

Reviews work of Koetter et al. and Dimakis et al. The former provides an algebraic framework for linear network coding. The latter reduces the so called repair problem to single-source multicast network-coding problem and shows that there is a tradeoff between amount of data stored in a distributed sturage system and amount of data transfer required to repair the system if a node(hard-drive) fails.

Transcript of Network Coding for Distributed Storage Systems(Group Meeting Talk)

Network Coding for Distributed Storage Systems*

Presented byJayant ApteASPITRG

7/9/13 & 7/11/13

*Dimakis, A.G.; Godfrey, P.B.; Wu, Y.; Wainwright, M.J.; Ramchandran, K. "Network Coding for Distributed Storage Systems", Information Theory, IEEE Transactions on, On page(s): 4539 – 4551 Volume: 56, Issue: 9, Sept. 2010

Outline

● Part 1– Single Source Multi-cast Linear Network Coding

● Part 2– The repair problem

– Reduction of repair problem to single source multicast network

– Family of single source multi-cast networks arising from the reduction

– A lower bound on min-cuts(i.e. An upper bound on max-flow and hence coding capacity of network)

– Minimization of storage bandwidth subject to this lower bound

Some background on single source multi-cast network coding

*Koetter, R.; Medard, M., "An algebraic approach to network coding," Networking, IEEE/ACM Transactions on , vol.11, no.5, pp.782,795, Oct. 2003

Some background on single source multi-cast network coding

*Koetter, R.; Medard, M., "An algebraic approach to network coding," Networking, IEEE/ACM Transactions on , vol.11, no.5, pp.782,795, Oct. 2003

Max-Flow-Min-Cut Theorem

Max-Flow-Min-Cut Theorem

Max-Flow-Min-Cut Theorem

Some background on single source multi-cast network coding

*Koetter, R.; Medard, M., "An algebraic approach to network coding," Networking, IEEE/ACM Transactions on , vol.11, no.5, pp.782,795, Oct. 2003

Basic Network Model

Basic Network Model

Local coding coefficients

Global coding coefficients

Matrix formulation

The transfer matrix

Proof of Theorem 2

Proof of Theorem 3

Some background on single source multi-cast network coding

*Koetter, R.; Medard, M., "An algebraic approach to network coding," Networking, IEEE/ACM Transactions on , vol.11, no.5, pp.782,795, Oct. 2003

Extension to multicast

Part 2- Outline

● Introduction● The repair problem ● Reduction of repair problem to single source multicast network ● Family of single source multi-cast networks arising from the

reduction● A lower bound on min-cuts(i.e. An upper bound on max-flow

and hence coding capacity of network)● Minimization of storage bandwidth subject to this lower bound

Distributed storage

● We are living in an internet age

● Demand for large scale data storage has increased significantly

● Social networks, file and video sharing require seamless storage, access and security for massive amounts of data

● Storage mediums(viz. hard-drives) are individually unreliable

● Hence we introduce redundancy via the use of erasure codes to improve reliability

A storage code((4,2) MDS)

KwefgwsJwehfwgSjfJHFJjhfefogSikytrdsdjhvkjd

A1

A2

B1

B2

A1

A2

B1

B2

A1+B

1

A2+B

2

A2+B

1

A1+ A

2+B

2

Fragment 1

Fragment 2

Disk 1

Disk 2

Disk 3

Disk 4

A storage code((4,2) MDS)

KwefgwsJwehfwgSjfJHFJjhfefogSikytrdsdjhvkjd

A1

A2

B1

B2

A1

A2

B1

B2

A1+B

1

A2+B

2

A2+B

1

A1+ A

2+B

2

Fragment 1

Fragment 2

Disk 1

Disk 2

Disk 3

Disk 4

Part 2- Outline

● Introduction● The repair problem ● Reduction of repair problem to single source multicast network ● Family of single source multi-cast networks arising from the

reduction● A lower bound on min-cuts(i.e. An upper bound on max-flow

and hence coding capacity of network)● Minimization of storage bandwidth subject to this lower bound

Problem Definition

● Storage nodes are distributed and connected in a network● Together they represent some storage code(MDS or

approximate MDS like LDPC)● The issue of repairing a node arises when a storage node of the

system fails● The still functioning nodes are called active nodes● A newcomer node called repair node must connect to a subset

of active nodes, obtain information from them and reconstruct the storage code i.e, repair the code

● The objective is to minimize amount of information transferred in this process

Notation

The repair problem

x1

x2

x3

x4

y1

y2

x5

Example: A (4,2) MDS code ( = repair bandwidth per node )

The repair problem

● Data object (2Mb) is divided into two fragments: y1,y2 (1 Mb each)

● 4 encoded fragments generated: x1,x2,x3,x4 (1 Mb each)

● x4 fails, x5, the newcomer needs to communicate with existing nodes and create a new encoded packet

● Any two out of x1,x2,x3,x5 must suffice to recover

original data object

The repair problem

● What(and how much) should x1,x2,x3 communicate to x5 such that are minimized?

x1

x2

x3

x4

y1

y2

x5

Example 1: A (4,2) MDS code

Variants of the repair problem

● Exact Repair: Failed blocks are exactly regenerated i.e. newcomer node must reconstruct exact replica of encoded block in the failed node

● Functional Repair: Newly generated data block need not be exact replica of encoded block on the failed node

● Exact repair of the systematic part: Only repair the systematic part exactly so there is always a un-coded copy of original file available

Variants of the repair problem

● Exact Repair: Failed blocks are exactly regenerated i.e. newcomer node must reconstruct exact replica of encoded block in the failed node

● Functional Repair: Newly generated data block need not be exact replica of encoded block on the failed node

● Exact repair of the systematic part: Only repair the systematic part exactly so there is always a un-coded copy of original file available

Functional repair example(Using RLNC)

a1

b1

a2

b2

a1+b

1+a

2+b

2

a1+2b

1+a

2+2b

2

a1+2b

1+3a

2+b

2

3a1+2b

1+2a

2+3b

2

a1

b1

a2

b2

p1=a1+2b

1

p2=2a2+b

2

p1=4a1+5b

1+4a

2+5b

2

5a1+7b

1+8a

2+7b

2

6a1+9b

1+6a

2+6b

2

1

2

2

1

3

1

1

1

1

1

22

File fragments

Encoded data blocks

Encoded repair packets

Repair node

(Each box is 0.5Mb)

Functional repair example(Using RLNC)

a1

b1

a2

b2

a1+b

1+a

2+b

2

a1+2b

1+a

2+2b

2

a1+2b

1+3a

2+b

2

3a1+2b

1+2a

2+3b

2

a1

b1

a2

b2

p1=a1+2b

1

p2=2a2+b

2

p1=4a1+5b

1+4a

2+5b

2

5a1+7b

1+8a

2+7b

2

6a1+9b

1+6a

2+6b

2

1

2

2

1

3

1

1

1

1

1

22

File fragments

Encoded data blocks

Encoded repair packets

Repair node

(Each box is 0.5Mb)

Flow across this Cut is repair b/w

An attempt at solution

x1

x2

x3

x4

y1

y2

x5

Example 1: A (4,2) MDS code

An attempt at solution

x1

x2

x3

x4

y1

y2

x5

Example 1: A (4,2) MDS code

x5 Recovers original data object and creates a newindependent linear combination

Can we do better than this?

Can we do better than this?

YES!

Part 2- Outline

● Introduction● The repair problem ● Reduction of repair problem to single source

multicast network ● Family of single source multi-cast networks arising

from the reduction● A lower bound on min-cuts(i.e. An upper bound on

max-flow and hence coding capacity of network)● Minimization of storage bandwidth subject to this

lower bound

Reduction to information flow graph

Example

x1in

x2in

x3in

x4in

x5in

x1out

x2out

x3out

x4out

S

x5out

DC

Information flow graph corresponding to Example 1: A (4,2) MDS code

Node 4 has failed

Dynamic nature of information flow graph due to given failure pattern

x1in

x2in

x3in

x4in

x5in

x1out

x2out

x3out

x4out

S

x5out

DC

Information flow graph corresponding to Example 1: A (4,2) MDS code

Node 4 has failed

Family of information flow graphs

x1in

x2in

x3in

x4in

x5in

x1out

x2out

x3out

x4out

S

x5out

DC

Information flow graph corresponding to Example 1: A (4,2) MDS code

Node 3 also failed say a few minutes later

x6in

x6out

Lemma 1

Outline

● The repair problem ● Reduction of repair problem to single source

multicast network ● Family of single source multi-cast networks arising

from the reduction● A lower bound on min-cuts(i.e. An upper bound on

max-flow and hence coding capacity of network)● Minimization of storage bandwidth subject to this

lower bound

Information flow graph

S

Information flow graph

S

Information flow graph

S

Information flow graph

S

Information flow graph

S

Information flow graph

S

Proof

WLOG

Outline

● The repair problem ● Reduction of repair problem to single source

multicast network ● Family of single source multi-cast networks arising

from the reduction● A lower bound on min-cuts(i.e. An upper bound on

max-flow and hence coding capacity of network)● Minimization of storage bandwidth subject to this

lower bound

Minimize subject to the lower bound

Nature of constraint

LHS of constraint as function of

LHS of constraint as function of

Solution to the optimization

Simplification of solution

Simplification of solution

Solution

Minimum repair bandwidth

Storage-Bandwidth TradeoffRelationship between and [1]

References

● [1]Alexandros G. Dimakis, P. Brighten Godfrey, Yunnan Wu, Martin J. Wainwright, and Kannan Ramchandran. 2010. Network coding for distributed storage systems. IEEE Trans. Inf. Theor. 56, 9 (September 2010), 4539-4551.

● [2]Koetter, R.; Medard, M., "An algebraic approach to network coding," Networking, IEEE/ACM Transactions on , vol.11, no.5, pp.782,795, Oct. 2003

● [3]Tracey Ho and Desmond Lun. 2008. Network Coding: An Introduction. Cambridge University Press, New York, NY, USA.

● [4]Dimakis, A.G.; Ramchandran, K.; Wu, Y.; Changho Suh, "A Survey on Network Codes for Distributed Storage," Proceedings of the IEEE , vol.99, no.3, pp.476,489, March 2011