Download - Network Coding for Distributed Storage Systems(Group Meeting Talk)

Transcript
Page 1: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Network Coding for Distributed Storage Systems*

Presented byJayant ApteASPITRG

7/9/13 & 7/11/13

*Dimakis, A.G.; Godfrey, P.B.; Wu, Y.; Wainwright, M.J.; Ramchandran, K. "Network Coding for Distributed Storage Systems", Information Theory, IEEE Transactions on, On page(s): 4539 – 4551 Volume: 56, Issue: 9, Sept. 2010

Page 2: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Outline

● Part 1– Single Source Multi-cast Linear Network Coding

● Part 2– The repair problem

– Reduction of repair problem to single source multicast network

– Family of single source multi-cast networks arising from the reduction

– A lower bound on min-cuts(i.e. An upper bound on max-flow and hence coding capacity of network)

– Minimization of storage bandwidth subject to this lower bound

Page 3: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Some background on single source multi-cast network coding

*Koetter, R.; Medard, M., "An algebraic approach to network coding," Networking, IEEE/ACM Transactions on , vol.11, no.5, pp.782,795, Oct. 2003

Page 4: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Some background on single source multi-cast network coding

*Koetter, R.; Medard, M., "An algebraic approach to network coding," Networking, IEEE/ACM Transactions on , vol.11, no.5, pp.782,795, Oct. 2003

Page 5: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Max-Flow-Min-Cut Theorem

Page 6: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Max-Flow-Min-Cut Theorem

Page 7: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Max-Flow-Min-Cut Theorem

Page 8: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Some background on single source multi-cast network coding

*Koetter, R.; Medard, M., "An algebraic approach to network coding," Networking, IEEE/ACM Transactions on , vol.11, no.5, pp.782,795, Oct. 2003

Page 9: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Basic Network Model

Page 10: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Basic Network Model

Page 11: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Local coding coefficients

Page 12: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Global coding coefficients

Page 13: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Matrix formulation

Page 14: Network Coding for Distributed Storage Systems(Group Meeting Talk)

The transfer matrix

Page 15: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Proof of Theorem 2

Page 16: Network Coding for Distributed Storage Systems(Group Meeting Talk)
Page 17: Network Coding for Distributed Storage Systems(Group Meeting Talk)
Page 18: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Proof of Theorem 3

Page 19: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Some background on single source multi-cast network coding

*Koetter, R.; Medard, M., "An algebraic approach to network coding," Networking, IEEE/ACM Transactions on , vol.11, no.5, pp.782,795, Oct. 2003

Page 20: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Extension to multicast

Page 21: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Part 2- Outline

● Introduction● The repair problem ● Reduction of repair problem to single source multicast network ● Family of single source multi-cast networks arising from the

reduction● A lower bound on min-cuts(i.e. An upper bound on max-flow

and hence coding capacity of network)● Minimization of storage bandwidth subject to this lower bound

Page 22: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Distributed storage

● We are living in an internet age

● Demand for large scale data storage has increased significantly

● Social networks, file and video sharing require seamless storage, access and security for massive amounts of data

● Storage mediums(viz. hard-drives) are individually unreliable

● Hence we introduce redundancy via the use of erasure codes to improve reliability

Page 23: Network Coding for Distributed Storage Systems(Group Meeting Talk)

A storage code((4,2) MDS)

KwefgwsJwehfwgSjfJHFJjhfefogSikytrdsdjhvkjd

A1

A2

B1

B2

A1

A2

B1

B2

A1+B

1

A2+B

2

A2+B

1

A1+ A

2+B

2

Fragment 1

Fragment 2

Disk 1

Disk 2

Disk 3

Disk 4

Page 24: Network Coding for Distributed Storage Systems(Group Meeting Talk)

A storage code((4,2) MDS)

KwefgwsJwehfwgSjfJHFJjhfefogSikytrdsdjhvkjd

A1

A2

B1

B2

A1

A2

B1

B2

A1+B

1

A2+B

2

A2+B

1

A1+ A

2+B

2

Fragment 1

Fragment 2

Disk 1

Disk 2

Disk 3

Disk 4

Page 25: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Part 2- Outline

● Introduction● The repair problem ● Reduction of repair problem to single source multicast network ● Family of single source multi-cast networks arising from the

reduction● A lower bound on min-cuts(i.e. An upper bound on max-flow

and hence coding capacity of network)● Minimization of storage bandwidth subject to this lower bound

Page 26: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Problem Definition

● Storage nodes are distributed and connected in a network● Together they represent some storage code(MDS or

approximate MDS like LDPC)● The issue of repairing a node arises when a storage node of the

system fails● The still functioning nodes are called active nodes● A newcomer node called repair node must connect to a subset

of active nodes, obtain information from them and reconstruct the storage code i.e, repair the code

● The objective is to minimize amount of information transferred in this process

Page 27: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Notation

Page 28: Network Coding for Distributed Storage Systems(Group Meeting Talk)

The repair problem

x1

x2

x3

x4

y1

y2

x5

Example: A (4,2) MDS code ( = repair bandwidth per node )

Page 29: Network Coding for Distributed Storage Systems(Group Meeting Talk)

The repair problem

● Data object (2Mb) is divided into two fragments: y1,y2 (1 Mb each)

● 4 encoded fragments generated: x1,x2,x3,x4 (1 Mb each)

● x4 fails, x5, the newcomer needs to communicate with existing nodes and create a new encoded packet

● Any two out of x1,x2,x3,x5 must suffice to recover

original data object

Page 30: Network Coding for Distributed Storage Systems(Group Meeting Talk)

The repair problem

● What(and how much) should x1,x2,x3 communicate to x5 such that are minimized?

x1

x2

x3

x4

y1

y2

x5

Example 1: A (4,2) MDS code

Page 31: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Variants of the repair problem

● Exact Repair: Failed blocks are exactly regenerated i.e. newcomer node must reconstruct exact replica of encoded block in the failed node

● Functional Repair: Newly generated data block need not be exact replica of encoded block on the failed node

● Exact repair of the systematic part: Only repair the systematic part exactly so there is always a un-coded copy of original file available

Page 32: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Variants of the repair problem

● Exact Repair: Failed blocks are exactly regenerated i.e. newcomer node must reconstruct exact replica of encoded block in the failed node

● Functional Repair: Newly generated data block need not be exact replica of encoded block on the failed node

● Exact repair of the systematic part: Only repair the systematic part exactly so there is always a un-coded copy of original file available

Page 33: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Functional repair example(Using RLNC)

a1

b1

a2

b2

a1+b

1+a

2+b

2

a1+2b

1+a

2+2b

2

a1+2b

1+3a

2+b

2

3a1+2b

1+2a

2+3b

2

a1

b1

a2

b2

p1=a1+2b

1

p2=2a2+b

2

p1=4a1+5b

1+4a

2+5b

2

5a1+7b

1+8a

2+7b

2

6a1+9b

1+6a

2+6b

2

1

2

2

1

3

1

1

1

1

1

22

File fragments

Encoded data blocks

Encoded repair packets

Repair node

(Each box is 0.5Mb)

Page 34: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Functional repair example(Using RLNC)

a1

b1

a2

b2

a1+b

1+a

2+b

2

a1+2b

1+a

2+2b

2

a1+2b

1+3a

2+b

2

3a1+2b

1+2a

2+3b

2

a1

b1

a2

b2

p1=a1+2b

1

p2=2a2+b

2

p1=4a1+5b

1+4a

2+5b

2

5a1+7b

1+8a

2+7b

2

6a1+9b

1+6a

2+6b

2

1

2

2

1

3

1

1

1

1

1

22

File fragments

Encoded data blocks

Encoded repair packets

Repair node

(Each box is 0.5Mb)

Flow across this Cut is repair b/w

Page 35: Network Coding for Distributed Storage Systems(Group Meeting Talk)

An attempt at solution

x1

x2

x3

x4

y1

y2

x5

Example 1: A (4,2) MDS code

Page 36: Network Coding for Distributed Storage Systems(Group Meeting Talk)

An attempt at solution

x1

x2

x3

x4

y1

y2

x5

Example 1: A (4,2) MDS code

x5 Recovers original data object and creates a newindependent linear combination

Page 37: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Can we do better than this?

Page 38: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Can we do better than this?

YES!

Page 39: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Part 2- Outline

● Introduction● The repair problem ● Reduction of repair problem to single source

multicast network ● Family of single source multi-cast networks arising

from the reduction● A lower bound on min-cuts(i.e. An upper bound on

max-flow and hence coding capacity of network)● Minimization of storage bandwidth subject to this

lower bound

Page 40: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Reduction to information flow graph

Page 41: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Example

x1in

x2in

x3in

x4in

x5in

x1out

x2out

x3out

x4out

S

x5out

DC

Information flow graph corresponding to Example 1: A (4,2) MDS code

Node 4 has failed

Page 42: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Dynamic nature of information flow graph due to given failure pattern

x1in

x2in

x3in

x4in

x5in

x1out

x2out

x3out

x4out

S

x5out

DC

Information flow graph corresponding to Example 1: A (4,2) MDS code

Node 4 has failed

Page 43: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Family of information flow graphs

x1in

x2in

x3in

x4in

x5in

x1out

x2out

x3out

x4out

S

x5out

DC

Information flow graph corresponding to Example 1: A (4,2) MDS code

Node 3 also failed say a few minutes later

x6in

x6out

Page 44: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Lemma 1

Page 45: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Outline

● The repair problem ● Reduction of repair problem to single source

multicast network ● Family of single source multi-cast networks arising

from the reduction● A lower bound on min-cuts(i.e. An upper bound on

max-flow and hence coding capacity of network)● Minimization of storage bandwidth subject to this

lower bound

Page 46: Network Coding for Distributed Storage Systems(Group Meeting Talk)
Page 47: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Information flow graph

S

Page 48: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Information flow graph

S

Page 49: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Information flow graph

S

Page 50: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Information flow graph

S

Page 51: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Information flow graph

S

Page 52: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Information flow graph

S

Page 53: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Proof

Page 54: Network Coding for Distributed Storage Systems(Group Meeting Talk)
Page 55: Network Coding for Distributed Storage Systems(Group Meeting Talk)
Page 56: Network Coding for Distributed Storage Systems(Group Meeting Talk)

WLOG

Page 57: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Outline

● The repair problem ● Reduction of repair problem to single source

multicast network ● Family of single source multi-cast networks arising

from the reduction● A lower bound on min-cuts(i.e. An upper bound on

max-flow and hence coding capacity of network)● Minimization of storage bandwidth subject to this

lower bound

Page 58: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Minimize subject to the lower bound

Page 59: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Nature of constraint

Page 60: Network Coding for Distributed Storage Systems(Group Meeting Talk)

LHS of constraint as function of

Page 61: Network Coding for Distributed Storage Systems(Group Meeting Talk)

LHS of constraint as function of

Page 62: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Solution to the optimization

Page 63: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Simplification of solution

Page 64: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Simplification of solution

Page 65: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Solution

Page 66: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Minimum repair bandwidth

Page 67: Network Coding for Distributed Storage Systems(Group Meeting Talk)

Storage-Bandwidth TradeoffRelationship between and [1]

Page 68: Network Coding for Distributed Storage Systems(Group Meeting Talk)

References

● [1]Alexandros G. Dimakis, P. Brighten Godfrey, Yunnan Wu, Martin J. Wainwright, and Kannan Ramchandran. 2010. Network coding for distributed storage systems. IEEE Trans. Inf. Theor. 56, 9 (September 2010), 4539-4551.

● [2]Koetter, R.; Medard, M., "An algebraic approach to network coding," Networking, IEEE/ACM Transactions on , vol.11, no.5, pp.782,795, Oct. 2003

● [3]Tracey Ho and Desmond Lun. 2008. Network Coding: An Introduction. Cambridge University Press, New York, NY, USA.

● [4]Dimakis, A.G.; Ramchandran, K.; Wu, Y.; Changho Suh, "A Survey on Network Codes for Distributed Storage," Proceedings of the IEEE , vol.99, no.3, pp.476,489, March 2011