236601 - Coding and Algorithms for Memories Lecture 13
description
Transcript of 236601 - Coding and Algorithms for Memories Lecture 13
1
236601 - Coding and Algorithms for
MemoriesLecture 13
Large Scale Storage Systems
2
• Big Data Players: Facebook, Amazon, Google, Yahoo,…
Cluster of machines running Hadoop at Yahoo! (Source: Yahoo!)
• Failures are the norm
Node failures at Facebook
3
Date
XORing Elephants: Novel Erasure Codes for Big Data M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur, VLDB 2013
4
Problem Setup• Disks are stored together in a group (rack)• Disk failures should be supported• Requirements:– Support as many disk failures as possible– And yet…
• Optimal and fast recovery• Low complexity
5
Reed Solomon Codes• A code with parity check matrix of the form
Where is a primitive element at some extension field and O() > n-1Claim: Every sub-matrix of size dxd has full rank
6
Reed Solomon Codes• Advantages:– Support the maximum number of disk failures– Are very comment in practice and have
relatively efficient encoding/decoding schemes• Disadvantages – Require to work over large fields
Solution: EvenOdd Codes– Need to read all the disks in order to recover
even a single disk failure – not efficient rebuildSolution: ZigZag Codes
The Repair Problem
7
1 2 3 4 5 6 7 9 108 P
1P3
P4
P2
• A disk is lost – Repair job starts
• Access, read, and transmit data of disks!
• Overuse of system resources during single repair
• Goal: Reduce repair cost in a single disk repair
• Facebook’s storage Scheme:– 10 data blocks– 4 parity blocks– Can tolerate any four disk failures
RS code
8
ZigZag Codes• Designed by Itzhak Tamo, Zhiying Wang,
and Jehoshua Bruck• The goal: construct codes correcting the
max number of erasures and yet allow efficient reconstruction if only a single drive fails
9
ZigZag Codes• Lower bound: The min amount of data required to
be read to recover a single drive failure– (n,k) code: n drives, k information, and n-k redundancy– M- size of a single drive in bits
• For (n,n-2) code it is required to read at least 1/2 from the remaining drives, that is at least (1/2)(n-1)M bits– The last example is optimal
• In general, for (n,n-r) code it required to read at least 1/r from the remaining drives (1/r)(n-1)M
10
ZigZag Codes• Example
info 1 info 2 info 3 Row parity
ZigZag
parity0 2 1 01 3 0 12 0 3 23 1 2 3
11
Network Coding for Distributed Storage
• Goal – show the following:In general, for (n,n-r) code it required to read at least 1/r from the remaining drives (1/r)(n-1)M
• Network Coding for Distributed StorageDimakis, Godfrey, Wu, Wainwright, Ramchandran
• File of size M is partitioned into k pieces of size M/k• The k pieces are encoded into n encoded pieces
using an (n,k) MDS code
12
Network Coding for Distributed Storage
• File of size M is partitioned into k pieces of size M/k• The k pieces are encoded into n encoded pieces
using an (n,k) MDS code
y1
y2
x1
x2
x3
x4
13
Network Coding for Distributed Storage
• File of size M is partitioned into k pieces of size M/k• The k pieces are encoded into n encoded pieces
using an (n,k) MDS code
y1
y2
x1
x2
x3
x4
x5
β=?
β
β
14
Network Coding for Distributed Storage
• File of size M is partitioned into k pieces of size M/k• The k pieces are encoded into n encoded pieces
using an (n,k) MDS code
S
x1
out
x2
out
x3
out
x4
out
x5i
n
β=?
β
β
x1i
n
x2i
n
x3i
n
x4i
n
∞
∞
∞
∞
α=1
α=1
α=1
α=1
DC
x5
out
∞
∞
15
ZigZag Codes• Example
a b a+b a+2dc d c+d c+b