Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of...
-
Upload
rylie-paskin -
Category
Documents
-
view
215 -
download
1
Transcript of Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of...
![Page 1: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/1.jpg)
Reliability of Disk Systems
![Page 2: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/2.jpg)
Reliability• So far, we looked at ways to improve the performance of
disk systems.• Next, we will look at ways to improve the reliability of disk
systems.• What is reliability?
– Essentially, it is the availability of data when there is a disk “failure” of some sort.
• This is achieved at the cost of some redundancy – data and/or disks.
![Page 3: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/3.jpg)
Intermittent Failures• In an intermittent failure, we may get several “bad” reads,
for example, but with repeated attempts we may eventually get a “good”.
• Disk sectors are stored with some redundant bits that can be used to tell us if an I/O operation was successful.
• For writes, we may want to again check the status– We can, of course, re-read the sector and compare it to the
original
– But this is expensive
– Instead, we simply re-read the sector and check the status bits
![Page 4: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/4.jpg)
Checksums for failure detection• A useful tool for status validation is the checksum
– One or more bits that, with high probability, verify the correctness of the operation
– The checksum is written by the disk controller.
• A simple form of checksum is the parity bit:– Here, a bit is added to the data so that the number of 1’s
amongst the data bits + the parity bit is always even.
– A disk read (per sector) would return a status value of “good” if the bit string has an even number of 1’s; otherwise, status = bad
11101110 0 Good
00101010 0 Bad
![Page 5: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/5.jpg)
(Interleaved) Parity bits• It is possible that more than one bit in a sector be corrupted
– Error(s) may not be detected.• Suppose bits error randomly: Probability of undetected error
(i.e. even 1’s) is thus 50% (Why?)
• Let’s have 8 parity bits
01110110 Byte 111001101 Byte 2 00001111 Byte 310110100 Byte of parity bits
• Probability of error is 1/28 = 1/256• With n parity bits, the probability of undetected error = 1/2n
![Page 6: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/6.jpg)
Recovery from disk crashes• Mean time to failure (MTTF) = when 50% of the disks
have crashed, typically 10 years• Simplified (assuming this happens linearly)
– In the 1st year = 5%,
– In the 2nd year = 5%,
– …
– In the 20th year = 5%
• However the mean time to a disk crash doesn’t have to be the same as the mean time to data loss; there are solutions.
![Page 7: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/7.jpg)
Redundant Array of Independent Disks, RAID• RAID 1:Mirror each disk (data/redundant disks)
• If a disk fails, restore using the mirror
Assume: • 5% failure per year; MTTF = 10 years (for disks). • 3 hours to replace and restore failed disk.
If a failure to one disk occurs, then the other better not fail in the next three hours. • Probability of failure = 5% 3/(24 365) = 1/58400. • If one disk fails every 10 years (10 5% = 50%), then one of two will fail every
5 years (5 (5% + 5%) = 50% ). • One in 58,400 of those failures results in data loss; MTTF = 292,000 years (5
58,400 = 292,000).
Drawback: We need one redundant disk for each data disk.This is the mean time to failure for data.
![Page 8: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/8.jpg)
RAID 4• RAID 4: One redundant disk only.• n data disks & 1 redundant disk (for any n)
• We’ll refer to the expression xy as modulo-2 sum of x and y (XOR)– E.g. 11110000 10101010 = 01011010
• Now, each block in the redundant disk has the modulo-2 sum for the corresponding blocks in the other disks.
i th Block of Disk 1: 11110000i th Block of Disk 2: 10101010i th Block of Disk 3: 00111000i th Block of red. disk: 01100010
• In effect this is just a distributed form of the block-interleaved parity discussed earlier.
![Page 9: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/9.jpg)
Properties of XOR: • Commutativity: xy = yx• Associativity: x(yz) = (xy)z• Identity: x0 = 0x = x (0 is vector 00…0)• Self-inverse: xx = 0
– As a useful consequence, if xy=z, then we can “add” x to both sides and get y=xz
– More generally:
0 = x1...xn+1
Then “adding” xi to both sides, we get:
xi = x1…xi-1 xi+1...xn+1
![Page 10: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/10.jpg)
Failure recovery in RAID 4We must be able to restore whatever disk crashes. • Just compute the modulo 2 sum of corresponding blocks of
the other disks.
• Use equation
• Example:
i th Block of Disk1: 11110000i th Block of Disk 2: 10101010i th Block of Disk 3: 00111000i th Block of red disk: 01100010
rednjjj xxxxxx ...... 111
Disk 2 crashes. Compute it by taking the modulo 2
sum of the rest.
![Page 11: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/11.jpg)
RAID 4 (Cont’d)Maintaining RAID 4 is relatively easy:
•Reading: as usual– Interesting possibility: If we want to read from disk i, but it is
busy and all other disks are free, then instead we can read the corresponding blocks from all other disks and modulo 2 sum them.
•Writing: – Write block.
– Update redundant block
![Page 12: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/12.jpg)
How do we get the value for the redundant block?
• Naively: Read all n-1 corresponding blocks
n+1 disk I/O’s, which is
n-1 blocks read,
1 data block write,
1 redundant block write.
• Better: How?
nired xxxx ......1
![Page 13: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/13.jpg)
How do we get the value for the redundant block?
• Better Writing: To write block j of data disk i (new value = v): – Read old value of that block, say o.
– Read the jth block of the redundant disk, value = r.
– Compute w = v o r.
– Write v in block j of disk i.
– Write w in block j of the redundant disk. • Total: 4 disk I/O; (true for any number of data disks)• Problem Why does this work?
– Intuition: v o is the “change” to the parity. – Redundant disk must change to compensate.
![Page 14: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/14.jpg)
Examplei th Block of Disk1: 11110000 x1i th Block of Disk 2: 10101010 x2 = oi th Block of Disk 3: 00111000 x3i th Block of red disk: 01100010 r
Suppose we change 10101010 into 0110111010101010 o01101110 v01100010 r---------------10100110 w
11110000 x101101110 x2 = v00111000 x3-------------10100110 w = new r
If done the naïve way
![Page 15: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/15.jpg)
RAID 5• RAID 4: Problem: The redundant disk is involved in every write Bottleneck!
• Solution is RAID 5: vary the redundant disk for different blocks. – Example: n+1 disks; – cylinder j is redundant on disk i if i = j mod n+1.
• Example: n=3. So, there are 4 disks. – First disk numbered 0, would be the “redundant” when considering cylinders
numbered: 0, 4, 8, 12 etc. (because they leave reminder 0 when divided by 4).
– Disk numbered 1, would be the “redundant” for its cylinders numbered: 1, 5, 9, 13. And so on
Cylinder 2
Cylinder 3
123
Cylinder 1
Disk 0
Cylinder 2
Cylinder 3
023
Cylinder 0
Disk 1
Cylinder 1
Cylinder 3
013
Cylinder 0
Disk 2
Cylinder 1
Cylinder 2
012
Cylinder 0
Disk 3
![Page 16: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/16.jpg)
RAID 5 (Cont’d)• The reading/writing load for each disk is the same.• In one block write what’s the probability that a
disk is involved?– Each disk has 1/(n+1) probability to have the block.– If not, i.e. with probability n/(n+1), then it has 1/n chance
that it will be the redundant block for that block number. – So, each of the four disks is involved in:
1/(n+1) * 1 + (n/(n+1))*(1/n) *1= 2/(n+1) of the writes.
![Page 17: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/17.jpg)
RAID 6 - for multiple disk crashesLet’s focus on recovering from two disk crashes.
Setup:• 7 disks, numbered 1 through 7• The first 4 are data disks, and disks 5 through 7 are redundant.• The relationship between data and redundant disks is summarized by
a 3 x 7 matrix of 0's and 1's
1 2 3 4 5 6 7
1 1 1 0 1 0 0
1 1 0 1 0 1 0
1 0 1 1 0 0 1
The columns for the redundant disks have a single 1.
All columns are different. No all-0’s column.
Data disks
Redundant disks The disks with 1
in a given row of the matrix are treated as if they were the entire set of disks in a RAID level 4 scheme.
![Page 18: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/18.jpg)
RAID 6 - example1) 11110000
2) 10101010
3) 00111000
4) 01000001
5) 01100010
6) 00011011
7) 10001001
1 2 3 4 5 6 7
1 1 1 0 1 0 0
1 1 0 1 0 1 0
1 0 1 1 0 0 1
Redundant disksData disks
disk 5 is modulo 2 sum of disks 1,2,3
disk 6 is modulo 2 sum of disks 1,2,4
disk 7 is modulo 2 sum of disks 1,3,4
![Page 19: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/19.jpg)
1 2 3 4 5 6 7
1 1 1 0 1 0 0
1 1 0 1 0 1 0
1 0 1 1 0 0 1
Why is it possible to recover from b a
two disk crashes?
r• Let the failed disks be a and b.• Since all columns of the redundancy matrix are different, we
must be able to find some row r in which the columns for a and b are different. – Suppose that a has 0 in row r, while b has 1 there.
• Then we can compute the correct b by taking the modulo-2 sum of corresponding bits from all the disks other than b that have 1 in row r. – Note that a is not among these, so none of them have failed.
• Having done so, we must recompute a, with all other disks available.
RAID 6 Failure Recovery
![Page 20: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/20.jpg)
• Example:
Before failure After failure
1 2 3 4 5 6 7
1 1 1 0 1 0 0
1 1 0 1 0 1 0
1 0 1 1 0 0 1
![Page 21: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/21.jpg)
RAID 6 – How many redundant disks?• The number of disks can be one less than any power of 2, say 2k – 1.
• Of these disks, k are redundant, and the remaining 2k– 1– k are data disks, so the redundancy grows roughly as the logarithm of the number of data disks.
• For any k, we can construct the redundancy matrix by writing all possible columns of k 0's and 1's, except the all-0's column.
– The columns with a single 1 correspond to the redundant disks, and the columns with more than one 1 are the data disks.
Note finally that we can combine RAID 6 with RAID 5 to reduce the performance bottleneck on the redundant disks
![Page 22: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/22.jpg)
Exercises
![Page 23: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/23.jpg)
RAID 4i th Block of Disk 1: 11110000
i th Block of Disk 2: 10101010
i th Block of Disk 3: 00111000
i th Block of Disk 3: 11111011
i th Block of red. disk:
Now suppose that Disk 1 crashed. Recover it.
![Page 24: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/24.jpg)
RAID 61) 11110000
2) 10101010
3) 00111000
4) 01000001
5)
6)
7)
1 2 3 4 5 6 7
1 1 1 0 1 0 0
1 1 0 1 0 1 0
1 0 1 1 0 0 1
Redundant disksData disks
Now suppose that Disk 2 and Disk 5 crash.
Recover them.
![Page 25: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/25.jpg)
RAID 6 - exercise• Find a RAID level 6 scheme using 15 disks, 4 of which are
redundant.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 0 1 1 1 0 0 0 1 1 1 1 0 0 0
1 1 0 1 1 0 1 1 0 1 0 0 1 0 0
1 1 1 0 1 1 0 1 0 0 1 0 0 1 0
1 1 1 1 0 1 1 0 1 0 0 0 0 0 1
![Page 26: Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.](https://reader031.fdocuments.in/reader031/viewer/2022013100/5517a4a45503460e6e8b5ebe/html5/thumbnails/26.jpg)
In-Class exercise• Suppose we have four disks: 1 and 2 are data disks, 3 and
4 are redundant• Disk 3 is a mirror of 1. Disk 4 holds parity check bits for
disks 2 and 3• which combination of simultaneous 2-disk failures can we
recover from?