Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department...

36
Lecture 4: A Case for Lecture 4: A Case for RAID (Part 2) RAID (Part 2) Prof. Shahram Ghandeharizadeh Prof. Shahram Ghandeharizadeh Computer Science Department Computer Science Department University of Southern California University of Southern California

Transcript of Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department...

Page 1: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

Lecture 4: A Case for RAID (Part Lecture 4: A Case for RAID (Part 2)2)

Prof. Shahram GhandeharizadehProf. Shahram GhandeharizadehComputer Science DepartmentComputer Science DepartmentUniversity of Southern CaliforniaUniversity of Southern California

Page 2: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

Smaller & Inexpensive Disks 25% annual reduction in size; 40% annual drop

in price

1 inch in height, weighs 1 ounce (16 grams)

1 GB, Year 2008

IBM Microdrive @ $125

Size of a refrigerator, 550 pounds (250 Kg)

1 GB, Year 1980

IBM 3380 @ $40,000

Page 3: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

Inexpensive Disks

Less than 9 Cents / Gigabyte of storage

Page 4: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

Challenge: Managing Data is Expensive

Cost of Managing Data is $100K/TB/Year: High availability: Down time is estimated at thousands of dollars

per minute. Data loss results in lost productivity:

20 Megabytes of accounting data requires 21 days and costs $19K to reproduce.

50% of companies that lose their data due to a disaster never re-open; 90% go out of business in 2 years!

Page 5: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

Challenge: Managing Data is Expensive

Cost of Managing Data is $100K/TB/Year: High availability: Down time is estimated at thousands of dollars

per minute. Data loss results in lost productivity:

20 Megabytes of accounting data requires 21 days and costs $19K to reproduce.

50% of companies that lose their data due to a disaster never re-open; 90% go out of business in 2 years!

RAID

Page 6: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

MTTF, MTBF, MTTR, AFR MTBF: Mean Time Between Failures

Designed for repairable devices Number of hours since the system was started until its

failure. MTTF: Mean Time To Failures

Designed for non-repairable devices such as magnetic disk drives

Disks of 2008 are more than 40 times more reliable than disks of 1988.

MTTR: Mean Time To Repair Number of hours required to replace a disk drive, AND Reconstruct the data stored on the failed disk drive.

AFR: Annualized Failure Rate Computed by assuming a temperature for the case (40

degrees centigrade), power-on-hours per year (say 8,760, 24x7), and 250 average motor start/stop cycles per year.

Page 7: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

Focus on MTTF & MTTR

MTTF: Mean Time To Failures Designed for non-repairable devices such

as magnetic disk drives Disks of 2008 are more than 40 times

more reliable than disks of 1988.

MTTR: Mean Time To Repair Number of hours required to replace a

disk drive, AND Reconstruct the data stored on the failed

disk drive.

Page 8: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

Assumptions

MTTF of a disk is independent of other disks in a RAID.

Assume:1. The MTTF of a disk is once every 100

years, and

2. An array of 1000 such disks. The MTTF of any single disk in the array

is once every 37 days.

Page 9: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID RAID organizes D disks into nG groups where each group

consists of G disks and C parity disks. Example: D = 8 G = 4 C = 1 nG = 8/4 = 2

Disk 1 Disk 2 Disk 3 Disk 4 Parity 1 Disk 5 Disk 6 Parity 2Disk 7 Disk 8

Parity Group 1 Parity Group 2

Page 10: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID RAID organizes D disks into nG groups where each group

consists of G disks and C parity disks. Example: D = 8 G = 4 C = 1 nG = 8/4 = 2

Disk 1 Disk 2 Disk 3 Disk 4 Parity 1 Disk 5 Disk 6 Parity 2Disk 7 Disk 8

Parity Group 1 Parity Group 2

Page 11: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID With 1 Group

With G disks in a group and C check disks, a failure is encountered when:1. A disk in the group fails, AND

2. A second disk fails before the failed disk of step 1 is repaired.

MTTF of a group of disks with RAID is:

Page 12: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID With 1 Group (Cont…)

Probability of another failure:

MTTR includes the time required to:1. Replace the failed disk drive,

2. Reconstruct the content of the failed disk.

Performing step 2 in a lazy manner increases duration of MTTR. And the probability of another failure.

What happens if we increase the number of data disks in a group?

Page 13: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID with nG Groups

With nG groups, the Mean Time To Failure of the RAID is computed in a similar manner:

Page 14: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

Review

RAID 1 and 3 were presented in the previous lecture.

Here is a quick review.

Page 15: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID 1: Disk Mirroring Contents of disks 1

and 2 are identical. Redundant paths keep

data available in the presence of either a controller or disk failure.

A write operation by a CPU is directed to both disks.

A read operation is directed to one of the disks. Each disk might be

reading different sectors simultaneously.

Tandem’s architecture

Controller 1 Controller 2

CPU 1

Disk 1 Disk 2

Page 16: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID 3: Small Blocks Reads

Bit-interleaved. Bad news: Small reads of less than the group size,

requires reading the whole group. E.g., read of one sector, requires read of 4 sectors. One parity group has the read rate identical to one disk.

01011110101010000001101001111

01

11

01

10

Disk 1 Disk 2 Disk 3 Disk 4 Parity

01

Page 17: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID 3: Small Block Reads Given a large number of disks, say D=12, enhance performance by

constructing several parity groups, say 3.

With G (4) disks per group and D (say 8), the number of read requests supported by RAID 3 when compared with one disks is the number of groups (2). Number of groups is D/G.

Disk 1 Disk 2 Disk 3 Disk 4 Parity 1 Disk 5 Disk 6 Parity 2Disk 7 Disk 8 …

Parity Group 1 Parity Group 2

Page 18: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

Any Questions?

Page 19: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

A Few Questions?

Assume one instance of RAID-1 organization. What are the values for: D G C nG

Page 20: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

A Few Questions?

Assume one instance of RAID-1 organization. What are the values for: D=1 G=1 C=1 nG=1

Page 21: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

A Few Questions?

Assume one instance of RAID-1 organization. What are the values for: D=1 G=1 C=1 nG=1

Is the availability characteristics of the following Level 3 RAID better than RAID 1?

Disk 1 Disk 2 Disk 3 Disk 4 Parity 1

Parity Group

Page 22: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID 4

Enhances performance of small reads/writes/read-modify-write. How? Interleave data across disks at the

granularity of a transfer unit. Minimum size is a sector.

Parity block ECC1 is an exclusive or of the bits in blocks a, b, c, and d.

Disk 1 Disk 2 Disk 3 Disk 4 Parity

Block a Block b Block c Block d ECC 1

Page 23: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID 4

Small read retrieves its block from one disk.

Now, 4 requests referencing blocks on different data disks may proceed in parallel.

When compared with 1 disk, throughput of a D disk system is D times higher.

Disk 1 Disk 2 Disk 3 Disk 4 Parity

Block a Block b Block c Block d ECC 1

Page 24: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID 4: Failures (Cont…) If Disk 2 fails, a small read for Block b

retrieves blocks a, c, d, and ECC 1 from disks 1, 3, 4, and Parity disks to compute the missing block. What is throughput relative to one disk now?

Once Disk 2 is replaced with a new one, its content is constructed either eagerly or in a lazy manner. System cannot be too lazy because we want to minimize MTTR.

Disk 1 Disk 2 Disk 3 Disk 4 Parity

Block a Block b Block c Block d ECC 1

Page 25: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID 4: Failures (Cont…)

If the Parity disk fails, read of data blocks may proceed as in normal mode of operation.

Once the Parity disk is replaced, content of new Parity disk is constructed either eagerly or lazily.

Disk 1 Disk 2 Disk 3 Disk 4 Parity

Block a Block b Block c Block d ECC 1

Page 26: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID 4: Small Writes

Performance of small writes is improved.

To write Block b: Read the old Block b and old parity block ECC1, Compute the new parity using the old Block b,

new Block b, and the old parity:New parity = (old block xor new block) xor old parity ECC1

A write requires 4 accesses: 2 reads and 2 writes.

Disk 1 Disk 2 Disk 3 Disk 4 Parity

Block a Block b Block c Block d ECC 1

Page 27: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID 4: Bottlenecks

For writes, parity disk is a bottleneck. Two different writes to Block b and g must read

ECC1 and ECC2 from the Parity disk. A queue will form on the Parity disk.

Performance of small writes is same as RAID 3, D/2G.

Disk 1 Disk 2 Disk 3 Disk 4 Parity

Block a Block b Block c Block d ECC 1

Block e Block f Block g Block h ECC 2

Page 28: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID 4: Summary

Page 29: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID 5: Resolve the Bottleneck

Distribute data and check blocks across all disks.

Disk 1 Disk 2 Disk 3 Disk 4

Block a Block b Block c Block d ECC 1

Block e Block f Block g Block hECC 2

Disk 5

Block i Block j ECC 3 Block k Block l

Block m ECC 4 Block n Block pBlock o

ECC 5 Block q Block r Block tBlock s

Page 30: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID 5: Resolve the Bottleneck

Write of Blocks a and j may proceed in parallel now.

Disk 1 Disk 2 Disk 3 Disk 4

Block a Block b Block c Block d ECC 1

Block e Block f Block g Block hECC 2

Disk 5

Block i Block j ECC 3 Block k Block l

Block m ECC 4 Block n Block pBlock o

ECC 5 Block q Block r Block tBlock s

Page 31: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID 5: Read Performance

Check disks service read requests. With D disks broken into nG groups, number of

parity disks is nG*C. nG = D/G. When compared with one disk, the throughput of a D disk

system is D + CD/G times higher.

Disk 1 Disk 2 Disk 3 Disk 4

Block a Block b Block c Block d ECC 1

Block e Block f Block g Block hECC 2

Disk 5

Block i Block j ECC 3 Block k Block l

Page 32: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID 5: Write Performance For writes, read the referenced block and its parity block. Compute

the new parity block. Write the new data block and its parity block. Continue to use the parity disk.

With D disks broken into nG groups, number of parity disks is nG*C. nG = D/G.

When compared with one disk, the throughput of a D disk system is D/4 + (CD/G)/4 times higher.

Disk 1 Disk 2 Disk 3 Disk 4

Block a Block b Block c Block d ECC 1

Block e Block f Block g Block hECC 2

Disk 5

Block i Block j ECC 3 Block k Block l

Page 33: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID 5: R-M-W Performance For R-M-W, read and write of the data block comes for free. the referenced block is already retrieved. Must perform one extra disk

I/O to read they parity block. Compute the new parity block. Write the new data block and its parity block.

Continue to use the parity disk. With D disks broken into nG groups, number of parity disks is nG*C. nG =

D/G.

When compared with one disk, the throughput of a D disk system is D/2 + (CD/G)/2 times higher.

Disk 1 Disk 2 Disk 3 Disk 4

Block a Block b Block c Block d ECC 1

Block e Block f Block g Block hECC 2

Disk 5

Block i Block j ECC 3 Block k Block l

Page 34: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID 5: Summary

Page 35: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID 5: Summary Significant improvement in the

performance of small writes/R-M-W:

Page 36: Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

RAID Summary

If your workload consists of small R-M-W operations, which RAID would you choose?