Reliability vs Availability

15
Lecture 4 1 Reliability vs Availabili ty Reliability: Is anything broken? Availability: Is the system still available to the user?

description

Reliability vs Availability. Reliability: Is anything broken? Availability: Is the system still available to the user?. RAID. R edundant A rray of I nexpensive D isks Higher throughput Ability to recover from failures Disk Array (Availability reduced to 1/N) Data Integrity - PowerPoint PPT Presentation

Transcript of Reliability vs Availability

Page 1: Reliability vs Availability

Lecture 4 1

Reliability vs Availability

• Reliability: Is anything broken?

• Availability: Is the system still available to the user?

Page 2: Reliability vs Availability

Lecture 4 2

RAID

• Redundant Array of Inexpensive Disks– Higher throughput– Ability to recover from failures

• Disk Array (Availability reduced to 1/N)• Data Integrity• Striping• MTTR• MTTF

Page 3: Reliability vs Availability

Lecture 4 3

Manufacturing Advantages of Disk Arrays

14”10”5.25”3.5”

3.5”

Disk Array: 1 disk design

Conventional: 4 disk designs

Low End High End

Disk Product Families

Page 4: Reliability vs Availability

Lecture 4 4

Replace Small # of Large Disks with Large # of Small Disks! (1988 Disks)

Data Capacity

Volume

Power

Data Rate

I/O Rate

MTTF

Cost

IBM 3390 (K)

20 GBytes

97 cu. ft.

3 KW

15 MB/s

600 I/Os/s

250 KHrs

$250K

IBM 3.5" 0061

320 MBytes

0.1 cu. ft.

11 W

1.5 MB/s

55 I/Os/s

50 KHrs

$2K

x70

23 GBytes

11 cu. ft.

1 KW

120 MB/s

3900 IOs/s

??? Hrs

$150K

Disk Arrays have potential for

large data and I/O rates

high MB per cu. ft., high MB per KW

reliability?

Page 5: Reliability vs Availability

Lecture 4 5

Array Reliability

• Reliability of N disks = Reliability of 1 Disk / N

50,000 Hours / 70 disks = 700 hours

Disk system MTTF: Drops from 6 years to 1 month!

• Arrays (without redundancy) too unreliable to be useful!

Hot spares support reconstruction in parallel with access: very high media availability can be achieved

Page 6: Reliability vs Availability

Lecture 4 6

Redundant Arrays of Disks• Files are "striped" across multiple spindles• Redundancy yields high data availability

Disks will fail

Contents reconstructed from data redundantly stored in the arrayCapacity penalty to store it

Bandwidth penalty to update

Mirroring/Shadowing (high capacity cost)

Horizontal Hamming Codes (overkill)

Parity & Reed-Solomon Codes

Failure Prediction (no capacity overhead!)VaxSimPlus — Technique is controversial

Techniques:

Page 7: Reliability vs Availability

Lecture 4 7

Redundant Arrays of DisksRAID 1: Disk Mirroring/Shadowing

• Each disk is fully duplicated onto its "shadow" Very high availability can be achieved

• Bandwidth sacrifice on write: Logical write = two physical writes

• Reads may be optimized

• Most expensive solution: 100% capacity overheadTargeted for high I/O rate , high availability environments

recoverygroup

Page 8: Reliability vs Availability

Lecture 4 8

Redundant Arrays of Disks RAID 3: Parity Disk

P100100111100110110010011

. . .logical record 1

0010011

11001101

10010011

00110000

Striped physicalrecords

• Parity computed across recovery group to protect against hard disk failures 33% capacity cost for parity in this configuration wider arrays reduce capacity costs, decrease expected availability, increase reconstruction time• Arms logically synchronized, spindles rotationally synchronized logically a single high capacity, high transfer rate diskTargeted for high bandwidth applications: Scientific, Image Processing

Page 9: Reliability vs Availability

Lecture 4 9

Redundant Arrays of Disks RAID 5+: High I/O Rate Parity

A logical writebecomes fourphysical I/Os

Independent writespossible because ofinterleaved parity

Reed-SolomonCodes ("Q") forprotection duringreconstruction

D0 D1 D2 D3 P

D4 D5 D6 P D7

D8 D9 P D10 D11

D12 P D13 D14 D15

P D16 D17 D18 D19

D20 D21 D22 D23 P...

.

.

.

.

.

.

.

.

.

.

.

.Disk Columns

IncreasingLogical

Disk Addresses

Stripe

StripeUnit

Targeted for mixedapplications

Page 10: Reliability vs Availability

Lecture 4 10

Problems of Disk Arrays: Small Writes

D0 D1 D2 D3 PD0'

+

+

D0' D1 D2 D3 P'

newdata

olddata

old parity

XOR

XOR

(1. Read) (2. Read)

(3. Write) (4. Write)

RAID-5: Small Write Algorithm1 Logical Write = 2 Physical Reads + 2 Physical Writes

Page 11: Reliability vs Availability

Lecture 4 11

Subsystem Organization

host arraycontroller

single boarddisk

controller

single boarddisk

controller

single boarddisk

controller

single boarddisk

controller

hostadapter

manages interfaceto host, DMA

control, buffering,parity logic

physical devicecontrol

often piggy-backedin small format devices

striping software off-loaded from host to array controller

no applications modifications

no reduction of host performance

Page 12: Reliability vs Availability

Lecture 4 12

Summary: A Little Queuing Theory

• Queuing models assume state of equilibrium: input rate = output rate

• Notation: r average number of arriving customers/second

Tser average time to service a customer (tradtionally ต = 1/ Tser )u server utilization (0..1): u = r x Tser Tq average time/customer in queue Tsys average time/customer in system: Tsys = Tq + TserLq average length of queue: Lq = r x Tq Lsys average length of system : Lsys = r x Tsys

• Little’s Law: Lengthsystem = rate x Timesystem (Mean number customers = arrival rate x mean service time)

Proc IOC Device

Queue serverSystem

Page 13: Reliability vs Availability

Lecture 4 13

Summary: Redundant Arrays of Disks (RAID) Techniques

• Disk Mirroring, Shadowing (RAID 1)

Each disk is fully duplicated onto its "shadow" Logical write = two physical writes

100% capacity overhead

• Parity Data Bandwidth Array (RAID 3)

Parity computed horizontally

Logically a single high data bw disk

• High I/O Rate Parity Array (RAID 5)Interleaved parity blocks

Independent reads and writes

Logical write = 2 reads + 2 writes

Parity + Reed-Solomon codes

10010011

11001101

10010011

00110010

10010011

10010011

Page 14: Reliability vs Availability

Lecture 4 14

Homework #3

• Problem 6.5 and 6.6

• Due date: June 26, 2001

Page 15: Reliability vs Availability

Lecture 4 15

Assignment #1

• Explain RAID 0-6 in greater detail.

• Due date: July 3, 2001