Disk Arrays COEN 180. Large Storage Systems Collection of disks to store large amount of data....
-
Upload
mateo-ferrer -
Category
Documents
-
view
212 -
download
0
Transcript of Disk Arrays COEN 180. Large Storage Systems Collection of disks to store large amount of data....
Disk Arrays
COEN 180
Large Storage Systems
Collection of disks to store large amount of data.
Performance advantage:Each drive can satisfy only so many IO per
seconds.Data spread across more drives is more
accessible. JBOD: Just a Bunch Of Disks
Large Storage Systems
Principal difficulty: ReliabilityData needs to be stored redundantly:
Mirroring, Replication Simple Expensive (double, triple, … storage costs) Good performance
Erasure correcting codes Complex Save storage Moderate performance
Large Storage Systems
Mirrored Disks Used by Tandem
1970 – 1997, bought by Compact Nonstop architecture
Used redundancy (CPU, storage) for fail-over capacity
Data is replicated on both drives Performance:
Writes as fast as single disk model Reads: Slightly faster, since we can serve the read from the
drive with best expected service time.
Disk Performance Modeling Basics
Service Time: Time to satisfy a request if system is otherwise idle.
Response Time: Time to satisfy a request at a given system load. Response time = service time + waiting time
Utilization: Time system is busy
Disk Performance Modeling Basics M/M/1 queue single server Assume Poisson arrival,
exponential service time Arrival rate Service time S Utilization U = S (Little’s law) Response time R
Determine R by: R = S + UR
R= S/(1-U) = S/(1- S)
0.2 0.4 0.6 0.8
5
10
15
20
S=1
hence U =
R
Disk Performance Modeling Basics
Need to determine service time of disk request.
Service time = seek time + latency + transfer time
Industrial (but wrong) determination:Seek time = time to travel one third of a disk.Why?
Disk Performance Modeling Basics
Assume that head position is randomly on any track.
Assume that target track is another random track.
Given x [0,1], calculate D(x) = distance of random point in [0,1] from
x.
Disk Performance Modeling Basics
Given x [0,1], calculate D(x) = distance of random point in [0,1] from x.
2
12
)1(
2
)()(
)(
2
22
0
1
1
0
xx
xx
dyxydyyx
dyxyxD
x
x
0.2 0.4 0.6 0.8 1
0.25
0.35
0.4
0.45
0.5
Disk Performance Modeling Basics
Now calculate the average distance from a random point to a random point in [0,1]
31
223
)(
1
0
23
1
0
x
x
xxx
dxxDD
Disk Performance Modeling Basics
Is Average Seek Time = Seek Time for Average Distance?
NO: Seek Time is not linearly dependent on average seek
time. Seek Time consists
acceleration cruising (if seek distance is long braking exact positioning
Disk Performance Modeling Basics
Is Average Seek Time = Seek Time for Average Distance?
Practical measurements suggestsSeek time depends on the seek distance
roughly as a square-root of distance
2 4 6 8 10
1
2
3
4
Disk Performance Modeling Basics
Rules of ThumbKeep utilization of disks between 50% and
80%.
Disk Arrays
Dealing with reliability RAID
Redundant array of inexpensive (independent) disks
RAID Levels RAID Level 0: JBOD (striping) RAID Level 1: Mirroring RAID Level 2:
Encodes symbols (bytes) with a Hamming code. Stores a bit per symbol on different disk. Not used in practice.
Disk Arrays
Dealing with reliabilityRAID Levels
RAID Level 3: Encodes symbols (bytes) with the simple parity code. Breaks a file up into n stripes. Calculates parity stripes. Stores all n + 1 stripes on n + 1 disks.
Disk Arrays
Dealing with ReliabilityRAID Levels
RAID Level 4 Maintains n data drives. Files are stored completely on one drive.
Or perhaps in stripes if files become very large. Additional drive storing the byte-wise parity of the disk
arrays.
ParityData Data Data
Disk Arrays
Level 4 RAIDUneven load of parity drive and data drives
Disk Arrays
Dealing with ReliabilityRAID Level 5
No dedicated parity disk Data in blocks Blocks in parallel positions on disks form reliability stripe. One block in each reliability stripe is the parity of the
others.
No performance bottleneck
Disk Arrays
Dealing with ReliabilityRAID Level 6
Like RAID Level 5, but every stripe has two parity blocks
Lower write performance 2-failure resilience
RAID Level 7 Proprietary name for a RAID Level 3 with lots of
caching. (Marketing bogus)
Disk Arrays
Disk Array Operations Reads:
Directly from data in RAID Level 3-6 Writes:
Large Writes: Writes to all blocks in a single reliability stripe.
Calculate parity from data and write it. Small Writes:
Need to maintain parity. Option 1: Write data, then read all other blocks in the stripe
and recalculate parity. Option 2: Read old data, then overwrite it. Calculate the
difference (XOR) between old and new data. Then read old parity, XOR it with the result of the previous operation and overwrite with it the parity block.
Disk Arrays
Disk Array OperationsReconstruction (RAID Level 4-5):
Systematically:Reconstruct only lost data.Read all surviving blocks in the reliability stripe.Calculate its parity. This is the lost data block.Write data block in place of parity.
Out of order reconstruction for data that is being read.
Disk Arrays Performance Analysis
Assume that read and write service times are the same.
seek latency (transfer)
Write operation involves the read-modify operation. About twice as long as read / write service time seek latency transfer two latencies transfer
Disk Arrays
Performance Analysis Level 4 RAID
Offered read load r Offered write load w n disks
Utilization at data disk: r S /(n – 1) + w 2S/(n – 1)
Utilization at parity disk: w 2S
Equal utilization only if r = 2(n – 2) w
100 200 300 400 500
0.2
0.4
0.6
0.8
1Disk Arrays
Performance Analysis Level 4 RAID
Offered load . Assume only small writes. Assume read /write ratio of
Utilization at data disk S/n
Utilization at write disk (1- )2 S
parity disk
data disk
Utilization
Offered Load (IO/sec)
Parameters:
4+1 layout
70% reads
Service time 10 msec
Disk Arrays
Performance Analysis RAID Level 5 Offered load Read ratio n disks
Read Load S/n
Write Load (1- ) 4S/n Every write leads to two read-modify-write ops.
100 200 300 400 500
0.2
0.4
0.6
0.8
1
100 200 300 400 500
0.2
0.4
0.6
0.8
1
Disk Arrays
Level 4 RAID vs Level 5 RAID
Without parity disk (JBOD)
RAID Level 5
Parameters:4+1 layout70% readsService time 10 msec
parity drive
data drive
Disk Arrays
PerformanceSmall writes are expensive.Parity logging (Daniel Stodolsky, Garth Gibson, Mark Holland)
Write operation: Read old data, Write new data, Send XOR to a parity log file.
Whenever parity log file becomes to big, process it by updating parity information.
Disk Arrays
ReliabilityAccurately given by the probability of failure at
every moment in time.
5 10 15 20 25 30
0.4
0.6
0.8
1
Disk Arrays
ReliabilityOften given by Mean Time To Data LossMTTDLWarning:
MTTDL numbers can be deceiving.
Red line is more reliable during Design Life, but has lower MTTDL
Disk Arrays
Use Markov Model to model system in various states.States describe system.Assumes constant rates of transitions.Transitions correspond to:
component failure component repair
Disk Arrays
One component system
Failure State
(absorbing)
Initial State
MTTDL = MTTF = 1/
Disk Arrays
Two component system without repair
Failure State
(absorbing)
Initial State:
2 components working
22 1
1 component working, one failed
Disk Arrays
Two component system with repair
Failure State
(absorbing)
Initial State:
2 components working
22 1
1 component working, one failed
Disk Arrays
How to calculate MTTF Start with original Markov model. Remove failure state. Replace transition(s) to failure state with failure
transitions to initial state. This models a meta-system where we replace a failed
system immediately with a new one. Now calculate the steady-state solution of the
Markov model. It typicallyhas become ergodic.
Use this to calculate the average rate of a failure transition being taken. This gives the MTTF.
Disk Arrays
One component system
Initial State
System in initial state all the time.
Failure transition taken at rate .
“Loss rate” L = .
MTTDL = 1/L = 1/
Disk Arrays
Two component system without repair
Initial State:
2 components working
22 1
1 component working, one failed
Steady-state solution
Let x be the probability to be in state 2, y the probability to be in state 1.
Then:
Inflow into state 2 = Outflow from state 2:
2x = y
Total sum of probabilities is 1:
x+y = 1.
Disk Arrays
Two component system without repair
Initial State:
2 components working
22 1
1 component working, one failed
Steady-state solution
2x = y
x+y = 1.
Solution is:
x = 1/3, y = 2/3.
Loss rate is L = (2/3).
MTTF = 1/L = 1.5 (1/ ).
(1.5 times better than before).
Disk Arrays Two component system with repair
Initial State:
2 components working
22 1
1 component working, one failed
21
2
2
22
3
2
3
3
2
3
2,
3
1,)(2
MTTF
L
yx
yxyx
Disk Arrays
RAID Level 4/5 Reliability
Failure State
(absorbing)
Initial State:
n disks
nn n-1
(n-1)
n – 1 disks
Disk Arrays
RAID Level 6 Reliability
Initial State:
n disks
nn n-1
(n-2)
n – 1 disks
Failure State
(absorbing)
(n-1)n-2
2
n – 2 disks
Disk Arrays
Sparing Create more resilience by adding a hot spare.Failover to hot spare reconstructs and
replaces contents of the lost disk on spare disk.
Distributed sparing (Menon et al.): Distribute the spare space throughout the disk
array.