Storage NetworksHow to Handle Heterogeneity
Bálint MiklósJanuary 24th, 2005ETH Zürich
External Memory Algorithms and Data Structures
What Storage Networks are?
• Persistent Storage – Hard Disks• Device capacity is doubled every 14-18
months – data grows faster• Use many disks• Need to protect, access, and manage
the ever-growing volume of storage assets
Storage Networks – Motivation
2
Hardware FailuresStorage Networks – Motivation
power supply
6%
FS error6%
disk subsystem
10%
disk error10%
disk failure42%
others26%
Trace collected from the Internet Archive (March 2003)courtesy of David Pease (UCSC) & Kelly Gottlib
3
Heterogen Storage Networks
• Increasing system speed, capacity: add new disks
• New disks usually have different characteristics than the older disks in the system.
• Many modern storage systems are distributed: Ethernet, FibreChannel.
• How to exploit this heterogeneity?
Storage Networks – Motivation
4
Goal
• Storage system requirements: – space and access balance– availability– resource efficiency– access efficiency– heterogeneity– adaptivity– locality
• Very difficult to meet ALL requirements.
Storage Networks – Motivation
5
Outline
• Model
• AdaptRaid• HERA• RIO
• Conclusions
Storage Networks
6
What Model to Use?
• Why not to use the layout of external memory algorithms?– We need solution for all the (sub)problems– One has to bypass operating system:
complex task
• Therefore different abstraction level:– Set of disks characterized by capacity and
bandwidth– Connection network is unrestricted: e.g.
SCSI, P2P
Storage Networks – Model
7
Model assumptions
• Disk access patterns generated by file system (OS)
• Difficult to predict these and can change
• Assume uniform pattern, our goal is to distribute data evenly
Storage Networks – Model
8
Outline
• Model
• AdaptRaid• HERA• RIO
• Conclusions
Storage Networks
9
Heterogeneous Storage Networks
• Straightforward solution:– Clustering disks according their characteristics– We can have many clusters– Easy to extend– New, faster do not improve overall response time
• Randomized batched solution [Sanders]:– Map randomly data to disks– Schedule a batch of accesses by solving a network
flow problem– Unfeasible for large systems: many flow problems to
be solved– Batch like behavior is a disadvantage.
10
Storage Networks – Heterogeneity
RAID
• Redundant Array of Inexpensive Disks• RAID level 0:
– Striping data across a set of disks
• RAID level 5:– Add a redundancy block per
stripe– Distribute redundancy
information evenly on every disk
11
Storage Networks – AdaptRaid
www.raidrecoveryguide.com
AdaptRaid 0
12
Storage Networks – AdaptRaid
• Basic idea:– Load each disk depending on its
characteristics• First solution:
– Use all disks like in RAID0 until smallest disk is full
– Then, discard full disks, and continue the same way
– Distribution continues until all disks are full
• Lower portion of address space has better access times
• Extend RAID layout for heterogeneity [Cortes, Labarta]
AdaptRaid 0 – Reducing Variance
13
Storage Networks – AdaptRaid
• Reduce variance: – Algorithm temporarly assumes that
disks are smaller.– Repeat pattern more times
• Stripes in a Pattern (SIP) defines the size of the pattern and the degree of variance
• Each disk has the same number of blocks like before
AdaptRaid 5
14
Storage Networks – AdaptRaid
• Similar idea, but one block is used for parity information
• Difference: A write implies updating of the parity.
• If not all the blocks in the stripe are written, a write needs additional read:
small-write problem
AdaptRaid 5 – Small-write Solution
15
Storage Networks – AdaptRaid
• Reference stripe: OS assumes to be a full stripe
• Size of every stripe is a divisor of the reference stripe
• Logically three steps:– Decrease strip size– Distribute evenly empty space
on all disks– Apply Tetris like method to fill
empty blocks
AdaptRaid 5 – variance reduction
Storage Networks – AdaptRaid
• We can use similar variance reduction like in AdaptRaid 0:
– Repeat more times a smaller pattern
16
AdaptRaid – generalization
Storage Networks – AdaptRaid
• What if bigger disks are not the faster ones?
• Until now we tried to use all blocks in a disk, now we want to use less blocks on slow disks
• Utilization Factor (UF): – 0..1 value per disk
• UF can be set based: – disk size (until now)– performance
17
AdaptRaid – summary
Storage Networks – AdaptRaid
• Decide UF for every disk:– How much we want to load a disk
• Decide SIP for the system:– How big the pattern is
• Performance:Adaptivity Speedup
AdaptRaid 0: RAID 0 8%-35%AdaptRaid 5: ? < 30%
Performance measured by simulators.
18
Outline
• Model
• AdaptRaid• HERA• RIO
• Conclusions
Storage Networks
19
Heterogeneous Extension of RAID
• Disk merging tehnique• Disks are partitioned into logical disks• Logical disks have the same bandwidth
and capacity
• We group logical disks in G parity groups
• We have G homogeneous systems.
Storage Networks – HERA
20
Heterogeneous Extension of RAID
• Constraint:
• Each logical disk in a parity group should map to different physical disk
Storage Networks – HERA
i
l
p
DG
21
Heterogeneous Extension of RAID
• Read: online load balancing algorihtm directs request for a block to the disk with the least loaded disk.
• Every disk has a queue with all reads and deadlines.
• Deliver requested blocks based on deadline, and location on disk (to minimize seek-time overhead)
Storage Networks – HERA
22
Heterogeneous Extension of RAID
• The availability is almost as good as the homogeneous case (RAID 5).
• But much more flexible than RAID 5.
• Performance relies on logical disk distribution, which is the task of administrator
• The authors recently proposed a configuration planning algorithm which optimizes for bandwidth and storage:[Zimmermann, Ghandeharizadeh: Highly Available and Heterogeneous Continuous Media Storage Systems] December 2004
Storage Networks – HERA
23
Outline
• Model
• AdaptRaid• HERA• RIO
• Conclusions
Storage Networks
24
Random I/O Mediaserver
• Randomized distribution strategy• Concentrates on delivering multimedia objects.
Optimized for real-time reading:– Video on demand– 3D interactive virtual world navigation– Interactive scientific visualization
• Idea: place data unit on a random disk at a random position. This will insure a long term load balance.
Storage Networks – RIO
25
Homogeneous RIO – Data Placement
• A multimedia object is composed of a sequence of constant size data block.
• Data block is placed on random disk on random location -> long term load balancing
• By replicating a fraction of the data blocks, we allow short term balancing
Storage Networks – RIO
26
Homogeneous RIO – Read Scheduler
• All reads have a deadline. Non real-time request have infinite deadline.
• Request for a block is routed to the disk with the least load
• A disk serves more blocks request in a cycle:– A number of blocks are selected from the disk request
queue– The selected requests are reordered according to their
location on disk to minimize the seek-time overhead and serviced.
Storage Networks – RIO
27
Heterogeneous RIO – Data Placement
• Place data to a disk with probability proportional to its size:
• Probability to place data on disk:• Note that:
• Disk capacity increasing faster than disk bandwidth -> faster, bigger disks are going to be bottleneck
Storage Networks – RIO
S
Cd
jj
nj
jd1
1
28
Heterogeneous RIO – BSR
• n disks (Di):– Capacity: Ci– Bandwidth: Bi
• Total capacity:
• Total bandwidth:
• Bandwidth space ratio (BSR):
• BSR is a hint how much load disk can take
Storage Networks – RIO
n
i
iCC1
n
i
iBB1
CCB
Bbs
i
i
i
29
Heterogeneous RIO – Clusters
• Goal: redirect load from small BSR disks to higher BSR disks.
• Group disks in clusters based on their BSRs.
• Low BSP clusters would have high load.• How much replication do we need to sustain
a certain load?
Storage Networks – RIO
30
Heterogeneous RIO – Replication Factor
• We want to sustain a maximum load of
• Data without replicas:
• Maximum load on a cluster is:
• To use all bandwidth we need :
->
Storage Networks – RIO
31
C
CirB
D
Cimaximax )1(
BBmax
r
CD
1
Bmax ii Bmax
ii
i
bs
CCB
Br 1 1)max( ibsr
Heterogeneous RIO – Summary
• Randomized data placement
• Read scheduler to optimized read bandwidth
• Based on disk characteristics we need different replication factor to sustain certain bandwidth
• Authors claim that in a few years 10% to 40% replication is sufficient to allow to use the full aggregate bandwidth of the network
Storage Networks – RIO
32
Outline
• Model
• AdaptRaid• HERA• RIO
• Conclusions
Storage Networks
33
Conclusions
• All three methods concentrate on optimizing bandwidth and space utilization. Adaptivity is hard to achieve
• AdaptRaid and HERA– Deterministic– Extend homogeneous RAID – AdaptRaid 5 wastes space?
• RIO– Randomized– How fast is read scheduler?– The only one where the autors showed a real-life
implementation (Virtual World Data Center)
Storage Networks – Conclusions
34
Storage Networks
Thank You!
Questions?
Bálint Miklós
35
Top Related