Choosing the Right Disk Technology in a High Availability ... · Like price/performance,...

55
DRAFT -- Revision 2.0 August 22, 1996 Page 1 Choosing the Right Disk Technology in a High Availability Environment A Technical Whitepaper Bob Sauers [email protected] General Systems Division General Systems Solution Lab Advanced Technology Center URL http://gsslweb.cup.hp.com/ATC May 1996 © Copyright 1996, Hewlett-Packard Company

Transcript of Choosing the Right Disk Technology in a High Availability ... · Like price/performance,...

Page 1: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 1

Choosing the Right Disk Technologyin a High Availability Environment

A Technical Whitepaper

Bob [email protected]

General Systems DivisionGeneral Systems Solution LabAdvanced Technology Center

URL http://gsslweb.cup.hp.com/ATC

May 1996

© Copyright 1996, Hewlett-Packard Company

Page 2: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 2

Table of Contents

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Disk Link Technology Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5HP-FL disk links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5SCSI disk links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Fibre Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Disk Link Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Disk Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Standalone disk drives with LVM mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

High Availability Storage Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Mirrored JBODs and SCSI targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Mirrored JBODS and on-line replacement . . . . . . . . . . . . . . . . . . . . . . 13Advantages and disadvantages of standalone disks with LVM mirroring

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Disk Arrays using RAID technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

HP-FL Disk Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16F/W SCSI Disk Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18High Availability Disk Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Advantages and Disadvantages of High Availability Disk Arrays (HADAs)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Disk Arrays with AutoRAID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Advantages and Disadvantages of AutoRAID Disk Arrays . . . . . . . . . . 26Symmetrix 3000 Series Integrated Cache Disk Arrays (ICDA) . . . . . . . 27Advantages and Disadvantages of Symmetrix ICDAs . . . . . . . . . . . . . 29

Solid State Disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Advantages and disadvantages of solid state disks . . . . . . . . . . . . . . . 30

CAPACITY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

PERFORMANCE COMPARISONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34LVM versus non-LVM managed disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Striping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Data Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

I/O Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36Ratio of reads to writes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Sequential versus random access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Failed mechanisms in RAID 3 or 5 configurations . . . . . . . . . . . . . . . . . . . . . . 37

Page 3: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 3

Disk Link Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Number of Targets on a Disk Link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Performance benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Summary of performance of various disk technologies . . . . . . . . . . . . . . . . . . 40

HIGH AVAILABILITY (HA) DISK CONSIDERATIONS . . . . . . . . . . . . . . . . . . . . . . . . 41

DECISION CRITERIA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42The need for on-line failed disk replacement versus scheduling downtime

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43The need for data redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43The need for double data redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44The need for continuous data redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Purchase cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Footprint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48Backup strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48Total capacity requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Power source redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50Total distance among computer systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

DISK SELECTION MATRIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Page 4: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

RAID was described in the paper "A Case for Redundant Arrays of Inexpensive1

Disks (RAID)", Patterson, Gibson and Katz, Department of Electrical Engineering andComputer Sciences, University of California at Berkeley, 1988

DRAFT -- Revision 2.0August 22, 1996Page 4

INTRODUCTION

Disk technologies have been rapidly changing with new generation hardware availableevery nine to 12 months. Computer buyers have been having trouble keeping up withthe changes let alone understanding them. RAID technology in particular has become1

popular in the marketplace. RAID is an acronym for Redundant Arrays of InexpensiveDisks and was designed as an alternative to Single Large Expensive Disks (SLED)used on supercomputers and mainframes. Often, buyers choose RAID technology foremotional reasons rather than logical reasons or just because it's the latest hottechnology without understanding the tradeoffs to make an informed decision. The High Availability (HA) environment seems to be where much of the controversyexists. Computer buyers want a disk technology that provides the best availability forthe least cost, in another words price/availability. Like price/performance,price/availability has many tradeoffs that need to be evaluated. The highest availabilitywill not cost the least amount of money!

If asked, a computer buyer usually says that application performance is as important asavailability. In another words, they are not willing to compromise performance foravailability. However, performance tradeoffs among different disk technologies are notalways well understood.

Version 2.0 of this whitepaper was written to incorporate four new disk technologies:

! High Availability Storage Systems! HP Disk Array with AutoRAID! EMC Symmetrix Integrated Cache Disk Array (ICDA)! HP Fibre Channel-SCSI Disk Multiplexer

Data redundancy is necessary to prevent a single disk failure from causing an outage tothe users. There are two methods available for providing data redundancy: mirroredstandalone disks and Disk Arrays with RAID protection in hardware. Selecting betweenthese two choices is often more emotional than logical. Each has its place and its prosand cons. Mirrored standalone disks require the optional product called MirrorDisk/UX. MirrorDisk/UX can also be used with Disk Arrays to provide double data redundancy. For example, this might be done to continue data redundancy while the mirrors are splitfor an offline backup.

Page 5: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 5

This Whitepaper will attempt to remove some of the mystery associated with variousdisk technologies and will discuss the advantages and disadvantages of each, with thegoal being an easy-to-use roadmap for choosing the right disk technology.

This Whitepaper is divided into six major sections:

! Disk Link Technology Comparisons! Types of Disks! Capacities! Performance Comparisons! High Availability Considerations! Decision Criteria

After first discussing the different links used with disk drives, a comparison betweenstandalone disks and disk arrays will be made. Then, there will be a discussion ofcapacity limitations and performance issues. These five sections provide backgroundinformation needed to use the decision criteria for selecting the disk technology mostappropriate to a given situation.

Disk Link Technology Comparisons

HP has three different disk link technologies currently available:

! HP-FL (Fiber Link)! SCSI (Small Computer System Interface)! Fibre Channel

HP-FL disk links

HP-FL is an HP proprietary link developed as a higher-speed alternative to HP-IB, anolder disk link technology. Disk drives are the only devices available that use HP-FL. HP-FL uses a dual fiber optic link between the computer interface and the first diskdrive in the chain. This fiber can be as long as 500 meters, providing flexibility in theplacement of disk drives for environmental, security and high availability reasons.

SCSI disk links

SCSI is a "standard" link that was initially developed for Personal Computers. It hasbeen adopted by Unix workstation and server vendors to connect disk drives and manyother devices such as 4 and 8 mm tapes, scanners and CD-ROM drives.

Page 6: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 6

There are many terms associated with SCSI. Several versions of the protocol andseveral versions of the bus exist. SCSI-1, SCSI-2 and SCSI-3 refer to the protocol towhich a particular device conforms. On the HP 9000 series 800, HP supports devicesthat conform to the SCSI-2 protocol only. SCSI-3 specifies an enhanced protocol thatincludes, for example, support for 32 LUNs (SCSI Logical Units) rather than the 8supported in SCSI-2.

SCSI has defined two speeds: Standard SCSI and Fast SCSI. There are two busdefinitions: Narrow and Wide. Both Standard and Fast SCSI have 8-bit data bussesand have a maximum bandwidth of 5- and 10-MB / second, respectively. Fast/WideSCSI uses a 16-bit data bus and specifies a maximum bandwidth of 20 MB / second.

Vendors were given the choice of using a single-ended or differential bus whenimplementing Standard and Fast SCSI. HP implemented Standard SCSI with a single-ended bus, which means that there is one wire per signal line referenced to signalground. HP implemented Fast SCSI as a differential bus with positive and negativesignal wires referenced together which typically allows a longer cable without severesignal degradation. Fast SCSI is available on certain HP 9000 700-series workstationsonly, using an EISA I/O adapter. Fast/Wide SCSI has always been implemented by HPwith a differential bus, also.

There are two styles of connectors that can be used with SCSI: (1) low-density whichuses an Amphenol-type connector and (2) high-density which is smaller and uses pins. The high-density connector has become popular since it requires less room on the backof the computer I/O adapter and the device.

Fibre Channel

Fibre Channel is an emerging networking technology that provides a maximumbandwidth of 1 Gbit/sec. HP and IBM agreed several years ago to jointly develop astandard for this technology. The link uses a fiber optic cable to connect the nodes inthe network. It is expected that the maximum distance between nodes in the networkwill be 2 kilometers initially, and expanded to 10 kilometers in the future.

A protocol has been defined for Fibre Channel, called the Fibre Channel Protocol forSCSI for Class 3 services that would allow disk connectivity. Various Fibre Channelsolutions are being developed that will allow disks to be located at distances furtherthan F/W SCSI. This technology will greatly increase the flexibility in the location of thedisk drives for High Availability and data replication purposes. Some disk vendors areimplementing a native Fibre Channel (FC) interface directly in the disk drive. Thisnative FC would not make sense for individual disk devices because of the need forpoint-to-point connections, but would be useful for large Disk Arrays. HP offers a Fibre

Page 7: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 7

Channel-SCSI Multiplexer, enabling existing F/W SCSI disks to be located up to 2kilometers from the host computers.

The Fibre Channel-SCSI Multiplexer is henceforth called the FC/SCSI Mux for brevity. The FC/SCSI Mux can have two Fibre Channel (FC) ports and that capability is veryimportant for High Availability, since it provides a redundant link to the disks if theprimary link goes down. Alternatively, the second FC port can be used to connect to adifferent host in a High Availability Cluster. Because of the increased distance, theFC/SCSI Mux can be used to implement campus-based Disaster Recovery (DR)solutions that would simplify the recovery from failures affecting an entire data center,for example.

The FC/SCSI Mux has four slots, each of which can contain a card that connects to aseparate F/W SCSI bus with up to 15 disk devices on it. So, a single Fibre Channel linkon a host can provide connectivity to up to 60 disk devices using a point-to-pointtopology. Throughput on the FC/SCSI Mux peaks at 60 MB/sec (or 5000 I/Os persecond), providing the throughput of three host-based F/W SCSI cards and theconnectivity of four host-based F/W SCSI cards but using only a single slot in the host. Thus, hosts with fewer slots can now connect to greater capacities of disk storage. However, host I/O bus bandwidth must be considered when determining the practicalamount of disk storage. This issue is discussed further in the section on PerformanceComparisons.

This new technology provides vastly increased flexibility in location of disk devices,connectivity to multiple hosts, and high bandwidth using a single host adapter slot.

Disk Link Comparisons

The choice of I/O link depends on the particular disk and requirements of distance,performance and capacity. These factors are summarized in the Table A.

Table A: Disk Link Comparisons

Link Link Bandwidth Maximum Link Maximum(MB/sec) Length Devices per

LinkPeak Sustained

HP-FL 5 4.7 500 meters 8 *

Standard SCSI 5 2.5 6 meters *** 8**

F/W SCSI (HP-PB) 20 7 - 10 25 meters *** 16**

F/W SCSI (HSC) 20 12 - 15 25 meters *** 16**

Page 8: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

Table A: Disk Link Comparisons

Link Link Bandwidth Maximum Link Maximum(MB/sec) Length Devices per

LinkPeak Sustained

DRAFT -- Revision 2.0August 22, 1996Page 8

Fibre Channel 1 Gbit / sec 2+ kilometers **** N/A

Fibre Channel- 60 30-40 2 kilometers per 60SCSI Multiplexer (5000 (2-3 K I/Os Fibre Channel port (15 per F/W

I/Os per per (max 2 ports) SCSI link)second) second)

* Computer systems do NOT count as devices** Computer systems DO count as devices*** Includes cabling internal to the disk drive**** Distance between nodes in the network or a node and disk, etc.

Although the peak transfer rate for an individual disk mechanism might be higher (e.g.,10 MB / sec with synchronous transfers on F/W SCSI), most disk mechanisms havesustained transfer rates in the range from 1.5 to 5 MB / second. To maximizeperformance potential, this sustained transfer rate should be used to calculate themaximum number of active disk drives per link.

The maximum link length will determine the maximum distance between computers inan HA cluster cabled to the same disks. The F/W SCSI standard limits the maximumcable length of the bus to 25 meters. This length includes any internal SCSI cabling. The cable extending from the computer to the first device is usually 2.5 meters. Therefore, in a two-node cluster, only 20 meters remains for cabling between andinside the disk trays or towers. There can be as much as 1.75 meters of SCSI cablinginside disk trays or towers. The system configuration guide shows exact internal cablelengths.

The maximum number of devices per link is a factor in determining the maximumamount of disk space that is possible on a given system. This issue will be discussed inthe next section. In the future, Fibre Channel should provide improvements inbandwidth, distance, and number of devices.

Disk Technologies

Three major disk technologies will be discussed:

! Standalone disk drives with LVM mirroring

Page 9: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 9

! Disk arrays using RAID technology! Solid state disks

Standalone disk drives with LVM mirroring

Standalone disk drives are simple, non-array type disks; i.e., they do not use implementany RAID level in hardware. A new acronym has emerged to describe these simpledisks: JBODs, which stands for "Just a Bunch Of Disks". JBODs can be single diskspindles with a controller and power supply or can be combined into towers or rackswith a single power supply. Even in combination, each disk drive retains its owncontroller and is addressed individually. However, there is only one physical path fromthe computer system to the entire tower or rack; i.e., each disk does not have its ownSCSI connector.

JBODs are available in many capacities. The most popular disk mechanisms havecapacities of 2 GB, and 4 GB and are typically based on the newest, fastest disk technology. One GB mechanisms, although still available, are declining in popularity.

Due to their simplicity, standalone disk drives are the least expensive solution. In aHigh Availability environment however, it makes no sense to discuss standalone diskdrives without data protection. JBODs can be combined with software solutions suchas MirrorDisk/UX to provide higher availability than they would alone provide, althoughthis significantly increases hardware costs. JBODs also provide the greatest flexibilityin configuration such as data placement, power source, racking, etc.

Figure 1 shows a typical way of configuring standalone mirrored disks in an HAenvironment. It is important to notice that each copy of the data is on a different disklink. This configuration is necessary to minimize Single Points of Failure (SPOFs).

Each group of disks is shown physically connected to multiple computer systems. Logical Volume Manager (LVM) is necessary to control shared or exclusive access tothe data. However, this does not automatically imply that concurrent access to thesame data is possible or desired. Data corruption can occur unless network-basedsoftware controls the access to the multiply-connected disks.

The High Availability software MC/LockManager in conjunction with Oracle ParallelServer are required to arbitrate concurrent write access to the same data. Theapplication runs on both systems and access to the data continues even if one of theSPUs fails.

This configuration can also be used with the HA software MC/ServiceGuard to provideexclusive access to the data. MC/ServiceGuard ensures that an application that isaccessing a particular group of disks runs on only one system at a time, thus preventing

Page 10: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

Mirror Copy A

Mirror Copy A'

DRAFT -- Revision 2.0August 22, 1996Page 10

Figure 1: Typical High Availability Configuration with Standalone Mirrored Disks

data corruption. The application can run on either of the SPUs, but only one at a time. When the primary system fails, access to the data shifts to the surviving SPU.

Also with this configuration, the loss of one of the disk drives, the cable or even one ofthe host adapters in the SPU will not prevent access to the data since another copyresides on a separate disk link. MirrorDisk/UX automatically handles the access tomultiple copies of the data in both normal and failure scenarios.

High Availability Storage Systems

A new disk storage solution, called the HP High Availability Storage System (HASS), isavailable that improves the availability of JBODs. The HASS (product number A3311Ais for a deskside unit and product number A3312A[Z] is for a rack-mount unit) has thefollowing HA features:

! optional dual, hot-pluggable power supplies! dual power cords (with dual power supplies)! dual, hot-pluggable cooling fans! up to 8 hot-pluggable 2 GB storage modules or up to 4 hot-pluggable 4 GBstorage modules! dual, internal SCSI busses (optional single internal SCSI bus)

The above high availability features make the HASS a superior choice over previousJBOD solutions. As with any JBOD configuration, the main limitation is the maximumnumber of targets possible on a given system. From a practical perspective, the HASSis a good solution for capacities up to 400 GB mirrored (200 GB of usable storage)when performance is the primary decision criteria. This capacity is feasible for smallersystems only with the FC/SCSI Mux.

Page 11: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 11

Since there are two internal SCSI busses each with its own connectors, power cords,power supplies, and fans, LVM mirroring can be accomplished using a single HASSwithout having any SPOFs. Each SCSI bus must be connected to a different F/W SCSIhost adapter when mirroring in a single HASS to maintain HA and remove the hostadapter as an SPOF.

As with older JBODs, each storage module in a HASS consumes one SCSI targetaddress. This limits the total storage capacity possible for a given system according tothe number of system slots and targets per SCSI bus.

The HASS can be ordered with one internal SCSI bus connecting all of the up to 8modules. This configuration is not supported with LVM mirroring when the mirrors arein the same HASS since there would be several SPOFs.

The dual SCSI busses can also be used to improve performance when connected toseparate host adapters. Data protection must still be dealt with by mirroring acrossHASS chassis.

Although the HASS supports secondary storage modules such as DDS-2 drives andCD-ROMs and the SCSI busses can be used for either standard SCSI or F/W SCSI,the secondary storage modules may not be used in a multi-initiator configuration.

When ordering HASS units that will be mounted in cabinets, it is important that thecabinets be ordered with a second power distribution unit (PDU). This second PDUprovides a second power strip inside the cabinet and a second power cord that can beplugged into a separate power source. This configuration removes power as an SPOF.

The disk storage modules are hot-pluggable which means that the bus and connectorsare made so that the disk module can be inserted or removed without un-terminatingthe SCSI bus. All modules are easily removed from the front of the chassis. The HASSdoes not have the problems of previous JBOD configurations that required extra longF/W SCSI cables, the removal of the chassis from the cabinet, and the removal of thecover before individual disk mechanisms could be replaced.

However, it is important to note that OS cooperation is still required when removing adisk module from the HASS since the HASS does NOT provide any data protection orregeneration of data on a newly replaced disk module. LVM mirroring must regeneratethe data on the storage module after the LVM configuration has been restore with thevgcfgrestore command.

Page 12: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

HASS Chassis # 1 HASS Chassis # 2Primary PrimaryMirror Mirror

DRAFT -- Revision 2.0August 22, 1996Page 12

Figure 2: Typical High Availability Configuration with the High Availability StorageSystem (HASS)

LVM Mirror Consistency

LVM has two mechanisms for maintaining consistency between the mirrors. Thedefault mechanism is to use the Mirror Write Cache (MWC). The MWC is kept inmemory and is periodically written to the disk as a Mirror Consistency Record (MCR) incase of OS failure. Upon reboot, LVM knows which mirror copy is current and whichone is stale and will quickly resynchronize the mirror copies. Unfortunately, posting theMCR to disk might degrade the performance of write-intensive applications. The exactamount of degradation is extremely application dependent.

Disabling the MWC enables the alternate method of keeping the mirror copiessynchronized and is called NOMWC. Although enabling NOMWC might improve on-line performance slightly, it greatly increases recovery time since the entire LogicalVolume must be copied from one side of the mirror to the other. This method mightalso result in lost transactions since without the MWC, LVM has no reliable way ofchoosing which side of the mirror is more current than the other side(s).

Mirrored JBODs and SCSI targets

A major disadvantage of JBODs is that each disk consumes a separate SCSI targetaddress. There is a hardware limit to the number of targets on a SCSI bus as shown inthe section on Disk Link Technologies. The F/W SCSI link provides the greatestconnectivity today, with a maximum of 15 targets. High availability clusters andperformance considerations reduce this to a maximum of 10 disk targets. Using 4 GB

Page 13: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

Technical HPPA Newsletter # 218, June 17, 1995, "Update: Hot Swap2

Configuration Now Possible". E-mail subscriptions to PA NEWS are available bycontacting [email protected] or pa-news(pa-news)/hp4700/um. Access to PANEWS is available through Mosaic or NetScape via http://hpfcme.atl.hp.com/pa-news.html.

DRAFT -- Revision 2.0August 22, 1996Page 13

capacity disk drives, this results in a limit of 40 GB per F/W SCSI card. The number ofF/W SCSI cards in a system is constrained first by slots in the SPU and further byperformance considerations. Since this paper is concentrating on High Availabilitysolutions, the standalone disks must then be mirrored, cutting the maximum usablecapacity by 50%. This issue will be discussed further in the section on Capacity.

Mirrored JBODS and on-line replacement

The HASS has been designed to facilitate the on-line replacement of failed diskmechanisms from a hardware perspective. Since the HASS does not have any built-indata redundancy or protection, LVM mirroring must be used to provide that dataprotection.

Older JBODs are not constructed for on-line replacement of failed disk mechanisms. However, there is a procedure that allows the field service technician to approximate anon-line disk replacement. Older disk trays have to be slid out the from rack withoutdisconnecting the SCSI bus and then the cover must be removed (which is necessarybefore replacing a failed mechanism). The cover also has to be removed from externaldisk towers to replace a failed disk.

Two supported procedures have been developed for the HASS and older disk trays andtowers that will allow either on-line or quiescent but powered-up replacement of faileddisk mechanisms. In another words, the computer system does NOT have to beshutdown before replacing these mechanisms, thus saving time and interruption to theusers. The first procedure involves fewer steps, but the application must be quiescedbefore the replacement. The second procedure allows the application to continuerunning, but is more prone to user error due to the multiple complicated steps. Theseprocedures are documented in a Technical HPPA Newsletter article. PA NEWS is2

currently the only place that these procedures are documented.

Since JBODs are attractive to computer buyers for performance reasons, this newprocedure allows computer buyers to use standalone mirrored disks with almost thesame availability as with RAID disk arrays that support on-line mechanism replacement. It is important to note, however, that there will be a scheduled interruption to theapplication during the quiescent replacement of the failed disk mechanism with a JBODunless the more complex, but error-prone procedure is followed. Most organizations

Page 14: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 14

can live with the scheduled aspect of the replacement, especially since the replacementwill occur in much less time than if a T500 had to be shut down and rebooted.

Global hot spare functionality can be simulated with JBODs if a spare spindle isavailable on each side of the mirror. With modification of one step, the procedure thatallows quiescent or on-line replacement of a JBOD will instead allow a hot spare to bereassigned to take over for a failed mechanism. However, this reassignment is notautomatic, and involves the procedure described in Technical HPPA Newsletter # 218.

Advantages and disadvantages of standalone disks with LVM mirroring are:

+ 2- or 3-way mirroring is possible+ offline backup can be done by splitting off one side of the mirror and activatingit read-only on another system+ potential for highest read performance due to LVM queuing to the least busyside of the mirror+ I/O can be spread across many SCSI interfaces to increase the bandwidthand do more concurrent I/Os+ multiple disk controllers reduce chance of controller bottleneck+ total control of data placement for application performance tuning+ fewer Single Points of Failure (SPOFs)

# each disk has its own controller# there is no master controller that might be a bottleneck or an SPOF# each tray or tower has its own power supply and power cord and cantherefore be powered from different power sources# mirrored trays or towers can be placed on separate SCSI interfaces.

- each individual disk requires a SCSI target address which, in turn, limitsthe total disk capacity on the system- 1 - 3 % additional software overhead due to LVM code paths (only aconsideration if the system is 100 % CPU bound)- additional cost due to 100% duplication of disk space- additional rack and floorspace requirements- use of a hot standby is difficult and not directly supported by LVM- failed disk replacement might require quiescence of the application- failed disk replacement might require special operational procedures

Disk Arrays using RAID technology

Disk Arrays are collections of disk drives usually in a common enclosure, each with theirown controller. Typically, one of two methods is used to implement RAID levels withdisk arrays.

Page 15: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 15

(1) One controller may be designated as a master controller with the othercontrollers acting as slaves. This method was implemented in the HP-FL arraymodels C2258HA and C2259HA and the Fast/Wide SCSI array modelsC2439JA/JZ and C2440JA/JZ.

(2) The second method incorporates a higher level storage processor thatcontrols the complete functions of the RAID array. The High Availability DiskArray models A3231A (Model 10) and A3232A (Model 20) have either one ortwo storage processors besides the individual disk controllers. The Disk Arraywith AutoRAID models A3515A and A3516A have two controllers. Thesestorage processors or controllers control caching, RAID level, sparing and datarecovery.

Each disk array appears to the operating system as one large disk or several largedisks using SCSI LUNs (Logical Unit Numbers). This LUN has nothing to do with theHP-UX 9.X LU (Logical Unit) that was an index into an OS table that was used to keeptrack of devices connected to the system. In HP-UX 10.0, card instance numbers areused in place of device LU numbers.

Each SCSI LUN addresses one or more physical disk drive(s) inside the array. It maybe fixed by the array firmware or be configurable by the system administrator. LUNswill be discussed further in the sections on disk arrays.

There were five RAID levels defined by the Patterson, et al. paper. They were:

RAID level 1 mirrored disks

RAID level 2 multiple check disks using Hamming Code

RAID level 3 single check disk using Parity, byte interleaved (also calledbyte striped)

RAID level 4 single check disk using Parity, sector interleaved

RAID level 5 no single check disk, data and parity spread across all disks,sector interleaved (also called block striped)

Since the paper was written, several new RAID levels have been defined:

RAID level 0 no check disk, no data protection, sector interleaved (alsocalled block striped) across a group of disks

Page 16: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 16

RAID level 0/1 sector interleaved groups of mirrored disks; this issometimes called RAID level 1/0 or RAID level 10

Since there is no data protection with RAID level 0, the only benefit is the potential forincreased performance due to the data being spread across multiple disks. HP hasimplemented a special case of RAID level 0 called Independent Mode , in which theinterleaving group size is one, effectively resulting in each disk being treated as if it were a non-RAID disk. Independent mode can sometimes increase performance overRAID level 0 in cases of small random I/Os.

Since the sector size on most disks is 512 bytes, the minimum I/O size is 1 Kbyte for 2disks in a group or 2 Kbyte for 4 disks in a group for RAID levels 0 and 3, not countingthe parity disk with RAID level 3. RAID 0 is usually implemented with groups of twodisks. RAID level 3 has been implemented by HP with groups of 3 or 5 disks, one ofwhich is the check disk.

HP-UX imposes a 1 Kbyte minimum I/O size requirement for disk drives. So, even withRAID level 5, the minimum I/O size is 1 KB. However, depending on the particularimplementation, an I/O from 1 KB to 128 KB can involve a single disk spindle.

Modern RAID implementations include levels 0, 1, 0/1, 3 and 5 only. Disk arrayssupport one or more RAID levels and may allow several RAID levels to operateconcurrently, such as with the High Availability Disk Arrays. RAID disk arrays are builtso that the disk mechanism is encased, and can be removed and inserted easily andcorrectly without disconnecting the SCSI bus or affecting I/Os in progress.

HP-FL Disk Arrays

HP-FL disk arrays, henceforth called FLDA for brevity, are available from HP as fourdifferent products.

! C2258HA (three 2 GB disk mechanisms with data protection)! C2258B (two 2 GB disk mechanisms without data protection)! C2259HA (five 2 GB disk mechanisms with data protection)! C2259B (four 2 GB disk mechanisms without data protection)

Only two of these products, with the suffix "HA", provide data protection. The othersmust be used with MirrorDisk/UX for data protection.

Internally, the FLDAs use 5-1/4 ", 5400 RPM SCSI disks and SCSI controllers in 1 and2 GB versions. However, the external connection is HP-FL whose advantage isdistance from the computer system up to 500 meters. When two FLDAs are daisy-

Page 17: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 17

chained together, each supports the connection to a computer system, thus supportinga total distance of 1 km end-to-end. The physical link peaks at 5 MB / sec.

On-line replacement of failed disk mechanisms is available in RAID 3 mode only. SinceRAID level 3 provides relatively poor performance on HP-UX systems, these arrays aremost often used in mirrored configurations using LVM mirroring, with the array setup inIndependent Mode (a special case of RAID 0). Unfortunately, on-line replacement offailed mechanisms is not supported in this configuration. So, although on-linereplacement of failed disks with RAID level 3 improves HA, performance is sonegatively affected, that HP recommends the use of these arrays in Independent Mode(RAID 0) together with LVM mirroring. This configuration would increase total capacityover JBODs since the array consumes only one target address but provides 10 GB ofdisk space.

It is suggested that the only reason for choosing FLDAs is compatibility with or reuse ofexisting disk storage despite the lower performance of the HP-FL link and the lowerRPM disk drives. Due to the long cable length of HP-FL, It is possible to connect thesedisks to systems located in different buildings, even up to 500 meters away. However,Fibre Channel disk storage solutions provide an even greater distance.

Figure 3 shows how HP-FL can be configured for a High Availability environment. HP-FL allows up to 8 FLDAs on the link. Performance considerations may limit the numberof arrays to less than 8. This configuration can be used with HA software as discussedin the section on JBODs.

Page 18: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

HP-FL P-BUS HP-FL P-BUS

HP-FL P-BUS HP-FL P-BUS

DRAFT -- Revision 2.0August 22, 1996Page 18

Figure 3: Typical High Availability Configuration with HP-FL Disk Arrays (FLDAs)

F/W SCSI Disk Arrays

A disk array similar to the FLDA is available from HP with a F/W SCSI interface and hastwo configurations. These disk arrays will be henceforth called SCSI DA for brevity.

! C2440JA (five 2 GB disk mechanisms with data protection)! C2439JA (three 2 GB disk mechanisms with data protection)

Besides RAID level 3 and Independent Mode, the F/W SCSI DAs also support RAIDlevel 5. On-line disk mechanism replacement is supported in RAID levels 3 and 5 only. The F/W SCSI bus supports a peak transfer rate of 20 MB / sec. This provides a 4-xincrease in bandwidth over the HP-FL link. However, the typical low performance of aRAID 3 configuration in an HP-UX environment implies that RAID 5 would be the usualchoice for this array.

The F/W SCSI DAs include up to five 2 GB, 5-1/4", 6400 RPM disk mechanisms,making the physical size and rack space requirements for the SCSI DAs larger than theHADAs that use 3-1/2" mechanisms.

Figure 4 shows a typical HA configuration using F/W SCSI DAs. Although F/W SCSIallows up to 14 devices in this configuration, and performance guidelines limit it to 10, a

Page 19: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

F/WSCSI

F/WSCSI

F/WSCSI

F/WSCSI

DRAFT -- Revision 2.0August 22, 1996Page 19

Figure 4: Typical High Availability Configuration with F/W SCSI Disk Arrays (SCSIDA)

maximum of 6 F/W SCSI DAs are possible due to a 3-bit DIP switch for the SCSIaddress setting. The number is 6 rather than 8 since the two SPUs should have thehighest priority SCSI addresses of 7 and 6. Like the FLDA, SCSI DAs can be used withHA software as discussed in the section on JBODs.

High Availability Disk Arrays

High Availability Disk Arrays, henceforth called HADA for brevity, are available from HPin the following models:

# A3231A (Model 10 with 2.1 GB disk mechanisms)# A3232A (Model 20 with 2.1 GB disk mechanisms)# A3388A (Model 10 with 4.2 GB disk mechanisms)# A3389A (Model 20 with 4.2 GB disk mechanisms)

HADAs have several features not available in FLDAs and SCSI DAs:

! support for multiple concurrent RAID levels! support of RAID level 0/1! up to 20, 3-1/2", 4 GB, 7200 RPM disk mechanisms! easy addition of disk mechanisms online

Page 20: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 20

! multiple internal SCSI busses! deskside and rackmount models! redundant and hot-swappable:

# power supply# fan module# dual storage processors optional on Models 10 & 20

! dual storage processors (optional) on Models 10 & 20 for performance! redundant storage processor(s) in case of failure

Cost, capacity and the features listed above make the HADA the practical choice inmany situations where several hundred Gigabytes of storage is required. HADAs canbe configured in RAID 0, 1, 0/1 and 5. Although the hardware supports RAID level 3,that level is not supported on an HP-UX system due to the generally poor performancein comparison to the other RAID levels. Note that on-line disk replacement is supportedonly in RAID levels 1, 0/1, and 5.

Disk mechanisms are grouped together and assigned to a SCSI LUN, which is anotherlevel of addressing beyond the SCSI target address. In HP-UX, there is a limit of 8LUNs (0 through 7) per SCSI device. Adding new disk mechanisms does not require areload of the data since new disks may be assigned to a separate LUN. Each LUN canbe configured for any supported RAID mode.

Internally, the Model 10 HADA has two standard speed SCSI busses with up to 5 diskmechanisms on each bus. Data is striped across disks on the same SCSI bus. Internally, the Model 20 HADA has five standard speed SCSI busses with up to 4 diskmechanisms on each bus. Data is striped across disks on different SCSI busses.

Both the Model 10 and the Model 20 arrays come with one storage processor and asecond one can be added. When the second optional storage processor is added,disks can be assigned in groups to either storage processor.

Each active or redundant storage processor requires a SCSI target address. So, aHADA requires one or two SCSI target address depending on whether a secondstorage processor has been purchased. The second storage processor can be usedboth to distribute the I/O load for increased performance and for redundancy. Initially,each storage processor is assigned certain disk mechanisms in specific RAID modes. Upon failure of one of the storage processors, the surviving storage processor can takeover all of the disk mechanisms.

HP-UX version 10.0 introduced a new capability to LVM called PV Links. With PVLinks, LVM detects that there are multiple paths to a disk or array. The first device fileconfigured with vgcreate or vgextend becomes the primary path. The device filecorresponding to the alternate path then becomes the redundant path. In case of

Page 21: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 21

failure of the primary path (host adapter, cable or storage processor), LVM willautomatically switch to the redundant path.

The PV Links feature is necessary to support the redundant storage processoravailable on both the Model 10 and Model 20. This feature causes LVM to switch to thehardware path of the redundant storage processor when the corresponding primarystorage processor fails. Note that the disk arrays must be configured with LVM to makeuse of this automatic switching feature. Figure 5 shows how the PV Links feature isused with dual storage processors. Use of the redundant device file name without LVMcan result in severe performance degradation due to the several second delayassociated with storage processor switching in the HADAs.

The recommended number of HADAs per F/W SCSI interface changes dependingupon the I/O load and upon the number of active storage processors. For example,with two active storage processors in the Model 20 and a heavy I/O load, only fourModel 20 HADAs are recommended on a given F/W SCSI interface, although sevenwould be theoretically supported. This configuration also assumes that each of the twostorage processors is connected to a different F/W SCSI host adapter. Theconsequence of exceeding the recommended number of HADAs per F/W SCSI link isreduced overall performance of the application. Performance and capacity arediscussed in greater detail later.

Figure 5 shows the only supported configuration for HADAs with two StorageProcessors when configured in an HA cluster . This configuration was established forboth performance and availability reasons. There are no SPOFs in this configurationsince each storage processor is attached to a different SCSI bus and host adapter. The combination of PV Links and dual storage processors with HADAs provides verysimilar availability as JBODs in association with the HA software MC/ServiceGuard andMC/LockManager.

Page 22: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

StorageProcessorA (Std) & Cache

(Optional)

Model 20

StorageProcessorB (Opt) & Cache

(Optional)

StorageProcessorA (Std) & Cache

(Optional)

Model 20

StorageProcessorB (Opt) & Cache

(Optional)

DRAFT -- Revision 2.0August 22, 1996Page 22

Figure 5: High Availability Disk Array (HADA) with two Storage Processors usingPV Links

Advantages and Disadvantages of High Availability Disk Arrays (HADAs)(Note: not all of these advantages and disadvantages apply to the FLDAs and SCSIDAs)

+ smaller overall footprint for a given amount of storage

Page 23: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 23

+ lower cost for moderate to large configurations+ easy on-line replacement of failed disk mechanisms in RAID 1, 0/1, and 5+ capability to immediately assign a hot standby spindle to take over for a failedspindle+ highest storage connectivity since each array uses only 1 or 2 SCSI targetaddresses+ flexibility of configuration

# select among a choice of RAID levels 0, 0/1, 5# multiple RAID levels in one array concurrently

+ potential for high performance in a small I/O size, read-intensive environment+ dual storage processors (controllers) and power supplies reduce the numberof SPOFs+ automatic storage processor failover with HP-UX Revision 10.0+ global hot spare disk mechanisms

- little control over the placement of data for application performance tuning- variable overall performance depending on RAID level, I/O size, and read orwrite- the Storage Processors may potentially be a bottleneck- performance degradation will occur for small reads if a disk spindle in RAIDlevel 5 mode has failed since the array controller must read from the otherspindles to recalculate the missing data- although HADAs have redundant power supplies, there is only one power cordthat means that the power source is an SPOF- no boot support in a multi-initiator (shared bus) environment - UPSs required for powerfail support- RS-232 link is required for configuration

There is an additional disadvantage worth mentioning that applies only to the FLDAsand SCSI DAs:

- adding disk mechanisms requires a reload of the data except in IndependentMode

Disk Arrays with AutoRAID

HP offers a disk array with a patented technology named AutoRAID, called the HP DiskArray with AutoRAID, henceforth called AutoRAID array for brevity. Two configurationsare available, as product numbers A3515A (Deskside) and A3516A/[Z] (Rackmount). These arrays have the following features:

! combines RAID 0/1 and RAID 5 in the same array! automatically migrates data between RAID modes according to accesspatterns

Page 24: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 24

! up to 12 disk modules of 2.1 GB each! active hot spare disk module! on-line capacity expansion! on-line replacement of failed disk modules! redundant dual active controllers with automatic failover! ECC cache! redundant fans! optional redundant power supply and power cord! on-line replacement of controllers, fans and power supplies

Cost, capacity and the features listed above make the AutoRAID the practical choice inmany situations where 50 - 200 GB of storage is required and where performance andapplication access pattern knowledge is not available. The RAID level in AutoRAIDcannot be configured. The AutoRAID unit determines the best RAID level for the datadepending upon access patterns and the configured limit of RAID 1 storage.

When ordering AutoRAID units that will be mounted in cabinets, it is important that thecabinets be ordered with a second power distribution unit (PDU). This second PDUprovides a second power strip inside the cabinet and a second power cord that can beplugged into a separate power source. This configuration removes power as an SPOF.

Disk storage is assigned to a SCSI LUN according to desired capacity. Adding newdisk mechanisms does not require a reload of the data since new disks may beassigned to an existing LUN online, a feature unique to AutoRAID.

The AutoRAID can be configured with up to 12 disk mechanisms assigned to acombination of RAID 0/1, RAID 5 and hot spare. Internally, there are four SCSI busseswith up to 3 storage modules on each bus. There is no control possible over dataplacement on physical disks. The LUNs are defined according to the amount of diskspace desired.

The amount of RAID 0/1 space is dynamic, limited by the amount of unallocated spacefor LUNs, and is allocated according to the data access patterns of the application witha minimum of 2 GB. The most active data is kept in RAID 0/1 for best performance. Data that is less frequently accessed is kept in RAID 5 for lower cost. The administratorcontrols only the total amount of space available for RAID 1.

The AutoRAID comes with two controllers that are active and redundant. That is, bothcontrollers can be used to access the data. In the event that one controller fails, theother controller can be used to access all of the data in conjunction with the PVLinksfeature of the operating system, described in a previous section. AutoRAID does nothave the performance impact that the HADA has when switching between controllersfor the same LUN. Note that PVLinks does not currently allow concurrent access to the

Page 25: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 25

same LUN via both paths. The second path is for redundancy, in case of failure of thehost adapter, cable or AutoRAID controller. PVLinks does permit concurrent access todifferent LUNs via both paths. To take advantage of this capability, multiple LUNs mustbe defined in the AutoRAID.

The recommended number of AutoRAIDs per F/W SCSI interface changes dependingupon the I/O load and upon the number of active AutoRAID controllers. For example,with two active AutoRAID controllers and a heavy I/O load, only four AutoRAIDs arerecommended on a given F/W SCSI interface, although seven would be theoreticallysupported. This configuration also assumes that each of the two AutoRAID controllersis connected to a different F/W SCSI host adapter. The consequence of exceeding therecommended number of AutoRAIDs per F/W SCSI link is reduced overall performanceof the application. Performance and capacity are discussed in greater detail later.

Figure 6 shows the only supported configuration for AutoRAIDs when configured inan HA cluster . This configuration was established for both performance andavailability reasons. There are no SPOFs in this configuration since each storageprocessor is attached to a different SCSI bus and host adapter. The combination of PVLinks and controllers with AutoRAID provides very similar availability as JBODs inassociation with the HA software MC/ServiceGuard and MC/LockManager.

Page 26: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

AutoRAIDController

A

AutoRAIDController

B

AutoRAIDController

A

AutoRAIDController

B

DRAFT -- Revision 2.0August 22, 1996Page 26

Figure 6: High Availability Configuration using the AutoRAID Disk Array andPVLinks

Advantages and Disadvantages of AutoRAID Disk Arrays

+ smaller overall footprint for a given amount of storage+ lower cost for moderate to large configurations+ easy on-line replacement of failed disk mechanisms+ hot standby is active and automatically assigned+ highest storage connectivity since each array uses only 2 SCSI targetaddresses+ flexibility of configuration

# administrator chooses the amount of space allocated to RAID 1 mode# automatic and dynamic migration of data between RAID 5 and RAID 0/1according to the application’s data access patterns

+ potential for high performance with no a priori knowledge of application dataaccess patterns and no need for data placement (especially for write-intensiveapplications)+ no SPOFs when used with two separate power sources

Page 27: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 27

+ automatic AutoRAID controller failover with HP-UX Revision 10.0+ performance approaches that of JBODs/HASS at lower cost

- no control over the placement of data for application performance tuning- overall performance depends entirely on workload and amount of RAID 1space configured- no boot support in a multi-initiator (shared bus) environment - UPSs required for powerfail support

Symmetrix 3000 Series Integrated Cache Disk Arrays (ICDA)

HP resells the Symmetrix 3000 Series Integrated Cache Disk Arrays manufactured byEMC Corporation, henceforth called Symmetrix for brevity. Many configurations areavailable, with product numbers in the S11xxA and S12xxA ranges. These arrays havethe following features:

! combines RAID 1 and RAID S (a special case of RAID 5) in the same array! capability for remote placement of mirrored unit on campus or inter-city withthe Symmetrix Remote Data Facility (SRDF) ! on-line replacement of failed disk modules! up to 8 redundant active controllers with automatic failover! up to 4 GB cache! redundant fans! redundant power supply! on-line replacement of controllers, fans and power supplies! connection to up to 16 different host systems with up to 32 SCSI ports

Capacity and the features listed above make the Symmetrix the practical choice inmany situations where hundreds of Gigabytes up to multiple Terabytes of storage isrequired and where multi-system connectivity and performance are critical factors. Installation, configuration and support is provided for Symmetrix directly from EMCCorporation.

Symmetrix units are external, standalone units that must be powered separately. Theycontain enough battery capacity to totally flush the cache in case of power failure.

Disk storage is assigned to a SCSI LUN according to desired capacity. LUNs areconfigured by EMC according to application needs and then assigned for access by oneor more of the external SCSI ports, thus allowing multiple system access to the sameSCSI LUNs, if desired. Arbitration such as that available with MC/ServiceGuard andMC/LockManager must be done among the systems so that multiple systems do notmodify the same data at the same time.

Page 28: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 28

Multiple system connections are not made with multi-initiator busses. Rather, separateconnections from each system is made to the Symmetrix using different SCSI ports onthe Symmetrix.

The Symmetrix can be configured with up to 128, 4 GB or 9 GB disk mechanismsassigned to up to 256 LUNs that can be configured on RAID 1 and RAID S groups. Internally, there are 32 SCSI busses (4 per disk director) with up to 4 disk mechanismson each bus. There can be up to 8 Channel Directors that can be a combination ofOpen Storage Directors that provide 4 F/W SCSI ports for Unix hosts and Remote LinkDirectors for SRDF configurations.

The Symmetrix must be connected to HP 9000 host systems in pairs. That is, at leasttwo F/W SCSI links must be connected to a host system for redundancy purposes. Both links can be used to access the data. In the event that one link fails, the other link can be used to access all of the data in conjunction with the PVLinks feature of theoperating system, described in a previous section. Symmetrix does not have theperformance impact that the HADA has when switching between controllers for thesame LUN. Note that PVLinks does not currently allow concurrent access to the sameLUN via both paths. The second path is for redundancy, in case of failure of the hostadapter, cable or Symmetrix Open Storage Director. PVLinks does permit concurrentaccess to different LUNs via both paths. To take advantage of this capability, multipleLUNs must be defined in the Symmetrix.

Only one Symmetrix is allowed per F/W SCSI host adapter. No daisy chaining isallowed. Figure 7 shows the only supported configuration for Symmetrix whenconfigured in an HA cluster . This configuration was established for both performanceand availability reasons. There are no SPOFs in this configuration since each OpenStorage Director is attached to a different SCSI bus and host adapter. The combinationof PV Links and dual Open Storage Directors provides very similar availability asJBODs in association with the HA software MC/ServiceGuard and MC/LockManager.

Page 29: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

Primary Links

RedundantLinks

DRAFT -- Revision 2.0August 22, 1996Page 29

Figure 7: Typical High Availability Configuration using the EMC Symmetrix ICDAand PVLinks

Advantages and Disadvantages of Symmetrix ICDAs

+ smaller overall footprint for a given amount of storage+ availability of 9 GB disk mechanisms+ most cost effective solution for very large configurations+ easy on-line replacement of failed disk mechanisms+ highest storage connectivity+ flexibility of configuration

# administrator chooses the amount of space allocated to RAID 1 andRAID S modes

+ potential for highest performance (especially for write-intensive applicationssince writes are always done to the cache)+ very large global cache (for all disk mechanisms)+ automatic Open Storage Director failover with HP-UX Revision 10.0 (PVLinks)+ performance approaches and possibly exceeds that of JBODs/HASS

Page 30: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 30

- overall performance depends entirely on workload and use of RAID 1 versusRAID S and SRDF- no boot support in a multi-initiator (shared bus) environment - high cost for small configurations- internal RAID group configuration and LUN assignment must be performed byEMC support personnel

Solid State Disks

Solid state disks are relatively new to the marketplace. The name is really anoxymoron. It is not a disk drive, but a disk drive replacement. The solid state diskemulates a regular disk but uses non volatile semiconductor RAM instead of rotatingdisk media. The main reason for using a solid state disk is to increase performance. Seek time, which is the factor that most affects rotating disk performance is nonexistent on a solid state disk.

Solid state disks are very expensive compared to regular disk drives. However, theirvery fast access time might compensate for the increased price in performance-sensitive situations. Several companies market solid state disks. Examples are:

! Disk Emulation Systems of Santa Clara, California! Quantum Corporation of Milpitas, California! Storage Computer of Nashua, New Hampshire

One product from Disk Emulation Systems (DES) provides ECC memory, batterybackup, and automatic backup to an internal disk in case of power outage. Anotherproduct provides transparent mirroring to disk and redundant power supplies in additionto the previous features. Solid State disks from DES have capacities up to 4 GB. Hardware support for DES products is provided by HP field repair personnel on allvendors' platforms.

Advantages and disadvantages of solid state disks

Solid state disks have the following advantages and disadvantages:

+ much higher performance than any standalone disk or RAID array+ no rotational latency+ no seek time+ appears to the OS as if it were a normal disk

- much higher cost- not purchasable from HP

Page 31: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 31

- not officially supported by HP-UX drivers

CAPACITY

Capacity is determined by the following factors: available interface slots in the SPU,operating system limitations, maximum number of disks per link, largest capacity diskdrive available, performance guidelines, and testing. Tables B through F can be usedto compare maximum capacities for the various disk types. The numbers in thesetables are maximums. Actual configuration limits will be smaller in multi-initiator(shared bus) configurations . See the section on performance to learn the impact ofmulti-initiator environments on the maximum number of disks on a disk link.

Table B: Maximum Link Capacity by Disk Type

Maximum targets Maximum Maximum TotalRecommended Capacity per Capacity per F/W

& Link Limit Target† SCSI Interface

Standalone JBODs * 5 - 10 15 2 GB 20 GB

High AvailabilityStorage System(HASS)

5 - 10 15 4 GB 40 GB

HP-FL Disk Arrays(FLDAs)

4 8 10 GB 40 GB

F/W SCSI DiskArrays (SCSI DAs)

3 7 10 GB 30 GB

HA Disk Arrays(HADAs) **

4 15 40 GB*** 160 GB

AutoRAID DiskArray

4 15 25.2 GB**** 100.8 GB****

Page 32: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

Table B: Maximum Link Capacity by Disk Type

Maximum targets Maximum Maximum TotalRecommended Capacity per Capacity per F/W

& Link Limit Target† SCSI Interface

DRAFT -- Revision 2.0August 22, 1996Page 32

NOTES: † maximum capacity per target does NOT account for mirroring or data parity

* standalone JBODs should be mirrored on different F/W SCSI busses

** configured with dual Storage Processors for maximum redundancy; eachstorage processor is attached to a different F/W SCSI bus

*** 40 GB divided equally across two Storage Processors that are on separateF/W SCSI busses

**** Unlike other arrays, AutoRAID arrays may not be used without data protection

According to the March 1995 configuration guide, the maximum disk capacity usingF/W SCSI interfaces (the most common choice today) for various SPUs is shown in thenext two tables. Capacity is determined by the maximum number of disk link hostadapters (interfaces) supported and the recommended maximum number of disks perdisk link. Table C is for standalone mirrored JBODs. Table D is for Disk Arrays.

Standalone disk drives are available in many different capacities. The most commoncapacities are 1, 2 and 4 GB 3-1/2" disk mechanisms. Table E shows systemmaximum disk capacity based upon 2 GB disk mechanisms.

Table C: Maximum Supported Disks & Host Adapters by SPU Family

Series E G H I K T

Maximum F/W SCSI cards 2 2 4 5 9 40

Maximum F/W SCSI disks 30 30 60 75 149 600

Maximum F/W SCSI arrays 30 30 60 75 59 600

Maximum SE SCSI cards 4 4 8 12 8 20

Maximum SE SCSI disks 34 34 57 85 56 84

Maximum HP-FL cards 0 2 4 4 4 40

Page 33: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

Table C: Maximum Supported Disks & Host Adapters by SPU Family

Series E G H I K T

DRAFT -- Revision 2.0August 22, 1996Page 33

Maximum HP-FL disks 0 16 32 32 32 160

Table D: Maximum Disk Capacity using F/W SCSI Standalone Disks

SPU Model Using Standalone Disks Using Mirrored StandaloneDisks

T5XX 2,400 GB 1,200 GB

K4XX 596 GB 298 GB

K2XX 236 GB 118 GB

K100 236 GB 118 GB

Ixx 300 GB 150 GB

Hxx 240 GB 120 GB

Gxx 120 GB 60 GB

Exx 120 GB 60 GB

Table E: Maximum Disk Capacity using F/W SCSI Disk Arrays

SPU Model Using Disk Arrays in RAID Using Disk Arrays in RAIDLevel 5 Level 0/1

T5XX 20,000 GB 12,000 GB

K4XX 10,480 GB 5,900 GB

K2XX 4,720 GB 2,950 GB

K100 4,720 GB 2,950 GB

Ixx 4,800 GB 3,000 GB

Hxx 3,840 GB 3,000 GB

Gxx 1,920 GB 1,190 GB

Page 34: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

Table E: Maximum Disk Capacity using F/W SCSI Disk Arrays

SPU Model Using Disk Arrays in RAID Using Disk Arrays in RAIDLevel 5 Level 0/1

DRAFT -- Revision 2.0August 22, 1996Page 34

Exx 1,920 GB 1,190 GB

Table F shows the maximum disk capacity supported on the various HP disk arrays ateach supported RAID level.

Table F: Maximum Supported Usable Capacities

RAID HP-FL Arrays F/W SCSI Arrays F/W SCSI HA AutoRAID Disklevel (FLDAs) (SCSI DAs) Arrays (HADAs) Arrays

0* N/A N/A 42 GB N/A

1 N/A N/A 21 GB **

0/1 N/A N/A 21 GB N/A

3 8 GB 8 GB N/A N/A

5 N/A 8 GB 33.6 GB 18.5 GB

Ind* 10 GB 10 GB 42 GB N/A

NOTES: * Independent Mode is NOT a High Availability Solution due to the lackof data redundancy ** The amount of disk capacity in RAID 1 mode is configurable andsubtracts from the amount available for RAID 5

PERFORMANCE COMPARISONS

The issue of the performance of standalone disks in comparison to RAID disk arrays isvery complex. Actual performance depends upon:

! use of LVM versus non-LVM managed disks! whether hardware or software striping is used! whether hardware or software data protection is used! I/O size! ratio of reads to writes (read or write intensity)! sequential or random access patterns

Page 35: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

Performance News Notes, Volume 13, No. 1, January/March 19953

DRAFT -- Revision 2.0August 22, 1996Page 35

! whether a mechanism has failed in a RAID 3 or 5 configuration! disk link (HP-FL, SCSI, F/W SCSI)! the number of targets on a disk link

LVM versus non-LVM managed disks

Organizations now usually employ Logical Volume Manager (LVM) to configure disks. All new high availability products including MC/ServiceGuard and MC/LockManagerrequire the disks to be configured with LVM since these HA products use LVM toenforce exclusive or shared mode activation, and to perform disk locking. LVM typicallyconsumes 1 - 3 % additional software overhead due to LVM code paths, but this is aconsideration only if the system is 100 % CPU bound. Therefore, although LVMtypically adds only a slight amount of CPU overhead, performance testing should bedone with LVM-configured disks to duplicate the exact customer environment.

Standalone disks work well in almost any I/O environment. There is not really asuboptimum environment for standalone disks. In contrast, RAID disks excel in certainenvironments but provide extremely poor performance in other environments. Standalone disks usually provide the best performance in all environments.

Striping

Striping of data across multiple disk mechanisms is a standard feature in RAID arraysand can be performed with standalone disks via LVM. Striping can greatly improve thespeed of random access.

For sequential access, the effect of striping depends upon I/O size. The performanceof small, independent sequential I/Os is unaffected by striping. However, largesequential I/Os (>= 64 KB) can benefit greatly from striping. HP-UX version 10.0 hasan internal feature called I/O merging that will automatically merge multiple sequentialI/Os that are currently in the disk queue into a single I/O request. This featurereduces overhead and improves performance by reducing the commands and setupassociated with many I/O requests. Since I/O merging increases the size of the I/O,performance can be further increased by striping.

Unlike HP-UX versions before 10.0, HP-UX version 10.0 explicitly supports softwarestriping via LVM with an easy-to-use option to the lvcreate and lvextend commands,and can be used to stripe in units as small as 4 KB. In contrast, hardware striping inRAID arrays is done in units of sectors, which are 512 bytes in size. LVM stripingappears to provide better performance than does hardware striping!3

Page 36: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 36

Data Protection

Mirroring can also be done in software or in hardware. Mirroring in software isaccomplished via LVM, and uses more CPU cycles than does hardware mirroring. LVMmirroring appears to provide better read performance than does hardware mirroringusing RAID level 0/1. This is due to LVM queuing the read onto the disk drive that hasthe shortest queue. Read performance has been seen to improve by as much as 40%with LVM mirroring. A new version of the firmware for the HADA disk arrays shouldimprove mirrored read performance by allowing reads from both sides of the mirrorinstead of always reading from one disk.

However, RAID level 0/1 appears to offer better write performance than does LVMmirroring which degrades write performance by about 10% due to the dual writes. WithRAID 0/1, a single write operation is sent to the disk array and is handled very efficientlywhen a write cache is configured.

RAID levels 3 and 5 also provide data protection. The data protection is parity, so thissolution is less expensive than mirroring which requires double the disk space. However, RAID 3 provides consistently lower performance as compared to standalonedisks except when the I/O size is >= 64 KB. RAID 5 performance is high for large I/Os(>= 64 KB) but poor for smaller I/O sizes.

AutoRAID combines RAID 5 and RAID 0/1 in the same disk array and trades off theincreased storage requirements of RAID 0/1 with the lower cost of RAID 5. AutoRAIDautomatically migrates data between the two RAID modes as data access patternschange. The most frequently accessed data in RAID 0/1 mode while the rest of thedata is kept in lower cost RAID 5. The amount of space used for RAID 0/1 isconfigurable.

I/O Size

On a Unix system used for business applications, including HP-UX, most I/Os arerelatively small, between 2 KB (raw I/O) and 8 KB (file system). I/O performance islinear in relation to size using standalone LVM mirrored disks if I/O setup and seek timeis ignored. HP-UX version 10.0 implements a new feature called I/O merging. Duringthe disk sort operation that sorts I/Os in the disk queue by cylinder location, the driverwill also look for I/Os that are contiguous on the disk and will merge the I/Os into asingle I/O. This feature can improve disk-intensive application performance when I/Osare sequential.

With disk arrays configured in RAID 3 or 5 mode, performance varies in a number ofways. RAID 3 provides consistent I/O performance for reads and writes at small I/O

Page 37: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 37

sizes, although the performance is much lower than with standalone mirrored disks. RAID 3 performs best for I/Os of 64 KB or larger.

The performance of a disk array configured in RAID 5 is very inconsistent. Small I/Osare most efficient for read operations since in the random case, multiple small I/Os canbe processed concurrently by the disk array when they reference data on different diskmechanisms. Large I/Os (>= 64 KB) involve all disk mechanisms in the array andtherefore prevent multiple concurrent I/Os. However, large writes are much moreefficient than small writes since large writes rewrite all of the data in the stripe includingthe parity information. Small writes require a read-modify-write sequence to occur sincethe entire stripe is not rewritten. The stripe must first be read, the appropriate portionmodified, the new parity information calculated, and the entire stripe written back out.

Ratio of reads to writes

Many commercial applications are read intensive. OLTP applications of 70% or higherreads and Decision Support (DSS) applications that are 100% reads are consideredread-intensive. Since reads are more efficient than writes, read-intensive applicationswill benefit most by an LVM mirrored environment or a HADA RAID 1 configuration withthe new firmware release. Both take advantage of the multiple copies of the data andsplit the I/Os among the copies.

Administrators of non-read-intensive applications must carefully weigh the disktechnology employed based on the other considerations discussed in this section, suchas I/O size, striping, and data protection mechanism.

Sequential versus random access

Sequential I/O performance is usually better than random when the I/O size is large. Therefore, the 10.0 feature of I/O merging improves sequential performance even whenthe application writes sequentially, but in small blocks.

Random I/O performance is usually best when the I/O size is small, since many I/Oscan be queued up which reference data on different disk mechanisms. So, the OS canissue many I/O requests in parallel.

Failed mechanisms in RAID 3 or 5 configurations

When a mechanism has failed in a RAID 3 or 5 configuration, the disk array controllermust recreate the missing data mathematically. The mathematical computationprevents the controller from working on other I/Os and therefore degrades overallperformance. Small I/Os that don't involve the entire RAID 5 stripe will require thecontroller to read the rest of the stripe in order to have enough data to recreate the

Page 38: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 38

missing data that resides on the failed mechanism. This situation will severely degradethe overall performance of the disk array.

Disk Link Technology

The various disk link technologies have different peak and sustained bandwidths. Obviously, a faster bandwidth link will provide better performance. Currently, F/W SCSIoffers the highest bandwidth and therefore the best performance.

Number of Targets on a Disk Link

The bandwidth and target limitations of the various disk link technologies was discussedin an earlier section. Each disk link technology has a physical limit on the number oftargets allowed. This limit is based on electrical and protocol requirements and variesfrom 8 to 16. However, in a real environment, performance requirements will oftendictate that fewer targets be configured on a particular disk link, thus requiring thatadditional disk links be configured in order to meet the total storage requirements. Thenumber of disk links is constrained by the number of I/O slots in a given SPU and upontesting and other support limitations.

Furthermore, in multi-initiator bus environments, fewer targets may be configured on adisk link when each of the initiators is active on the bus. It is attractive for cost reasonsto configure HA clusters such that disks belonging to different applications areconfigured on the same bus. These applications could be running on different SPUsand it is this case that degrades performance the most. Therefore, it is highlyrecommended that disks on a particular bus belong to only one application.

Following are guidelines on how to configure a disk link for maximum throughput. Onemust always consider, though, that one key issue is the number of active disks on alink, not just the number of disks on a link. The consequence of exceeding theseguidelines is reduced overall application performance due to disk link (bus) saturation;i.e., some I/Os will be queued up since the bus is busy.

For performance reasons, the maximum number of Performance Loads allowed ona F/W SCSI bus is 11-1/2 . Performance Loads are established for the host adapter aswell as disks, and in a multi-initiator HA environment, each host adaptor must becounted if it is active on the same bus.

The assigned Performance Load Factor for a particular host adapter, disk, or array isbased on the sustained transfer rate. This definition includes the possibility thatmultiple host adapters on different nodes might be accessing the same F/W SCSI busconcurrently, although this is not recommended. Host adapters should be assigned thehighest priority SCSI addresses in order to minimize starvation.

Page 39: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 39

The Performance Load Factor is normalized to a modern JBOD, which can sustain a3.7 MB/sec transfer rate. For example, a HADA disk array Storage Processor cansustain an 8 MB/sec transfer rate, so it counts as two devices.

Table G summarizes the Performance Load Factors associated with each device type.

Table G: Performance Load Factor by Device Type

Device Type PerformanceLoad Factor

Standalone JBOD 1

Each storage module in a HASS 1

HADA disk array with one storage processor 2

HADA disk array with two storage processors, but each storage 2processor is attached to a different F/W SCSI bus (on each bus)

HADA disk array with two storage processors on a single bus 4(NOT a supported configuration in an HA cluster)

AutoRAID disk array with two controllers, each on a different F/W 2SCSI bus (on each bus)

Each active host adaptor on the same F/W SCSI bus 1.5

Here are several examples giving configurations that add up to the maximum of 11.5Performance Loads based on the numbers in the above Table G:

! A single system

1.5 for the one active host adapter AND10.0 for 5 HADA disk arrays each with one Storage Processor each OR 8.0 for 2 HADA disk arrays each with two Storage Processors each OR10.0 for 10 JBODs

! A two-node HA cluster with only one node active on the F/W SCSI bus

1.5 for the one active host adapter AND10.0 for 5 HADA disk arrays each with one Storage Processor each OR 8.0 for 2 HADA disk arrays each with two Storage Processors each OR

Page 40: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 40

10.0 for 5 HADA disk arrays each with two Storage Processors each,where each Storage Processor is attached to a different F/W SCSI busOR10.0 for 10 JBODs

! A two-node HA cluster with both host adapters active on the F/W SCSI bus

3.0 for the two host adapters AND 8.0 for 4 HADA disk arrays each with one Storage Processor each OR 8.0 for 4 HADA disk arrays each with two Storage Processors each,where each Storage Processor is attached to a different F/W SCSI busOR 8.0 for 8 JBODs

! A three-node HA cluster with all three host adapters active on the same F/WSCSI bus

4.5 for the three host adapters AND6.0 for 3 HADA disk arrays with one Storage Processor each OR4.0 for 1 HADA disk array with two Storage Processors OR7.0 for 7 JBODs

Performance benchmarks

Various studies have been done to compare the performance of the various disk drivesand arrays. Most recently, a rigorous study has been done by the Capacity PlanningCenter. Preliminary findings have been reported in the publication "Performance NewsNotes," also known as PN², Volume 13, No. 1, January/March 1995. This report isavailable on ESP, keyword "SCSI Disk Performance." These tests were done with LVMraw logical volumes on various disk configurations.

Summary of performance of various disk technologies

Tables H and I summarize the areas that are optimum (best) and sub-optimum (worst)for the various RAID levels and for standalone disks. LVM mirrored disks provide thebest overall performance . RAID 5 provides better general performance than doesRAID 3.

Page 41: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 41

Table H: Optimum I/O Environment (Best)

I/O Size Random Random Sequential SequentialRead-intensive Write-intensive Read-intensive Write-intensive

Small(1-2 KB)

! Standalone / ! RAID 0 ** ! Standalone/ ! RAID 3 *Mirrored (LVM) ! Standalone / Mirrored (LVM)*! Standalone / Striped (LVM)**Striped (LVM)** ! RAID 0/1! RAID 0 **! RAID 5

Large(>= 64 KB)

! Standalone / ! RAID 0 ** ! Standalone / ! RAID 0 **Mirrored (LVM) ! RAID 0/1 Mirrored (LVM) ! RAID 3! RAID 0 ** ! RAID 5 ! RAID 0 **! RAID 0/1

NOTES: * with HP-UX 10.0 feature of merging sequential small I/Os into a single large I/O ** NOT a High Availability configuration

Table I: Sub-Optimum I/O Environment (Worst)

I/O Size Random Random Sequential SequentialRead-intensive Write-intensive Read-intensive Write-intensive

Small(1-2 KB)

! RAID 3 ! RAID 3 ! RAID 3 *** ! RAID 3 ***! RAID 0/1 ! RAID 5

Large(>= 64 KB)

! RAID 3 ! RAID 3! RAID 0/1 ! Standalone /

Mirrored (LVM)

NOTES: *** prior to HP-UX 10.0

HIGH AVAILABILITY (HA) DISK CONSIDERATIONS

When implementing an HA solution, a primary goal is to minimize the number of SinglePoints of Failure (SPOFs). To achieve the highest availability, all these SPOFs have tobe prevented. Table J summarizes the SPOFs that apply to disk drives and thesolutions that are available.

Page 42: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 42

Table J: Solutions to Disk Single Points of Failure

SPOF JBODs or HASS with LVM RAID Disk ArraysMirroring

Disk mechanism ! Mirror to different disk ! Use RAID level 1, 0/1, 3mechanism or 5 in HADA, FLDA or

SCSI DA

Disk Power Supply ! Mirror to disk in different ! Use HADA, AutoRAID ortray or tower Symmetrix with redundant

power supply option

Power Source ! Mirror to disk in differenttray or tower; HASS has 2power cords in one chassis

No Solution except forAutoRAID which has 2power cords in one chassis

Disk Cooling System ! Mirror to different disk ! Use HADA, AutoRAID ormechanism Symmetrix which have

redundant cooling system

Disk or Array Controller ! Mirror to different disk ! Use HADA, AutoRAID ormechanism Symmetrix

Cable from SPU ! Mirror to disk on different ! Use HADA, AutoRAID orinterface Symmetrix with PV Links

(two F/W SCSI interfacecards)

Computer Interface ! Mirror to disk on different ! Use HADA, AutoRAID orinterface Symmetrix with PV Links

(two F/W SCSI interfacecards)

DECISION CRITERIA

The major factors that should be considered in deciding the appropriate disk technologyare:

! on-line failed disk replacement! the need for data redundancy! the need for double data redundancy! the need for continuous data redundancy! purchase cost! footprint

Page 43: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

Technical HPPA Newsletter # 205, August 1, 1994, "LVM Mirrored Disk4

Recovery".

Technical HPPA Newsletter # 217, May 23, 1995, "Recovery Cookbook".5

DRAFT -- Revision 2.0August 22, 1996Page 43

! performance! backup strategy! total capacity requirements! power source redundancy! total distance

The need for on-line failed disk replacement versus scheduling downtime

Can up to one hour of downtime be scheduled to replace a failed disk? If yes,then on-line failed disk replacement is not a requirement. Of course,replacement depends on the availability of a spare disk mechanism and theknowledge of how to do the replacement.

RAID disk arrays support true on-line replacement of failed disk mechanisms ifthey are configured in RAID levels 1, 0/1, 3, or 5 only. The application cancontinue to run since the master controller or storage processor limits access tothe failed mechanism. Also, the SCSI busses remain connected and properlyterminated due to the design of the disk array.

With LVM mirrored standalone disks or arrays, it is recommended that theapplication be halted so that the simpler replacement procedure can be used. This also ensures that no I/Os are occurring on the bus at the time of thereplacement. Inadvertent disconnection of the SCSI bus might cause OS and/ordata corruption problems if an I/O was attempted while the bus wasdisconnected.

A good discussion of the correct procedure for replacing a failed disk mechanismin an LVM-mirrored environment can be found in PA NEWS # 205. The4

Australian Response Center has created a cookbook that can be accessed asdescribed in PA NEWS # 217.5

The need for data redundancy

Data redundancy can be provided in several ways. One must decide firstwhether one level of redundancy is sufficient. One-level data redundancy can beprovided with:

Page 44: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 44

� 2-way LVM mirroring of standalone disks� RAID level 0/1 or 1� RAID levels 3 or 5

Data redundancy is lost with the failure of any one disk mechanism with one-level redundancy. Exposure time is the length of time it takes to replace thefailed mechanism and recovery the data onto the replacement disk.

The need for double data redundancy

Two-level data redundancy can be accomplished with:

� 3-way LVM mirroring of standalone disks� 2-way LVM mirroring of RAID level 0/1, 1, 3, or 5 disk arrays

Although much more expensive due to the additional standalone disks or arrays,two-level data redundancy provides protection from a double failure. In addition,it gives the ability to take one side of the mirror offline for backup or datamaintenance and still maintain data redundancy. Some have used this featureto maintain an original copy of the data that can be restored very quickly in caseof corruption or other data loss.

The need for continuous data redundancy after a disk mechanism has failed andbefore it is replaced

Do I require data redundancy during the time that a disk mechanism has failedand not yet been repaired? This is really an exposure to risk issue; in anotherwords, is the risk of a second failure before repair of the first failure acceptable?

LVM mirroring supports 3-way mirroring that could meet this requirement. Thereis still a primary and backup copy that exists even if one disk in the mirror fails.

HADA disk arrays have a feature to designate one of the disk mechanisms as ahot standby that will be reassigned by the storage processor in case of amechanism failure. This feature continues data protection while waiting for afailed disk mechanism to be replaced and is most useful in lights-outenvironments or when failed disk mechanism replacement has to be scheduled.

Hot standby disks can be reassigned in an LVM mirrored environment. However, to do this, a series of LVM commands must be used to reassign aspare standalone disk as described in Technical HPPA Newsletter # 218.

Page 45: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 45

Purchase cost

Purchase cost is usually a factor in determining how much availability can beprovided. Cost is affected by:

� the number of cabinets required� the number of I/O expansion modules (T500) required� the number of SCSI cards required� whether one- or two-level data redundancy is required

An example cost comparison among various standalone disks and disk arraysfollows the next section on footprint in Table K.

Footprint

The footprint is the amount of floor space required by the system. It changesaccording to which disk technology is chosen. It is affected by:

� the number of I/O expansion units required� whether UPSs are required� the number of cabinets required to hold the disks

Here is an example footprint and price comparison among five configurationsusing LVM standalone disks, HADA disk arrays, and FLDAs and SCSI DAs for64 GB of protected data. RAID 5 provides 64 GB of protected data using 80 GBtotal disk space. LVM mirroring provides 64 GB of protected data using 128 GBtotal disk space. This comparison includes the following assumptions:

� space or cost for UPSs is not included� space or cost for T500 expansion modules is not included� standalone JBODs are configured in trays with four rather thanfive mechanisms for simplicity of configuration and comparison withthe disk arrays whose capacities are multiples of 8 GB� the cache on the HADA, FLDA and SCSI DA divided among thenumber of mechanisms in the array is equivalent to the cache oneach JBOD� 1.6 meter cabinets are used that have room for up to 32 EIAunits and power capacity of 16 amperes at 230 VAC.

When reviewing the Table K, consider that Configuration A will probablyyield the greatest performance due to the larger number of mechanismsspread over more F/W SCSI links. Configuration B, however, is the leastexpensive and occupies the smallest floorspace.

Page 46: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 46

Table K: Cost & Footprint Comparison for 64 GB of Protected Disk Space

Config A Config B Config C Config D Config E

Method of Dataprotection

LVM RAID 5 RAID 1 RAID 5 LVMMirroring Mirroring

Quantity ofArrays or Trays

16 2 4 8 16

Quantity of 2 GBdiskmechanisms

64 40 64 40 64

Rack space inEIA Units (each)

4 9 9 7 7

Powerconsumption inAmperes (each)

1.2 4 4 3 3

# of F/W SCSIcards required

8 2 2 3 6

# of Cabinetsrequired

2 1 2 2 4

Square metersof floorspace

1.08 0.54 1.08 1.08 2.16

Cost $176,700 $129,400 $221,190 $192,785 $374,370

Page 47: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

Table K: Cost & Footprint Comparison for 64 GB of Protected Disk Space

Config A Config B Config C Config D Config E

DRAFT -- Revision 2.0August 22, 1996Page 47

Notes:Config A: C3553RZ 2 GB JBODs in 8 GB tray configuration mirrored for

128 GB total disk space (64 GB protected)Config B: A3232A HADA arrays in RAID 5 mode for 80 GB total disk space

(64 GB protected) configured with 2 Storage Processors and 32MB cache; 20 mechanisms per array

Config C: A3232A HADA arrays in RAID 1 mode for 128 GB total diskspace (64 GB protected) configured with 2 Storage Processorsand 32 MB cache; 16 mechanisms per array

Config D: C2440JZ SCSI DAs in RAID 5 mode for 80 GB total disk space(64 GB protected); 5 mechanisms per array

Config E: C2439JZ+C2431A SCSI DAs in Independent mode mirrored for128 GB total disk space (64 GB protected); 4 mechanisms perarray

The exact configurations used are shown below. Prices are from the August,1995 price list and are subject to change.

Table L: Detail of Configurations Compared in Table K

Configuration Qty Product # Price each Total Price

AMirroredJBODs

16 C3553RZ 10,090 161,4402 A1897A 2,450 4,9008 28696A 1,295 10,360

176,700

BRAID 5

HADA Arrays

2 A3232A 15,0002 #315 35,0202 #421 2,0002 #532 10,160 124,3602 28696A 1,295 2,5901 A1897A 2,450 2,450

129,400

Page 48: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

Table L: Detail of Configurations Compared in Table K

Configuration Qty Product # Price each Total Price

DRAFT -- Revision 2.0August 22, 1996Page 48

CRAID 1

HADA Arrays

4 A3232A 15,0004 #320 26,2654 #421 2,0004 #532 10,160 213,7002 28696A 1,295 2,5902 A1897A 2,450 4,900

221,190

DRAID 5

SCSI DAs

8 C2440JZ 23,000 184,0003 28696A 1,295 3,8852 A1897A 2,450 4,900

192,785

EIndependentMode SCSI

DAs

16 C2439JZ 18,000 288,00016 C2431A 4,300 68,8006 28696A 1,295 7,7704 A1897A 2,450 9,800

374,370

Performance

Performance of the disk subsystem is a complex subject and was discussed inthe section on performance. Remember that RAID often, but not alwaysdegrades the performance of the disk subsystem. Factors which affectperformance of the disk subsystem are:

� read versus write intensity� volatility of the data (update or insert versus read-only)� random versus sequential access (OLTP vs. DSS)� I/O size� RAID level chosen� whether LVM mirroring is used� the use of multiple controllers for redundancy and performance.

Backup strategy

The application together with business requirements determine whether an on-line backup can be used. On-line backups are usually done on the same systemthat runs the application. Not all applications provide the facility for a consistent

Page 49: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 49

on-line backup. Consistency refers to the ability of the backup to provide a self-contained copy of all of the data. Some Relational Database systems provide anon-line consistent backup facility either as a tool, or as a procedure using OSbackup tools.

Off-line backups can be performed by first shutting down the application andthen doing the backup. During the time that the backup occurs, access to thedata is not allowed.

Alternatively, splitting off one side of mirrored disks and then doing the backupallows the application to continue to access the data while the backup isoccurring. This is an on-line backup based on a snapshot of the data and can bedone on the same system or a different system from the one where theapplication is running. The offline mirror is then merged back into the onlinemirror after the backup has completed.

The HP-UX version 10.X feature called read-only volume groups is required toperform the on-line backup on a different system. An on-line concurrent backupis normally inconsistent unless the application is first shutdown. The procedurefor ensuring that the backup is consistent is:

1. Quiesce the application for data consistency.2. Perform an lvsplit operation to split off one side of the mirror.3. Resume the application.4. Optionally, activate the volume group as read-only on a differentsystem.5. Do the backup.6. Deactivate the volume group if it was activated on anothersystem.7. Perform an lvmerge operation to return the offline mirror to themirrored set.

For those applications that use a file system, the optional OnLine JFS productalso provides the capability to do an online backup. A snapshot file system iscreated to record the original data that is being changed while the application isonline during the backup. The backup accesses the original data by referring tothe snapshot file system rather than the original file system.

Total capacity requirements

The total amount of disk space required may force a choice of disk technology. Disk arrays provide the largest total capacity on a system, so it might benecessary to choose that technology just to get the needed capacity.

Page 50: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 50

Power source redundancy

To ensure the highest level of availability, data should be protected by multiplepower sources. This is possible today only with standalone mirrored disks. Theloss of one power source or circuit will not prevent access to the data. Ofcourse, this only makes sense if used together with MC/ServiceGuard where thesystems that can run the application are powered separately, also.

Total distance among computer systems connected to the disks

The more systems connected to the same set of disks, the longer the total cablelength required. Although HP-FL allows the longest cable length (up to 500meters from each system to the string of disks), its performance is much lowerthan newer technologies such as F/W SCSI.

The total cable length for F/W SCSI is limited to 25 meters, including all cablinginside towers and trays. Since each system in the cluster consumes 2.5 metersof cabling and each "V" cable consumes an extra 2 meters, a four-node clusterwill consume 16 meters of cable not including any internal cabling.

DISK SELECTION MATRIX

Table M summarizes the decision criteria for selection of the appropriate disktechnology. Each of these criteria involves a business decision that should be made. These criteria should also be ranked when more than one is required.

Table M: Disk Selection Matrix by Technology

Technology Criterion If Required, then If Not achoose from Requirement,

this list of then chooseoptions from this list

on-line replacement of failed disk RAID 0/1 anymechanism RAID 3

RAID 5

global hot spare disk mechanism for RAID 0/1 anyautomatic reassignment in case of failed disk RAID 5

Page 51: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

Table M: Disk Selection Matrix by Technology

Technology Criterion If Required, then If Not achoose from Requirement,

this list of then chooseoptions from this list

DRAFT -- Revision 2.0August 22, 1996Page 51

data redundancy! 3-way full protection ! LVM mirrored N/A! 2-way full protection ! LVM mirrored

! parity protection ! RAID 3 or 5or RAID 0/1

lowest cost LVM mirroredRAID 5 HADA orAutoRAID

smallest footprint RAID 0/1RAID 5 HADA orAutoRAID LVM mirrored

No Single Points of Failure (SPOFs) RAID 0/1, 1LVM mirrored ,HASS,AutoRAID,SymmetrixRAID 5 HADA (except power)

RAID 3RAID 5

capacity > mirrored standalone capacity anyRAID 5 HADA,AutoRAID,Symmetrix

backup from offline snapshot anyLVM mirrored

greatest general performance anyLVM mirrored

greatest absolute performance solid state disks any

power source redundancy anyLVM mirrored

total disk cable length > 25 m F/W SCSIFC/SCSI Mux

Table N shows which products meet the decision criteria discussed above. The columncodes refer to the following products:

A - standalone LVM mirrored disksB - High Availability Storage System (HASS) using LVM mirroring

Page 52: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 52

C - HP-FL RAID disk arrays (FLDA)D - F/W SCSI RAID disk arrays (SCSI DA)E - High Availability disk arrays (HADA)F - Solid state disksG - High Availability Disk Array with AutoRAID

NOTES: RAID levels are given as Rn, where n is the RAID level. Notes are referenced as *n, where n is the note number. Relative rankings are referenced at VH, H, M and L for very high high,

medium and low.

Table N: Disk Selection Matrix by Product

Availability Requirement A B C D E F G

On-line mechanism Yes Yes Yes Yes Yes No Yesreplacement *1 *2 R3 R3 R5

R5 R1R01

Quiescent mechanism Yes Yes Yes Yes Yes No N/Areplacement *3 *4 R0 R0 R0

Global hot spare capability Yes Yes No No Yes No Yes*5 *5

One-level data redundancy Yes Yes Yes Yes Yes Yes YesR3 R3 R1

R5 R01R5

Two-level data redundancy Yes Yes Yes Yes Yes No Yes*6 *6 *6 *6

Cost H H H M M/L VH M

Footprint H H H H M VH M

redundant disk mechanism Yes Yes Yes Yes Yes Yes Yes

redundant power supply Yes Yes No No Yes Yes Yes*7 *7

redundant power source Yes Yes No No No No Yes*7 *7

Page 53: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

Table N: Disk Selection Matrix by Product

Availability Requirement A B C D E F G

DRAFT -- Revision 2.0August 22, 1996Page 53

redundant cooling Yes Yes Yes Yes Yes ??? Yes*7

hot-replaceable cooling N/A Yes No No Yes ??? Yes

redundant controller Yes Yes No No Yes ??? Yes*7 *7

hot-replaceable controller N/A N/A No No Yes ??? Yes

redundant link cable Yes Yes Yes Yes Yes Yes Yes*7 *7 *8 *9 *10 *9

redundant SPU interface Yes Yes Yes Yes Yes Yes Yes*7 *7 *8 *9 *10 *9

concurrent offline backup Yes Yes No No No No No

performance H H L M/L M/L VH M

cable length 25 25 500 25 25 25 25m m m m m m m

automatic disk backup in case No No No No No Yes Noof power failure

Page 54: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 54

NOTES:

*1 requires possible removal from rack, opening of the tower or tray, and amany-step complex procedure

*2 requires a multi-step complex procedure; mechanisms can be easilyremoved and replaced from the front

*3 requires quiescence of the application, several LVM commands, possibleslide-out from the rack, opening of the tower or tray

*4 requires quiescence of the application; mechanisms can be easilyremoved and replaced from the front

*5 requires a procedure similar to that of on-line or quiescent JBODreplacement

*6 with LVM mirroring of multiple RAID disk arrays

*7 inherent capability due to mirroring of standalone disks

*8 with multiple disk arrays and PV Links

*9 with V-cable and PV Links

*10 with redundant controller and PV Links

SUMMARY

If it is possible to simplify the myriad of factors associated with choosing the right disk inan HA environment, one could reduce to the general case by concluding:

If performance is more important than cost and capacity, LVM mirroredstandalone disks (JBODs) are the best choice.

If cost and/or capacity are more important than performance, HADA disk arraysin RAID 5 are the best choice.

Page 55: Choosing the Right Disk Technology in a High Availability ... · Like price/performance, price/availability has many tradeoffs that need to be evaluated. The highest availability

DRAFT -- Revision 2.0August 22, 1996Page 55