Fault Tolerance Jacob Holt Nick Chaconas John Aucoin.

17
Fault Tolerance Jacob Holt Nick Chaconas John Aucoin

Transcript of Fault Tolerance Jacob Holt Nick Chaconas John Aucoin.

Page 1: Fault Tolerance Jacob Holt Nick Chaconas John Aucoin.

Fault ToleranceJacob HoltNick ChaconasJohn Aucoin

Page 2: Fault Tolerance Jacob Holt Nick Chaconas John Aucoin.

Server Mirroring

11/19/1411/19/14 11/16/1411/16/14

Identical files are not mirrored

New files are sent from primary server to

secondary server

All files can be replicated

Modules are used to copy files

to primary server

Page 3: Fault Tolerance Jacob Holt Nick Chaconas John Aucoin.

Server Mirroring

• Pro: Very effective for fault-tolerance

• Con: $$ expensive

Page 4: Fault Tolerance Jacob Holt Nick Chaconas John Aucoin.

RAID

• RAID initially meant Redundant Array of Inexpensive Disks but nowadays means Redundant Array of Independent Disks.

• We can have a hardware-based array if archived through a proprietary expansion card controller (maybe a component integrated into the motherboard)

or

• Software-based array if everything is archived through the main cpu (or cpus).

• Anyway the RAID levels are the same

Page 5: Fault Tolerance Jacob Holt Nick Chaconas John Aucoin.

RAID 0

A 1A 1

A 3A 3

A 5A 5

A 7A 7 A 8A 8

A 6A 6

A 4A 4

A 2A 2

Pros: Read and write operations on your computer will occur faster

Cons: Very little pure read/write that gets done on a computer, you have the dead drive problem that you alluded to, "brown out" problem

Drive 1 Drive 2

Striping

Page 6: Fault Tolerance Jacob Holt Nick Chaconas John Aucoin.

RAID 1

A 1A 1

A 2A 2

A 3A 3

A 4A 4 A 4A 4

A 3A 3

A 2A 2

A 1A 1

Pros: One Write or two Reads possible per mirrored pair, 100% redundancy of data means no rebuild is necessary in case of a disk failure, just a copy to the replacement disk, simplest RAID storage subsystem design

Cons: Highest disk overhead of all RAID types (100%) – inefficient, typically the RAID function is done by system software, loading the CPU/Server and possibly degrading throughput at high activity levels, hardware implementation is strongly recommended

Drive 1 Drive 2

Mirrored data to both drives

Page 7: Fault Tolerance Jacob Holt Nick Chaconas John Aucoin.

RAID 2

A 0A 0

B 0B 0

C 0C 0

D 0D 0 D 1D 1

C 1C 1

B 1B 1

A 1A 1

Pros: 'On the fly' data error correction, extremely high data transfer rates possible, the higher the data transfer rate required, the better the ratio of data disks to ECC disks

Cons: Very high ratio of ECC disks to data disks with smaller word sizes – inefficient, entry level cost very high - requires very high transfer rate requirement to justify, no commercial implementations exist

Drive 1 Drive 2

A 2A 2

B 2B 2

C 2C 2

D 2D 2 D 3D 3

C 3C 3

B 3B 3

A 3A 3

Drive 3 Drive 4

ECC AXECC AX

ECC BXECC BX

ECC CXECC CX

ECC DXECC DX ECC DYECC DY

ECC CYECC CY

ECC BYECC BY

ECC AYECC AY

ECC DZECC DZ

ECC CZECC CZ

ECC BZECC BZ

ECC AZECC AZ

Bit-level striping with dedicated hamming code parity

Page 8: Fault Tolerance Jacob Holt Nick Chaconas John Aucoin.

RAID 3

A 0A 0

B 0B 0

C 0C 0

D 0D 0 D 1D 1

C 1C 1

B 1B 1

A 1A 1

Pros: Very high Read data transfer rate, very high Write data transfer rate,disk failure has an insignificant impact on throughput, low ratio of ECC (Parity) disks to data disks means high efficiency

Cons: Transaction rate equal to that of a single disk drive at best (if spindles are synchronized), controller design is fairly complex, very difficult and resource intensive to do as a 'software' RAID

Drive 1 Drive 2

A 2A 2

B 2B 2

C 2C 2

D 2D 2 C 4C 4

C 3C 3

B 3B 3

A 3A 3

Drive 3 Drive 4

APAP

BPBP

CPCP

DPDP

Parallel transfer with dedicated parity

Page 9: Fault Tolerance Jacob Holt Nick Chaconas John Aucoin.

RAID 4

A 0A 0

B 0B 0

C 0C 0

D 0D 0 D 1D 1

C 1C 1

B 1B 1

A 1A 1

Pros: Very high Read data transaction rate, low ratio of ECC (Parity) disks to data disks means high efficiency, high aggregate Read transfer rate, low ratio of ECC (Parity) disks to data disks means high efficiency

Cons: Quite complex controller design, worst Write transaction rate and Write aggregate transfer rate, difficult and inefficient data rebuild in the event of disk failure, block Read transfer rate equal to that of a single disk

Drive 1 Drive 2

A 2A 2

B 2B 2

C 2C 2

D 2D 2 C 4C 4

C 3C 3

B 3B 3

A 3A 3

Drive 3 Drive 4

APAP

BPBP

CPCP

DPDP

Independent data disks with shared parity disk

Page 10: Fault Tolerance Jacob Holt Nick Chaconas John Aucoin.

RAID 5

A 1A 1

B 1B 1

C 1C 1

DPDP D 1D 1

CPCP

B 2B 2

A 2A 2

Pros: Highest Read data transaction rate, medium Write data transaction rate, low ratio of ECC (Parity) disks to data disks means high efficiency, good aggregate transfer rate

Cons: Disk failure has a medium impact on throughput, most complex controller design, difficult to rebuild in the event of a disk failure (as compared to RAID level 1), individual block data transfer rate same as single disk

Drive 1 Drive 2

A 3A 3

BPBP

C 2C 2

D 2D 2 C 4C 4

C 3C 3

B 3B 3

APAP

Drive 3 Drive 4

Independent data disks with distributed parity blocks

Page 11: Fault Tolerance Jacob Holt Nick Chaconas John Aucoin.

RAID 6

A 1A 1

B 1B 1

C 1C 1

DPDP DQDQ

CPCP

B 2B 2

A 2A 2

Pros: RAID 6 is essentially an extension of RAID level 5 which allows for additional fault tolerance by using a second independent distributed parity scheme (two-dimensional parity), data is striped on a block level across a set of drives, just like in RAID 5, and a second set of parity is calculated and written across all the drives; RAID 6 provides for an extremely high data fault tolerance and can sustain multiple simultaneous drive failures, perfect solution for mission critical applications

Cons: Very complex controller design, controller overhead to compute parity addresses is extremely high, very poor write performance, requires N+2 drives to implement because of two-dimensional parity scheme

Drive 1 Drive 2

A 3A 3

BPBP

CQCQ

D 1D 1 C 2C 2

C 2C 2

BQBQ

APAP

Drive 3 Drive 4

AQAQ

B 3B 3

C 3C 3

D 3D 3

Independent data disks with two independent distributed parity schemes

Drive 5

Page 12: Fault Tolerance Jacob Holt Nick Chaconas John Aucoin.

Scripting DiskRAID

• DiskRAID can be scripted on any computer running Windows Server 2008 or Windows Server 2003 with an associated VDS hardware provider.

• To invoke a DiskRAID script, at the command prompt type:

diskraid /s <script.txt>

• By default, DiskRAID stops processing commands and returns an error code if there is a problem in the script. To continue running the script and ignore errors, include the NOERR parameter on the command.

Page 13: Fault Tolerance Jacob Holt Nick Chaconas John Aucoin.

DiskRAID error codes

Code Description

0 No error occurred. The entire script ran without failure.

1 A fatal exception occurred.

2 The arguments specified on a DiskRAID command line were incorrect.

3 DiskRAID was unable to open the specified script or output file.

4 One of the services DiskRAID uses returned a failure.

5 A command syntax error occurred. The script failed because an object was improperly selected or was invalid for use with that command.

Page 14: Fault Tolerance Jacob Holt Nick Chaconas John Aucoin.

Network Area Storage• (NAS) device is a server that is dedicated to nothing more

than file sharing. • NAS does not provide any of the activities that a server in a server-

centric system typically provides, such as email, authentication or file management.

• NAS allows more hard disk storage space to be added to a network that already utilizes servers without shutting them down for maintenance and upgrades.

• With a NAS device, storage is not an integral part of the server. Instead, in this storage-centric design, the server still handles all of the processing of data but a NAS device delivers the data to the user.

• A NAS device does not need to be located within the server but can exist anywhere in a LAN and can be made up of multiple networked NAS devices.

Page 15: Fault Tolerance Jacob Holt Nick Chaconas John Aucoin.

DLink DNS-323

Netgear ReadyNAS Duo

Netgear ReadyNAS Ultra 2

Iomega StorCenter ix4-200d

Netgear ReadyNAS Ultra 4

Netgear ReadyNAS Ultra 6

Page 16: Fault Tolerance Jacob Holt Nick Chaconas John Aucoin.

Storage Area Network

• A Storage area network, or SAN, is a high-speed network of storage devices that also connects those storage devices with servers.

• SAN provides block-level storage that can be accessed by the applications running on any networked servers.

• SAN storage devices can include tape libraries, and, more commonly, disk-based devices, like RAID hardware.

Page 17: Fault Tolerance Jacob Holt Nick Chaconas John Aucoin.

Storage Area Network