Microsoft Cross-Site Disaster Recovery...

18
Microsoft Corporation ©2009 Microsoft Cross-Site Disaster Recovery Solutions End-to-End Solutions Enabled by Windows Server 2008 Failover Clustering, Hyper-V, and Partner Solutions for Data Replication Published: December 2009 Introduction: This white paper describes various end-to-end disaster recovery solutions for Windows virtualized environments. These solutions are enabled by Windows Server 2008 Failover Clustering, Hyper-V technology, and partner solutions for data replication. These solutions demonstrate automated failover capabilities in a geographically dispersed virtualized Microsoft environment.

Transcript of Microsoft Cross-Site Disaster Recovery...

Page 1: Microsoft Cross-Site Disaster Recovery Solutionsi.dell.com/.../it/Documents/disaster-recovery-solutions_it.pdf · Microsoft Corporation ©2009 Microsoft Cross-Site Disaster Recovery

Microsoft Corporation ©2009

Microsoft Cross-Site Disaster Recovery Solutions

End-to-End Solutions Enabled by Windows Server 2008 Failover Clustering, Hyper-V, and Partner Solutions for Data Replication

Published: December 2009

Introduction: This white paper describes various end-to-end disaster recovery

solutions for Windows virtualized environments. These solutions are enabled by

Windows Server 2008 Failover Clustering, Hyper-V technology, and partner solutions for

data replication. These solutions demonstrate automated failover capabilities in a

geographically dispersed virtualized Microsoft environment.

Page 2: Microsoft Cross-Site Disaster Recovery Solutionsi.dell.com/.../it/Documents/disaster-recovery-solutions_it.pdf · Microsoft Corporation ©2009 Microsoft Cross-Site Disaster Recovery

Disaster Recovery in a Geographically Dispersed Cross-Site Virtual Environment ii

Microsoft Corporation ©2009

Copyright

The information contained in this document represents the current view of Microsoft Corporation on the issues

discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it

should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the

accuracy of any information presented after the date of publication.

This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS,

IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable

copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this

document may be reproduced, stored in, or introduced into a retrieval system, or transmitted in any form or

by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without

the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications,

trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except

as expressly provided in any written license agreement from Microsoft, the furnishing of this document does

not give you any license to these patents, trademarks, copyrights, or other intellectual property.

2009 Microsoft Corporation. All rights reserved. Microsoft, Hyper-V, Windows, Windows PowerShell, and

Windows Server are either registered trademarks or trademarks of Microsoft Corporation in the United States

and/or other countries.

The names of actual companies and products mentioned herein may be the trademarks of their respective

owners.

Page 3: Microsoft Cross-Site Disaster Recovery Solutionsi.dell.com/.../it/Documents/disaster-recovery-solutions_it.pdf · Microsoft Corporation ©2009 Microsoft Cross-Site Disaster Recovery

Disaster Recovery in a Geographically Dispersed Cross-Site Virtual Environment iii

Microsoft Corporation ©2009

Contents Executive Summary .............................................................................................. 4

Introduction ........................................................................................................ 4

Key Concepts ....................................................................................................... 5

DR Server Functionality ................................................................................... 6

Clustering ................................................................................................. 6

Virtualization ............................................................................................. 6

Multipath I/O Support ................................................................................. 7

DR Replication Functionality ............................................................................. 7

Synchronous Replication ............................................................................. 7

Asynchronous Replication ............................................................................ 8

Microsoft DR Solution Components ......................................................................... 8

Windows Server 2008 Hyper-V ......................................................................... 9

Windows Server Failover Clustering ................................................................. 10

System Center Virtual Machine Manager .......................................................... 10

DR Solutions in Hyper-V Environments .................................................................. 11

Software-Based Solutions............................................................................... 12

Appliance-Based Solutions .............................................................................. 14

Array-Based Solutions ................................................................................... 15

Key Benefits of Windows Server 2008 Hyper-V and WSFC in DR ............................... 17

Conclusion ........................................................................................................ 18

Additional Information ........................................................................................ 18

Page 4: Microsoft Cross-Site Disaster Recovery Solutionsi.dell.com/.../it/Documents/disaster-recovery-solutions_it.pdf · Microsoft Corporation ©2009 Microsoft Cross-Site Disaster Recovery

Disaster Recovery in a Geographically Dispersed Cross-Site Virtual Environment 4

Microsoft Corporation ©2009

Executive Summary This white paper discusses how organizations can build effective and highly available

disaster recovery (DR) solutions using Microsoft virtualization and failover capabilities

complemented with partner data replication products. It showcases various available

end-to-end disaster recovery solutions for different scenarios. Each scenario enables

organizations to select a Microsoft partner product to provide cross-site data

management and replication that is appropriate for a company’s particular

environment. The solutions discussed in this document are generic approaches for

structuring DR solutions in a geographically dispersed Hyper-V environment with the

help of clustering.

The solutions discussed are intended to provide disaster recovery solution configuration

examples that customers can use to evaluate and select end-to-end disaster recovery

solutions that best fit their needs. However, this white paper is not intended to be an

exhaustive study of specific architectures for every environment. To evaluate specific

data center disaster recovery requirements, please contact a Microsoft sales

representative.

This paper is written for those who have a working knowledge of Windows Server 2008

and virtualized Windows Server environments. This paper also assumes that the reader

understands replication and clustering terminology.

Introduction For Information Technology (IT), mitigating the risks to critical data, systems, and

applications (in addition to computing infrastructures in the event of system outages or

complete disasters) presents an ongoing challenge from both a technological and a

business perspective. Organizations need to consistently find solutions that not only

meet application and data requirements for capacity, performance, and availability, but

also have proven return on investment (ROI) and cost reduction capabilities.

Virtualization has been a game changer for many companies. It has allowed companies

that previously were unable to afford DR to begin implementing DR solutions.

Virtualization has also enabled companies to justify costs by providing full DR for

additional applications. In addition, it has provided more flexible options for effective

DR.

The business challenge is to acquire the ability to create a cost-effective, highly

available, and protected virtual server infrastructure. This infrastructure needs to make

certain that applications meet business-defined service-level agreements (SLAs) for

availability and disaster recovery preparedness. This white paper discusses several

options that meet these requirements.

Windows Server 2008 Failover Clustering and Hyper-V technology, coupled with partner

data replication solutions can be used to build end-to-end robust, highly available, and

cost-effective DR solutions. In order to understand these solutions, key business

continuity planning and DR concepts, as well as the key technical components of the

solutions, need to be understood.

Page 5: Microsoft Cross-Site Disaster Recovery Solutionsi.dell.com/.../it/Documents/disaster-recovery-solutions_it.pdf · Microsoft Corporation ©2009 Microsoft Cross-Site Disaster Recovery

Disaster Recovery in a Geographically Dispersed Cross-Site Virtual Environment 5

Microsoft Corporation ©2009

Key Concepts Business continuity planning is the ability to minimize scheduled and unscheduled

downtime for IT systems in an organization. Hyper-V™ technology from Microsoft

includes powerful business continuity features, such as live migration, which enable

businesses to deliver rigorous uptime and response service levels.

In order to minimize damage and quickly return to a normal operative state after a

scheduled downtime or a disaster, good business continuity planning is required. An

understanding of the systems that require protection and the required level of

protection is necessary. This knowledge is usually formalized into a service-level

agreement (SLA), which becomes the responsibility of the IT department to uphold. The

SLA consists of recovery time objectives (RTO) and recovery point objectives (RPO),

which are defined for each system that requires protection. The RTO is the duration of

time and the service level that a business process must be restored to after a disaster

or disruption in order to avoid unacceptable consequences associated with a break in

business continuity. The RPO is the point in time when data must be recovered as

defined by an organization.

DR is a key component of business continuity that facilitates IT operations resumption

of key systems after a site level crisis per the SLA as shown in Figure 1 below. Hyper-V

utilizes the clustering capabilities of Windows Server 2008 to provide support for

disaster recovery within IT environments and across data centers, using geographically

dispersed clustering capabilities.

Business Continuity

Resumption of full operations combining people, processes, and platforms

Disaster Recovery

Protection zone or site level crisis, facilitates IT operations resumption for applications and access to data.

High Availability

Local clustering use cases

presume a contained

failure and that the rest of

the environment is active.

Backup and Recovery

Generally presumes that

the infrastructure is whole

and 97 percent of use

cases are file/small unit

related.

Replication

Synchronous or

asynchronous coordination

of data on different storage

devices.

Figure 1. Business Continuity Framework

Page 6: Microsoft Cross-Site Disaster Recovery Solutionsi.dell.com/.../it/Documents/disaster-recovery-solutions_it.pdf · Microsoft Corporation ©2009 Microsoft Cross-Site Disaster Recovery

Disaster Recovery in a Geographically Dispersed Cross-Site Virtual Environment 6

Microsoft Corporation ©2009

In order to understand how DR can be optimized in a virtualized environment, it is

important to understand some important technical aspects. These aspects include

related server and replication functionality in a DR environment, which is described in

the following sections.

DR Server Functionality

Clustering is a key server function in a DR environment. In addition to clustering, the

addition of virtualization enhances DR capabilities and can improve DR performance. To

be complete, a DR plan needs to include Multipath I/O support to ensure the high

availability of the connected storage.

Clustering

Clustering is normally divided into three conceptual types: failover cluster, load

balancing cluster, and grid computing. Microsoft has all three clustering capabilities in

the Windows Server operating system. In this paper, failover clustering is the focus.

Failover clustering is a very mature technology that can be used for mission-critical

applications such as file and print servers, application servers, database servers, and so

on. A cluster enables two or more servers to work together as a computer group. This

can provide failover and increase the availability of the application and data in any

situation. In the event of a primary node going down, failover software based on a

heartbeat technique triggers an automatic restart of services on the secondary nodes of

the cluster. There are two types of failover clustering: local clustering and stretch

clustering.

Local Clustering: With local clustering, all cluster participant nodes are at the

same facility or data center and are physically coupled with the heartbeat link. This

configuration can provide application failover, but cannot sustain hosting during

downtime that affects the entire facility or data center. For example, if the whole

data center is affected by a catastrophic event, the entire facility is subjected to

downtime and does not provide maximum uptime. Still, local clustering is a

preferred solution for applications that need to be failed over immediately.

Authentication domain servers and financial transaction Web servers are examples

of servers that can negatively affect the infrastructure if there is failover delay.

Stretch Clustering: Stretch clustering or geographically dispersed clustering

mitigates the issues involved with local clustering. When a primary site goes down

due to natural or man-made disasters, local clustering is not enough to achieve the

required uptime of mission-critical applications. If a specific site has clusters

spanning different seismic zones, applications can be failed over to the secondary

site that is unaffected by the primary site downtime. Stretch clustering writes data

to both the primary storage system and the remote storage system. This extends

the capabilities of a single failover cluster solution and guards against downtime

with Windows Server Failover Clustering (WSFC) failover.

Virtualization

WSFC supports physical and virtual environments, including physical-to-physical,

virtual-to-virtual, and physical-to-virtual configurations. By supporting virtual

environments, the new functionality assists the DR solutions. For example, Windows

Server 2008 R2 has extended its WSFC feature set with the Clustered Shared Volume

Page 7: Microsoft Cross-Site Disaster Recovery Solutionsi.dell.com/.../it/Documents/disaster-recovery-solutions_it.pdf · Microsoft Corporation ©2009 Microsoft Cross-Site Disaster Recovery

Disaster Recovery in a Geographically Dispersed Cross-Site Virtual Environment 7

Microsoft Corporation ©2009

(CSV) feature. This feature enables concurrent access to the Virtual Hard Drive (VHD)

files on a Logical Unit Number (LUN) of the CSV. This is highly beneficial during live

migration. With concurrent access to the VHD files, there is no access delay in site

disaster scenarios. This assists greatly in trimming down the RTO.

Multipath I/O Support

Windows Server 2008 R2 includes many enhancements for the connectivity of a

computer running a Windows Server–class operating system to storage area network

(SAN) devices. Among the enhancements that enable high availability for connecting

Windows-based servers to SANs is integrated Multipath I/O (MPIO) support. Microsoft

MPIO architecture supports iSCSI, fiber channel, and serial-attached storage (SAS) SAN

connectivity by establishing multiple sessions or connections to the storage array.

Companies can take advantage of MPIO to implement an infrastructure for a reliable

shared storage solution with built-in redundancy and tight integration of virtualization

management capabilities. The Microsoft MPIO framework provides high availability and

dynamic load balancing to SAN devices through a redundant network or fabric

connections. Microsoft MPIO dynamically routes input/output (I/O) to the best path and

protects against failures at any connection point between a Hyper-V host and shared

storage, including NICs/adapters, switches, or array ports.

DR Replication Functionality

Microsoft partner data replication products help organizations to maintain consistent

data sets between sites to avoid data loss during cross-site failovers. As existing data

replication techniques support both fiber channel and Gigabit Ethernet technology, they

can be easily integrated into any existing IT infrastructure without major modifications.

Data replication modes are generally classified as synchronous replication or

asynchronous replication. Both synchronous and asynchronous replication techniques

can use either byte-level or block-level copy methods. Byte-level copy maintains a copy

of the data on each node of the cluster and updates each copy as the data changes.

Block-level copy gives continuous data protection that enables the data to be restored

at any point in time, ensuring that the whole process of replication is quick and

efficient. The following sections discuss synchronous and asynchronous replication in

detail.

Synchronous Replication

With synchronous replication, the I/O updates in the cache at the remote storage array

precede updates at the primary location. The remote storage array confirms the I/O

completion to the primary site, which in turn initiates the process at the primary site.

Thus, the writing application only receives a write-completed response from the storage

system when the I/O write operation is completed at both the remote storage location

and the local storage location. Its performance heavily depends on network bandwidth,

network latency, and distance. For minimum delay of data round-trip time, network

bandwidth should be high and distance and network latency should be low. This makes

synchronous mirroring a viable replication solution only for shorter distances up to 100

miles. It also has the extra overhead of requiring faster network bandwidth.

Synchronous remote mirroring enables the highest possible level of RPO and RTO.

Page 8: Microsoft Cross-Site Disaster Recovery Solutionsi.dell.com/.../it/Documents/disaster-recovery-solutions_it.pdf · Microsoft Corporation ©2009 Microsoft Cross-Site Disaster Recovery

Disaster Recovery in a Geographically Dispersed Cross-Site Virtual Environment 8

Microsoft Corporation ©2009

Asynchronous Replication

In asynchronous replication, I/O is completed at the primary site without any

acknowledgement requirement from the secondary site. The distance limitation of

synchronous replication can be overcome in an asynchronous replication mode by delta

set architecture that helps long-distance replication with customizable synchronization

times. Delta sets are collections of writes that have occurred in a specific amount of

time. Unlike the synchronous replication mode, hosts are not involved in the replication

process that also helps to achieve better I/O performance on the hosts. Performance is

greatly increased, but at the risk of potentially losing data. If the local storage is lost

due to failure, asynchronous replication does not guarantee that the remote storage has

the most current data copy. Therefore, the most recent data might be lost depending

on the configuration and the circumstances.

Microsoft partner products available today support both synchronous as well as

asynchronous replication modes along with byte-level or block-level copy modes to

maintain data connectivity even during hardware, link, or complete hosting facility

downtime. The user has the liberty to select a single storage vendor or a multi-storage

vendor–based product, or a completely software-based product to deploy a DR-ready IT

environment.

Microsoft DR Solution Components When considering DR options, virtualization is a game changer. Virtualization makes DR

affordable to companies that could not afford it before. Because it is cost effective, DR

planning can be expanded further into the application pool to offer better service levels

to more applications for which the investment was not previously justifiable. With

Windows Server 2008, everything required to start using virtualization is available.

Virtualization functionality is built right into Windows Server 2008 as the Hyper-V role.

The key Microsoft components of the DR solutions include:

Windows Server 2008 with Hyper-V

Windows Server Failover Clustering

Microsoft System Center Virtual Machine Manager

Page 9: Microsoft Cross-Site Disaster Recovery Solutionsi.dell.com/.../it/Documents/disaster-recovery-solutions_it.pdf · Microsoft Corporation ©2009 Microsoft Cross-Site Disaster Recovery

Disaster Recovery in a Geographically Dispersed Cross-Site Virtual Environment 9

Microsoft Corporation ©2009

Windows Server 2008 with Hyper-V

Hyper‐V is the hypervisor‐based virtualization technology from Microsoft that is

integrated into all Windows Server 2008 x64 Edition operating systems. As a

virtualization solution, Hyper‐V enables users to take maximum advantage of the server

hardware by providing the capability to run multiple operating systems (on virtual

machines) on a single physical server.

The availability of Hyper‐V as a role in a standard Windows operating system provides

several key advantages:

Features Benefits

Built-in technology Hyper-V enables enterprises to easily utilize

the benefits of virtualization without

adopting a new technology.

Broad device driver support The new 64‐bit micro-kernelized hypervisor

architecture uses the broad device driver

support in the Windows Server 2008 parent

partition to extend support to a broad array

of servers, storage, and devices.

SMP support Hyper-V supports symmetric multiprocessors

(SMP) in virtual machines.

Host high availability Windows Server 2008 clustering provides

high availability to virtual machines to

minimize unplanned downtime.

Shared storage high availability Microsoft MPIO dynamically routes I/O to the

best path and safeguards against connection

failures at any point between a Hyper-V host

and shared storage, including

NICs/adapters, switches, or array ports.

Easy virtual machine migration Live migration capability to support business

continuity during planned and unplanned

downtime and over a distance.

Volume Shadow Copy Service (VSS)

support

Robust host‐based backup of virtual

machines by utilizing the existing Windows

VSS-based infrastructure.

Easy extensibility Easy extensibility using the standards-based

Windows Management Instrumentation

(WMI) interfaces and APIs.

Simplified integrated management With its tight integration into the Microsoft

System Center family of products,

customers have end-to‐end physical and

virtual infrastructure management capability

for Hyper‐V environments.

Table 1. Hyper-V Features

Page 10: Microsoft Cross-Site Disaster Recovery Solutionsi.dell.com/.../it/Documents/disaster-recovery-solutions_it.pdf · Microsoft Corporation ©2009 Microsoft Cross-Site Disaster Recovery

Disaster Recovery in a Geographically Dispersed Cross-Site Virtual Environment 10

Microsoft Corporation ©2009

Windows Server Failover Clustering

Failover clustering in Windows Server 2008 helps to ensure that mission-critical

applications and services, such as e-mail and line-of-business applications, are available

when required. Beyond the capabilities already mentioned in the previous stretch

clustering section, some other important capabilities of WSFC for DR solutions include:

Features Benefits

No single-subnet limitation

Enable cluster nodes to communicate across

network routers. It is no longer necessary to

connect nodes with virtual local area

networks (VLANs).

Configurable heartbeat timeouts

Increase to extend geographically dispersed

clusters over greater distances. Decrease to

detect failures faster and take recovery

actions for quicker failover.

Common toolset Similar management experience to

managing local cluster.

Automated failover Automatic failover on complete disaster in

one site.

VSS support VSS support to back cluster settings.

Automation support Automation support starting Windows Server

2008 R2 with Cluster PowerShell.

Cross-site replication tool

combination

Mirrored storage between stretched

locations. Seamless integration with partner

hardware or software-based data replication

solutions.

Table 2. Windows Server Failover Clustering Features

System Center Virtual Machine Manager

Microsoft System Center Virtual Machine Manager 2008 is enterprise‐class management

software that enables administrators to easily and effectively manage both the physical

and virtual environments from a single management console and thus avoid the

complexity of using multiple consoles typically associated with managing an IT

infrastructure. The key capabilities of Virtual Machine Manager 2008 include:

Page 11: Microsoft Cross-Site Disaster Recovery Solutionsi.dell.com/.../it/Documents/disaster-recovery-solutions_it.pdf · Microsoft Corporation ©2009 Microsoft Cross-Site Disaster Recovery

Disaster Recovery in a Geographically Dispersed Cross-Site Virtual Environment 11

Microsoft Corporation ©2009

Features Benefits

Enterprise‐class management suite Manages both Hyper‐V and VMware ESX

virtualization environments.

Intelligent virtual machine placement Supports intelligent placement of virtual

machines.

System Center Operations Manager

2007 integration

Works with System Center Operations

Manager 2007 to provide proactive

management of both virtual and physical

environments through a single console by

leveraging Performance and Resource

Optimization (PRO).

Native physical-to-virtual/virtual-to-

virtual migration

Offers native capability for

physical‐to‐virtual and virtual‐to‐virtual

migrations.

Failover integration Works with failover clustering to support

the high availability and live migration of

virtual machines.

Automation Offers easy automation capabilities that

utilize Windows PowerShell.

Table 3. System Center Virtual Machine Manager Features

System Center Virtual Machine Manager 2008 can be configured in multiple ways,

depending on the implementation requirements. A basic configuration will have Virtual

Machine Manager 2008 installed and running on a standalone server with local disks on

the server as storage. Attaching a storage enclosure to the standalone server hosting

Virtual Machine Manager is recommended if the deployment requires a relatively large

library server. The library server is a capability built into Virtual Machine Manager for

storing VHD templates, inactive virtual machine files, ISO images, and so on.

DR Solutions in Hyper-V Environments Even though high availability can be achieved with local clustering, this will not

safeguard a company from the entire data center or hosting facility going down. In this

event, a DR solution needs to have geographically dispersed clusters as well as the

means to replicate data over these distances and restart the total infrastructure from

the secondary cluster site.

Virtualization solutions addressing DR including data replication can be classified into

three categories. As shown in Figure 2, these types include:

Software-based solutions: Software-based data replication solutions are third-party software

suites hosted by the application server, which replicate data over the wide area network

(WAN).

Appliance-based solutions: With appliance-based data replication solutions, all intelligence

needed to perform the replication is housed in an appliance that resides in the I/O path

between the host and the storage, typically in a SAN.

Page 12: Microsoft Cross-Site Disaster Recovery Solutionsi.dell.com/.../it/Documents/disaster-recovery-solutions_it.pdf · Microsoft Corporation ©2009 Microsoft Cross-Site Disaster Recovery

Disaster Recovery in a Geographically Dispersed Cross-Site Virtual Environment 12

Microsoft Corporation ©2009

Array-based solutions: In an array-based data replication solution, the replication is native to

the storage controllers.

Disaster Recovery Solution Types

Protection Zone or Site-level Crisis

Software-Based

Host-based solutions to manage storage replication and failover.

Appliance-Based

Data replication and storage fail over based on the SAN controller/appliance.

Array-Based

Data replication and storage fail over based on a mid-range and enterprise SAN infrastructure.

Figure 2. Disaster Recovery Framework

Software-Based Solutions

Software-based solutions use third-party software applications that work with Windows

Server Failover Clustering technology to provide synchronous or asynchronous data

replication. Software-based replication uses patented technologies to replicate data

over the WAN. This enables clusters to be spanned across different geographic locations

as well as different seismographic zones, thereby eliminating a single point of failure.

These solutions can work equally well with or without the use of shared storage within

the cluster. Such software suites coupled with WSFC can provide a cost-effective DR

solution for Hyper-V–based setups.

These solutions are also called host-based solutions because they reside on the

application server that needs to have its data replicated. Therefore, an issue with host-

based replication is that it takes processing cycles away from the applications running

on the host. Another issue is that as more servers need to use replication, the cost goes

up, including the initial cost of each software license, implementation, and service, as

well as ongoing maintenance. On the other hand, the major benefits for a host-based

solution are that the cost can be very low because a SAN is not required and

heterogeneous storage can be used.

Page 13: Microsoft Cross-Site Disaster Recovery Solutionsi.dell.com/.../it/Documents/disaster-recovery-solutions_it.pdf · Microsoft Corporation ©2009 Microsoft Cross-Site Disaster Recovery

Disaster Recovery in a Geographically Dispersed Cross-Site Virtual Environment 13

Microsoft Corporation ©2009

Figure 3. Software-Based Data Replication

As shown in Figure 3, in the event of any failure that causes heartbeat timeout, WSFC

automatically and seamlessly fails over to the second node. The software maintains a

copy of the data on each node of the cluster and uses a patented replication technology

to update each copy as the data changes. These replication technologies can range

from byte-to-byte copy or block-level copy operations. The software senses application

failover by WSFC and mounts the replicated volumes at secondary sites with read/write

access. Virtual machines are automatically restarted on the backup node with minimal

downtime.

The integration of the user interface with WSFC further simplifies the management and

monitoring of the events. For example, open file replication technology is used to back

up files that are currently being used and applications that are open. This eliminates the

need to bring the virtual machines offline to replicate the data.

Page 14: Microsoft Cross-Site Disaster Recovery Solutionsi.dell.com/.../it/Documents/disaster-recovery-solutions_it.pdf · Microsoft Corporation ©2009 Microsoft Cross-Site Disaster Recovery

Disaster Recovery in a Geographically Dispersed Cross-Site Virtual Environment 14

Microsoft Corporation ©2009

Appliance-Based Solutions

Appliance-based replication technology, like host-based, will support all the types of

replication. Unlike the host-based solutions, all the intelligence needed to perform the

replication is housed in an appliance. This appliance resides in the I/O path between the

host and the storage, typically in a SAN.

Appliance-based replication has many advantages over host-based replication. For

example, there is no replication overhead on the application server. In fact, the

application has little or no knowledge that the appliance exists or that the replication is

taking place. In addition, replication management is centralized on the appliance, and

like host-based solutions, a heterogeneous storage pool can be utilized.

However, there are some major issues with an appliance-based solution. For a highly

available solution, there should be at least two appliances in the local site, configured

as failover for each other, and at least one appliance available remotely. Because the

appliance is involved with every I/O and not just the replicated data, each appliance

should use at least four switch ports. This can add significant cost and complexity to the

SAN infrastructure. In addition, modern disk subsystems can deliver huge I/Os per

second and megabytes per second. This enables application servers to drive these

appliances to their max. Therefore, a SAN appliance in the stream of the I/O can

become a major bottleneck. An environment with large I/O needs can easily overpower

a pair of appliances. Some appliance-based solutions are limited to a pair of appliances

while others can scale beyond two. As additional appliances are added, the cost of the

solution rises, including the cost of each appliance and SAN switch port, as well as

support and other incidentals.

With appliance-based replication, when data is being written to the primary storage, the

appliance saves the data temporarily in the local hardware cache and then transfers it

to the local hard drives. When the data is ready to be synchronized, it is placed in the

mirror queue. With the help of the network link, data is then copied to the peer array.

Generally, destination arrays are unavailable for direct access or are given read-only

access to avoid corrupting the golden copy of the data at the destination. Customers

can deploy this type of robust infrastructure coupled with WSFC and Hyper-V

virtualization to withstand any catastrophic events.

The data replication link has to be configured between the primary site array and

secondary site array. Replication link setup is necessary in order to establish paths

between the primary and secondary storage system for appliance-based data

replication.

Page 15: Microsoft Cross-Site Disaster Recovery Solutionsi.dell.com/.../it/Documents/disaster-recovery-solutions_it.pdf · Microsoft Corporation ©2009 Microsoft Cross-Site Disaster Recovery

Disaster Recovery in a Geographically Dispersed Cross-Site Virtual Environment 15

Microsoft Corporation ©2009

File Server

Stretch Cluster Link

File Share Witness

Heartbeat Link

Heartbeat Link

APPL Server

Database Server

File Server

APPL Server

Database Server

File Server

Storage Area

Network

Storage Area

Network

FC/iSCSI Switch

FC/iSCSISwitch

Windows Server 2008

Cluster with Hyper-V

Windows Server 2008

Cluster with Hyper-V

EMC IBM HDC HP HP HDC IBM EMC

Heterogeneous Storage

InternetWAN

Appliance-Based Data Replication

Appliance-Based Data Replication

Figure 4. Appliance-Based Data Replication

As shown in Figure 4, both sites will lose communication with sudden network or site

disruption scenarios. After missing the next heartbeat, WSFC will fail the application

over to the peer Hyper-V server. On the storage side, replicated LUNs in the

consistency group on the secondary array are made available to the peer Hyper-V

server for I/O. This process can be manually intervened or completely automated.

Automation can be in the form of customized scripts or storage vendor–specific

automation enablers. LUNs having Hyper-V virtual machines can be replicated in either

a synchronous or asynchronous mode as required to support the SLA.

Some storage vendors also provide a data logging or journaling feature with the help of

time stamping to avoid any kind of repetition. This feature also improves the

resynchronization time during failback to the original cluster node.

Array-Based Solutions

Array-based data replication combines the aspects of software-based and appliance-

based solutions. As with appliance-based replication, there is no overhead on the

application servers. Management is centralized and any host supported by the storage

system can use the replication functions of the storage device. Unlike appliance-based

solutions, no extra SAN switch ports are needed to implement storage-based

replication. Since the replication is native to the storage controllers, the impact is

minimal to the application servers utilizing the storage. The only drawback to storage-

based replication is that replication can only take place between homogeneous storage

Page 16: Microsoft Cross-Site Disaster Recovery Solutionsi.dell.com/.../it/Documents/disaster-recovery-solutions_it.pdf · Microsoft Corporation ©2009 Microsoft Cross-Site Disaster Recovery

Disaster Recovery in a Geographically Dispersed Cross-Site Virtual Environment 16

Microsoft Corporation ©2009

systems. In the past, homogeneous-only support could be costly, but today, many

storage devices enable replication from more expensive fiber channel drives to less

expensive Serial Advanced Technology Attachment (SATA) drives, and also support

remote replication from a higher-end model to a lower-end model.

There is an interesting exception to the homogeneous-only support. When using SAN

controllers that support virtualized heterogeneous storage pools, heterogeneous

replication is typically also supported. In this case, SAN administrators can virtualize

heterogeneous storage into one storage pool by configuring the replication at volume

levels. Volumes to be replicated are grouped in replication groups to ensure that all

similar volume entities are managed together. The storage pooling techniques also

minimize management overhead and reduce setup complexity.

File Server

Stretch Cluster Link

File Share Witness

Heartbeat Link

Heartbeat Link

APPL Server

Database Server

File Server

APPL Server

Database Server

File Server

Storage Area

Network

Storage Area

Network

FC/iSCSI Switch

FC/iSCSISwitch

Windows Server 2008

Cluster with Hyper-V

Windows Server 2008

Cluster with Hyper-V

EMC IBM HDC HP HP HDC IBM EMC

Heterogeneous Storage

InternetWAN

Array-Based Data Replication

Array-Based Data Replication

Figure 5. Array-Based Data Replication

As shown in Figure 5, the hardware controller works at the network layer of the SAN

and distributes the I/O from the hosts to storage as well as to the hardware controller.

In the event of server, hardware, or network failure, WSFC initiates failover actions

based on the policy used, and will restart the resource group in the cluster. In this

stretch cluster scenario, every cluster node sees quorum as a local resource and stores

all configuration information on a local disk. WSFC ensures cluster integrity by

replicating changes across cluster nodes. WSFC guards against server hardware failure

and network outages and initiates failover actions to resource group restart, whereas

hardware data replicators provide remote mirroring in asynchronous or synchronous

mode to replicate virtual machines on Hyper-V. During cluster configuration, the cluster

Page 17: Microsoft Cross-Site Disaster Recovery Solutionsi.dell.com/.../it/Documents/disaster-recovery-solutions_it.pdf · Microsoft Corporation ©2009 Microsoft Cross-Site Disaster Recovery

Disaster Recovery in a Geographically Dispersed Cross-Site Virtual Environment 17

Microsoft Corporation ©2009

administrator has to define applications, services, IP, or even disks in the resource

group so that they can be transferred to a backup node during actual failover.

When the active node at the primary site fails or the complete site fails, a heartbeat

timeout occurs. At this point in time, the cluster reforms between the secondary site

and the FSW node. The backup cluster node at the secondary site brings resource

groups online with the help of FSW. The hardware data replicator recovers volumes

listed in the cluster resource group at the secondary site and mounts on the backup

node with read/write access. Applications mentioned in the resource group are

automatically started by WSFC on the backup cluster node. Once everything is done,

virtual machines on the backup cluster node at recovery sites are automatically

restarted and operations resume from the point of failure. Total automated failover and

failback can be achieved with customized scripts or with automation enablers provided

by the hardware data replicator vendor.

In conjunction with vendor-based enablers, the array-based solutions can seamlessly

work with WSFC for remote mirroring operations. This single product can help increase

efficiency, simplify storage management practices, and mitigate risk with business

objectives in a heterogeneous storage pool.

Key Benefits of Windows Server 2008 Hyper-V and WSFC in DR

The combination of Hyper-V and WSFC can reduce the impact of hardware outages in

case of failure of any hardware component by relocating the virtual machines to the

working node. By stretching the cluster geographically and adding a data replication

partner solution, the high availability solution is turned into a complete end-to-end DR

solution.

Virtualization coupled with stretch clustering is a cost-effective and differentiated

solution for maintaining almost 100 percent uptime for mission-critical, high availability

applications. Windows Server 2008 with Hyper-V enables any organization to benefit

from virtualization and WSFC technology across different geographic locations. Hyper-V

provides an inherently security-enhanced architecture that seamlessly merges into the

existing IT environment, simplifying processes, provisioning, and management.

Other high availability advantages of virtualization with Hyper-V include operating

system updates. Updates are considerably simplified because the virtual machines can

be migrated to the peer node with live migration whenever a restart of the primary

node is necessary. This helps ensure that there is no downtime for the end user. CSV

support with live migration in Hyper-V further enhances failover with minimal to no

downtime.

Some of the additional benefits of Hyper-V for overall IT organization are:

64-bit micro-kernel architecture.

Multiple operating system support.

Symmetric multiprocessor support.

Network load balancing with a virtual switch.

New virtual service provider/virtual service client hardware sharing architecture.

Page 18: Microsoft Cross-Site Disaster Recovery Solutionsi.dell.com/.../it/Documents/disaster-recovery-solutions_it.pdf · Microsoft Corporation ©2009 Microsoft Cross-Site Disaster Recovery

Disaster Recovery in a Geographically Dispersed Cross-Site Virtual Environment 18

Microsoft Corporation ©2009

Virtual machine snapshot.

Windows Management Instrumentation (WMI) interfaces and APIs provide the

extensibility to develop custom tools or utilities.

Live backup with VSS.

Conclusion The reasons that a DR plan is a necessity for any global organization are obvious, but

cost has always been a prohibitive factor that has limited the existence and breadth of

these plans. Virtualization is now a game changer for DR planning, providing DR

solutions for a fraction of the nonvirtualized costs. This is enabling companies that

previously could not afford DR to implement flexible and effective solutions.

Furthermore, its affordability is enabling companies to justify full DR for additional

applications, further improving their SLAs. In addition, virtualization has created more

options for effective DR to address a wider range of needs.

The DR solutions discussed in this white paper utilize the power of Hyper-V and WSFC

from Microsoft, combined with partner data replication solutions. The approach provides

credible solutions that cover any DR requirements. However, actual selection of possible

DR approaches in a Hyper-V environment depend on important factors like cost,

criticality, and the distance between sites and should be discussed with a Microsoft

representative trained to assist in DR solution planning.

Additional Information The Microsoft extensive partner ecosystem complements and extends its virtualization

toolset with products for desktops, servers, applications, storage, and networks.

Together with the partners, Microsoft delivers robust and complete solutions for a

virtualized DR-ready infrastructure.

Windows Server 2008 with Hyper-V and WSFC, paired with the right partner software,

appliance, or array-based data replication product, provides a range of promising

solutions for critical DR needs.

For more information, see:

Microsoft Virtualization Solutions:

www.microsoft.com/virtualization/solutions

Microsoft Virtualization Partners:

www.microsoft.com/virtualization/partners

Microsoft Windows Server 2008 with Hyper-V product information:

www.microsoft.com/windowsserver2008/en/us/hyperv-main.aspx

Microsoft System Center Virtual Machine Manager product information:

www.microsoft.com/systemcenter/virtualmachinemanager/en/us/default.aspx