BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the...

74
Stretched Clusters and VMware vCenter Site Recovery Manager: How and When to Choose One, the Other, or Both Chad Sakac, EMC Corporation Vaughn Stewart, NetApp INF-BCO2982 ##vmworldinf

description

BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

Transcript of BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the...

Page 1: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

Stretched Clusters and VMware vCenter Site Recovery Manager: How and When to Choose One, the Other, or Both

Chad Sakac, EMC Corporation

Vaughn Stewart, NetApp

INF-BCO2982

##vmworldinf

Page 2: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Where were we last year? • Covered at VMworld 2011

– BCO2863: Using Distance to Your Advantage (NetApp) – BCO2479: Understanding vSphere Stretched Clusters (EMC)

• Stretched clusters exists since VI3 with NetApp

MetroCluster and accelerated with EMC VPLEX entrance into market – vSphere 5 introduced vMSC certification – initially with EMC

VPLEX, accelerating with new entrants

• Customers are actually seeking availability

– Blending Backup, Disaster Recovery & Disaster Avoidance

Page 3: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

The State of the Union • Adoption continues to accelerate! • vSphere Metro Stretched Cluster HCL is expanding • Hardening of VM HA for stretched clusters

– terminateVMonPDLByDefault in vSphere 5.0 u1 and vSphere 5.1 – Timeout of IO on APD

• Stretched Clusters + SRM = AND, not an OR • Expanding the use cases

– longer and longer distances – Reducing hardware dependencies

Page 4: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Customers Want Geo-Spanned Availability

Stretch Clusters ACROSS DATA CENTERS

SYNCHRONOUS DISTANCES

Future… ACROSS DATA CENTERS ASYNCH

DISTANCES

Disaster Recovery OPERATIONAL AND 3RD SITE

RECOVERY

Page 5: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

“Disaster Recovery” “Disaster Avoidance”

“High Availability”

…Words Matter

Page 6: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

“Disaster” Avoidance – Host Level

“Hey… That host WILL need to go down for maintenance. Let’s vMotion to avoid

a disaster and outage.”

Page 7: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

“Disaster” Avoidance – Host Level

“Hey… That host WILL need to go down for maintenance. Let’s vMotion to avoid

a disaster and outage.”

X

Page 8: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

“Disaster” Avoidance – Host Level

“Hey… That host WILL need to go down for maintenance. Let’s vMotion to avoid

a disaster and outage.”

X This is vMotion.

Most important characteristics:

• By definition, avoidance, not

recovery. • “non-disruptive” is massively

different than “almost non-disruptive”

Page 9: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

“Disaster” Recovery – Host Level

Page 10: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

“Disaster” Recovery – Host Level

Hey… That host WENT down due to unplanned failure causing a unplanned outage due to that disaster. Let’s automate the RESTART of the

affected VMs on another host.

X

Page 11: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

“Disaster” Recovery – Host Level

Hey… That host WENT down due to unplanned failure causing a unplanned outage due to that disaster. Let’s automate the RESTART of the

affected VMs on another host.

X This is VM HA.

Most important characteristics:

• By definition recovery

(restart), not avoidance • Simplicity, automation,

sequencing

Page 12: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Disaster Avoidance – Site Level

Hey… That site WILL need to go down for maintenance. Let’s vMotion to avoid

a disaster and outage.

Page 13: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Disaster Avoidance – Site Level

Hey… That site WILL need to go down for maintenance. Let’s vMotion to avoid

a disaster and outage.

Page 14: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Disaster Avoidance – Site Level

Hey… That site WILL need to go down for maintenance. Let’s vMotion to avoid

a disaster and outage.

This is inter-site vMotion.

Most important characteristics:

• By definition, avoidance, not

recovery. • “non-disruptive” is massively

different than “almost non-disruptive” X

Page 15: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Disaster Recovery – Site Level

Hey… That site WENT down due to unplanned failure causing a unplanned outage due to that disaster. Let’s automate the RESTART of the

affected VMs on another host.

X

Page 16: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Disaster Recovery – Site Level

Hey… That site WENT down due to unplanned failure causing a unplanned outage due to that disaster. Let’s automate the RESTART of the

affected VMs on another host.

X

Page 17: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Disaster Recovery – Site Level

Hey… That site WENT down due to unplanned failure causing a unplanned outage due to that disaster. Let’s automate the RESTART of the

affected VMs on another host.

X This is Disaster

Recovery. Most important characteristics:

• By definition recovery

(restart), not avoidance • Simplicity, testing, split brain

behavior, automation, sequencing, IP address changes

Page 18: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

VMware High Availability vSphere HA Cluster

Stretched across campus or metro area

VMware High Availability – Extended between distributed parts of the same

virtual datacenter

Page 19: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

VMware High Availability vSphere HA Cluster

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

VMware High Availability – Extended between distributed parts of the same

virtual datacenter

Page 20: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

VMware High Availability vSphere HA Cluster

VMware High Availability – Extended between distributed parts of the same

virtual datacenter – Automatic rapid recovery from host failures

Page 21: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

VMware High Availability vSphere HA Cluster

VMware High Availability – Extended between distributed parts of the same

virtual datacenter – Automatic rapid recovery from host failures

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

Page 22: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

VMware High Availability vSphere HA Cluster

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

VMware High Availability – Extended between distributed parts of the same

virtual datacenter – Automatic rapid recovery from host failures – No complex clustering software in the VM

Page 23: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

VMware Fault Tolerance vSphere HA Cluster

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

• VMware Fault Tolerance – Easily enabled/disabled per virtual machine

Page 24: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

VMware Fault Tolerance vSphere HA Cluster

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

FT Protected VM

• VMware Fault Tolerance – Easily enabled/disabled per virtual machine

APP OS

2

APP OS 1

Page 25: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

VMware Fault Tolerance vSphere HA Cluster

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

FT Protected VM

• VMware Fault Tolerance – Easily enabled/disabled per virtual machine – Eliminate VM downtime due to hardware failures

APP OS

2

APP OS 1

Page 26: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

VMware Fault Tolerance

• VMware Fault Tolerance – Easily enabled/disabled per virtual machine – Eliminate VM downtime due to hardware failures

vSphere HA Cluster

FT Protected VM APP OS

2

APP OS

APP OS

APP OS

APP OS

APP OS

Page 27: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

VMware Fault Tolerance

• VMware Fault Tolerance – Easily enabled/disabled per virtual machine – Eliminate VM downtime due to hardware failures – Protect homegrown applications without

a clustering solution

vSphere HA Cluster

FT Protected VM APP OS

2

APP OS

APP OS

APP OS

APP OS

APP OS

Note – not part of the vMSC, ergo not VMware supported, MAY be vendor supported.

Page 28: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

VMware VM Host Affinity

• VMware Host Affinity – Provides a “site affinity” capability – Keeps workloads local to storage until failure – Keeps primary and secondary FT VMs in

appropriate sites

vSphere HA Cluster

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

2

APP OS

APP OS

APP OS

APP OS

APP OS

Site 1 Affinity Group Site 2 Affinity Group

Note – considerations later.

Page 29: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Type 1: “Stretched Single vSphere Cluster”

Page 30: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Stretching VMware vSphere Clusters vSphere HA Cluster

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

Page 31: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Stretching VMware vSphere Clusters vSphere HA Cluster

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

Page 32: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Stretched Storage (eg EMC VPLEX, NetApp Metrocluster)

Stretching VMware vSphere Clusters vSphere HA Cluster

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

Page 33: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Stretching VMware vSphere Clusters vSphere HA Cluster

Stretched Storage (eg EMC VPLEX, NetApp Metrocluster)

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

Array based synchronous

replication

Page 34: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Planned Datacenter Migration vSphere HA Cluster

Stretched Storage (eg EMC VPLEX, NetApp Metrocluster)

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

vMotion

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

Standard vMotion of Virtual Machines

Moving all operations between locations

Page 35: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Planned Datacenter Migration vSphere HA Cluster

Stretched Storage (eg EMC VPLEX, NetApp Metrocluster)

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

APP OS

Page 36: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

One little note re: “Intra-Cluster” vMotion • Intra-cluster vMotions can be highly parallelized

– and more and more with each passing vSphere release – With vSphere 4.1 and vSphere 5.x it’s up to 4 per

host/128 per datastore if using 1GbE – 8 per host/128 per datastore if using 10GbE – …and that’s before you tweak settings for more, and

shoot yourself in the foot :-) • Need to meet the vMotion network requirements

– 622Mbps or more, 5ms RTT (upped to 10ms RTT if using Metro vMotion - vSphere 5 Enterprise Plus)

– Layer 2 equivalence for vmkernel (support requirement)

– Layer 2 equivalence for VM network traffic (required)

Page 37: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Type 2: “Two Clusters, Stretched Storage, inter-cluster vMotion”

We don’t see this much, so will skip it.

Page 38: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Type 3: “Classic Site Recovery Manager”

Page 39: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

vSphere Cluster A vSphere Cluster B Distance

Datastore A

vCenter Prot.

vCenter Recov.

Read-only (gets promoted or

snapshoted to become

writeable) replica of

Datastore A

Array-based (sync, async or continuous) replication or vSphere

Replication v1.0 (async)

Type 3: “Classic Site Recovery Manager”

Page 40: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Type 4: “Stretched Cluster + Site Recovery Manager”

Page 41: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Can You have Stretched Clusters + SRM? YES!

• Deduped replication • Native WAN

compression and/or replication compression

• VMware vSphere 5 integration

• Robust DR testing and sequencing

• Automated Failback

Array Replica (EMC Recoverpoint, NetApp Snapmirror)

Stretched vSphere Cluster

VMware vSphere 5 Site Recovery Manager

Page 42: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Summary of “Taxonomy Matters” • Disaster Avoidance != High Availability != Disaster Recovery

– Same logic applies at a server level applies at the site level – Same value (non-disruptive for avoidance, automation/simplicity for

recovery) that applies at a server level, applies at the site level – Don’t underestimate the importance of DR testing

• Stretched clusters have complex considerations • vMotion = single vCenter domain vs. SRM = two or more vCenter domains

• Straight-forward SRM for most (~1 of every 5) • Stretched Clusters and SRM no longer mutually exclusive

Page 43: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Thinking of Stretching?

vSphere Stretched Clusters Considerations

Page 44: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Stretched Cluster Design Considerations

• Understand the difference compared to DR – HA does not follow a robust, scriptable recovery plan workflow – HA is not site aware for applications, where are all the moving parts of my

app? Same site or dispersed? How will I know what needs to be recovered? – DR usually involves a regular, structured “DR test”.

• Single stretch site = single vCenter – During disaster, what about vCenter setting consistency across sites? (DRS

Affinity, cluster settings, network)

• Will network support? Layer2 stretch? IP mobility? • Cluster split brain = how to handle?

Not necessarily cheaper solution vs. SRM licensing, read between the lines (hidden storage, networking and WAN costs)

Page 45: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

vSphere 5.x - HA

• Complete re-write of vSphere HA • Elimination of Primary/Secondary

concept • Foundation for increased scale and

functionality – Eliminates common issues (DNS resolution)

• Multiple Communication Paths – Can leverage storage as well as the mgmt network

for communications – Enhances the ability to detect certain types of

failures and provides redundancy

• IPv6 Support • Enhanced User Interface • Enhanced Deployment

ESX 01 ESX 03

ESX 04 ESX 02

Page 46: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

vSphere 5.x HA – Heartbeat Datastores

• Monitor availability of Slave hosts and VMs running on them

• Determine host network isolated VS network partitioned

• Coordinate with other Masters – VM can only be owned by one master

• By default, vCenter will automatically pick 2 datastores

• Very useful for hardening stretched storage models

ESX 01 ESX 03

ESX 04 ESX 02

Page 47: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Something to understand re: yanking & “suspending” storage re VM HA

• What happens when you “yank” storage? – VMs who’s storage “disappears” or goes “read-only” behavior is

more complex than people think at first. – Responding to a ping doesn’t mean a system is available (if it

doesn’t respond to any services, for example)

• In vSphere 5.0 or earlier: – Yanked: http://www.youtube.com/watch?v=6Op0i0cekLg

– Suspended: http://www.youtube.com/watch?v=WJQfy7-udOY

• What’s new? vSphere 5.0 u1 or 5.1: • terminateVMonPDLByDefault • In vSphere 5.1 - Timeout of IO on APD

Page 48: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Stretched Storage Configuration • Literally just stretching the SAN fabric (or NFS exports

over LAN) between locations, with a failover on failure • Requires synchronous replication • Limited in distance to ~100km in most cases • Typically read/write in one location, read-only in

second location • Implementations with only a single storage controller at

each location create other considerations.

Page 49: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Stretched Storage Configuration

X Read/Write Read-Only

Stretched Storage Fabric(s)

X

Page 50: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Distributed Virtual Storage Configuration • Leverages storage technologies to distribute storage

across multiple sites • Requires some sort of synchronous replication • Limited in distance to ~100km in most cases • Read/write storage in both locations, employs data

locality and caching algorithms • Typically uses multiple controllers in a scale-out fashion • Must address “split brain” scenarios

Page 51: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Distributed Virtual Storage Configuration

X X Read/Write Read/Write

Page 52: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Stretch Cluster

Stretched Storage

Virtual Center VMs VMs

Array at Site-A

Array at Site-B

Witness at 3rd Site

FC or IP

Underlying Storage

IP

Logical Paths to the other site Logical Paths to the same site Physical Connections

Understanding – “Uniform Access”

• Pros: –One more failure mode that doesn’t trigger VM HA

• Cons: –Operational complexity, including multipathing –If non-locally cached, latency

Page 53: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Understanding – “Non Uniform Access”

• Pros: –Simple

• Cons: – cluster failure = VM HA event.

Stretch Cluster

Stretched Storage

Virtual Center VMs VMs

Array at Site-A

Array at Site-B

Witness at 3rd Site

FC or IP

Underlying Storage

IP

Page 54: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Understanding… Network Options

• Stretched VLAN approaches (VPLS, Ethernet Fabrics, etc)

• Cisco OTV • VXLAN (haven’t seen this widely used for this

use case yet)

Page 55: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Stretched Cluster Considerations #1 Consideration: Prior to and including vSphere 4.1, you can’t control HA/DRS behavior for “sidedness” • With stretched Storage Network configurations:

– Additional latency introduced when VM storage resides in other location

– Storage vMotion required to remove this latency

• With distributed virtual storage configurations: – Need to keep cluster behaviors in mind – Data is access locally due to data locality algorithms

Page 56: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Stretched Cluster Considerations #2 Consideration: With vSphere 5, you can use DRS host affinity rules to DRS behavior

– NOTE: Doesn’t address HA primary/secondary node selection

• With stretched Storage Network configurations: – Caution when using single-controller implementations – Storage latency still present in the event of a controller

failure • With distributed virtual storage configurations:

– Plan for cluster failure/cluster partition behaviors • Understand/embrace “VMware supported” vs.

“Vendor Supported” – This is what vMSC is really all about….

Page 57: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Stretched Cluster Considerations #3

Consideration: There is no supported way to control VMware HA primary /secondary node selection with vSphere 4.x • With vSphere 4.x

– Limits cluster size to 8 hosts (4 in each site) – No supported mechanism for controlling/specifying primary/secondary node

selection – Methods for increasing the number of primary nodes also not supported by

Vmware

• With vSphere 5.x – Better VM HA implementation (heartbeat datastores hel – Still no supported mechanism for controlling/specifying primary/secondary

node selection – Host affinity groups + DRS may sort it out – but may not.

Page 58: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Stretched Cluster Considerations #4

Consideration: Stretched Clusters require Layer 2 “equivalence” at the network layer • Complicates the network infrastructure • The “re-IP” approach with SRM is relatively simple • Requires use of technologies like VXLAN, OTV, VPLS • Main question: “do you have the equipment and the

networking expertise”?

Page 59: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Stretched Cluster Considerations #5 Consideration: The network lacks site awareness, so stretched clusters introduce new networking challenges. • The movement of VMs from one site to another doesn’t

update the network • VM movement can cause “horseshoe/trombone routing”

(LISP and other approaches can help) • You’ll need to use multiple isolation addresses in your

VMware HA configuration

Page 60: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Page 61: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Nope. Not Sci-Fi. 500+ EMC examples

70% MORE

UTILIZATION

MIGRATED 250 LIVE SYSTEMS

MULTI VENDOR MIGRATIONS

15% MORE

EFFICIENCY

ALWAYS ON AVAILABILITY

ONLINE MIGRATIONS

IMPLEMENTED PRIVATE CLOUD

83% LESS

MANAGEMENT

ACTIVE/ACTIVE DATA CENTERS

Page 62: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

6,000+ NetApp examples…

• In Germany alone!

• 11,000+ global installations

Page 63: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Summary of what’s new…. • NOW – Expanding vMSC (includes NetApp, IBM, HP) • NOW – Site Recovery Manager 5.1 • NOW – vSphere 5 VM HA rewrite & heartbeat

datastores, help on partition scenarios • NOW – vSphere 5 Metro vMotion • NOW – vSphere 5.0 update 1 and 5.1 PDL changes • NOW – PDL response in VPLEX & MetroCluster, VAAI

support, Cluster Interconnect, Witness

Page 64: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

For More Information… • EMC VPLEX vMSC

– VMware: Using VPLEX Metro with VMware HA

• http://kb.vmware.com/kb/1026692 • http://kb.vmware.com/kb/1021215

– VMware: Implementing Uniform and Non-Uniform VPLEX Metro configs

• http://kb.vmware.com/kb/2007545 – EMC: VPLEX Metro HA techbook : h7113 – EMC: VPLEX Metro with VMware HA: h8218

• NetApp MetroCluster vMSC – VMware: vSphere Metro Storage Cluster

Case Study – NetApp: TR3548: Best Practices for

MetroCluster Design and Implementation

Page 65: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

So… What’s Next?

Page 66: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

VM Component Protection • Detect and recover from catastrophic infrastructure

failures affecting a VM – Loss of storage path – Loss of Network link connectivity

• VMware HA restarts VM on available healthy host

VMware ESX VMware ESX

Page 67: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Automated Stretched Cluster Config • Leverage the work in VASA and VM Granular Storage • Automated site protection for all VM’s • Benefits of single cluster model • Automated setup of HA and DRS affinity rules

Site A Site B

Distributed Storage Volumes

Layer 2 Network

HA/DRS Cluster

Page 68: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Stretched Cluster +

vCOPS

Page 69: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf
Page 70: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

More to come…

1. VM Granular Operations 2. “vRecoverpoint”

3. Multi-Site

RecoverPoint RAPIDpath

Network Transformation

Future

1. VM Granular Operations = Async 2. “vVPLEX”

VPLEX

Future Future

Page 71: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

Q & A – Some Questions from us to you. • “Stretched clustering sounds like awesomesauce, why not?” • “Our storage vendor/team tells us their disaster avoidance solution

will do everything we want, HA, DA, DR, we are not experts here, should we be wary?”

• “Our corporate SLA’s for recovery are simple BUT we have LOTS of expertise and think we can handle the bleeding edge stuff should we just go for it???”

• “My datacenter server rooms are 50 ft apart but i definitely want a DR solution what's wrong with that idea?”

• Is “cold migration” over distance good enough for you, or is it live or nothing?

Page 72: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

&

THANK YOU

Page 73: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

FILL OUT A SURVEY

EVERY COMPLETE SURVEY IS ENTERED INTO DRAWING FOR A

$25 VMWARE COMPANY STORE GIFT CERTIFICATE

Page 74: BCO2982-Stretched Clusters and VMware vCenter Site Recovery Manager How and When to Choose One, the Other, or Both_Final_US.pdf

Stretched Clusters and VMware vCenter Site Recovery Manager: How and When to Choose One, the Other, or Both

Chad Sakac, EMC Corporation

Vaughn Stewart, NetApp

INF-BCO2982

##vmworldinf