VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

44
Implementing a Holistic BC/DR Strategy with VMware - Part Two Jeff Hunter, VMware Ken Werneburg, VMware BCO5162 #BCO5162

description

VMworld 2013 Jeff Hunter, VMware Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare Ken Werneburg, VMware

Transcript of VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

Page 1: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

Implementing a Holistic BC/DR Strategy with

VMware - Part Two

Jeff Hunter, VMware

Ken Werneburg, VMware

BCO5162

#BCO5162

Page 2: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

2

IT Business Continuity

Page 3: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

3

Is It a Real Problem?

Page 4: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

4

What’s the Difference?

Disaster

Avoidance

Disaster

Recovery

Planned vs.

Unplanned

Page 5: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

5

Disaster Recovery vs. Business Continuity

Example: Tuesday, August 23, 2011 at 1:51 PM EDT - Magnitude 5.8

earthquake near Mineral, Virginia

Disaster recovery required?

No

Interruption to business continuance?

YES!

Page 6: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

6

Fault Tolerance vs. High Availability

Fault tolerance

• Ability to recover from component loss

• Example: Hard drive failure

High availability

Uptime percentage in one year Downtime in one year

99 3.65 days

99.9 8.76 hours

99.99 52 minutes

99.999 “five nines” 5 minutes

X

Page 7: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

7

RTO, RPO, and MTD

Recovery Time Objective (RTO)

• How long it should take to recover

Recovery Point Objective (RPO)

• Amount of data loss that can be incurred

Maximum Tolerable Downtime (MTD)

• Downtime that can occur before significant loss is incurred

• Examples: Financial, reputation

Page 8: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

8

Making an Application Service Highly Available

vSphere HA

NEW: vSphere App HA

Page 9: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

9

VMware vFabric™ tc Server

vSphere App HA New

Policy-based

Protect off-the-shelf apps

Page 10: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

10

vSphere App HA

vSphere HA Cluster

vFabric

Hyperic Virtual Appliance

vSphere App HA Virtual Appliance

Hyperic Agents Running in VMs

vCenter

Server

vSphere vSphere vSphere vSphere

New

Page 11: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

11

vSphere App HA New

Page 12: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

12

vSphere HA – Keep In Mind…

RTO – measured in minutes (not seconds)

Requires shared storage

Best practices

• Use admission control – percentage policy

• Test post-failure performance with host maintenance mode

• Isolation response – leave powered on

• Network and storage redundancy

• Also see BCO5047

Page 13: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

13

vSphere Fault Tolerance (FT)

Zero recovery time, data loss

• Host hardware failure only

• Does not protect against OS and application failure

Works fine with HA, App HA

Why not FT?

• Resource requirements – does workload really need it?

• VM has multiple CPUs – see BCO5065

• No VM snapshots – backups require agent

Page 14: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

14

Data Protection (Backup and Restore)

Agents? No Agents? – Both!

• No agents for majority of workloads – keep it simple

• Agents for certain apps

vSphere Data Protection (VDP) Advanced

• Backup and recovery for VMware, from VMware

• Based on proven, mature EMC Avamar™

• Agent-less VM backup and restore

• Agents for granular tier-1 application protection

Page 15: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

15

vSphere Data Protection New

Page 16: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

16

VDP Advanced – Keep In Mind…

Engineered for SMB environments

Uses VADP – VM snapshots, CBT

Utilizes Windows VSS in VMware Tools

Works fine with HA, not with FT

RDM – virtual yes, physical no

Is it DR?

• Maybe – depends on RTO, RPO

• Needs replication offsite, right? – see BCO5041

Page 17: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

17

VDP Advanced – Keep In Mind…

Best Practices

• Prepopulate DNS, always use FQDN

• Manage VM snapshots

• Avoid deploying to slow storage

• Do not power-off, always shut down gracefully

• Do not schedule backups during maintenance window

• Also see BCO4756 and BCO5041

Page 18: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

18

vCenter Availability

Run vCenter Server application in a VM

Run vCenter Server database in a VM

Run both in same VM?

Protect with vSphere HA

• vCenter and DB VM restart priority set to High

• Enable guest OS and App monitoring

App HA can protect SQL Server database

Page 19: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

19

vCenter Availability

Back up vCenter Server VM and database

• Image-level backup for vCenter Server VM

• App-level backup using agent for database backup

Why not FT for vCenter Server?

• vCenter Server requires minimum of 2 vCPUs

• FT does not protect against application failure

Replicate vCenter Server, database VMs?

Page 20: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

20

vCenter Availability – vCenter Server Heartbeat

Pros

• Better RTO and RPO – typically ~5 minutes

• Protects against host and guest OS failure

• Checks network connectivity

• Monitors application services and performance

Cons

• Complexity

• Requires double the resources

• Licensing cost

Page 21: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

21

vSphere Replication – DR

Native tool built into the platform

Per-VM hypervisor replication, managed in VC

Selectable RPO from 15 min up

to 24 hours

Selectable destination

datastore (Disk-type agnostic)

Page 22: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

22

Replication Across Sites

vCenter Server

ESXi

NFC

VRA

ESXi

NFC

VRA

ESXi

NFC

VRA

Storage Storage

(VMDK1)

vCenter Server

ESXi

NFC

VRA

ESXi

NFC

VRA

ESXi

NFC

VRA

VR

Appliance VR

Appliance

Storage Storage

VMDK1

vCenter Server vCenter Server

Page 23: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

23

Four Steps for Full Recovery

Right-click, select “Recover”

Select a target folder

Select a target resource

Click Finish

Will validate your choices as you go

Page 24: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

24

New Feature – Retain Historical Replicas

vSphere

VR Agent

After recovery, use the snapshot manager to revert

to earlier points

Retention of

multiple

points in

time allows

reversion to

earlier

known

good states

Page 25: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

25

MPIT Presented as VM Snapshots after Failover

Use the snapshot manager to revert to earlier points, an interface

all administrators have been comfortable with for many years.

Page 26: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

26

vSphere Replication – Interoperability

Fault tolerance –

Doesn’t work with VR

• FT conflicts at the

vSCSI disk filter level.

VDP

• Mostly no problem!

• If using VSS… ensure

you are using 5.5!!

HA, vMotion, DRS

Storage vMotion

and Storage DRS

• Now supported!

Page 27: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

27

vSphere Replication – Best Practices

RPO

• Only what is necessary!

• Just because you can…

RTO

• Don’t set one! No testing,

no automation, manual

process.

VSS – Only if necessary!

What about bandwidth?

• Very hard to determine.

Do a local loopback first.

RDMs?

• Don’t use them. If you must, use

virtual compatible.

Don’t mix ABR and VR!

Page 28: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

28

SRM

• A Disaster Recovery engine

• A tool that uses externally replicated data (VR or array based) to speed the RTO of a BCP

• A product that allows for DR to be tested, automated, planned, repeatable and customizable

What is it?

• A replication engine

• A tool for systems that need near-instant RPO

• A disaster avoidance stretched cluster

What is it not?

Page 29: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

29

Key Components of SRM

Replication

vCenter Server

SRM Server

One vCenter Server

(Windows or VCVA) per

site, same versions

One SRM Server per

site, same versions

vSphere hosts,

recommend same

versions per site (pre

vSphere 5.x only if using

array replication)

vSphere Essentials Plus and higher editions supported

vCenter Server

Page 30: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

30

SRM Replication Options

SRM can utilize BOTH array

based AND vSphere Replication

SRM will “see” existing

standalone vSphere

Replication protected VMs

SRM can install vSphere

Replication from scratch

if needed

Hub LUN 2

Web

Multi-tier App

DB

App

vSphere Replication

Storage-based Replication

LUN 1

Web

DB

App

Multi-tier App

Page 31: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

31

Recovery Workflows

• User defined recovery plan

• Minimize errors

Failover Automation

• Isolated test environment

• Increase confidence in DR process

Non-disruptive Failover Testing

• Zero data loss

• Operational migration

Planned Migration

• Re-protect VM’s, migrate back

Failback Automation

Page 32: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

32

SRM Interoperability

Works with VR –and- ABR

Backups, VADP or other

are fine

HA is no problem at all

vMotion and DRS are fine

Storage vMotion and

Storage DRS – Sort of…

• Replication Dependent

FT is “yellow”

• Array replicated only and the FT

status is not recovered

Web vs vSphere Client

Page 33: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

33

SRM – A Few Best Practices

Not exhaustive

How long is Vmworld?

Big ones: Storage Layout

Test Network Configuration

Test often!

Size vCenter correctly

Biggest one:

Do a Business Impact Analysis

RPO, RTO, Cost of downtime, interdependencies, criticality of applications, priorities, units of failover, overlooked externalities, executive buy-in, …..

Page 34: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

34

SRM Further Detail at VMworld

• BCO5733 - vCenter Site Recovery Manager – Solution Overview and Lessons

from a Fortune 500 Health Care Company Implementation

• BCO5129 - Protection for All - vSphere Replication & SRM Technical Update

• BCO5170 - DR to The Cloud with VMware Site Recovery Manager and

Rackspace Disaster Recovery Planning Services

• BCO5652 - Three Quirky Ways to Simplify DR with Site Recovery Manager

• BCO4905 - Disaster Recovery Solution with Oracle Data Guard and Site

Recovery Manager

Page 35: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

35

Protection Groups (PGs)

More PGs = more granular testing/failover

• DR testing is easier – fewer resource requirements

• Fail-over only what is needed

• More configuration/complexity

Less protection groups = less complex

• Fewer LUNs, PGs, recovery plans

• Less flexibility

Find a good balance between flexibility and simplicity

Fewer LUNs/PGs

Less complexity

Less flexibility

More LUNs/PGs

More complexity

More flexibility

Right combination

of complexity and

flexibility Varies by customer

Majority of outages

are partial (not entire

data center) – design

accordingly

Page 36: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

36

Test Network

• Use VLAN or isolated network for test environment

• Default “Auto” setting does not allow VM communication between hosts

• Different vSwitch can be specified in SRM for test versus run

• Specified in Recovery Plan

Page 37: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

37

vSphere Infrastructure Navigator

Page 38: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

38

VMware – Multiple Levels of Protection

SQL

vSphere HA/FT

Site A

Page 39: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

39

VMware – Multiple Levels of Protection

SQL

vSphere HA/FT

VDPA

Site A

Page 40: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

40

VMware – Multiple Levels of Protection

SQL

vSphere HA/FT

VR/SRM SQL

VDPA

Site A Site B

Page 41: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

45

Other VMware Activities Related to This Session

HOL:

HOL-SDC-1305

Business Continuity and Disaster Recovery In Action

VMworld Session:

BCO-5160

Implementing a Holistic BC/DR Strategy – Part 1

BCO5162

Page 42: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

THANK YOU

Page 43: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two
Page 44: VMworld 2013: Implementing a Holistic BC/DR Strategy with VMware - Part Two

Architecting the Software-Defined Data Center

Aidan Dalgleish, VMware

David Hill, VMware

Kamau Wanguhu, VMware

VSVC7371

#VSVC7371