VMware vSphere HA Recommendations to Maximize...

Post on 04-May-2018

238 views 7 download

Transcript of VMware vSphere HA Recommendations to Maximize...

VMware vSphere HA Recommendations to Maximize Virtual Machine Uptime

Josh Gray, VMware, Inc.

Jeff Hunter, VMware, Inc.

INF-BCO2382

#vmworldinf

2

Disclaimer

This session may contain product features that are currently under development.

This session/overview of the new technology represents no commitment from VMware to deliver these features in any generally available product.

Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

Technical feasibility and market demand will affect final delivery.

Pricing and packaging for any new technologies or features discussed or presented have not been determined.

3

High Availability is Part of IT Business Continuity

4

Just a Few Clicks to Higher Availability

Turn ON vSphere HA

OK

5

Global Support Services (GSS)

Bangalore, India

Tokyo, Japan

Cork, Ireland Burlington, Canada

Palo Alto, CA Broomfield, CO

Support offices Local language support

Spanish, Portuguese, French, German, Japanese, Chinese

Global Coverage 24x7, 365 days/year 6 Support Centers

1000+ Support Engineers

Follow-the-sun Support for

Severity 1 Issues

Support Relationships with 100% of the

Fortune 100; 99% of Fortune 500

6

Recent Enhancements

7

vSphere 5.0 Major Redesign

Fault Domain Manager (FDM)

8

vSphere 5.1 Minor Updates

9

Recommendations: Networking

Redundant Management Network

Fewest hops possible

Route based on originating port ID

Failback policy = No

Enable PortFast, Edge, etc.

MTU size the same

Keep things simple

10

Recommendations: Networking

Consistent portgroup names, network labels

Host Monitoring during network maintenance

Use Maintenance Mode

Separate subnet for vSphere HA

Specify additional network isolation address

Each host can communicate with all other hosts

Keep things simple

11

Recommendations: Networking

12

Recommendations: Networking

Advanced Configuration Options • das.allowNetwork[0-9]=

• das.isolationAddress[0-9]=

• das.useDefaultIsolationAddress= (true/false)

• das.failuredetectiontime • Not supported in vCenter 5.x

13

Recommendations: Storage

Implement multiple paths

• HBAs, storage processors (SPs), NICs, switches

• Appropriate multipathing policy

14

Recommendations: Storage

Storage Heartbeats

• HA selects two datastores by default

15

Recommendations: Storage

Storage Heartbeats

• Override auto-selected datastores if necessary

16

HA Events (How to Avoid Problems)

17

Possible HA Events: Host Failure

Network partition Host isolation

18

HA Events: Host Failures

19

HA Events: Network Partition

20

Recommendations: Network Partition

Symptoms: Network Partition

21

Recommendations: Network Partition

Symptoms: Network Partition

Master

22

Recommendations: Network Partition

Symptoms: Network Partition

23

Recommendations: Network Partition

Symptoms: Network Partition

New Master

24

Recommendations: Network Partition

Symptoms: Network Partition

New Master

New Master

25

HA Events: Host Isolation

26

Host Isolation Policies: Leave Powered On

Power Off

Shutdown

27

Which Policy? (How to Avoid Problems)

28

Depends. (on HOW You Want to Avoid Problems)

29

Likelihood….

30

Recommendations: Isolation Response

Host will retain access to

datastores?

VMs will retain access to VM

network?

Recommended Isolation Policy Rationale

Likely Likely Leave Powered On

VM is running fine, why power it off

Likely Unlikely Leave Powered On or Shutdown

Allow HA to restart on hosts that are not isolated, likely to have access to

storage

Unlikely Likely Power off Avoid having two instances of the same VM on the

network

31

Recommendations: Isolation Response

Host will retain access to

datastores?

VMs will retain access to VM

network?

Recommended Isolation Policy Rationale

Likely Likely Leave Powered On

VM is running fine, why power it off

Likely Unlikely Leave Powered On or Shutdown

Allow HA to restart on hosts that are not isolated, likely to have access to

storage

Unlikely Likely Power off Avoid having two instances of the same VM on the

network

32

Recommendations: Isolation Response

Host will retain access to

datastores?

VMs will retain access to VM

network?

Recommended Isolation Policy Rationale

Likely Likely Leave Powered On

VM is running fine, why power it off

Likely Unlikely Leave Powered On or Shutdown

Allow HA to restart on hosts that are not isolated, likely to have access to

storage

Unlikely Likely Power off Avoid having two instances of the same VM on the

network

33

Recommendations: Isolation Response

Host will retain access to

datastores?

VMs will retain access to VM

network?

Recommended Isolation Policy Rationale

Likely Likely Leave Powered On

VM is running fine, why power it off

Likely Unlikely Leave Powered On or Shutdown

Allow HA to restart on hosts that are not isolated, likely to have access to

storage

Unlikely Likely Power off Avoid having two instances of the same VM on the

network

34

Recommendations: Isolation Response

Host will retain access to

datastores?

VMs will retain access to VM

network?

Recommended Isolation Policy Rationale

Likely Likely Leave Powered On

VM is running fine, why power it off

Likely Unlikely Leave Powered On or Shutdown

Allow HA to restart on hosts that are not isolated, likely to have access to

storage

Unlikely Likely Power off Avoid having two instances of the same VM on the

network

35

Recommendations: Isolation Response

Host will retain access to

datastores?

VMs will retain access to VM

network?

Recommended Isolation Policy Rationale

Likely Likely Leave Powered On

VM is running fine, why power it off

Likely Unlikely Leave Powered On or Shutdown

Allow HA to restart on hosts that are not isolated, likely to have access to

storage

Unlikely Likely Power off Avoid having two instances of the same VM on the

network

36

Recommendations: Isolation Response

Host will retain access to

datastores?

VMs will retain access to VM

network?

Recommended Isolation Policy Rationale

Likely Likely Leave Powered On

VM is running fine, why power it off

Likely Unlikely Leave Powered On or Shutdown

Allow HA to restart on hosts that are not isolated, likely to have access to

storage

Unlikely Likely Power off Avoid having two instances of the same VM on the

network

37

Admission Control (How to Avoid Problems)

38

Admission Control Policies: Static number of hosts

Percentage of cluster resources Dedicated failover hosts

39

Static Number of Hosts Admission Control Policy

40

Recommendations: Admission Control

Number of Hosts (Host Failures Cluster Tolerates)

VMware vSphere

41

Recommendations: Admission Control

Number of Hosts (Host Failures Cluster Tolerates)

Each Host: 4 CPU x 2.40 GHz CPU 16 GB memory

Cluster: 38 GHz 64 GB memory

42

Recommendations: Admission Control

Number of Hosts (Host Failures Cluster Tolerates)

Reservation: 2 GHz 1024 MB

Reservation: 1 GHz 2048 MB

Each Host: 4 CPU x 2.40 GHz CPU 16 GB memory

Cluster: 38 GHz 64 GB memory

43

Recommendations: Admission Control

Number of Hosts (Host Failures Cluster Tolerates)

Reservation: 2 GHz 1024 MB

Reservation: 1 GHz 2048 MB

44

Recommendations: Admission Control

Number of Hosts (Host Failures Cluster Tolerates)

Reservation: 2 GHz 1024 MB

Reservation: 1 GHz 2048 MB

45

Recommendations: Admission Control

Number of Hosts (Host Failures Cluster Tolerates)

Reservation: 2 GHz 1024 MB

Reservation: 1 GHz 2048 MB

46

Recommendations: Admission Control

Number of Hosts (Host Failures Cluster Tolerates)

VM VM

47

Recommendations: Admission Control

Number of Hosts (Host Failures Cluster Tolerates)

VM VM

48

Recommendations: Admission Control

Number of Hosts (Host Failures Cluster Tolerates)

VM VM

49

Recommendations: Admission Control

Number of Hosts (Host Failures Cluster Tolerates)

VM VM

50

Recommendations: Admission Control

Number of Hosts (Host Failures Cluster Tolerates)

VM VM

51

Recommendations: Admission Control

Number of Hosts (Host Failures Cluster Tolerates) Windows

Client

vSphere Web Client

52

Recommendations: Admission Control

Number of Hosts (Host Failures Cluster Tolerates) Windows

Client

vSphere Web Client

53

Recommendations: Admission Control

Number of Hosts (Host Failures Cluster Tolerates)

• vSphere Windows Client • Sets a “cap” on the slot size

Override default

behavior

54

Recommendations: Admission Control

Number of Hosts (Host Failures Cluster Tolerates)

• vSphere Web Client • Sets the exact size. Important difference.

Override default

behavior

55

Recommendations: Admission Control

Number of Hosts (Host Failures Cluster Tolerates)

VM VM

56

Recommendations: Admission Control

Number of Hosts (Host Failures Cluster Tolerates)

VM VM

57

Recommendations: Admission Control

Number of Hosts (Host Failures Cluster Tolerates)

VM VM

58

Recap: Static Number of Hosts

Admission Control Policy

59

% of Cluster Resources Admission Control Policy

60

Recommendations: Admission Control

Percentage of cluster resources

61

Recommendations: Admission Control

Percentage of cluster resources

62

Recommendations: Admission Control

Percentage of cluster resources

63

Recommendations: Admission Control

Percentage of cluster resources

64

Dedicated Failover Hosts Admission Control Policy

65

Recommendations: Admission Control

66

Which Do I Use?!?!

67

Recommendations: Admission Control

“Basic design principle: Do the math, and take customer requirements into account. If you need flexibility a “Percentage” is the way to go.”

– Frank Denneman & Duncan Epping VMware vSphere 5 Clustering – Technical Deepdive

68

vSphere HA VM Monitoring

VM Monitoring restarts VM if…

• VMware Tools Heartbeat not received

• No network or disk activity within I/O stats interval • Default 120 seconds – customize in vSphere Web Client

69

vSphere HA Application Monitoring

3rd-Party Solutions • Symantec ApplicationHA

• Neverfail vAppHA

Application Awareness API open with vSphere 5.0 • Download VMware GuestAppMonitor SDK with 5.0

• Download VMware Guest SDK for vSphere 5.1

70

vSphere HA Futures

VMware vSphere HA Today • Storage interconnect most commonly queried KB issue

• Assumes storage connected on other hosts

• Improvements with vSphere 5.0 U1 and 5.1

Virtual Machine Component Protection (VMCP) • Fine-grained controls for VM restart policy

• Queries destination host(s) for storage health

• Demo in VMware booth on show floor

71

vSphere HA Futures

VMware vSphere Fault Tolerance (FT) Today • Protects only VMs with 1 vCPU

• Many mission-critical apps require multiple vCPUs

SMP Fault Tolerance (FT) • Protect VMs that have more than one vCPU

72

Customer Support Day Events

Coming to a location near you: sharing of VMware best practices!

Support Days are a collaboration between VMware Support, Sales and customers – you learn directly from the experts

Topics are driven by customer input, and typically include: • Best practices • Tips/tricks • Top issues • Product roadmaps/demos • Certification offerings

http://www.vmware.com/go/supportdays

73

VMware GSS: Important Links

Blogs Support Insider: blogs.vmware.com/kb KBTV: blogs.vmware.com/kbtv KB Digest: blogs.vmware.com/kbdigest

Twitter @vmwarecares: twitter.com/vmwarecares @vmwarekb: twitter.com/vmwarekb Facebook https://www.facebook.com/vmwkb

Communities communities.vmware.com

YouTube KBTV: youtube.com/user/vmwarekb

Support and Downloads: vmware.com/support

Technical Support Welcome Guide: vmware.com/go/supportguide

Get Support via My VMware: my.vmware.com/group/vmware/get-help

Licensing Help Center: vmware.com/support/licensing

Knowledge Base: kb.vmware.com

Customer Support Days: vmware.com/go/supportdays

Renewals: vmware.com/go/renew

Customer Advocacy: customerfeedback@vmware.com

Product Support Centers: vmware.com/support/product-support

FILL OUT A SURVEY

EVERY COMPLETE SURVEY IS ENTERED INTO DRAWING FOR A

$25 VMWARE COMPANY STORE GIFT CERTIFICATE

VMware vSphere HA Recommendations to Maximize Virtual Machine Uptime

Josh Gray, VMware, Inc.

Jeff Hunter, VMware, Inc.

INF-BCO2382

#vmworldinf