IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with...

21
White Paper Abstract This white paper describes VMware High Availability (HA) support for EMC ViPR SRM. It lists the common disaster-like scenarios, describes recovery results, and provides qualification details. December 2014 IMPROVING EMC VIPR SRM HIGH AVAILABILITY WITH VMWARE HA

Transcript of IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with...

Page 1: IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with VMware HA 5 vSphere High Availability (HA) Figure 1: vSphere HA When using vSphere

White Paper

Abstract

This white paper describes VMware High Availability (HA) support for EMC ViPR SRM. It lists the common disaster-like scenarios, describes recovery results, and provides qualification details. December 2014

IMPROVING EMC VIPR SRM HIGH AVAILABILITY WITH VMWARE HA

Page 2: IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with VMware HA 5 vSphere High Availability (HA) Figure 1: vSphere HA When using vSphere

2 Improving EMC ViPR SRM High Availability with VMware HA

Copyright © 2014 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. The information in this publication is provided “as is.” EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. VMware is a registered trademark of VMware, Inc. in the United States and/or other jurisdictions. All other trademarks used herein are the property of their respective owners. This document was created using the official VMware icon and diagram library. Copyright © 2014 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents. VMware does not endorse or make any representations about third party information included in this document, nor does the inclusion of any VMware icon or diagram in this document imply such an endorsement.

Page 3: IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with VMware HA 5 vSphere High Availability (HA) Figure 1: vSphere HA When using vSphere

3 Improving EMC ViPR SRM High Availability with VMware HA

Table of contents

Executive summary.................................................................................................. 4

About EMC M&R ................................................................................................................. 4

Audience ............................................................................................................................ 4

Terms and Definition ............................................................................................... 4

High Availability vs. Disaster Recovery ................................................................................ 4

vSphere High Availability (HA) ............................................................................................ 5

vSphere Fault Tolerance (FT) ............................................................................................... 6

vMotion and Storage vMotion ............................................................................................. 7

VMware Distributed Resource Scheduler (DRS) ................................................................... 8

vCenter Site Recovery Manager ........................................................................................... 9

ViPR SRM ................................................................................................................ 9

Introduction ....................................................................................................................... 9

Fault Tolerance and EMC M&R .......................................................................................... 10

Scalable Architecture ....................................................................................................... 11

Single Server Implementation .......................................................................................... 11

The Collection Layer ......................................................................................................... 12

The Presentation Layer ..................................................................................................... 12

The Database & Normalization Layer ................................................................................ 13

EMC M&R Failover Dependencies ........................................................................... 13

Recommend Disaster Recovery / High Availability solutions ............................................. 13

Understanding HA when takes place ................................................................................ 14

HA support qualification ................................................................................................... 14

Network considerations .................................................................................................... 15

Storage considerations .................................................................................................... 17

VMware HA Best Practices ................................................................................................ 17

Example of HA Cluster settings ......................................................................................... 18

Conclusion ............................................................................................................ 19

References ............................................................................................................ 20

Page 4: IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with VMware HA 5 vSphere High Availability (HA) Figure 1: vSphere HA When using vSphere

4 Improving EMC ViPR SRM High Availability with VMware HA

Executive summary With the maturity of virtualization technologies comes the improvement of tools to manage the entire lifecycle of applications, from deployment to retirement. Advances in hardware and software have also improved the performance of virtualized applications when compared to physical deployments, making the virtualization decision even more compelling.

At its core, EMC® ViPR SRM, release 3.6, includes the EMC M&R foundation as a new class of monitoring and reporting architecture specifically targeted to leverage virtualization advances and dramatically lowering the cost of ownership. Along with leveraging the virtualization advancements for easy deployment, ViPR SRM also supports some of the industry standard tools provided by VMware for Disaster Recovery and High Availability.

This document demonstrates how VMware High Availability feature can be leveraged for ViPR SRM virtual machines. It also demonstrates how redundant network and storage architecture helps as an initial safety net.

About EMC M&R

EMC M&R leverages state-of-the-art visibility and forecasting from various devices and technologies processing and storing millions of indicators. When designing a EMC M&R solution it should be taken into consideration how to avoid common availability pitfalls and in the case of a failure how to gracefully reestablish the services to maintain consistency and performance.

Audience

This white paper is intended for anyone (e.g. system and storage administrators), system implementers (e.g. solution architects), support, and EMC partners interested in knowing the disaster recovery & High Availability solutions currently supported by ViPR SRM.

Terms and Definition

High Availability vs. Disaster Recovery

When deciding on how to put in place a fault tolerant solution, it is important to understand how much downtime the customer can tolerate if his EMC M&R solution ceases to work. This seemingly trivial question will decide whether you should plan for a High Availability (HA) or a Disaster Recovery (DR) approach.

By definition, a High Availability design should be chosen when there’s little to no consideration of downtime scenarios. The usual notation for HA is based on the percentage uptime for the system within a certain period of time. On the other hand, a Disaster Recovery situation means that a disaster has occurred, resulting in downtime, and that there are guidelines in order to bring the system back online.

Page 5: IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with VMware HA 5 vSphere High Availability (HA) Figure 1: vSphere HA When using vSphere

5 Improving EMC ViPR SRM High Availability with VMware HA

vSphere High Availability (HA)

Figure 1: vSphere HA

When using vSphere HA one needs to understand that HA does not avoid downtime of the VMs, on the opposite, it will only take an action when either, the ESX/ESXi host is down or when it fails to communicate with the VMs for which it is set to protect. This is a different concept, especially when one assumes that vSphere HA is the same as the standard industry HA solutions, such as an active-active dual stack where if one of the stacks fails the other one takes over transparently, resulting in no downtime.

VMware does have a solution to provide the highest uptime possible, it’s called Fault Tolerance and will further discussed below.

vSphere HA was designed to protect against ESX/ESXi server failures, by restarting the VMs on a different ESX/ESXi server of the vSphere Cluster. HA can also monitor against failures on the VM operating system when using VMware Tools. In all cases, VMware HA will restart the VMs, either on the same ESX/ESXi or on another one.

When migrating from vCenter 4.x to 5.x beware that vCenter Server 5.0 uses Fault Domain Manager (FDM) agents for High Availability (HA), rather than Automated Availability Manager (AAM) agents, so when troubleshooting the required process has changed.

When using a vApp, HA will not take into consideration any predefined boot order for the virtual machines that make the vApp.

Page 6: IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with VMware HA 5 vSphere High Availability (HA) Figure 1: vSphere HA When using vSphere

6 Improving EMC ViPR SRM High Availability with VMware HA

vSphere Fault Tolerance (FT)

Figure 2: vSphere FT

With vSphere Fault Tolerance it is possible to have a VM running all of the time, thus avoiding downtime. Fault Tolerance relies on a technology known as VMware vLockstep which works by duplicating the exact same sequence of instructions of the protected VM on another ESX/ESXi host under the same cluster. In the event of a failure, the second VM is ready to engage with minimum performance and uptime impact if any.

This however comes at a price: it works with only one vCPU and has a limited grid of supported operating systems. At its current state, vSphere FT’s limitations make it unsuitable for as a HA solution for the ViPR SRM. The list of supported guest operating systems include Windows 7, Windows Server 2003 (32 bit), Windows XP (32 bit), Windows 2000, Windows NT 4.0, Solaris 10 (64-bit) and Solaris 10 (32 bit).

The virtual machines that are deployed as part of ViPR are tailored to use 4 vCPUs using SUSE Enterprise Linux as the underlying operating system.

Page 7: IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with VMware HA 5 vSphere High Availability (HA) Figure 1: vSphere HA When using vSphere

7 Improving EMC ViPR SRM High Availability with VMware HA

vMotion and Storage vMotion

Figure 3: VMware vMotion

vMotion is a technology that enables the live relocation of running VMs from one ESX/ESXi host to another one. This process is done without service interruption and it is fully transparent to the end users. This greatly enhances the ability to performance maintenance on ESX/ESXi servers without causing downtime.

Storage vMotion follows the same concept as the traditional vMotion, in that it is possible to avoid downtime related to the storage layer attached to the vSphere Cluster. When maintenance takes place on a datastore used by the Cluster, Storage vMotion can be used to transparently migrate the disk files associated with the virtual machines resident on the datastore under maintenance to another datastore. Both, vMotion and Storage vMotion, are considered proactive measures to avoid downtime.

Figure 4: Storage vMotion

Page 8: IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with VMware HA 5 vSphere High Availability (HA) Figure 1: vSphere HA When using vSphere

8 Improving EMC ViPR SRM High Availability with VMware HA

VMware Distributed Resource Scheduler (DRS)

Figure 5: VMware DRS

Although not required to achieve HA, having DRS enabled will greatly enhance the overall usage of the cluster as it will work to balance resource allocation on the available ESX/ESXi hosts, by migrating virtual machines from busy servers to other ESX/ESXi hosts where capacity is available. The question of having or not DRS can only be considered for deployments other than those involving the vApp. Since vCenter Server 2.0, when the deployment of ViPR SRM makes use of the officially released vApp, in a cluster, DRS is needed. The rationale behind this is a resource pool and DRS is required to manage the disposition of a resource pool in a cluster. Also bear in mind that when DRS is turned off on a cluster that contains a vApp, the administrator will be prompted to delete all of the resources associated with it.

To illustrate DRS capabilities, let’s imagine a scenario where HA and DRS are enabled for a cluster with 3 ESX/ESXi servers. If one of the ESX/ESXi server goes offline, e.g.: going into maintenance mode or unpredictably being powered off, HA will restart the VMs that were allocated for that particular ESX/ESXi into the other servers and, as we have DRS on top of HA, DRS will make sure that the protected VMs get properly balanced to the remaining servers. Now once the offline ESX/ESXi server comes back, DRS would go ahead and move the VMs to it, keeping the cluster balanced. This happens when DRS is set to run on a fully automated mode with “moderate aggressiveness”. When using its fully automated mode, DRS will balance the load throughout the Cluster with vMotion.

Without DRS, the administrator would have to balance out manually the VMs around the cluster, possibly having sub-optimal performance and contention.

If you have a boot order for your machines (inside of the vApp), VMWare HA does not take that into account.

Page 9: IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with VMware HA 5 vSphere High Availability (HA) Figure 1: vSphere HA When using vSphere

9 Improving EMC ViPR SRM High Availability with VMware HA

vCenter Site Recovery Manager

VMware’s vCenter Site Recovery Manager is a complete Disaster Recovery solution that allows orchestration of repeatable, reviewable and automated testing and execution of recovery plans. It is possible to conduct non-intrusive tests, simulating the current recovery plan and generating documentation of the steps that will take place when a disaster happens.

As a Disaster Recovery approach, Site Recovery Manager was designed to be used after an outage has occurred, in the same fashion as vSphere HA, but for an entire site. Site Recovery Manager is the best option when prioritization and dependencies are important, such as when the order for which each VM starts or when customization is needed when failover happens, e.g.: changing VMs IP addresses or updating firewall rules.

vCenter Site Recover Manager, in conjunction with vSphere HA, is the ultimate solution for not only Highly Available services but also for Disaster Recovery scenarios.

ViPR SRM

Introduction

ViPR SRM can be installed as a Virtual Appliance (vApp) in a VMware environment or via a binary installation. The focus of this technical white paper will be on the vApp installment but are not limited only to the vApp. Please note that the recommendations here can also be used when individual virtual machines are deployed.

The Open Virtualization Format (OFV) that is distributed with the ViPR SRM provides two types of solutions, a four VM vApp and a one VM vApp. The four VMs option distributes the fundamental components through the four VMs (Frontend, Primary Backend, Additional Backend and Collector) and are pre-configured to interact properly with its counterparts. The one VM vApp solution can be used either as an All-in-One (AIO) solution, for proof of concepts, demonstrations and small environments or as a specific core component (e.g.: Backend or Collector VM) in order to augment the initial four VM vApp deployment. For more information please refer to the EMC ViPR SRM 3.6 installation documentation.

The four Virtual Machines that are part of the four vApp solution are:

Frontend VM: Contains the web portal, Centralized Management, License controls.

Primary Backend VM: Contains a backend and database, Load Balancer Arbiter, Topology database and Alerting database.

Page 10: IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with VMware HA 5 vSphere High Availability (HA) Figure 1: vSphere HA When using vSphere

10 Improving EMC ViPR SRM High Availability with VMware HA

Additional Backend VM: Contains backend and timeseries databases. This is used when scaling out.

Collector VM: Contains collectors used retrieve data from devices, arrays and other supported technologies.

For more information please refer to EMC ViPR SRM 3.6 installation documentation.

Fault Tolerance and EMC M&R

There are three layers in EMC M&R where a fault tolerant solution can be implemented:

The Presentation layer (Frontend VM)

The Database layer (Primary and Additional Backends)

The Collection layer (Collector VM)

These layers are logical in nature and can be further broken down into their individual modular components.

The Presentation layer provides client access, via a standard Web browser, to EMC M&R’s reporting capabilities; more specifically the Web Portal, and its servlets. This does not include the database(s), where all metrics that populate the reports are stored. However, the database is required for report generation and to service user requests.

The Database or Storage layer is responsible for EMC M&R’s storage capabilities, for both time and event based data. This includes the capability to store incoming metrics and events and the ability to serve up previously collected (historical) time and event based data to the Presentation layer.

The Collection layer is charged with the task of collecting and normalizing the incoming raw data after which it is staged in a temporary directory before being pushed to the storage layer.

A failover solution is available for each of these layers and can be combined or modified to eliminate any single point of failure (SPOF). An HA solution will allow EMC M&R to continue running should a fatal fault occur affecting any of three layers. This fault may be in the form of a software, Operating System or hardware.

In addition to EMC M&R’s failover capabilities, there are several caching mechanisms at the Collection and Database layer ensuring the collected data is cached in the event the storage layer becomes unavailable.

Page 11: IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with VMware HA 5 vSphere High Availability (HA) Figure 1: vSphere HA When using vSphere

11 Improving EMC ViPR SRM High Availability with VMware HA

Scalable Architecture

EMC M&R is a highly modular solution, allowing it to grow vertically and horizontally as demand require (Figure 6: Scaling Capabilities). The expansion can be made on all three layers (Presentation, Database and Collection).

Scale collection layer horizontally upon growth number of devices, sites

Scale backend layer horizontally upon growth number of metrics

Scale presentation layer horizontally upon growth of concurrent users

Backend/Database Layer

Backend/Aggregation

Embedded Database

Alerting/Thresholds

Presentation Layer

Watch4net Portal

Collection Layer

Data Collection

Figure 6: Scaling Capabilities

Single Server Implementation

ViPR SRM can be implemented on a single server, where all of its components are installed on the same host as illustrated at Figure 7. Here we can observe the nonexistence of a failover or caching mechanism.

Page 12: IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with VMware HA 5 vSphere High Availability (HA) Figure 1: vSphere HA When using vSphere

12 Improving EMC ViPR SRM High Availability with VMware HA

Presentation Layer

Storage & Normalization

Layer

Collection LayerAPG Collector /NMS Adapter

APG Backend

APG Datastore

APG Tomcat

Performance Management

Services Dashboard / Alerting

Service Level Agreements

Network Figure 7: Logical Topology of a Standard EMC M&R Architecture

In this architecture, a failure which could severely cripple service is possible at each layer. Below is an explanation of some of the possible failures that can occur and their impact on EMC M&R and its users.

The Collection Layer

If the Collection layer fails, no new data will be collected for the duration of the outage and will result in a permanent gap of performance data.

In both cases the Database and Presentation layer are still 100% operational allowing EMC M&R users’ unfettered access to the Web portal to view and manage EMC M&R reports, users, Solution Packs. However, the missing data will manifest itself as a temporal gap of data in reports.

The Presentation Layer

If the Presentation layer of EMC M&R should fail, users will no longer be able to access the Web portal to view or manage reports, users and Solution Packs. In this case, although service to EMC M&R’s users has been interrupted, no data is lost and data collection continues unaffected.

Page 13: IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with VMware HA 5 vSphere High Availability (HA) Figure 1: vSphere HA When using vSphere

13 Improving EMC ViPR SRM High Availability with VMware HA

The Database & Normalization Layer

The Database layer stores incoming data from the Collection layer and provides the Presentation layer with historical data, user information, and the report templates. As shown previously, the other 2 layers are tightly coupled with the Database Layer.

Fortunately, in addition to the caching mechanism in the Collection component which stores incoming data when the Normalization component is not available, the Normalization section has the ability to cache incoming data when the database is unavailable.

In the case where the Database is not available, new metrics will be cached and the EMC M&R portal will be unavailable as the Presentation layer will be unable to retrieve required data and the ability to authenticate users is lost. Cached data is stored when the database returns to service.

EMC M&R Failover Dependencies

Function Collection Layer Failover?

Storage Layer Failover Required?

Presentation Layer Failover Required?

Data Collection √ √* Database/Reporting √ √ Web Portal √ √

Table 1: Table summarizing the dependencies of each EMC M&R layer

* Only required if caching the collected data until the storage layer recovers is not acceptable

The table above summarizes three main EMC M&R services and the failover layers required to provide that service. When summarized in this fashion, it’s clear that, at minimum, a priority should be placed on a robust failover solution for the Storage layer.

Recommend Disaster Recovery / High Availability solutions

We recommend the following DR/HA solutions for ViPR SRM:

VMware High Availability (HA)

vCenter Site Recovery Manager (vCenter SRM)

For the reasons previously discussed, each one of the solutions proposed here are better suited to tackle a specific problem. VMware HA offers an answer to a problem where prioritization and dependencies are not crucial and when there isn’t a need to orchestrate any recovery effort. It is usually done on a single datacenter, although there is an approach known as stretched clusters where multiple vSphere Clusters can be used on different geographical locations. Stretched Clusters will extend the use of VMware HA, vMotion and Storage vMotion to more than one site. For more information please consult the technical white paper Stretched Clusters and VMware vCenter Site Recovery Manager Understanding the Options Goals.

Page 14: IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with VMware HA 5 vSphere High Availability (HA) Figure 1: vSphere HA When using vSphere

14 Improving EMC ViPR SRM High Availability with VMware HA

Along with the above solutions, ViPR SRM also supports VMware Snapshots and VMware Data Recovery (VDR) as backup and restore solutions in case ViPR SRM VMs need to be restored to a working state in the event of failed update or due to any other failures caused by a disaster scenario.

Understanding HA when takes place

VMware vSphere High Availability (HA) provides easy-to-use, cost effective high availability for applications running in virtual machines. In the event of physical server failure, affected virtual machines are automatically restarted on other production servers with spare capacity. In the case of operating system failure, vSphere HA restarts the affected virtual machine on the same physical server.

Disaster scenarios that trigger migration of VMs from one host to another:

When a host managing a ViPR SRM VM goes down due to a catastrophic hardware failure, such as: loss of both primary and redundant power supply.

Other failure scenarios that don’t trigger VM migrations:

Loss of primary and redundant IP network connectivity to the Management/Service console.

Loss of primary and redundant IP network connectivity to Production vLANs (VMPort group).

Loss of one or more I/O paths to Storage LUN’s provided at least one redundant path exists.

Loss of network connectivity to a vMotion (VMKernel) port.

HA support qualification

HA support qualification involved configuring the HA cluster as per the recommendations mentioned in vSphere Availability Guide & vSphere High Availability Deployment Best Practices Guide, keeping in mind that the majority of end-users are already familiar with the best practices and follow it for proper network and storage design and recommendations on settings for host isolation response and admission control.

ViPR SRM VM was able to recover from a catastrophic hardware failure such as: loss of both primary and redundant power supply on a host managing the ViPR SRM VM. The VM in question rebooted on a standby host that was part of the cluster and ViPR SRM functionality remained intact after a successful reboot. The data in the system was retained and functionality was intact.

Page 15: IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with VMware HA 5 vSphere High Availability (HA) Figure 1: vSphere HA When using vSphere

15 Improving EMC ViPR SRM High Availability with VMware HA

Qualification was done on the following ESX/ESXi versions:

Table 2: ESX/ESXi Host Network and Storage designs

Since there can be various network and storage designs, the standard network and storage designs were configured during qualification after following the vSphere High Availability Deployment Best Practices Guide.

Network considerations

Figure 8: ViPR SRM VM deployed on an ESX/ESXi Host where Management/Service Console & Production vLANs (VMPort Groups) are configured on the same vSwitch.

Observations:

ViPR SRM was able to recover from a disaster-like situation in which the Host that was managing the ViPR SRM VMs went into “Host Isolation” mode, that is, lost IP network connectivity to both primary (VMNIC-0) and redundant IO paths (VMNIC-1). During this time, the state of ViPR SRM VMs shows up as

ESX/ESXi version

VMware ESXi 5.5 Build 1331820 (Sept 22, 2013)

VMware ESXi 5.0 Update 1 Build 623860 (March 15, 2012)

VMware ESX 4.1 Update 2 Build 502767 (Oct 27, 2011)

Page 16: IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with VMware HA 5 vSphere High Availability (HA) Figure 1: vSphere HA When using vSphere

16 Improving EMC ViPR SRM High Availability with VMware HA

“Disconnected” when viewed from the vSphere client. Once the connectivity is restored, the VMs reflect the normal state. The data in the system was retained and functionality was intact.

ViPR SRM functionality wasn’t affected when one of the redundant network IO paths was lost at a given point in time (either VMNIC-0 or VMNIC-1). The data in the system was retained and functionality was intact.

Figure 9: ViPR SRM VM deployed on an ESX/ESXi Host where Management/Service

Console & Production vLANs (VMPort Groups) are configured on a different vSwitch.

Observations:

ViPR SRM was able to recover from a disaster-like situation in which the Host that was managing the ViPR SRM VMs went into “Host Isolation” mode, that is, lost IP network connectivity to both primary (VMNIC-0) and redundant IO paths (VMNIC-1). During this time, the state of ViPR SRM VMs shows up as “Disconnected” when viewed from the vSphere client. Once the connectivity is restored, the VM(s) reflect the normal state. The data in the system was retained and functionality was intact.

ViPR SRM functionality wasn’t affected when one of the redundant network IO paths to Management/Service console (VMNIC-0 or VMNIC-1) or for the

Page 17: IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with VMware HA 5 vSphere High Availability (HA) Figure 1: vSphere HA When using vSphere

17 Improving EMC ViPR SRM High Availability with VMware HA

Production vLANs (VMNIC-2 or VMNIC-3) was lost at a given point in time. The data in the system was retained and functionality was intact.

ViPR SRM functionality was intact and was able to recover from a failure scenario when the host managing the ViPR SRM VMs lost the IP network connectivity to its primary (VMNIC-2) and its redundant path (VMNIC-3). The data in the system was retained and functionality was intact.

Storage considerations

Figure 10: Storage configuration for ESX/ESXi hosts

Observations:

ViPR SRM functionality was intact and wasn’t affected when one of the redundant Storage IO paths connected to Host (VMHBA-x or VMHBA-y) was lost temporarily at a given point in time.

VMware HA Best Practices When using VMware HA, please bear in mind the following best practices:

All of the virtual machines on the vSphere Cluster must be located in the same datastore. Currently supported storage technologies are: Fibre Channel SAN, iSCSI SAN or SAN iSCSI NAS.

The service console/management network should have redundant paths as VMware HA monitors the heartbeat broadcasts sent to this network. Multiple network cards or NIC Teaming are feasible options.

Page 18: IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with VMware HA 5 vSphere High Availability (HA) Figure 1: vSphere HA When using vSphere

18 Improving EMC ViPR SRM High Availability with VMware HA

Using DRS with HA will greatly improve load balancing of VMs when a ESX/ESXi host is back from a failover.

To use DRS in its fully automated mode, the ESX/ESXi hosts need to adhere to the VMotion requirements. For the latest information on VMotion please refer to the VMware vSphere 5.5 Documentation Center.

If you are running base version of vCenter 5 & ESX/ESXi 5.0, live Storage vMotion of ViPR SRM VMs may not work and you may encounter the error: “A specified parameter was not correct”. For resolution, please refer to this VMware KB article: http://kb.vmware.com/kb/2012122

Example of HA Cluster settings

These settings are listed for representation purpose only.

Features Comments

Cluster Features

HA set to “Enabled” vSphere HA detects failures and provides rapid recovery for virtual machines running within a cluster.

DRS set to “Enabled”

vSphere DRS enables vCenter Server to manage hosts as an aggregate pool of resources. It also enables vCenter Server to manage the assignment of VMs to hosts automatically, suggesting placement when VMs are powered on, and migrating running VMs to balance load and enforce resource allocation policies.

vSphere HA settings

Host Monitoring Status -> set to “Enabled” ESX hosts in the cluster exchange network heartbeats

Admission Control -> set to “Enabled”

The vSphere HA admission control policy determines the amount of cluster capacity that is reserved for VM failovers. Reserving more failover capacity allows failures to be tolerated but reduces the number of VMs that can be run

Admission Control Policy= Set to "Percentage of cluster resources reserved as failover spare capacity". Default values- CPU=25%, Memory=25%

Virtual Machine Options Set options that define the behavior of virtual machines for vSphere HA

VM restart Policy set to “Medium”

Page 19: IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with VMware HA 5 vSphere High Availability (HA) Figure 1: vSphere HA When using vSphere

19 Improving EMC ViPR SRM High Availability with VMware HA

Features Comments

Host Isolation response set to “Leave powered on”

VM Monitoring Status set to “VM and Application Monitoring”

VM Monitoring restarts individual VMs if their Vmware tools heartbeats are not received within a set time. Application monitoring restarts individual VMs if their Vmware tools application heartbeats are not received within a set time.

Datastore Heartbeating Set to “ Select any of the cluster datastores”

vSphere HA uses datastores to monitor hosts and VMs when the management network has failed. vCenter Server selects two datastores for each host using the policy and datastore preferences set

vSphere DRS

Automation Level set to “Fully automated”

VMs will be automatically placed onto hosts when powered on, and will be automatically migrated from one host to another to optimize resource usage.

DRS Group Manager DRS Group Membership will apply to hosts and VMs only when they remain in the cluster, and will be lost if the VM or host is moved out of the cluster. Each Host or VM can be in more than one DRS group

Virtual Machine Options Set individual automation level options for VM in the cluster.

Power Management set to “Off” (default option) Power management settings

VMware EVC set to “Disabled” Enhanced vMotion Compatibility configures a cluster and its hosts to maximize vMotion compatibility. Once enabled, EVC will ensure that only hosts that are compatible with those in cluster may be added to cluster.

Table 3 HA Cluster Settings

Conclusion

ViPR SRM is a versatile tool for monitoring and reporting performance data for all aspects of your infrastructure. By leveraging virtualization advances and supporting industry standard t VMware tools for Disaster Recovery & High Availability, it is possible for the users to recover ViPR SRM VMs to its working state by utilizing the standard tools provided by VMware in the event of a hardware failure of the host or any kind of disaster like scenario, thereby eliminating the need to deploy afresh and start all over again.

Page 20: IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with VMware HA 5 vSphere High Availability (HA) Figure 1: vSphere HA When using vSphere

20 Improving EMC ViPR SRM High Availability with VMware HA

References ViPR SRM documentation and release notes on https://support.emc.com/products/ ViPR SRM articles and related content on the EMC Community Network at https://community.emc.com/community/products/vipr VMware Inc, (n.d), VMware Compatibility Guide. Retrieved April 1, 2014, from http://www.vmware.com/resources/compatibility/search.php?action=base&deviceCategory=other

VMware Inc, Dec 14, 2012. Processors and guest operating systems that support VMware Fault Tolerance (1008027). Retrieved March 27, 2014, from http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1008027

VMware Inc, May 23, 2014. Retaining resource pools when disabling VMware DRS clusters in the vSphere Web Client (2032893). Retrieved March 27, 2014, from http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2032893

VMware Inc, (n.d). VMware vSphere 5.5 Documentation Center. Retrieved April 1, 2014, from http://pubs.vmware.com/vsphere-55/index.jsp VMware Inc, (n.d).VMware vCenter Site Recovery Manager 5.5 Documentation Center. Retrieved March 31, 2014, from http://pubs.vmware.com/srm-55/index.jsp VMware Inc, (n.d). VMware vCenterTM Site Recovery Manager Performance and Best Practices for Performance Architecting Your Recovery Plan to Minimize Recover Time. Retrieved March 27, 2014, from http://www.vmware.com/pdf/Perf_SiteRecoveryManager10_Best-Practices.pdf

VMware Inc, January, 2013. VMware vSphere High Availability 5.0 Deployment Best Practices. Retrieved March 27, 2014, from http://www.vmware.com/files/pdf/techpaper/vmw-vsphere-high-availability.pdf VMware Inc, (n.d). Automating High Availability (HA) Services with VMware HA. Retrieved March 27, 2014, from http://www.vmware.com/pdf/vmware_ha_wp.pdf

Page 21: IMPROVING STORAGE RESOURCE MANAGEMENT HIGH … · Improving EMC ViPR SRM High Availability with VMware HA 5 vSphere High Availability (HA) Figure 1: vSphere HA When using vSphere

21 Improving EMC ViPR SRM High Availability with VMware HA

VMware Inc, (n.d). Stretched Clusters and VMware vCenter Site Recovery Manager Understanding the Options Goals. Retrieved March 27, 2014, from http://www.vmware.com/files/pdf/techpaper/Stretched_Clusters_and_VMware_vCenter_Site_Recovery_Manage_USLTR_Regalix.pdf

Guthrie, Forbes, Scott D. Lowe, and Kendrick Coleman. VMware VSphere Design, 2nd Edition. Indianapolis, IN: Wylie Pub., 2013. Print.