Architecting a VMware vCloud® Availability for vCloud ... · Note Refer to the Architecting a...

75
1 | VMware vCloud ® Architecture Toolkit™ for Service Providers VMware vCloud ® Architecture Toolkit™ for Service Providers Architecting a VMware vCloud ® Availability for vCloud Director ® Solution Version 2.9 June 2018 Tomas Fojta Senior Staff Architect, VCDX

Transcript of Architecting a VMware vCloud® Availability for vCloud ... · Note Refer to the Architecting a...

1 | VMware vCloud® Architecture Toolkit™ for Service Providers

VMware vCloud® Architecture Toolkit™ for Service Providers

Architecting a VMware vCloud® Availability for vCloud Director® Solution

Version 2.9 June 2018

Tomas Fojta Senior Staff Architect, VCDX

Architecting a VMware vCloud Availability for vCloud Director Solution

2 | VMware vCloud® Architecture Toolkit™ for Service Providers

© 2018 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. This product is covered by one or more patents listed at http://www.vmware.com/download/patents.html.

VMware is a registered trademark or trademark of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies.

VMware, Inc. 3401 Hillview Ave Palo Alto, CA 94304 www.vmware.com

Architecting a VMware vCloud Availability for vCloud Director Solution

3 | VMware vCloud® Architecture Toolkit™ for Service Providers

Contents

Introduction .............................................................................................. 7

Use Cases ............................................................................................... 8

2.1 Disaster Recovery ............................................................................................................... 8

2.2 Migration ............................................................................................................................. 9

vCloud Availability Architecture Design Overview ................................. 10

3.1 vCloud Availability Architecture ........................................................................................ 10

3.2 Network Flows .................................................................................................................. 12

3.3 Conceptual Architecture ................................................................................................... 12

vCloud Availability Management Components ...................................... 14

4.1 Logical Architecture .......................................................................................................... 14

4.2 vCloud Availability Installer Appliance .............................................................................. 15

4.3 vCloud Availability Portal .................................................................................................. 17

4.4 vCloud Availability Administration Portal .......................................................................... 18

4.5 Cloud Proxy ...................................................................................................................... 19

4.6 RabbitMQ .......................................................................................................................... 24

4.7 Cassandra Database ........................................................................................................ 25

4.8 VMware Platform Services Controller ............................................................................... 26

4.9 vSphere Replication Cloud Service .................................................................................. 29

4.10 vSphere Replication Manager ..................................................................................... 29

4.11 vSphere Replication Servers ....................................................................................... 29

4.12 ESXi Hosts ................................................................................................................... 30

4.13 vCloud Availability Metering ......................................................................................... 32

4.14 vRealize Orchestrator .................................................................................................. 32

4.15 Management Component Resiliency Considerations.................................................. 34

vCloud Director Configuration ............................................................... 36

5.1 User Roles ........................................................................................................................ 36

5.2 Tenant Limits and Leases ................................................................................................. 39

5.3 Other Configurations ......................................................................................................... 40

5.4 Organization Virtual Data Center ...................................................................................... 40

5.5 Network Management ....................................................................................................... 43

5.6 Storage Management ....................................................................................................... 44

5.7 vApps and VMs ................................................................................................................. 47

Network Bandwidth Requirements ........................................................ 48

Architecting a VMware vCloud Availability for vCloud Director Solution

4 | VMware vCloud® Architecture Toolkit™ for Service Providers

Tenant Feature and Component Compatibility ...................................... 50

Billing ..................................................................................................... 51

vRealize Orchestrator Configuration ..................................................... 52

9.1 On-Premises Deployment ................................................................................................. 52

9.2 In-the-Cloud Deployment .................................................................................................. 52

9.3 Provider Deployment ........................................................................................................ 53

9.4 Failover Orchestration ....................................................................................................... 53

Monitoring .............................................................................................. 55

10.1 Component Monitoring ................................................................................................ 55

10.2 VM Replication Monitoring ........................................................................................... 57

10.3 Backup Strategy........................................................................................................... 58

Updates and Upgrades ......................................................................... 60

11.1 Cloud Proxies ............................................................................................................... 60

11.2 Portal Appliances ......................................................................................................... 60

11.3 vSphere Replication Components ............................................................................... 60

11.4 vCloud Availability Installer Appliance ......................................................................... 61

11.5 Third-Party Components .............................................................................................. 61

11.6 vSphere ........................................................................................................................ 61

Appendix A – Port Requirements and Firewall Rules ............................ 62

Appendix B – Glossary .......................................................................... 65

Appendix C – Maximums ...................................................................... 67

Appendix D – Reference Documents .................................................... 68

Appendix E – Tenant API Structure ...................................................... 69

Appendix F – Undocumented Hybrid Settings vCloud API .................... 70

Appendix G – Monitoring with vRealize Operations .............................. 74

Architecting a VMware vCloud Availability for vCloud Director Solution

5 | VMware vCloud® Architecture Toolkit™ for Service Providers

List of Tables Table 1. Components ............................................................................................................................. 10

Table 2. vCloud Availability for vCloud Director Portal Appliance Scaling ............................................ 18

Table 3. Number of Cloud Proxies ......................................................................................................... 19

Table 4. Example of Load Balancer Configuration ................................................................................ 20

Table 5. From-the-Cloud Specific Cloud Proxy Configuration ............................................................... 23

Table 6. FQDN Example ........................................................................................................................ 23

Table 7. Example of RabbitMQ Load Balancer Configuration ............................................................... 24

Table 8. Example of Cassandra Configuration ...................................................................................... 26

Table 9. Number of vSphere Replication Cloud Service Nodes ............................................................ 29

Table 10. Number of vSphere Replication Server Nodes ...................................................................... 29

Table 11. Load Balancer Configuration Example .................................................................................. 33

Table 12. Management Component Resiliency ..................................................................................... 34

Table 13. vCloud Director Role Adjustments ......................................................................................... 38

Table 14. Org VDC Replication Features .............................................................................................. 41

Table 15. Tenant Feature and Component Compatibility ...................................................................... 50

Table 16. Syslog Monitoring .................................................................................................................. 55

Table 17. Monitoring using vCloud API ................................................................................................. 55

Table 18. Monitoring using Service Manager Portal API ....................................................................... 56

Table 19. Replication Details ................................................................................................................. 58

Table 20. Backup Strategy..................................................................................................................... 58

Table 21. vCloud Availability Port Requirements (Service Provider) .................................................... 62

Table 22. vCloud Availability Port Requirements (Tenant) .................................................................... 64

Table 23. Glossary ................................................................................................................................. 65

Table 24. vCloud Availability Maximums ............................................................................................... 67

Table 25. vCloud Availability for vCloud Director Reference Documents ............................................. 68

Table 26. Monitoring with Endpoint Operations Management for vRealize Operations ........................ 74

Architecting a VMware vCloud Availability for vCloud Director Solution

6 | VMware vCloud® Architecture Toolkit™ for Service Providers

List of Figures Figure 1. vCloud Availability for vCloud Director Component Diagram ................................................. 10

Figure 2. Network Flows ........................................................................................................................ 12

Figure 3. vCloud Conceptual Architecture Overview ............................................................................. 13

Figure 4.Installer Appliance Registry File Example ............................................................................... 16

Figure 5. vCloud Availability Portal UI ................................................................................................... 17

Figure 6. vCloud Availability Administration Portal UI ........................................................................... 18

Figure 7. Cloud Proxy Application ......................................................................................................... 19

Figure 8. vCloud Director Cells and Cloud Proxies ............................................................................... 19

Figure 9. From-the-Cloud Tunnel Workflow ........................................................................................... 22

Figure 10. Cloud Proxy global.properties FQDN Override .................................................................... 22

Figure 11. Load Balancing Design Example .......................................................................................... 24

Figure 12. Cassandra Nodes Registration in Lookup Service Viewed Through JXplorer ..................... 27

Figure 13. Platform Service Controller Nodes in Single-Site Configuration .......................................... 28

Figure 14. Platform Service Controller Nodes in Multi-Site Configuration ............................................. 28

Figure 15. Provider vSphere Replication Traffic .................................................................................... 31

Figure 16. VMkernel vSphere Replication Traffic Types ....................................................................... 31

Figure 17. Network I/O Control Traffic Shaping of vSphere Replication Traffic Type ........................... 32

Figure 18. vCloud Director vSphere Replication Role ........................................................................... 36

Figure 19. Cloud Proxy-Related Rights ................................................................................................. 37

Figure 20. Tenant Connect to a Cloud Provider Dialog Box .................................................................. 38

Figure 21. Organization Limits ............................................................................................................... 39

Figure 22. Organization Leases ............................................................................................................. 39

Figure 23. Upload Quarantine Setting ................................................................................................... 40

Figure 24. Example of Org VDC Layout ................................................................................................ 42

Figure 25. Supported and Unsupported Org VDC Topologies .............................................................. 42

Figure 26. vSphere Replication Target Network Configuration ............................................................. 43

Figure 27. Tenant View of Storage Consumption in vSphere Web Client ............................................. 46

Figure 28. Sync Now of powered-off VM ............................................................................................... 47

Figure 29. vSphere Replication Calculator ............................................................................................ 48

Figure 30. Run Real Recovery to Cloud with IP Address Change ........................................................ 53

Figure 31. Update VM Address Workflow .............................................................................................. 53

Figure 32. To-the-Cloud Tenant APIs .................................................................................................... 69

Figure 33. From-the-Cloud Tenant APIs ................................................................................................ 69

Figure 34. Example of Process Configuration ....................................................................................... 75

Architecting a VMware vCloud Availability for vCloud Director Solution

7 | VMware vCloud® Architecture Toolkit™ for Service Providers

Introduction The VMware vCloud® Availability for vCloud Director® solution extends existing hybrid cloud offerings of VMware Cloud Providers™ on top of VMware vCloud Director with disaster recovery and migration services. With vCloud Availability for vCloud Director, vCloud Director can act as cloud replication endpoint for on-premises or co-located VMware vSphere® infrastructures and leverage hardware independence to VMware vSphere Replication™ technology.

This guide provides information on how to properly architect and design vCloud Availability for vCloud Director on top of a vCloud Director infrastructure.

Note Refer to the Architecting a VMware vCloud Director Solution for VMware Cloud Providers white paper1 for vCloud Director specific architectural considerations.

This release updates the changes in vCloud Availability for vCloud Director version 2.0.

1 https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/vcat/vmware-architecting-a-vcloud-director-solution.pdf

Architecting a VMware vCloud Availability for vCloud Director Solution

8 | VMware vCloud® Architecture Toolkit™ for Service Providers

Use Cases The vCloud Availability for vCloud Director solution uses a public vCloud Director endpoint as a target for vSphere Replication. vSphere Replication that is provided by the hypervisor supports essentially any virtualized workload without any dependence on program or operating system disaster recovery (DR) capability. vSphere version 5.5, 6.0, 6.5 and 6.7 environments are supported, while only vSphere version 6.x provides replication in both directions (to and from the cloud).

Although the solution is mainly used for disaster recovery, it can also be used for migration services to shorten workload downtime.

2.1 Disaster Recovery Description: In on-premises or co-located vSphere environment, you may want a simple disaster recovery (DR) solution that can replicate virtual workloads to a multitenant public cloud service. In case of a complete loss of the vSphere infrastructure, the tenant can power on all replicated workloads in the public cloud with minimal loss of data (RPO = 5 mins).

Typical Use Case Workflow: 1. Purchase cloud Organization Virtual Data Center (Org VDC) with DR option. You can purchase

additional Org VDCs without DR option to run permanent workloads that are protected by solutions other than storage-level replication (for example, Active Directory servers or databases).

2. Set up cloud networking infrastructure with Org VDC networks that use the same subnets as the on-premises or co-located environment. Optionally, additional networks (isolated networks) can be created for failover test.

3. Configure a new vCloud replication endpoint in the VMware vSphere® Web Client with its vCloud Director credentials.

4. Map the vSphere networks in vSphere Web Client with vCloud Org VDC networks for failover and failback.

5. Enable protection in vSphere Web Client for each necessary workload. Recovery Point Objective (RPO) and number of point-in-time snapshots (vSphere version 6.x feature only) must be specified. RPO varies among workloads. Guest quiescing and replicated data compression are supported as well.

6. Run initial synchronization for each protected workload before the workload becomes protected. Using pre-seeding of cold clones can speed up the synchronization.

7. Monitor vSphere Web Client replication dashboard to see if all protected workloads satisfy and comply with the required RPO.

Expected Result:

• You can at any time initiate vSphere Web Client test failover for a given subset of protected workloads. The failed over workloads will be connected to vCloud Org VDC networks designated as test networks and powered on.

• In case of loss in on-premises or co-located vSphere environment, you can initiate failover. Replicated workloads will be connected to the vCloud Org VDC networks designated as recovery networks. You can initiate failover from the vCloud Availability Portal, from within the vSphere Web Client or alternatively through the vCloud API.

• After on-premises vSphere environment is rebuilt, you can reverse replication to prepare the workloads for a failback (only supported with vSphere version 6.x on-premises).

Architecting a VMware vCloud Availability for vCloud Director Solution

9 | VMware vCloud® Architecture Toolkit™ for Service Providers

Other comments:

• Metering for VMware licensing is based on the number of protected VMs.

• Tenant billing is based on resource use (CPU, memory, disk space).

The DR can be offered as an optional managed service.

2.2 Migration Description:

You can migrate a large number of workloads to the public cloud with minimal downtime.

Typical Use Case Workflow: 1. Request DR option for existing or new Organization Virtual Data.

2. Set up cloud networking infrastructure with Org VDC networks that use the same subnets as the on-premises or co-located environment.

3. Configure the new vCloud replication endpoint in vSphere Web Client with its vCloud Director credentials.

4. Configure the vSphere networks in vSphere Web Client to map to the vCloud Org VDC networks for failover.

5. Enable replication for each workload that will be migrated in the vSphere Web Client. RPO, point-in-time snapshots, and guest quiescing are not critical for migration. Data compression can be optionally enabled.

6. After the initial synchronization of each protected workload has finished, migrate each VM. To secure minimal workload downtime, migration is performed in following way:

• New delta synchronization is performed.

• vSphere workload is shut down.

• Last data synchronization is performed.

• Workload is powered on in the cloud.

Expected Result:

• Workload is migrated to the public cloud with no data loss and with minimal downtime (minutes)

Other comments:

• vCloud Availability licensing is not applicable for workload replicated for migration purposes.

• Tenant billing is based on resource use (CPU, memory, disk space).

The DR can be offered as an optional managed service.

Architecting a VMware vCloud Availability for vCloud Director Solution

10 | VMware vCloud® Architecture Toolkit™ for Service Providers

vCloud Availability Architecture Design Overview

3.1 vCloud Availability Architecture VMware vSphere Replication is the underlying replication technology that is used by vCloud Availability for vCloud Director. vSphere Replication provides a hypervisor-based, asynchronous replication of VMs, independent from the underlying storage hardware, which is essential for replications to and from the public cloud with disparate hardware used.

The vSphere Replication Agent is built into the VMware ESXi™ hypervisor, and with the help of the vSCSI filter, tracks block changes to virtual disks (VMDKs) of running VMs. Depending on the required RPO, the hypervisor then transmits the changed blocks in regular intervals to the target environment.

The following figure shows the components of vCloud Availability for vCloud Director.

Figure 1. vCloud Availability for vCloud Director Component Diagram

Table 1. Components

Component Description

VMware vCloud Director Existing vCloud Director components consist of multiple vCloud Director cells, vCloud Director database (VCD-SP DB), vSphere (vCenter Server, ESXi hosts and datastores) and VMware NSX® resources.

RabbitMQ AMQP message broker provides communication between vCloud Director and the other components that extend vCloud Director functionality.

Architecting a VMware vCloud Availability for vCloud Director Solution

11 | VMware vCloud® Architecture Toolkit™ for Service Providers

Component Description

vSphere Replication Cloud Service It is a tenant-aware replication manager that extends vCloud API with vCloud Availability for vCloud Director APIs and manages replications and failovers.

CassandraTM Cassandra noSQL database cluster is used for replication metadata, and for configuration of vCloud Availability for vCloud Director infrastructure components.

Cloud Proxy Cells vCloud Director cell-like components are to tunnel replicated traffic to and from the cloud and establish control connections with on-premises vSphere Replication Appliance.

vCloud Availability Portal The appliance provides a tenant-level UI portal to manage replications in the cloud.

vCloud Availability Service Manager (Administration) Portal

The appliance provides service provider UI to manage and monitor the whole vCloud Availability environment.

vSphere Replication Manager Server The component manages the replication process of tenant VMs. The ratio of Replication Manager Server to the cloud resource VMware vCenter Server® instances is 1 to 1.

vSphere Replication Server Replication server receives the replication data and records the changes for each replicated VM. The ratio of Replication Server to the cloud resource vCenter Server instances is 1 to many.

vSphere Replication Appliance The on-premises vSphere Replication component combines both manager and replication server functionality. It also contains the vCloud Tunneling Agent that stops the to-cloud and from-cloud replication tunnels. vSphere Replication Appliance can be extended with deployment of additional vSphere Replication Server instances for scale.

VMware Platform Services Controller™ An external Platform Services Controller runs the lookup service necessary for detection of vCloud Availability components (Cassandra nodes, vSphere Replication Manager Server, and vSphere Replication Cloud Service), and provides secure token service to authenticate replication solution users to vCloud Director.

vCloud Availability Installer The appliance is used to install, configure and run day 2 operations of vCloud Availability.

Architecting a VMware vCloud Availability for vCloud Director Solution

12 | VMware vCloud® Architecture Toolkit™ for Service Providers

3.2 Network Flows The following diagram shows network flows between all components of the vCloud Availability for vCloud Director solution.

Figure 2. Network Flows

3.3 Conceptual Architecture The following diagram shows all vCloud Director and vCloud Availability for vCloud Director components in a production environment where a highly available and scalable configuration is used.

Architecting a VMware vCloud Availability for vCloud Director Solution

13 | VMware vCloud® Architecture Toolkit™ for Service Providers

Figure 3. vCloud Conceptual Architecture Overview

Architecting a VMware vCloud Availability for vCloud Director Solution

14 | VMware vCloud® Architecture Toolkit™ for Service Providers

vCloud Availability Management Components

4.1 Logical Architecture There are two types of vCloud Availability for vCloud Director components—vCloud Director management components that are coupled with each vCloud Director instance, and vSphere management components that are required for each vCloud Resource vCenter Server instance.

4.1.1 vCloud Director Management Components • The vCloud Availability Portal is a virtual appliance that provides a graphical user interface for

tenants to facilitate the management of vCloud Availability for vCloud Director operations.

• The vCloud Availability Administration Portal (also called Service Manager Portal) is a single appliance used by the Service Provider to manage and monitor use of the whole vCloud Availability solution.

• Existing vCloud Director cells are extended with Cloud Proxies. The Cloud Proxy nodes use the same Red Hat Enterprise Linux / CentOS guest OS and vCloud Director binaries as the cells, but have most of the vCloud services disabled. The Cloud Proxies still require to have access to the vCloud database and to the NFS transfer share. They can be load balanced with different public virtual IP addresses (VIPs) and have different SSL certificates than the vCloud Director cells. Deployment of Cloud Proxies scales out with the number of concurrent tunnels.

• RabbitMQ is used for vCloud API extension to provide communication between vCloud Director cells and vSphere Replication Cloud Service. To obtain high availability, deploy at least two clustered nodes with mirrored queues in RabbitMQ. The nodes must be behind a load balancer and presented under one VIP.

• Cassandra noSQL database cluster is used for vSphere Replication Cloud Service replication metadata and configuration storage. To obtain high availability, deploy at least three nodes to provide quorum consistency level, with replication factor 3. A network load balancer is not required to access the Cassandra database because vSphere Replication Cloud Service automatically chooses an available node to communicate with. The node is detected through the Platform Services Controller lookup service where each Cassandra node is registered.

• vSphere Replication Cloud Service is a virtual appliance that provides the management functionality and APIs for vCloud Availability for vCloud Director. To obtain high availability, deploy at least two appliances without the requirement for a network load balancer. The appliances use a common Cassandra database and communicate with vCloud Director through the RabbitMQ message bus. The appliances detect other service components through Platform Services Controller lookup service.

• VMware Platform Service Controller provides lookup service to detect vCloud Availability for vCloud Director components (Cassandra nodes, vSphere Replication Manager Server, and vSphere Replication Cloud Service), and provides secure token service to authenticate replication solution users to vCloud Director. To obtain high availability, deploy an external Platform Services Controller in form of VMware vCenter Server Appliance except in the case of Platform Services Controller services disabled or Windows installable. This Platform Services Controller can be separate from Platform Services Controller instances used by Resource Group vCenter Server instances but must be within the same vSphere Single Sign-on domain. Although a two-node load balanced Platform Services Controller setup in a single site is recommended, a single node Platform Services Controller protected with VMware vSphere High Availability might be sufficient, depending on required availability SLAs.

• vCloud Availability for vCloud Director Installer is a virtual appliance that simplifies the deployment, configuration, and management of all vCloud Availability for vCloud Director components.

Architecting a VMware vCloud Availability for vCloud Director Solution

15 | VMware vCloud® Architecture Toolkit™ for Service Providers

4.1.2 vSphere Management Components • Each Resource Group vCenter Server is coupled with vSphere Replication Manager, which is

provided as a virtual appliance. vSphere Replication Manager must be registered in the same vSphere Single Sign-on domain as the vCloud Director Platform Services Controller. vSphere Replication Manager is protected with vSphere HA.

• Multiple vSphere Replication servers are deployed for each vSphere Replication Manager Server. vSphere Replication servers receive the replication streams from the tunnel provided by cloud proxies and upload the data to ESXi hosts. vSphere Replication servers are deployed in scale-out fashion based on the number of replicated VMs.

4.2 vCloud Availability Installer Appliance The vCloud Availability Installer appliance is used to install and configure vCloud Availability solution and to run commands for day 2 operations, such as certificate replacements, storage migrations and upgrades.

It contains necessary OVF files to deploy the following appliances:

• vCloud Availability Portal

• vCloud Availability Service Manager Portal

• vSphere Replication Manager Server

• vCloud Replication Cloud Service

• vSphere Replication Servers

It can also optionally deploy Docker host appliance with Cassandra and RabbitMQ containers to simplify proof-of-concept installations. Such containerized deployment of 3rd party components should not be used in production environments.

Note The Installer appliance does not deploy Cloud Proxies because they are deployed from the same binaries as vCloud Director.

The installation and configuration process is managed with CLI vcav commands and text registry file which describes the whole desired or existing deployment.

The registry file contains vSphere endpoint(s) where management appliances will be deployed with compute, network and storage locations. There might be multiple entries for each (for example, Management, Management.DMZ, Resource-01, Resource-02, and so on) if necessary.

The vCloud Availability installation (or reconfiguration) usually consists of running the following vcav commands:

• vcav prevalidate … pre-validates all tasks required for a complete deployment

• vcav resume … initiates deployment

• vcav cancel … resets ongoing deployment

• vcav status … lists status of all tasks of an ongoing deployment

It is also possible to deploy and configure each component with a specific vcav command.

Architecting a VMware vCloud Availability for vCloud Director Solution

16 | VMware vCloud® Architecture Toolkit™ for Service Providers

Figure 4. Installer Appliance Registry File Example

Architecting a VMware vCloud Availability for vCloud Director Solution

17 | VMware vCloud® Architecture Toolkit™ for Service Providers

The day 2 operations include follows:

• Configure to-the-cloud and from-the-cloud replication endpoint addresses

• List vCloud Director organizations and Organization Virtual Datacenters (Org VDC)

• Enable an Org VDC for replications

• Add replication rights to a specific organization or role

• List and delete replications of a particular organization

• List and delete existing tunnels

• Move replications between datastores and vSphere Replication Server instances

• Consolidate VM point-in-time snapshot

• Manage Appliance certificate

• Uninstall appliances, solution users and vCloud API extension

4.3 vCloud Availability Portal The vCloud Availability for vCloud Director portal is a virtual appliance that provides a graphical user interface for tenants to manage their replications when they have no access to on-premises vSphere Web Client user interface. The portal provides a web-based user interface on TCP port 8443 and requires a vCloud API connection.

Figure 5. vCloud Availability Portal UI

The appliance is stateless. With a network load balancer supporting sticky sessions, the appliance can be deployed in highly available and horizontally scalable configuration. Additionally, the appliance can be scaled up vertically in three different virtual hardware configurations, depending on the number of concurrent sessions each appliance must serve.

Architecting a VMware vCloud Availability for vCloud Director Solution

18 | VMware vCloud® Architecture Toolkit™ for Service Providers

Table 2. vCloud Availability for vCloud Director Portal Appliance Scaling

Deployment Type vCPU RAM Java VM Concurrent Sessions

Small 2 2 GB 512 MB 150

Medium 2 4 GB 1.5 GB 400

Large 4 6 GB 3 GB 800

4.4 vCloud Availability Administration Portal The vCloud Availability Administration (Service Manager) portal is a single virtual appliance that provides a graphical user interface for service provider to monitor use and perform day 2 operations in the vCloud Availability environment. The portal provides a web-based user interface on TCP port 8443 and requires a vCloud API and RabbitMQ connection.

Figure 6. vCloud Availability Administration Portal UI

Although the appliance contains internal MongoDB database, it is used only for caching of resource intensive data, and therefore is essentially stateless. RabbitMQ connection is used for receiving real-time updates. vCloud API connection is used for retrieving inventory data (12-hour intervals), authentication and day 2 operations. The service provider can trigger storage migrations (relocation of replication to a different datastore), scrubbing stale migrations (for example, replications with missing placeholder vApps) and tenant impersonation feature (system administrator is redirected to tenant portal logged in as an organization user).

Service provider has access to dashboards and reports monitoring use of compute and storage resources of replicated workloads grouped by Organizations or Org VDCs with the ability to navigate.

Architecting a VMware vCloud Availability for vCloud Director Solution

19 | VMware vCloud® Architecture Toolkit™ for Service Providers

4.5 Cloud Proxy Cloud proxies are similar to vCloud Director cells. The Cloud proxies are installed from the same binary on RHEL/CentOS VM using the responses.properties file, and they require access to the vCloud Director database and vCloud Director cell NFS transfer share. Cloud proxies, however, have one vCloud Director service disabled that requires editing the global.properties configuration file.

Figure 7. Cloud Proxy Application

Although in theory a single cell can provide both vCloud Director (UI, API, remote console) and cloud proxy services, such a setup should be used only for test and lab purposes. For the production environments, multiple cloud proxies should be deployed based on expected VM replication count.

Table 3. Number of Cloud Proxies

Cloud Proxies Replicated VMs

2 500

3 5000

5 10000

Note Although cloud proxies are displayed as regular cells in the vCloud Director Administrator UI, they do not count against vCloud Director maximums (see Appendix C – Maximums).

Figure 8. vCloud Director Cells and Cloud Proxies

Timekeeping of the cloud proxies is critical for correct vCloud Availability for vCloud Director functionality. The same NTP-based internal time source must be used for the whole solution.

4.5.1 Cloud Proxy Load Balancing Configurations with multiple cloud proxies require an external network load balancer to manage traffic among the clustered proxies. The replicated traffic can come from the internet and internally from the cloud. Due to the volume of traffic, deploy dedicated load balancers for each replication direction.

Architecting a VMware vCloud Availability for vCloud Director Solution

20 | VMware vCloud® Architecture Toolkit™ for Service Providers

Both the public cloud proxy VIP endpoint (FQDN) for to-the-cloud tunnel termination and the internal IP address VIP for from-the-cloud traffic (used by ESXi host-based replication) must be specifically configured in vCloud Director either with the vcav CLI or with a vCloud API call (see Appendix F – Undocumented Hybrid Settings vCloud API for more information):

• vcav CLI, provided by the vCloud Availability for vCloud Director installer appliance:

vcav vcd set-cloud-proxy \\ --to-the-cloud-address=<cloud-proxy-fqdn> \\ --from-the-cloud-address=<from-cloud-IP> \\ --vcd=<vcd instance>

• vCloud API:

PUT /api/admin/hybrid/settings

Headers:

Accept: application/*+xml;version=6.0 Content-Type: application/vnd.vmware.vcloud.hybridSettings+xml

Body:

<HybridSettings xmlns="http://www.vmware.com/vcloud/v1.5"> <CloudProxyBaseUriOverride>wss://<cloud-proxy-fqdn>:443/socket/cloudProxy</CloudProxyBaseUriOverride> <CloudProxyFromCloudTunnelHostOverride><from-cloud-IP></CloudProxyFromCloudTunnelHostOverride> </HybridSettings>

Table 4. Example of Load Balancer Configuration

Attribute Specification

Cloud Proxy FQDN (TCP 443) <public VIP>

From Cloud IP (TCP 31031) <internal VIP>

Application type TCP

SSL mode SSL passthrough

Persistence SSL Session ID

Pool members <CP1 internal IP> <CP2 internal IP>

Pool health check TCP

Pool algorithm LEASTCONN

Pool TCP ports 443 and 31031

Architecting a VMware vCloud Availability for vCloud Director Solution

21 | VMware vCloud® Architecture Toolkit™ for Service Providers

4.5.2 From-the-Cloud Tunnel The from-the-cloud replication is a complex process that involves a control connection (initiated by the on-premises vSphere Replication vCloud Tunneling Agent) and replication data coming from the vSphere Replication agent (on the cloud ESXi host) to internally load balance cloud proxies. To properly “stitch” external and internal parts of the tunnel, the following mechanism is used:

1. The vSphere Replication vCloud Tunneling Agent opens a control connection (based on CloudProxyBaseUri set in vCloud API /hybrid/settings). See Appendix F – Undocumented Hybrid Settings vCloud API. The connection is forwarded by a cloud proxy load balancer to a cloud proxy (in Figure 9, it is Cloud Proxy 2).

2. The ESXi host starts a vSphere Replication session for a given replication group-id.

3. An internal load balancer forwards the TCP connection to an arbitrary cloud proxy (based on CloudProxyFromCloudTunnelHost configured in vCloud API /hybrid/settings). In Figure 9, it is Cloud Proxy 1.

4. The cloud proxy (through the vSphere Replication content-aware plug-in) decodes the group-id and requests the vSphere Replication Cloud Service to resolve it to a destination-id and tenant-id. The cloud proxy acts as the vSphere Replication destination and “fabricates” the replication reply frames expected by the ESXi host until the real replication connection from the on-premises vSphere Replication Agent (relayed by the vSphere Replication vCloud Tunneling Agent) joins.

5. The cloud proxy broadcasts a reverse connection request over the internal message bus for the resolved tunnel (destination-id, tenant-id). The cloud proxy provides reverse Fully Qualified Domain Name (FQDN) based on global.properties override. See Figure 10.

6. Cloud Proxy 2 detects a control connection corresponding to the given destination-id.

7. Cloud Proxy 2 sends a reverse connection request to the specific vSphere Replication vCloud Tunneling Agent over the corresponding control connection and announces the FQDN of Cloud Proxy 1 to connect.

8. The vSphere Replication vCloud Tunneling Agent opens the requested reverse connection (to the specific Cloud Proxy 1 FQDN).

9. Firewall forwards (Destination NAT) to specific Cloud Proxy 1.

10. Cloud Proxy 1 “stitches” the incoming vSphere Replication vCloud Tunneling Agent reverse connection with the pending context and verifies that the corresponding stitched replication sessions are consistent with each other.

Architecting a VMware vCloud Availability for vCloud Director Solution

22 | VMware vCloud® Architecture Toolkit™ for Service Providers

Figure 9. From-the-Cloud Tunnel Workflow

Figure 10. Cloud Proxy global.properties FQDN Override

Architecting a VMware vCloud Availability for vCloud Director Solution

23 | VMware vCloud® Architecture Toolkit™ for Service Providers

Table 5. From-the-Cloud Specific Cloud Proxy Configuration

Attribute Cloud Proxy 1 Cloud Proxy 2

Public IP <CP1 public IP> <CP2 public IP>

Internal IP <CP1 internal IP> <CP2 internal IP>

Fully Qualified Domain Name <FQDN CP1> <FQDN CP2>

DNAT DNAT: CP#N public IP:TCP 443 > CP#N internal IP:TCP 443

global.properties: cloudproxy.reverseconnection.fqdn=

<FQDN CP1>:443

<FQDN CP2>:443

4.5.3 Cloud Proxy Certificates A publicly trusted certificate for the FQDN of cloud proxy VIP must be imported to the cloud proxy cell with the alias http. Another certificate must be available with the alias consoleproxy even though it will be unused. This is because the cloud proxies should not be used for VM console proxy sessions. Both certificates are imported with the same process as vCloud Director cell certificates.

Because each cloud proxy is accessed from the internet under two different FQDNs (cloud proxy VIP and specific cloud proxy FQDN in Table 5), the http certificate must match both FQDNs. The easiest approach is to use a wildcard certificate. If that is not possible, certificates with Subject Alternate Name (listing both FQDNs) can be used instead.

Table 6. FQDN Example

Attribute Specification

vCloud Director UI/API vcloud.example.com

vCloud Remote Console Proxy console.example.com

vCloud Availability Portal availability.example.com

Cloud Proxy VIP vip.proxy.example.com

Cloud Proxy 1 cp1.proxy.example.com

Cloud Proxy 2 cp2.proxy.example.com

Certificate CN on Cloud Proxy Nodes

*.proxy.example.com

Architecting a VMware vCloud Availability for vCloud Director Solution

24 | VMware vCloud® Architecture Toolkit™ for Service Providers

4.5.4 vCloud API Load Balancing Multiple internal management components, such as vSphere Replication Cloud Service, vCloud Availability Administration portal, VMware vCloud® Usage Meter, and VMware vRealize® Orchestrator™, require access to the vCloud API. If the public vCloud API endpoint is not routable from the internal management networks, a separate internal VIP can be configured on the VCD DMZ network. Split-brain DNS or local hosts entries can then be used to provide correct internal VIP resolution of the vCloud API FQDN.

Figure 11. Load Balancing Design Example

4.6 RabbitMQ vCloud can extend vCloud Director with external systems through the AMQP message bus provided by the RabbitMQ highly available cluster.

To obtain high availability, configure at least two RabbitMQ load-balanced nodes running with RabbitMQ clustering enabled and mirrored queues.

RabbitMQ must have SSL/TLS communication enabled because vSphere Replication Cloud Service uses an encrypted connection. Optionally, the vCloud Director connection can be encrypted as well.

Table 7. Example of RabbitMQ Load Balancer Configuration

Attribute Specification

Virtual IP …

Port 5671

Protocol TCP

Pool Members …

Persistence None

Internet

Internet Firewall

Internet DMZ

VCD Cell LB

VCD DMZ

Cloud Proxy LB

VCD Cells Cloud Proxies

From the Cloud LB

Proxy-VIPProxy1-VIPProxy2-VIP ...

From-the-cloud-VIPvCloud API internal VIP

http and consoleproxy VIPs

vCloud Availability Portal LB

portal VIP

vCAv Portal

Architecting a VMware vCloud Availability for vCloud Director Solution

25 | VMware vCloud® Architecture Toolkit™ for Service Providers

Attribute Specification

Application Type TCP

LB Algorithm LEASTCONN

Health check Default TCP Monitor

The RabbitMQ SSL configuration (/etc/rabbitmq/rabbitmq.conf) is as follows: [

{ssl, [{versions, ['tlsv1.2', 'tlsv1.1', tlsv1]}]},

{rabbit, [

{ssl_listeners, [5671]},

{ssl_options, [{cacertfile,"/etc/rabbitmq/server/cacert.pem"},

{certfile,"/etc/rabbitmq/server/cert.pem"},

{keyfile,"/etc/rabbitmq/server/key.pem"},

{versions, ['tlsv1.2', 'tlsv1.1', tlsv1]},

{ciphers, ["ECDHE-ECDSA-AES256-GCM-SHA384”,…]}

{verify,verify_none},

{fail_if_no_peer_cert,false}]}]}

].

All nodes must have identical certificate files: cacert.pem (SSL certificate of the signing Certificate Authority), cert.pem (RabbitMQ node certificate), and key.pem (private key of the RabbitMQ node certificate). The common name of cert.pem must be the FQDN of the load balancer VIP or by using a wildcard certificate.

4.7 Cassandra Database The Cassandra noSQL database cluster is used for vSphere Replication Cloud Service replication metadata and configuration storage. vSphere Replication Cloud Service requires client to node encryption. To obtain high availability, deploy three nodes to provide quorum consistency, with a replication factor of 3. A load balancer is not required for accessing the Cassandra database, because vSphere Replication Cloud Service automatically chooses an available node to communicate with. The node detection is performed by vRCS server upon its startup through the Platform Services Controller lookup service where each Cassandra node must be registered. In case that multiple Cassandra cluster instances are existing in the same vSphere Single Sign-On domain, specify vSphere Single Sign-On site name text file filter on vRCS server in /opt/vmware/hms/conf/sites. Enter vSphere Single Sign-on site name(s) on separate lines.

Cassandra default database (keyspace) name is vr2c and is configured in hcs-config.properties file on vRCS server (/opt/vmware/hms/conf/hcs-config.properties).

Cassandra is available as Apache Software Foundation project or commercially by Datastax who provides enterprise and community editions.

Architecting a VMware vCloud Availability for vCloud Director Solution

26 | VMware vCloud® Architecture Toolkit™ for Service Providers

Table 8. Example of Cassandra Configuration

Attribute Specification

Nodes 3

Seeds node 1 IP address, node 2 IP address

Endpoint Snitch GossipingPropertyFileSnitch

Internode encryption All

Client encryption Enabled = True

Optional = True

Required client authentication False

Certificates Self-signed certificates

4.8 VMware Platform Services Controller vSphere Replication Cloud Service relies on the Platform Services Controller infrastructure to detect other services and for vCloud Director authentication. The following requirements must be met:

• All resource vCenter Server instances must be registered in a single Platform Services Controller vSphere domain (for example, vsphere.local).

• All Cassandra nodes must be registered in the Platform Services Controller lookup service.

• The vSphere Replication Manager Server must be registered in the Platform Services Controller lookup service.

• vCloud Director must be federated with the Platform Services Controller.

• The Platform Services Controller solution user for each vSphere Replication Cloud Service node is added to vCloud Director system administrator users.

Architecting a VMware vCloud Availability for vCloud Director Solution

27 | VMware vCloud® Architecture Toolkit™ for Service Providers

Figure 12. Cassandra Nodes Registration in Lookup Service Viewed Through JXplorer

As soon as more than one resource vCenter Server is deployed, you must have an external Platform Services Controller because you cannot have multiple embedded Platform Services Controller instances in a single Platform Services Controller domain (https://kb.vmware.com/kb/2108548).

An existing resource vCenter Server with an embedded Platform Services Controller can be re-pointed to an external Platform Services Controller if required. To do this, use the instructions documented at http://vmw.re/emb2ext.

High availability for Platform Services Controller instances can be achieved with VMware vSphere High Availability or by deploying a load balancer in front of two nodes. For more information, see https://kb.vmware.com/kb/2113315.

The Platform Services Controller (vSphere 6.5) has the following scalability constraints:

• 10 Platform Services Controller nodes per domain

• 4 Platform Services Controller nodes per Platform Services Controller site (behind a load balancer)

• 15 vCenter Server nodes per Platform Services Controller domain

To fulfill these requirements and constraints while keeping the complexity of the solution low, you can use the Platform Services Controller design options described in the following sections.

4.8.1 Single-Site vCloud Director management components are co-located at the same site as the resource infrastructure. A load balanced, highly available pair of Platform Services Controller nodes is used for all components.

Architecting a VMware vCloud Availability for vCloud Director Solution

28 | VMware vCloud® Architecture Toolkit™ for Service Providers

Figure 13. Platform Service Controller Nodes in Single-Site Configuration

4.8.2 Multi-Site Each availability zone (site) has a shared Platform Services Controller node with a dedicated Platform Services Controller site name (Site A, Site B). Each Platform Services Controller node can be used for multiple resource vCenter Server nodes that are installed in the same site and for related resource group management components (VMware NSX Manager™, vSphere Replication Manager).

A third Platform Services Controller node is deployed with Platform Services Controller Site C. This Platform Services Controller node is then used solely by non-resource group components, such as vCloud Director, vSphere Replication Cloud Service servers, and Cassandra., The Platform Services Controller is not site-specific – it can run on any site and fail over to another site (for example, with Site Recovery Manager protection).

Figure 14. Platform Service Controller Nodes in Multi-Site Configuration

Note All Platform Services Controller nodes must be in the same Single Sign-On domain (for example, vsphere.local). It is also possible to deploy each node in a highly available configuration with two nodes in the same site behind a load balancer (as in the single-site case).

PSC domain: vsphere.local

ResourcevCenter Server 1

PSC Node 2Default-Site

vCloud DirectorvRCS

PSC Node 1 Default-Site

vRMSVC1

ResourcevCenter Server 2

vRMSVC2

LoadBalancer

PSC domain: vsphere.local

ResourcevCenter Server 1

ResourcevCenter Server 2

PSC Node 1Site A

PSC Node 2Site B

vCloud DirectorvRCS

PSC Node 3Site C

Availability Zone A Availability Zone B

SRM protected between sites

vRMSVC1

vRMSVC2

ResourcevCenter Server 3

ResourcevCenter Server 4

vRMSVC3

vRMSVC4

Architecting a VMware vCloud Availability for vCloud Director Solution

29 | VMware vCloud® Architecture Toolkit™ for Service Providers

4.9 vSphere Replication Cloud Service vSphere Replication Cloud Service virtual appliances provide management functionality and APIs for vCloud Availability for vCloud Director.

To obtain high availability, deploy a pair of vSphere Replication Cloud Service nodes in active – active configuration without a load balancer because they share a common Cassandra database and RabbitMQ message bus for communication with vCloud Director.

Table 9. Number of vSphere Replication Cloud Service Nodes

vSphere Replication Cloud Service Nodes

Replicated VMs

1 Only for PoCs

2 0-3000

3 3000-10000

4.10 vSphere Replication Manager There must be one vSphere Replication Manager appliance for each resource group vCenter Server. For vSphere Replication Manager Server nodes, high availability is achieved with the vSphere HA mechanism

vSphere Replication Manager can use the internal vPostgreSQL database, an external Microsoft® SQL database or an external Oracle® database. All replication configurations in a paired vCenter Server are stored in the database. From a recoverability perspective, it is therefore important to provide continuous backup for the database. This is more easily accomplished with the external database because only a VMware vSphere Storage APIs – Data Protection based backup mechanism can be used for protection of vPostgreSQL. The configuration of external database is performed through appliance VAMI interface before vCloud Availability configuration.

4.11 vSphere Replication Servers Multiple vSphere Replication Servers can be deployed for each vSphere Replication Manager Server. They receive the replication data from the tunnel provided by cloud proxies and upload them to ESXi hosts. A high-bandwidth uplink to ESXi vmknic ports is recommended. One vSphere Replication Server can be deployed for every 250 replicated VMs with a minimum configuration of two vSphere Replication Servers.

Table 10. Number of vSphere Replication Server Nodes

vSphere Replication Server Nodes

Replicated VMs

2 250

2 500

4 1000

… …

Architecting a VMware vCloud Availability for vCloud Director Solution

30 | VMware vCloud® Architecture Toolkit™ for Service Providers

Each replication is associated with one vSphere Replication Server. However, a replication can be manually moved to another vSphere Replication Server with an API call or vcav switch-vr command from the Installer appliance. When the vSphere Replication Server fails, recovery and reconfiguration of the associated replications is not possible. For this reason, it is essential to protect all vSphere Replication Servers.

GET /api/admin/extension/vr/vrServers … returns vSphere Replication Server Node references (including UUID)

GET /api/admin/extension/vr/vrs/<vrs-id>/statistics2 … returns replication count, disk count and data transfer statistics.

The following API call moves a specific replication to another vSphere Replication Server:

POST /api/vr/replications/<replication-id>/action/switchVrServer

Headers:

Accept: application/*+xml;version=20.0;vr-version=4.0

Content-Type: application/vnd.vmware.hcs.switchVrServerParams+xml

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<ns2:SwitchVrServerParams xmlns="http://www.vmware.com/vcloud/v1.5" xmlns:ns2="http://www.vmware.com/vr/v6.0">

<ns2:vrServerUuid><vrs-uuid></ns2:vrServerUuid>

</ns2:SwitchVrServerParams>

4.12 ESXi Hosts The cloud ESXi hosts already run vSphere Replication Agent and vSCSI filter because these services are embedded in the hypervisor. Incoming replications are coming through vSphere Replication Servers over Network File Copy protocol (NFC). Outgoing replications go directly from hosts to Cloud Proxies (HBR port TCP 31031).

2 https://vdc-repo.vmware.com/vmwb-repository/dcr-public/9673486d-16ff-4673-b18c-ea87d2315ff7/ad7f6352-26f3-4f38-8d42-4e275288d4d9/operations/GET-HbrServerStatistics.html

Architecting a VMware vCloud Availability for vCloud Director Solution

31 | VMware vCloud® Architecture Toolkit™ for Service Providers

Figure 15. Provider vSphere Replication Traffic

In vSphere, you can specify which VMkernel ports must be used for both traffic types, so it is possible to dedicate specific network uplinks.

Figure 16. VMkernel vSphere Replication Traffic Types

In most cases, vSphere Replication traffic shares the same uplink with other traffic types (Management, vSphere vMotion, and so on). In such cases, VMware vSphere Network I/O Control on VMware vSphere Distributed Switch™ can be used to define service quality by setting shares, reservations, or limits.

Architecting a VMware vCloud Availability for vCloud Director Solution

32 | VMware vCloud® Architecture Toolkit™ for Service Providers

Figure 17. Network I/O Control Traffic Shaping of vSphere Replication Traffic Type

Make sure that every resource ESXi host can reach the from-the-cloud endpoint (usually Cloud Proxy internal load balancer – see Section 4.5.1) on port 31031. This process can be tested with the following command: cat /dev/zero | nc -v <from-the-cloud-IP> 31031

4.13 vCloud Availability Metering vCloud Availability is licensed on top of VMware Cloud Provider Program bundles with 18 points per protected VM per month.

Note The VM that is replicated for migration is not subject to vCloud Availability licensing if its migration is fully finished within 30 days.

The metering is provided by vCloud Usage Meter version 3.6 or later. As vCloud Usage Meter cannot automatically distinguish the replication purpose for disaster recovery or for migration purposes, the service provider license administrator must subtract migrated workloads from the end-of-month report manually.

4.14 vRealize Orchestrator VMware vRealize® Orchestrator™ provides service orchestration. It can automate tasks across VMware products leveraging vCloud API, VIM API, NSX API, and so on. Using generic plug-ins (SSH, SOAP, HTTP REST, SQL, PowerShell, or Active Directory) or third-party specific plug-ins, vRealize Orchestrator can orchestrate other systems as well.

vRealize Orchestrator has two distinct roles in vCloud Director environments:

• Act as an extension that is subscribed to RabbitMQ, and consume vCloud Director messages. In this role, vRealize Orchestrator extends vCloud Director by providing additional services (backup, additional controls, CMDB integration, and so on).

Architecting a VMware vCloud Availability for vCloud Director Solution

33 | VMware vCloud® Architecture Toolkit™ for Service Providers

• Act as an orchestrator for common onboarding or tenant life cycle tasks. The tasks are triggered by an external portal (VMware vRealize Automation™ Advanced Service Designer, or the vRealize Orchestrator REST API). By using vRealize Orchestrator plug-ins, you can configure the tenant service end-to-end.

For vCloud Availability use cases, the second role can be used to provide tenant onboarding and managing disaster recovery failover for fully provider managed services.

To obtain high availability, deploy two vRealize Orchestrator appliances in active – standby configuration, with a network load balancer that provides access to the active instance behind common VIP. Both appliances must be configured in the exact same way and must share a common database. Failover detection is provided through database heartbeats.

Table 11. Load Balancer Configuration Example

Attribute Specification

Virtual IP …

Port 8281

Protocol HTTPS

Pool Members vRealize Orchestrator node1, vRealize Orchestrator node2

Persistence None

Application Type HTTPS

LB Algorithm ROUND-ROBIN

Health check Interval: 3s

Timeout: 9s

Max Retries: 3

Type: HTTPS

Expected: 200

Method: GET

URL: /vco/api/healthstatus

Receive: RUNNING

For more information on vRealize Orchestrator integration, see Using VMware vRealize Orchestrator with VMware vCloud Availability for vCloud Director3 whitepaper.

3 https://cloudsolutions.vmware.com/assets/blt9002ef127884838b/Using_vRealize_Orchestrator_ with_vCloud_Availability_for_vCloud_Director.pdf

Architecting a VMware vCloud Availability for vCloud Director Solution

34 | VMware vCloud® Architecture Toolkit™ for Service Providers

4.15 Management Component Resiliency Considerations The following table summarizes the high availability concept for each solution component that can be used to fulfill the requirements of SLAs.

Table 12. Management Component Resiliency

Management Component HA Enabled? Additional Availability

vCloud Director Cells Yes Load balancing, Active – Active

Cloud Proxies Yes Load balancing, Active – Active

vCloud Availability Portal Yes Load balancing, Active – Active

Service Manager Portal Yes Stateless, can be redeployed

Platform Services Controller nodes Yes Distributed

vCloud Director Database Yes AlwaysOn SQL, Oracle RAC, PostgreSQL HA

vCloud NFS Transfer Share Yes Storage array provided

vSphere Replication Cloud Service Yes Active – Active

RabbitMQ Yes Application clustering, queue mirroring, load balancing

Cassandra Yes Application clustering, Active – Active

Resource vCenter Server nodes Yes Application clustering with VCHA

Resource NSX Manager instances Yes

vSphere Replication Manager instances

Yes

vSphere Replication Manager databases

Yes AlwaysOn SQL, Oracle RAC

vSphere Replication Servers Yes Active – Active (manual failover)

vCloud Availability Installer Appliance

Yes

Active Directory servers Yes Application replication

vCloud Usage Meter Yes

vRealize Orchestrator Server nodes Yes Load balancing, Active – Passive

Architecting a VMware vCloud Availability for vCloud Director Solution

35 | VMware vCloud® Architecture Toolkit™ for Service Providers

Management Component HA Enabled? Additional Availability

vRealize Orchestrator database Yes AlwaysOn SQL, Oracle RAC

VMware vRealize Operations Manager™ servers

Yes Application clustering

Architecting a VMware vCloud Availability for vCloud Director Solution

36 | VMware vCloud® Architecture Toolkit™ for Service Providers

vCloud Director Configuration vCloud Availability for vCloud Director extends vCloud Director functionality, which is used as a platform for its services.

The following points summarize vCloud Availability for vCloud Director interaction with vCloud Director:

• vCloud Availability for vCloud Director registers itself as a vCloud API extension, which is provided by vSphere Replication Cloud Service appliances.

• RabbitMQ messaging bus and vCloud API are used for communication with vCloud Director.

• vSphere Replication Cloud Service uses the VMware vCenter Single-Sign On solution user to access the vCloud API as the system administrator.

• Cloud Proxy is provided as part of vCloud Director cell binaries and its address is configured with vCloud API. For more information, see Section 4.5.2.

• vCloud Availability Portal relies on tenant vCloud API to collect and configure replications.

5.1 User Roles When vCloud Availability extension is registered, a new vSphere Replication role and rights are created in vCloud Director.

Figure 18. vCloud Director vSphere Replication Role

The Organization Administrator does not have these rights assigned by default.

Architecting a VMware vCloud Availability for vCloud Director Solution

37 | VMware vCloud® Architecture Toolkit™ for Service Providers

Cloud proxy-related rights (To-the-Cloud Tunnel and From-the-Cloud Tunnel) are available in vCloud Director, and are already assigned to the Organization Administrator.

Figure 19. Cloud Proxy-Related Rights

When tenants configure the vCloud Availability for vCloud Director replication endpoint, they can enter two different credentials. One credential is for administration and used only during configurations. The other credential is for replication monitoring and vCTA-Cloud Proxy tunnel setup, and it is permanently stored on vSphere Replication Appliance.

Architecting a VMware vCloud Availability for vCloud Director Solution

38 | VMware vCloud® Architecture Toolkit™ for Service Providers

Figure 20. Tenant Connect to a Cloud Provider Dialog Box

To support two different roles, the changes in the following table should be made to default vCloud Director roles.

Table 13. vCloud Director Role Adjustments

Role Rights

Organization Administrator (existing role)

Add:

{com.vmware.vr}.ManageRight

{com.vmware.vr}:ViewRight

Replication Monitoring

(new role)

Organizations – View Organization Networks

Organizations – View Organizations

Organization VDC – View Organization VDCs

{com.vmware.vr}:ViewRight

{com.vmware.vr}:ManageRight

5.1.1 Organization-Specific Roles and Rights in vCloud Director 8.20 or Later vCloud Director 8.20 or later can create organization-specific assignments of roles and rights.

• Service providers can still create global roles.

• Service providers can selectively grant rights to specific tenants.

• Organization administrators can create tenant-specific roles from a subset of granted rights.

The procedure to add new vCloud Availability for vCloud Director rights to a predefined Organization Administrator role is different and must be performed from the Installer appliance with vcav hcs add-rights-to-role or with vCloud API.

Architecting a VMware vCloud Availability for vCloud Director Solution

39 | VMware vCloud® Architecture Toolkit™ for Service Providers

5.2 Tenant Limits and Leases

5.2.1 Limits vCloud Director can configure the limits for resource-intensive operations that each tenant can run at the same time. Limits are configurable at the global, organization, and user levels.

Figure 21. Organization Limits

The limit cannot be set too low at the user level, otherwise, it negatively impacts configuration of replications. Because replications are all done under one Organization Administrator user that is defined in the VMware vSphere Web Client cloud endpoint configuration (Figure 20).

Note It might take up to 24 hours for the initial VM configuration task to expire. Increase these limits substantially during tenant onboarding when many VM replications are configured concurrently.

5.2.2 Leases To avoid expiration and deletion of the placeholder VM, set the vApp runtime and storage leases to Never Expires.

Figure 22. Organization Leases

Architecting a VMware vCloud Availability for vCloud Director Solution

40 | VMware vCloud® Architecture Toolkit™ for Service Providers

5.3 Other Configurations Increase vCloud API extension response timeout on each vCloud Director cell by appending to the $VCLOUD_HOME/etc/global.properties file.

extensibility.timeout = 60

Also disable the upload quarantine in vCloud Director General Settings because it leads to placeholder vApp creation timeout.

Figure 23. Upload Quarantine Setting

5.4 Organization Virtual Data Center An organization virtual data center (Org VDC) is a subgrouping of compute and storage resources allocated from a provider virtual data center and mapped to an organization. Org VDCs are a deployment environment where vApps can be instantiated, deployed, and powered on.

5.4.1 Org VDCs with Disaster Recovery Capabilities vCloud Availability for vCloud Director is enabled at the Org VDC level using an API call or using CLI from the installer appliance.

• Using an API call

POST /api/vdc/{orgvdc-uuid}/action/enableReplication

Headers: Accept: application/*+xml;version=20.0;vr-version=4.0 Content Type: application/vnd.vmware.hcs.enableReplicationParams+xml

• Using CLI from the installer appliance

vcav org-vdc enable-replication --vcd=<vcd instance> --org=<org name> --vdc=<org vdc name>

The provider can further view and limit vCloud Availability for vCloud Director features that are available to tenants at the Org VDC limit with the following API calls.

GET /api/vdc/{orgvdc-uuid}/recoveryDetails

POST /api/admin/vdc/{orgvdc-uuid}/action/editRecoveryDetails

Architecting a VMware vCloud Availability for vCloud Director Solution

41 | VMware vCloud® Architecture Toolkit™ for Service Providers

Headers: Accept: application/*+xml;version=20.0;vr-version=4.0 Content-Type: application/vnd.vmware.hcs.editRecoveryDetailsParams+xml

Body:

<EditRecoveryDetailsParams xmlns="http://www.vmware.com/vr/v6.0" xmlns:vcloud_v1.5="http://www.vmware.com/vcloud/v1.5"> <IsTestFailoverEnabled> xs:boolean </IsTestFailoverEnabled> <IsFailoverEnabled> xs:boolean </IsFailoverEnabled> <IsPlannedMigrationEnabled> xs:boolean </IsPlannedMigrationEnabled> </EditRecoveryDetailsParams>

Table 14. Org VDC Replication Features

Element Description

IsPlannedMigrationEnabled Can run planned migrations? (Boolean)

IsFailoverEnabled Can configure failover? (Boolean)

IsTestFailoverEnabled Can configure test failover (Boolean)

Limiting replication features at the Org VDC level is useful in scenarios where vCloud Availability for vCloud Director is used only for onboarding of tenants’ VMs and only planned migration is enabled.

Note Only Org VDCs with thin storage provisioning can be enabled for replication.

5.4.2 Org VDC Architecture vCloud Availability for vCloud Director services can be offered on top of existing vCloud Director environments. In such cases, provide a separate Org VDC with enabled replication next to existing regular Org VDCs. Tenants can run regular workloads or the workloads that use application-level replication (for example, Active Directory servers or SQL AlwaysOn Database cluster nodes) in the regular Org VDC. Org VDC networks can be shared among regular and replication-enabled Org VDCs.

Take note of the following considerations:

• Provider VDC hardware version must be equal or higher than that of the on-premises VM hardware. It is therefore strongly advised that the provider is running the latest vCloud Availability compatible vSphere release so the tenants are not limited by the hardware version.

• Set Organization VDC to storage thin provisioning because the replication uses thin base disk files and delta files for point-in-time snapshots

• Set Org VDC allocation model to Pay-as-you-go so that compute resources are not reserved all the time but are used and charged for only during test and real failovers.

• Providers can use higher over-subscription rates for compute because it is expected that not all tenants fail over their workloads at the same time.

Architecting a VMware vCloud Availability for vCloud Director Solution

42 | VMware vCloud® Architecture Toolkit™ for Service Providers

Figure 24. Example of Org VDC Layout

Org VDC (regular)

Tenant Organization

Org VDC (replicated)

Org VDC Network (routed)

External Network Internet

Internet / MPLS

Org VDC Network (shared)

Test Org VDC Network (isolated)

Replicated VMs

Test JumphostMS SQLActive

Directory

External Network - Direct Connect

5.4.3 Supported Org VDC Topologies It is supported to have only one peer (on-premises vSphere to Org VDC pair) per vCloud Availability vSphere Replication Manager Server – essentially resource vCenter Server as there is 1:1 relationship between resource vCenter Server and vSphere Replication Manager Server. Tenants can replicate from the same vSphere Replication appliance only to one Org VDC per provider vCenter Server.

The following figure shows supported and unsupported topologies.

Figure 25. Supported and Unsupported Org VDC Topologies

On-premisesvCenter Server

Provider Resource vCenter Server

Org VDC A Org VDC B

On-premisesvCenter Server A

Provider Resource vCenter Server

Org VDC A

On-premvCenter Server B

On-premisesvCenter Server

Provider Resource vCenter Server A

Org VDC A Org VDC B

Provider Resource vCenter Server B

XOn-premises

vCenter Server A

Provider Resource vCenter Server

Org VDC A Org VDC B

On-premvCenter Server B

Architecting a VMware vCloud Availability for vCloud Director Solution

43 | VMware vCloud® Architecture Toolkit™ for Service Providers

5.5 Network Management During vSphere Replication configuration, tenants select one recovery network and one test network for each Org VDC in the Target Networks Configuration dialog box. All replications that tenants create subsequently use the recovery network setting. If the recovery network is changed later, existing replications are not affected. However, changing the test network is applicable to all replications for the given Org VDC.

If more than one target network is required, the following options are possible:

• If the topologies are supported, use multiple Org VDCs. See Section 5.4.3.

• After a replication is configured, change the network in the placeholder vApp.

• Configure vApp networks in vCloud Director after a test failover and before the vApp power-on.

• Create a recovery plan with the tools like vRealize Orchestrator where VMs are attached to correct the network after a failover and before the power-on.

Figure 26. vSphere Replication Target Network Configuration

5.5.1 Application Network Design Moving a service physically from one location (for example, an on-premises data center) to another location (for example, the cloud) represents a networking challenge, especially if applications have hard-coded IP addresses.

vCloud Availability for vCloud Director does not natively provide any mechanism for network extension from on-premises environments to the cloud. Network connectivity can be achieved with Layer 2 or Layer 3 VPNs, direct connects, and similar technologies. In general, the following options are available to tenants and the selection of the right one depends on the use case, application requirements, the specifics and technologies that the disaster recovery involves.

5.5.1.1 VM IP Address Change

VMs fail over to the other location where their IP addresses are changed to match destination subnets. The advantage is that no network extensions are required, however, such an approach must be orchestrated and tested to verify application compatibility. For more information, see Section 9.4.

Architecting a VMware vCloud Availability for vCloud Director Solution

44 | VMware vCloud® Architecture Toolkit™ for Service Providers

5.5.1.2 DNS Change

The access to applications is provided through public DNS entries. VMs fail over to identical subnets in the target location, but access to VMs is provided through Destination Network Address Translation (DNAT) on public IPs from the service provider range. Part of the DR process is to update DNS entries to match the changed public IP addresses.

This scenario does not require any network extensions but must be orchestrated together with the DNS changes. All components of the application must fail over together.

5.5.1.3 Layer 3 Network Extension

Tenants establish Layer 3 connectivity between the on-premises data center and the Org VDC in the cloud. This connectivity can be achieved using IPSec VPN between on-premises router and Org VDC Edge Gateway or through direct routable connectivity (for example, MPLS) that is offered by the service provider as an add-on service. There are identical subnets in the cloud. To avoid route flapping, subnets are not announced externally. During the DR situation, tenants fail over the whole subnet from their own data center to the cloud, disconnect the subnet from the router, and start advertising the subnet routes from the cloud routers.

This scenario does not require configuring any IP re-addressing, just routing reconfiguration. VMs from the whole subnet must fail over together.

5.5.1.4 Layer 2 Network Extension

Tenants establish Layer 2 connectivity between on-premises data center and the Org VDC networks in the cloud. This connectivity can be accomplished with a tunnel between the L2VPN server on the Org VDC Edge Gateway and the L2VPN client on-premises. The L2VPN client is provided by licensed NSX or free standalone NSX Edge Gateway.

This scenario does not require any workload changes and enables partial failover and migrations from a given subnet. However, due to tunneling and non-optimized network flows, it might have network performance limitations. The typical use case migrates applications to the cloud when the L2 tunnel exists until all workloads from a particular subnet are migrated.

5.6 Storage Management Storage management is crucial for vCloud Availability for vCloud Director because storage is consumed by replicated base disks, change deltas, and point-in-time snapshots.

In vCloud Director, the tenant consumes cloud resources through Org VDCs that represent compute, networking, and storage resources. Storage is presented in terms of storage policies (with different performance or availability SLAs).

vCloud Availability for vCloud Director consumes replication-enabled Org VDCs, and Org VDC storage policies are presented as targets for replicated VMs. The files that represent replicated disks and multiple point-in-time (MPIT) storage snapshots of protected VMs are not real vSphere objects. Therefore, vCloud Director has no visibility as to how much space VMs consume until VMs fail over. Therefore, Org VDC storage quota management has no effect on the cold data.

Note If VMware vSAN™ is used as target storage policy, the replicated disk is stored on Default Virtual SAN Storage Policy which might be different from the VM vSAN storage policy assigned to the Org VDC. In such cases, the provider needs to orchestrate remediation at the vSphere level after a failover.

When a VM is configured for replication, vCloud Director creates a placeholder VM initially with full disk size. The vCloud Director placement engine places the placeholder VM to a particular datastore and checks it against Org VDC storage quota. VM disks are then replaced by 4 MB dummy disks, so that no storage consumption is reported to Org VDC. The VM folder location is used for replication data. vSphere recognizes the replication files as independent data (hbrdisk.RDID) that is not coupled

Architecting a VMware vCloud Availability for vCloud Director Solution

45 | VMware vCloud® Architecture Toolkit™ for Service Providers

with the VM. VMware vSphere® Storage vMotion® and vCloud API move replica disk call must be used to move the replication data. See Section 5.6.2, Moving Replica Disks.

VMware vSphere Storage DRS™ is supported only for placement purposes and not for the data movement purposes for load balancing or maintenance.

5.6.1 MPIT Storage Snapshots A tenant can create up to 24 multiple point-in-time storage snapshots. Currently, the service provider cannot control this number. In theory, if all disk block data has changed, a storage snapshot can grow to the full size of the base disk. As a general practice, reserve two to three times of the disk size for each virtual machine.

Upon failover or migration, the MPIT storage snapshots are not automatically consolidated because the process is time consuming and impacts VM Recovery Time Objective (RTO). Therefore, service providers must consolidate these VMs because MPIT files consume additional storage space and delta disks negatively affect VM storage performance. Consolidation can be scheduled automatically against the resource vCenter Server with the following VMware PowerCLI™ script.

Get-VM | Where-Object {$_.Extensiondata.Runtime.ConsolidationNeeded} | ForEach-Object { $_.ExtensionData.ConsolidateVMDisks() }

The installer appliance provides the following command that can trigger consolidation at the Organization, Org VDC, vApp, and VM levels.

vcav vcd consolidate

Note This consolidation does not affect VMs or disks that are in the replication state.

Both the provider and tenant can track disk usage of each replication and MPIT with vCloud API.

• Lists all replications at the Org VDC level. GET /api/org/<org-id>/replications

• Lists space requirements for replicated VM per disk. GET /api/admin/vr/replications/<replication-id>/details

• Lists total space requirements and the size and duration of each transfer session. GET /api/vr/replications/<replication-id>/instances

<ns2:ReplicaSpaceRequirements>6486491136</ns2:ReplicaSpaceRequirements> <ns2:Instance> <ns2:Id>GINST-Hbr.Replica.GroupInstance.000000000000014c/CGID-5345cefe-3ad9-457a-8452-93759e8e54d6</ns2:Id> <ns2:TransferStartTime>2016-10-12T12:36:48.000Z</ns2:TransferStartTime> <ns2:TransferSeconds>46</ns2:TransferSeconds> <ns2:TransferBytes>1151172608</ns2:TransferBytes> </ns2:Instance> <ns2:Instance> <ns2:Id>GINST-Hbr.Replica.GroupInstance.000000000000013a/CGID-5345cefe-3ad9-457a-8452-93759e8e54d6</ns2:Id> <ns2:TransferStartTime>2016-10-12T10:22:22.000Z</ns2:TransferStartTime> <ns2:TransferSeconds>1</ns2:TransferSeconds> <ns2:TransferBytes>598016</ns2:TransferBytes> </ns2:Instance>

Architecting a VMware vCloud Availability for vCloud Director Solution

46 | VMware vCloud® Architecture Toolkit™ for Service Providers

Figure 27. Tenant View of Storage Consumption in vSphere Web Client

5.6.2 Moving Replica Disks In case service providers need to migrate or free space on a datastore that is used for replication, the replica disks can be moved with from the Service Manager Portal or with the following API call.

POST /api/admin/vr/replications/{id}/action/moveReplication

Accept: application/*+xml;version=6.0;vr-version=2.0

Content-Type: application/vnd.vmware.hcs.adminMoveReplicationGroupParams+xml

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<MoveReplicationGroupParams xmlns="http://www.vmware.com/vr/v6.0" xmlns:vmext="http://www.vmware.com/vcloud/extension/v1.5">

<NewDatastore> <vmext:VimServerRef href="https://10.10.10.10/api/admin/extension/vimServer/c1abda33-81a8-4cf1-aa89-5af79658f84a" type="application/vnd.vmware.admin.vmwvirtualcenter+xml"/> <vmext:MoRef>datastore-26</vmext:MoRef>

Architecting a VMware vCloud Availability for vCloud Director Solution

47 | VMware vCloud® Architecture Toolkit™ for Service Providers

<vmext:VimObjectType>DATASTORE</vmext:VimObjectType> </NewDatastore>

</MoveReplicationGroupParams>

You cannot relocate VMs in the test or recovery process. Placeholder VM storage is moved by vSphere Storage vMotion after VM’s replica disks are moved.

5.6.3 Storage Management Recommendations • Dedicate datastores or storage policies to Org VDCs that are used for replication.

• Use conservative datastore thresholds in vCloud Director. Keep the red threshold to 33%-50% of the total datastore size.

• Charge tenants for cold and hot disk data.

5.7 vApps and VMs A vApp is created in the target Org VDC for each replicated VM. Inside the vApp is a placeholder VM with the same compute (vCPU, RAM) size and operating system as the protected VM. The placeholder VM has certain operations disabled and cannot be moved or powered on by vCloud Director. These operations can be performed only through vCloud Availability for vCloud Director APIs.

A vApp is created with the recovery network. In case of test failover, a test network is added and the recovered VM is connected to it.

The vApp can be renamed, and VMs and vApp networks can be added. However, there must be only one recovery VM in a vApp.

The recovery VM has guest customization disabled. The network IP Allocation Mode is set to DHCP that guarantees the recovery VM having the same IP and MAC address as the protected VM.

It is possible to change IP address of the recovered VM with guest customization but the change must be orchestrated during the failover. For more information, see Section 9.4, Failover Orchestration.

A running VM configured for replication starts replicating immediately, however, the powered-off VM cannot be replicated until the replication is explicitly enabled with the Sync Now button. Immediate start of replication locks VM disks. Such a VM cannot powered on until the initial sync finishes and that might potentially take a long time.

Figure 28. Sync Now of powered-off VM

While replication of multiple VMs can be configured together at the same time, vSphere Replication does not provide cross-VM consistency group mechanism that ensures quiescing those VMs at the same time to provide consistency among themselves. If applications have such consistency requirements, suggested approaches include lowering RPO to five minutes and using the application level replication for database augmented with vCloud Availability for the rest of the virtual machines.

Architecting a VMware vCloud Availability for vCloud Director Solution

48 | VMware vCloud® Architecture Toolkit™ for Service Providers

Network Bandwidth Requirements The compliance of replications RPO depends on the ability to transfer workload changed data from the protected site to the recovery site in a given (RTO) time. The data transfer is constrained by the network bandwidth available between the sites. Considerations must be given to have wide enough pipe to the Internet because data centers of the service provider aggregates replications from on-premises data centers of multiple tenants.

The replication bandwidth requirements mostly depend on the following factors:

• Number of protected VMs

• Average disk size

• Average utilization

• Average rate of change

• RPO

• Enabled compression

Service providers and tenants can estimate the network bandwidth usage and model the impact based on changed parameters by using the VMware vSphere Replication Calculator4. For more information, see https://kb.vmware.com/s/article/2037268.

Figure 29. vSphere Replication Calculator

4 https://vmware.com/vrcalculator

Architecting a VMware vCloud Availability for vCloud Director Solution

49 | VMware vCloud® Architecture Toolkit™ for Service Providers

Service providers can limit the maximum number of VMs in Org VDC and the maximum number of protected VMs per tenant, but cannot directly limit other parameters like RPO, snapshots, and compression. In case network bandwidth is limited, provide billing mechanism that charges per network usage so that customers use bandwidth effectively. For more information, see Section 8.

Architecting a VMware vCloud Availability for vCloud Director Solution

50 | VMware vCloud® Architecture Toolkit™ for Service Providers

Tenant Feature and Component Compatibility To use the vCloud Availability service, tenants deploy in the data center standard vSphere Replication Appliance compatible with the on-premises vCenter Server. The following tables shows the vSphere editions compatible with vCloud Availability 2.0 and the features that are available with each version.

Table 15. Tenant Feature and Component Compatibility

Feature vCenter 6.7 vCenter 6.5 vCenter 6.0 vCenter 5.5/5.1

ESXi supported

6.7***, 6.5***, 6.0 6.5***, 6.0 5.5 6.0, 5.5, 5.1, 5.0 5.5****, 5.1, 5.0

vSphere Replication Support

8.1 6.5.1 6.1.2 5.8.1**

Failover Yes Yes Yes Yes

Failover test Yes Yes Yes Yes

Planned migration

Yes Yes Yes Yes

Failback Yes Yes Yes No

MPIT snapshots

Yes Yes Yes No

5 min RPO Yes

Yes* No No

*) Requires vSphere Replication version 6.5.1.2 or later.

**) For TLS 1.1/1.2 compatibility see https://kb.vmware.com/s/article/52590

***) For ESXi 6.5U1, 6.5U2 and ESXi 6.7 compatibility see https://kb.vmware.com/s/article/2110304

****) Requires vCenter Server 5.5.

vSphere Replication does not require a separate license. It is used for free on vSphere Essentials Plus and later.

Architecting a VMware vCloud Availability for vCloud Director Solution

51 | VMware vCloud® Architecture Toolkit™ for Service Providers

Billing Service providers can use various mechanisms to charge tenants for vCloud Availability for vCloud Director usage.

Pay per protected VM

• The simplest model aligns with VMware licensing. Protected VMs can be collected with the vCloud API and usage metering script. See Section 4.13, vCloud Availability Metering.

Pay per used storage

• Cold storage – used for replicated data (base disks, MPIT snapshots). Cold storage can be metered with vCloud API. See Section 5.6.1, MPIT Storage Snapshots.

• Hot storage – used by (recovered) running VMs. Hot storage is reported as Org VDC allocated storage and can be metered through standard vCloud Director mechanisms.

Pay per used compute

• A replication enabled Org VDC should be set to the Pay-As-You-Go allocation model. After a failover, allocated compute of running VMs is reported through standard vCloud Director metering mechanisms.

Pay per replicated network traffic

• vCloud API can be used to collect information about how much data was transferred for each replication transfer. Note: This number does not include compression overhead and encryption efficiencies. For more information about vCloud API, see Section 5.6.1, MPIT Storage Snapshots.

Architecting a VMware vCloud Availability for vCloud Director Solution

52 | VMware vCloud® Architecture Toolkit™ for Service Providers

vRealize Orchestrator Configuration The vSphere Replication UI is provided within the vSphere Web Client of the on-premises vCenter Server. It is the primary interface where you can configure the replications in to-the-cloud and from-the cloud directions.

There is, however, potential need for an additional interface that can be provided by vRealize Orchestrator for two main use cases:

• Provider-managed DR service

• Tenant-initiated automated failover

vSphere Replication includes the vRealize Orchestrator plug-in with out-of-the-box basic workflows for replication configuration and recovery, migration, and test actions at the VM level. These workflows can be combined into elaborate recovery plans, thanks to vRealize Orchestrator automation and orchestration capabilities.

The placement of vRealize Orchestrator affects available workflows.

9.1 On-Premises Deployment Tenants deploy vRealize Orchestrator within their own data centers. The vSphere Replication plug-in communicates with the protected vCenter Server and the cloud provider endpoint. All plug-in workflows are available.

Deployment summary:

• Tenant self-service use case

• Simple networking setup

• An alternative interface to vSphere Web Client UI

• Configuration of replications to and from the cloud

• Management of migrations and test failovers

• In case of complete loss of data centers, tenants lose the vSphere Replication user interface and vRealize Orchestrator

9.2 In-the-Cloud Deployment Tenants deploy vRealize Orchestrator in the provider cloud (Org VDC). The deployment can be simplified by the provider who can offer a pre-installed vRealize Orchestrator template with a vSphere Replication plug-in included in the public catalog. Tenants must decide if they connect the cloud-based vRealize Orchestrator only to cloud endpoint or to their own vCenter Server as well. Connection to the tenant’s on-premises vCenter Server requires either a direct connect network or an IPsec VPN.

If vRealize Orchestrator does not have connectivity to the on-premises vCenter Server, it can be used only for failover purposes during a DR situation. The following workflows are available:

• Run Real Recovery to Cloud

• Run Test Recovery at the Cloud Site

• Run Test Clean-up at the Cloud Site Deployment summary:

• Tenant self-service use case

• Complex networking setup if vCenter Server access is required for full configuration and management functionality

• Available during complete loss of a tenant data center

Architecting a VMware vCloud Availability for vCloud Director Solution

53 | VMware vCloud® Architecture Toolkit™ for Service Providers

9.3 Provider Deployment Providers deploy vRealize Orchestrator into their own management infrastructure in a highly available and scalable configuration. vRealize Orchestrator is used for tenant on-boarding activities with vCloud Director plug-in as well as for managed DR services for multiple tenants.

The provider vRealize Orchestrator cannot access tenant’s on-premises vCenter Server nodes. Therefore, it can be used only for real and test recovery to the cloud.

Deployment summary:

• Provider-managed use cases

• Scalable multi-tenant vRealize Orchestrator usage

• No connectivity to tenant on-premises vCenter Servers (uses Register Standalone Org configuration workflow)

• Only for recovery to the cloud

In this release, the vSphere Replication plug-in supports only tenant-level credentials. Therefore, providers must have an Organization Administrator account for each managed organization and configure separate endpoints within vRealize Orchestrator.

9.4 Failover Orchestration Combining the vCloud Availability for vCloud Director vRealize Orchestrator plug-in with vCloud Director (and optionally other) plug-ins enables orchestration of complex workflows that require additional tasks during a VM failover.

A simple example is the requirement to change the IP address of the recovered VM. The change can be accomplished with the following workflow.

1. Run real and test failovers to the cloud without powering on a VM.

2. Enable guest customization of the recovered VM.

3. Change recovered VM IP address and IP address allocation mode to manual.

4. Power on the VM.

In this release, the vCloud Director vRealize Orchestrator plug-in does not provide a workflow that can complete steps 2 and 3. Therefore, a new workflow, Update VM IP Address, is required.

Figure 30. Run Real Recovery to Cloud with IP Address Change

The Update VM IP Address Workflow consists of two scripted tasks and the Wait for a Task workflow following each scripted task.

Figure 31. Update VM Address Workflow

• The Enable Guest Customization script

Architecting a VMware vCloud Availability for vCloud Director Solution

54 | VMware vCloud® Architecture Toolkit™ for Service Providers

Input: (vCloud:VM) vm Output: (vCloud:Task) task

vm.updateInternalState(); var guestCustomizationSection = vm.getGuestCustomizationSection(); if (guestCustomizationSection.Enabled != "true") { guestCustomizationSection.Enabled = "true"; System.log("Enabling guest customization"); } task = vm.updateSection(guestCustomizationSection);

• The Update IP Address script.

Input: (vCloud:VM) vm, (Array/string) ipAddress Output: (vCloud:Task) task vm.updateInternalState(); var networkConnectionSection = vm.getNetworkConnectionSection(); var networkConnections = networkConnectionSection.networkConnection.enumerate(); var poolAllocationModeNetworkConnectionIndexProp = new Properties(); var i = 0; for each (var networkConnection in networkConnections) {

networkConnection.IpAddress = ipAddress[i]; if (networkConnection.ipAddressAllocationMode != "MANUAL") { networkConnection.ipAddressAllocationMode = "MANUAL"; poolAllocationModeNetworkConnectionIndexProp.put(networkConnection.networkConnectionIndex,""); } i++; } task = vm.updateSection(networkConnectionSection);

Architecting a VMware vCloud Availability for vCloud Director Solution

55 | VMware vCloud® Architecture Toolkit™ for Service Providers

Monitoring

10.1 Component Monitoring vCloud Availability for vCloud Director components can be monitored with an API (for example, vCloud API or SMP API), an agent-based monitoring solution (VMware vRealize® Hyperic® or Endpoint Operations Management for VMware vRealize® Operations™) and a syslog-based solution (VMware vRealize® Log Insight™).

Table 16. Syslog Monitoring

Component Syslog configuration Log files

vSphere Replication Cloud Service Append:

.*. @<syslog IP> to /etc/rsyslog.d

/opt/vmware/hms/logs/hcs.log

vSphere Replication Server /var/log/vmware/hbrsrv.log

vSphere Replication Manager Server

/opt/vmware/hms/logs/hms.log

Cloud Proxy http://kb.vmware.com/kb/2004564

Tenant Portal /opt/vmware/conf/vcav-ui/log4j2.xmls

/opt/vmware/logs/vcav-ui/access.log /opt/vmware/logs/vcav-ui/dr2c.log /opt/vmware/logs/vcav-ui/error.log

Service Manager Portal /opt/vmware/conf/vcav-smp/log4j2.xmls

/opt/vmware/logs/vcav-smp/dr-service-manager.log /opt/vmware/logs/vcav-smp/service.log

Platform Services Controller /var/log/vmware/sso/*

vRealize Orchestrator /var/log/vco/app-server/server.log

Table 17. Monitoring using vCloud API

Component vCloud API call Response

vSphere Replication Cloud Service

GET /api/admin/vr/nodes

<ns2:Node id="{vrcs-node-id}"> <ns2:MaxSupportedApiVersion>4.0</ns2 MaxSupportedApiVersion> <ns2:LastHeartbeatTimestamp>{epochtime}</ns2:LastHeartbeatTimestamp> </ns2:Node>

vSphere Replication Server GET /api/admin/extension/vr/ vrs/{vrs_id}

<ns2:IsConnected>true</ns2:IsConnected>

vSphere Replication Manager Server

GET /api/admin/extension/ vimServer/{vim-server-id}/vrmServer

<ns2:IsConnected>true</ns2:IsConnected>

Architecting a VMware vCloud Availability for vCloud Director Solution

56 | VMware vCloud® Architecture Toolkit™ for Service Providers

Component vCloud API call Response

Cloud Proxy GET /api/query?type=cell

or alternatively

GET <proxy-ip>/ api/server_status

<CellRecord … isActive="1" … name="{cell-hostname}" …/>

Service is up.

Portal GET <portal-ip>:8443/ api/vcd/server_status

Service is up.

Table 18. Monitoring using Service Manager Portal API

Component SMP API call Response

Retrieve all monitored objects

GET /vRObjectList

{

"vRMSList": [

{

"name": "vcsa-mgmt",

id": "355072cf-0626-4c51-bf04-c7e075cfbcce"

}

],

"vRSList": [

{

"id": "52030010-0098-19be-23ab-fca03479627e"

},

{

"id": "52d8e82e-1345-24a5-d8c9-a2839b9dab79"

}

],

"vRCSList": [

{

"id": "0fcea3e7-3cd9-4e1f-a42c-9c838c82cbb7"

},

{

"id": "27aca21f-b25b-4e4e-8002-0bf6bf7bfa90"

}

]

}

vSphere Replication Cloud Service

GET /vRCSStatus?id=<vrcs_id>

<VRCSStatus>

<lastHeartbeatTimestamp>1518529429408</lastHeartbeatTimestamp>

</VRCSStatus>

Architecting a VMware vCloud Availability for vCloud Director Solution

57 | VMware vCloud® Architecture Toolkit™ for Service Providers

Component SMP API call Response

vSphere Replication Server

GET /vRSStatus?id=<vrs_id>

<VRSStatus>

<uuid>52d8e82e-1345-24a5-d8c9-a2839b9dab79</uuid>

<vcUuid>cae0acb5-d21d-47f4-be68-b1f14de844f3</vcUuid>

<isConnected>true</isConnected>

<replicationTrafficAddress>10.172.158.35</replicationTrafficAddress>

<vrServerStatistics>

<replicationsCount>0</replicationsCount>

<disksCount>0</disksCount>

<startTime>1509408830398</startTime>

<lwdStatistics>

<inbound>5598235</inbound>

<outbound>828105</outbound>

<errors>0</errors>

<startTime>1509408830398</startTime>

</lwdStatistics>

<nfcStatistics>

<inbound>16396</inbound>

<outbound>22469123</outbound>

<errors>0</errors>

<startTime>1509408830398</startTime>

</nfcStatistics>

</vrServerStatistics>

</VRSStatus>

vSphere Replication Manager Server

GET /vRMSStatus?id=<vrms_id>

<VRMSStatus>

<uuid>355072cf-0626-4c51-bf04-c7e075cfbcce</uuid>

<vcUuid>cae0acb5-d21d-47f4-be68-b1f14de844f3</vcUuid>

<isConnected>true</isConnected>

</VRMSStatus>

10.2 VM Replication Monitoring Tenants and the provider (in managed service use case) monitor the VM replication state through the vCloud Availability for vCloud Director API. A list of replications can be obtained on the Organization object or the Peer (vSphere endpoint) object associated with a particular Org VDC. For more information, see Appendix E – Tenant API Structure. Detailed information about replication is provided through the calls:

• GET /api/vr/replications

• /api/vr/failbackreplications

Architecting a VMware vCloud Availability for vCloud Director Solution

58 | VMware vCloud® Architecture Toolkit™ for Service Providers

Table 19. Replication Details

Element Description

Name Replication name, same as protected VM name

RPO Recovery Point Objective (in minutes)

QuiesceGuestEnabled Guest OS quiescing enabled? (Boolean)

NetworkCompressionEnabled Network compression enabled? (Boolean)

PlaceholderVappId vCloud vApp URN

EventPartitionId vSphere Replication Manager Server UUID who manages replication

ReplicationState OK or replication error

VrServerInfo vSphere Replication Server that stages replicated data

NextInstanceSequenceNumber Replication data sequence number

Paused Is replication paused (Boolean)

CurrentRpoViolation The time in minutes that has elapsed since the last RPO violation. It has a value of 0 if there is no RPO violation, and a positive value otherwise.

TestRecoveryState Unknown, none, test in progress, complete, cleanup in progress, test error, cleanup error

RecoveryState Unknown, not started, in progress, error, OK

ReplicationGroupInstance MPIT snapshots (date, duration of transfer, amount of bytes)

LastTestFailoverOperationId Test failover operation ID

Vm vCloud VM URN and replication tag (original replication ID)

10.3 Backup Strategy The backup and recovery are performed with the combination of VMware vSphere Storage APIs – Data Protection compatible backup solution at the VM level with database-level backups. The following table summarizes the backup strategy for each solution component.

Table 20. Backup Strategy

Component Backup strategy Notes

vSphere Replication Cloud Service VM level stateless

vSphere Replication Server VM level

Architecting a VMware vCloud Availability for vCloud Director Solution

59 | VMware vCloud® Architecture Toolkit™ for Service Providers

Component Backup strategy Notes

vSphere Replication Manager Server VM level + external database backups

Cloud Proxy VM level stateless

vCloud Availability for vCloud Director Portal

VM Level stateless

Platform Services Controller VM level http://pubs.vmware.com/vsphere-60/index.jsp?topic=%2Fcom.vmware.vsphere.install.doc%2FGUID-7FB2DA06-3737-41F0-AE53-12FBD0D1E6B7.html

vRealize Orchestrator appliance VM level + external database backups

RabbitMQ VM level Messages are in durable HA queues

Cassandra VM level https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsBackupRestore.html

Architecting a VMware vCloud Availability for vCloud Director Solution

60 | VMware vCloud® Architecture Toolkit™ for Service Providers

Updates and Upgrades The vCloud Availability solution consists of multiple components that have specific upgrade procedures and are potentially upgraded in different life cycles.

11.1 Cloud Proxies Cloud Proxies are essentially vCloud Director cells and therefore their upgrade life cycle depends on vCloud Director. It is recommended to use the multi-cell orchestrated approach5 because all the cells (including cloud proxies) are upgraded together with one command. If there are multiple load-balanced cloud proxies, maintenance (like base OS update) can be done at any time. Because the connections are established again to the remaining cloud proxies with minimum impact on ongoing replications.

During the period when vCloud Director is down and does not accept vCloud API commands, vCloud Availability is not accessible.

11.2 Portal Appliances The tenant and service manager portals are stateless components that can be redeployed at any time for update and upgrade purposes. The SSL certificate is supplied externally from the installer appliance.

If the tenant portal appliances are load balanced, maintenance can be performed at any time without any user impact.

11.3 vSphere Replication Components vSphere Replication components are:

• vSphere Replication Cloud Service nodes

• vSphere Replication Manager Servers

• vSphere Replication Servers

All these appliances are updated or upgraded through its Virtual Appliance Management Interface (VAMI). The application and OS upgrade binaries are provided on ISO media that is attached to each appliance.

When multiple vSphere Replication Cloud Service nodes are used, one node can be taken down at any time.

Maintenance on vSphere Replication Manager Server impacts configuration of new and existing replications in the associated vCenter Server but does not impact RPO compliance of ongoing replications.

Maintenance of a specific vSphere Replication Server impacts RPO compliance of incoming replications that the server processes. The replications can be potentially migrated before the maintenance. For more information, see Section 4.11.

5 https://docs.vmware.com/en/vCloud-Director/9.0/com.vmware.vcloud.install.doc/GUID-FC13BB98-B7CD-4932-BC8A-697351619D92.html

Architecting a VMware vCloud Availability for vCloud Director Solution

61 | VMware vCloud® Architecture Toolkit™ for Service Providers

11.4 vCloud Availability Installer Appliance The upgrade of the vCloud Availability Installer appliance is always performed by deploying a new one. Before deleting the old appliance, back up the registry file and other files that the registry refers to (for example, certificates and private keys).

If the installer appliance is down there is no impact on vCloud Availability functionality with the exception that the administrator cannot perform day 2 actions (certificate replacements, moving replications, enabling Org VDCs for replications).

11.5 Third-Party Components Cassandra and RabbitMQ are essential for the correct functionality of vCloud Availability. therefore both must be clustered with at least 3 nodes for Cassandra or 2 nodes for RabbitMQ. Clustering allows for one node maintenance any time without impact on the correct functionality. For information about performing upgrades with minimum impact on functionality, see respective third-party documentation.

11.6 vSphere ESXi host maintenance mode is fully supported by vCloud Availability. If there are other functional ESXi hosts in the cluster, upgrades can be performed without any impact on existing replications.

Maintenance on a datastore can be performed after all replications are migrated to a different datastore of the same storage policy. For more information, see Section 5.6.2. vCloud Availability does not support the datastore maintenance mode that Datastore Cluster provides with SDRS.

vCenter Server is essential for the ability to perform configurations of new or existing replications, as well as to perform failovers, failbacks, clean ups and migrations. Data traffic of existing replications and their RPOs are not affected by vCenter Server maintenance.

Architecting a VMware vCloud Availability for vCloud Director Solution

62 | VMware vCloud® Architecture Toolkit™ for Service Providers

Appendix A – Port Requirements and Firewall Rules The following table shows the required TCP and UDP ports for vCloud Availability for vCloud Director.

Table 21. vCloud Availability Port Requirements (Service Provider)

Description Ports Protocol

Cloud Proxy to MS SQL database 1433 TCP

Cloud Proxy to Oracle database 1521 TCP

Cloud Proxy to PostgreSQL database 5432 TCP

Cloud Proxy to DNS Server 53 TCP, UDP

Cloud Proxy to RabbitMQ 5671, 5672 TCP, UDP

Cloud Proxy to/from NFS share 111,920 TCP, UDP

Cloud Proxy to Syslog Server 514 UDP

Cloud Proxy to/from vCloud Director cell / Cloud Proxies (Active MQ)

61611, 61616

TCP

Remote Manager to Cloud Proxy (JMX) 8999 TCP

Cloud Proxy to vSphere Replication Server 31031

9998 (stunnel)

TCP

External to Cloud Proxy 443 TCP

ESXi host to Cloud Proxy 31031 TCP

Administrator to vSphere Replication Manager 5480 TCP

Administrator to vSphere Replication Cloud Service 5480 TCP

Administrator to vSphere Replication Server 5480 TCP

Platform Services Controller to Active Directory 389 TCP

vCloud Director cells to RabbitMQ 5671, 5672 TCP

vCloud Director cells to Platform Services Controller 443, 7444 TCP

vSphere Replication Manager Server to Platform Services Controller 443, 7444 TCP

vSphere Replication Cloud Service to Platform Services Controller 443, 7444 TCP

Architecting a VMware vCloud Availability for vCloud Director Solution

63 | VMware vCloud® Architecture Toolkit™ for Service Providers

Description Ports Protocol

vSphere Replication Cloud Service to vCenter Server 80, 443 TCP

vSphere Replication Cloud Service to vCloud Director API 443 TCP

vSphere Replication Cloud Service to RabbitMQ 5671 TCP

vSphere Replication Cloud Service to vSphere Replication Manager Server 8043 TCP

vSphere Replication Cloud Service to Cassandra 9042, 9160 TCP

vSphere Replication Manager to vCenter Server 80, 443 TCP

vSphere Replication Manager to vSphere Replication Server 8123 TCP

vSphere Replication Server to ESXi hosts 80, 902 TCP, UDP

vCloud Availability for vCloud Director Portal to vCloud Director 443 TCP

External to vCloud Availability for vCloud Director Portal 8443 TCP

Internal to vCloud Availability Service Manager Portal 8443 TCP

vCloud Availability Installer Appliance to vCloud Director API 443 TCP

vCloud Availability Installer Appliance to Platform Services Controller 443, 7444 TCP

vCloud Availability Installer Appliance to vCenter Servers 80, 443 TCP

vCloud Availability Installer Appliance to ESXi hosts 443 TCP

vCloud Availability Installer Appliance to RabbitMQ 5671 TCP

vCloud Availability Installer Appliance to Cassandra 22, 9042 TCP

vCloud Availability Installer Appliance to vSphere Replication Manager Server

22, 5480, 8043 TCP

vCloud Availability Installer Appliance to vSphere Replication Cloud Service

22, 5480, 8043 TCP

vCloud Availability Installer Appliance to vSphere Replication Server

22, 5480, 8043 TCP

vCloud Availability Installer Appliance to vCloud Availability Portal 22, 8443 TCP

vCloud Availability Installer Appliance to Service Manager Portal 22, 8443 TCP

Architecting a VMware vCloud Availability for vCloud Director Solution

64 | VMware vCloud® Architecture Toolkit™ for Service Providers

Description Ports Protocol

vCloud Availability Portal to vCloud Director API 443 TCP

Service Manager Portal to vCloud Director API 443 TCP

Service Manager Portal to RabbitMQ 5671 TCP

Service Manager Portal to vCloud Availability Portal 8443 TCP

The follow table shows the requirements on TCP and UDP ports for tenants.

Table 22. vCloud Availability Port Requirements (Tenant)

Description Ports Protocol

vSphere Replication Appliance to vCenter Server 80 TCP

vSphere Replication Appliance or Server to ESXi hosts 80, 902 TCP, UDP

Administrator to vSphere Replication Appliance or Server 5480 TCP

vSphere Replication Server to vSphere Replication Appliance 8043 TCP

vSphere Replication Appliance to vSphere Replication Server

8123, 31031 44046 (VR 5.x)

TCP

vSphere Replication Appliance to vCloud Availability public endpoints 443 TCP

vCenter Server to vCloud Availability public endpoints 443 TCP

ESXi hosts to vSphere Replication Appliance 10000-10010 TCP

Architecting a VMware vCloud Availability for vCloud Director Solution

65 | VMware vCloud® Architecture Toolkit™ for Service Providers

Appendix B – Glossary The following table defines the terms in this document.

Table 23. Glossary

Term Definition

Cassandra An open source distributed highly scalable and available database.

CBT Changed Block Tracking mechanism vSphere Storage API – Data Protection use. Not used by vSphere Replication.

Cloud proxy Multitenant tunneling component running in the service provider environment.

FastLZ Compression library used by vSphere Replication. Provides good CPU: compression ratio.

FQDN Fully Qualified Domain Name (for example, cloudproxy.example.com)

HBR Host-based replication (another name for vSphere Replication)

HCS HBR Cloud Service (another name for vSphere Replication Cloud Service)

HMS HBR Management Service (another name for vSphere Replication Manager Server)

hbrdisk.RDID Redo log file in the target datastore location

MPIT Multiple point-in-time (snapshots)

NFC Network File Copy. Protocol used to move replicated data from a vSphere Replication Server to a vSphere datastore.

RabbitMQ An open source Message Queue Service used to route messages between vCloud Director and vCloud Availability for vCloud Director components.

Replication group Group of disks that belong to the same VM and are replicated together.

RPO Recovery Point Objective. Policy that defines the maximum tolerable amount of data loss measured as a period of time.

RTO Recovery Time Objective. Policy that defines the maximum tolerable downtime of an application.

Architecting a VMware vCloud Availability for vCloud Director Solution

66 | VMware vCloud® Architecture Toolkit™ for Service Providers

Term Definition

vCloud Tunneling Agent

Software component running as part of vSphere Replication Manager Server on premises. Terminates to and from the cloud tunnels with Cloud Proxy.

vSphere Replication Hypervisor-based replication technology used by vCloud Availability for vCloud Director.

vSphere Replication Agent vSphere component that manages replication at ESXi host level.

vSphere Replication Cloud Service Provides vSphere Replication APIs through vCloud extension and multitenancy.

vSphere Replication Manager Server Appliance that manages the environment and receives replicated data.

vRealize Orchestrator Extensible workflow solution used for orchestrating complex DR procedures.

vSphere Replication Server Appliance that receives replicated data.

vSCSI Filter vSphere component that capture writes for replication.

VSS Microsoft Volume Shadow Copy Service. Feature of Microsoft products enabling app-level and file-level consistency during backup or replication.

Architecting a VMware vCloud Availability for vCloud Director Solution

67 | VMware vCloud® Architecture Toolkit™ for Service Providers

Appendix C – Maximums The following table shows the constraint limits.

Table 24. vCloud Availability Maximums

Constraint Limit Explanation

Resource vCenter Server Instances 15

vSphere Replication Cloud Service 3

vSphere Replication Manager Servers 15 One per each resource vCenter Server

vSphere Replication Servers per vSphere Replication Manager Server

10

Total vSphere Replication Server Nodes 150

Replications per vCenter Server 2500

Replication per vSphere Replication Server 250

Multiple Point-in-Time Snapshots 24

Recovery Point Objective 5 minutes – 24 hours

A minimum of 15 minutes for VC 6.0 or older

Cloud Proxy Nodes 5

Connections per Cloud Proxy 2000

vCloud Director Tenants 500

vCenter Server Instances per Platform Services Controller Node

15

Platform Services Controller Nodes in vSphere Domain

10

vCenter Server instances in vSphere Domain 15

Architecting a VMware vCloud Availability for vCloud Director Solution

68 | VMware vCloud® Architecture Toolkit™ for Service Providers

Appendix D – Reference Documents The following table provides additional information resources

Table 25. vCloud Availability for vCloud Director Reference Documents

Item URL

Product Documentation https://docs.vmware.com/en/vCloud-Availability-for-vCloud-Director/index.html

vCloud Director Documentation https://docs.vmware.com/en/vCloud-Director/index.html

VMware vCloud® Architecture Toolkit™ for Service Providers

http://www.vmware.com/solutions/cloud-computing/vcat-sp.html

vSphere Replication Documentation https://docs.vmware.com/en/vSphere-Replication/index.html

vSphere Replication Technical Overview https://www.vmware.com/files/pdf/vsphere/vmw-vsphere-replication-6-1.pdf

Architecting a VMware vCloud Availability for vCloud Director Solution

69 | VMware vCloud® Architecture Toolkit™ for Service Providers

Appendix E – Tenant API Structure The following figures shows the structure of To-the-Cloud and From-the-Cloud Tenant APIs.

Figure 32. To-the-Cloud Tenant APIs

Organization

Org VDC

Peers

Replications

/api/org/{org-id}/enabledForReplicationVdcs

/api/vdc/{vdc-id}/peers

/api/vr/peers/{peer-id}/replications

/api/org/{org-id}/replications

Replication

/api/vr/replications/{replication-id}

RecoveryDetails

/api/vdc/{vdc-id}/recoveryDetails

Failover/action/failover

Test Failover/action/testFailover

Test Cleanup/action/testCleanup

sync /action/sync

Power Off /action/powerOffVm

Reverse /acton/reverse

Figure 33. From-the-Cloud Tenant APIs

Organization

Org VDC

Peers

Replications

/api/org/{org-id}/enabledForReplicationVdcs

/api/vdc/{vdc-id}/peers

/api/vr/peers/{peer-id}/failbackreplications

/api/org/{org-id}/failbackreplications

Replication

/api/vr/failbackreplications/{replication-id}

RecoveryDetails

/api/vdc/{vdc-id}/recoveryDetails

Failover/action/failover

Test Failover/action/testFailover

Test Cleanup/action/testCleanup

sync /action/sync

Power Off /action/powerOffVm

Power On /acton/powerOnVm

Reverse /acton/reverse

Architecting a VMware vCloud Availability for vCloud Director Solution

70 | VMware vCloud® Architecture Toolkit™ for Service Providers

Appendix F – Undocumented Hybrid Settings vCloud API HybridSettingsType Element: HybridSettings

Type: HybridSettingsType

Namespace: http://www.vmware.com/vcloud/v1.5

Description: Public URL and certificate of the Cloud Proxy endpoint

Since: 5.6

Internal: Yes

Schema: hybrid.xsd

Media type: application/vnd.vmware.vcloud.hybridSettings+xml

Extends: ResourceType

XML Representation: <HybridSettings xmlns="http://www.vmware.com/vcloud/v1.5" href="xs:anyURI" type="xs:string">

<Link href="xs:anyURI" id="xs:string" type="xs:string" name="xs:string"

rel="xs:string"/>

<CloudProxyBaseUri> xs:string </CloudProxyBaseUri>

<CloudProxyBaseUriPublicCertChain> xs:string </CloudProxyBaseUriPublicCertChain>

<CloudProxyBaseUriOverride> xs:string </CloudProxyBaseUriOverride>

<CloudProxyBaseUriPublicCertChainOverride> xs:string </CloudProxyBaseUriPublicCertChainOverride>

<CloudProxyFromCloudTunnelHost> xs:string </CloudProxyFromCloudTunnelHost>

<CloudProxyFromCloudTunnelHostOverride> xs:string </CloudProxyFromCloudTunnelHostOverride>

</HybridSettings>

Architecting a VMware vCloud Availability for vCloud Director Solution

71 | VMware vCloud® Architecture Toolkit™ for Service Providers

Attribute Type Required Modifiable Since Description

href anyURI No Always 5.6 The URI of the entity.

type string No Always 5.6 The MIME of the entity.

Element Type Required Modifiable Since Description

CloudProxyBaseUri string No none 5.6 Effective base URI for Cloud Proxy (wss) endpoint. By default, this value is the same as the base URI for the API endpoint. This value can be overridden by specifying the CloudProxyBaseUriOverride element.

CloudProxyBaseUriOverride string No always 5.6 Base URI for Cloud Proxy (wss) endpoint. Leave empty to use the default.

CloudProxyBaseUriPublicCertChain string No none 5.6 Effective SSL public certificate chain for the effective base URI for cloud proxy (wss) endpoint. The certificate chain is PEM encoded X.509 certificates. By default, this value is the same as the public certificate chain for the API endpoint. This value can be overridden by specifying the CloudProxyBaseUriPublicCertChainOverride element. This value will be empty if URI is overridden but the certificate is not.

CloudProxyBaseUriPublicCertChainOverride string No always 5.6 SSL public certificate chain for the base URI for cloud proxy (wss) endpoint. The certificate chain must be PEM encoded X.509 certificates. A Base URI override must be specified if this value is included in a request.

Architecting a VMware vCloud Availability for vCloud Director Solution

72 | VMware vCloud® Architecture Toolkit™ for Service Providers

Element Type Required Modifiable Since Description

CloudProxyFromCloudTunnelHost string No none 6 IP address or hostname of the effective host used for from-the-cloud tunnels.

CloudProxyFromCloudTunnelHostOverride string No none 6 IP address or hostname of the host used for from-the-cloud tunnels. Leave empty to use the default value.

Link LinkType No none 5.6 A reference to an entity or operation associated with this object.

VCloudExtension VCloudExtensionType No always 5.6 An optional extension element that can contain an arbitrary number of elements and attributes. Not related to extension services.

Architecting a VMware vCloud Availability for vCloud Director Solution

73 | VMware vCloud® Architecture Toolkit™ for Service Providers

GET /admin/hybrid/settings Operation: GET /admin/hybrid/settings

Internal: Yes

Input parameters Consume media type: None

Input type: None

Output parameters current settings

Produce media type: application/vnd.vmware.vcloud.hybridSettings+xml

Output type: HybridSettingsType

PUT /admin/hybrid/settings Operation: PUT /admin/hybrid/settings

Description: Updates settings

Internal: Yes

Input parameters

New settings

Consume media type: application/vnd.vmware.vcloud.hybridSettings+xml

Input type: HybridSettingsType

Output parameters

updated settings

Produce media type: application/vnd.vmware.vcloud.hybridSettings+xml

Output type: HybridSettingsType

Architecting a VMware vCloud Availability for vCloud Director Solution

74 | VMware vCloud® Architecture Toolkit™ for Service Providers

Appendix G – Monitoring with vRealize Operations The essential vCloud Availability processes can be monitored with Endpoint Operations Management for vRealize Operations. The monitoring is achieved by installing vRealize Endpoint Operations Management Agent plug-in (64-bit RPM) into each appliance and by configuring process monitoring in vRealize Operations with the process queries in the following table.

Table 26. Monitoring with Endpoint Operations Management for vRealize Operations

Component Process Process query

vCloud Director cells

vcd-watchdog

vCloud Director

State.Name.eq=vmware-vcd-watc

State.Name.eq=java,Args.*.eq=/opt/vmware/vcloud-director/etc

CloudProxies vcd-watchdog

Cloud Proxy

State.Name.eq=vmware-vcd-watc

State.Name.eq=java,Args.*.eq=/opt/vmware/vcloud-director/etc

vSphere Replication Cloud Service

hcs State.Name.eq=java,Args.*.eq=libs/hcs-6.5.0.jar

vSphere Replication Manager Server

hms State.Name.eq=java,Args.*.eq=libs/hms.jar

vSphere Replication Server

hbr State.Name.eq=hbrsrv-bin

Tenant Portal nginx State.Name.eq=nginx-linux-x64

SMP Portal Dashboard

MongoDB

State.Name.eq=java,Args.*.eq=com.vmware.dr2. DRServiceDashboard

State.Name.eq=mongod

Architecting a VMware vCloud Availability for vCloud Director Solution

75 | VMware vCloud® Architecture Toolkit™ for Service Providers

Figure 34. Example of Process Configuration