VMworld 2013: Part 2: How to Build a Self-Healing Data Center with vCenter Orchestrator

Post on 19-Jun-2015

374 views 2 download

Tags:

description

VMworld 2013 Nicholas Colyer, Catamaran RX Dan Mitchell, VMware Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare

Transcript of VMworld 2013: Part 2: How to Build a Self-Healing Data Center with vCenter Orchestrator

Part 2: How to Build a Self-Healing Data Center with

vCenter Orchestrator

Nicholas Colyer, Catamaran RX

Dan Mitchell, VMware

VCM5695

#VCM5695

2 2

Session Agenda

vCenter Orchestrator Overview: A quick look at vCenter

Orchestrator platform

VMware Example: vCenter Operations Manager Remediation

package - using vCenter Orchestrator and vCenter Operations

Manager

Customer Example: Real-world use cases addressed by one

customer using vCenter Orchestrator

Partner Example: vCenter Orchestrator plugins by partners like

EMC address common use cases for remediation

3 3

Key Takeaways

Advice, considerations and implementation tips for real-world use cases

Understand the concept of the self-healing data center

and how vCenter Orchestrator supports it 1

2

3

Hear from a customer regarding their experiences today and how they

will continue to take advantage of vCO remediation capabilities

4 4

VMware vCenter Orchestrator Product Overview

5 5

vCenter Orchestrator Overview

Features

Drag-&-drop

design

• Create powerful workflows easily by drop-&-dragging pre-built actions

Cloud

scalability

• Execute hundreds to thousands of workflows in parallel to meet cloud scale

Flexible

triggers

• Launch workflows from the vSphere Web Client, vCloud Automation Center, browser, schedule, event, and API

Automate

VMware

• 100% coverage of vSphere and vCloud APIs

• Unmatched VMware content

Key Benefits

• Integrate VMware solutions into your IT environment and processes

• Reduce IT OpEx and total cost of ownership of VMware solutions

• Automate your cloud and accelerate transition to “IT as a Service” model

Platform

Plug-ins Ecosystem

vSphere

• Included with vSphere at no extra cost

• Installed with vCenter OOTB

Included with

vCenter Server

Fully Integrated

with vCAC

• Trigger vCO workflows from vCAC

• Use vCO to configure and extend vCAC

6 6

vCO Workflow Designer

• Drag and drop actions

• Conditional logic

• Pause, wait until, counters, etc.

• Exception handling

• Version control

• Role-based access control

• And more ...

~500 workflows and actions for vCenter Server

and vCloud Director

7 7

• Windows

• Mac & Linux

Designer

• SOAP

• REST

Web Services Operator

• vSphere Web Client

High-level vCO Product Architecture

• Oracle

• MS SQL Server

• PostgreSQL

Workflow Library

Webview Library

Workflow Engine

vCO Platform (Access points)

Management Systems

IT Infrastructure

vCO Platform (Engine, 64-bit)

vCO Plug-Ins

……

• vCloud Automation Center

• Service Catalogs

• AMQP

• SNMP

External

Notifications

8 8

• vCenter Server 4.0, 4.1, 5.0 & 5.1

• vCloud Director 1.0, 1.5 & 5.1

• vCloud Automation Center 5.1 & 5.2

• vCenter Update Manager 4.1, 5.0 & 5.1

• vCenter Chargeback 2.0

• vCenter Configuration Manager 5.5

• vCenter Orchestrator Multi-node 5.0 & 5.1

• vSphere Auto Deploy

• VMware Service Manager 9.1

• VMware Service Elasticity

• Microsoft AD & PowerShell

• AMQP /

RabbitMQ

• Email (POP3)

• Email (SMTP)

• HTTP-REST

• JDBC

• SOAP

• SNMP v1, v2c, v3

• SQL

• SSH

• Telnet

• XML

• BMC Atrium CMDB & Remedy – NEW

• EMC Unified Infrastructure Manager – NEW

• Infoblox NIOS – UPDATED

• Egenera PAN Manager - NEW

• Radware vDirect

• ServiceNow

• Up.time Software

Standard Protocols

Partner Applications • F5 Networks BigIP – NEW

• EMC ViPR – NEW

• Cisco UCS Manager 2.x – NEW

• NetApp storage

• Bluecat Networks

• VMware vCenter Network and Security

• VMware Site Recovery Manager

• HP ServiceManager

Upcoming releases

VMware Applications

Thousands of Out of the Box Workflows & Actions

9 9

• Improve scalability & availability

• Built-in HA & clustering

• Support external load balancers

• Extend the vCO REST API to:

• vCO server installation

• vCO server configuration

• Provide higher availability

• Scale orchestration capacity along with

the growth of your cloud

• Enable dynamic scale-up and scale-

down of orchestration capacity

Overview

Benefits

Optimized for Growing Clouds

Orchestration HA and

dynamic elasticity!

10 10

VMware Cloud Automation

vCloud Automation Center (IaaS, & DaaS Automation )

Infrastructure

Integration

• CMDB

• DNS

• IPAM

• Load

Balancers

• Service Desk

• Monitoring

Systems

• Databases

• Web Services

• Etc.

Fabric

Management

Automation

vC

en

ter

Orc

hes

trato

r IT

Pro

cess A

uto

matio

n

Some Use Cases:

o Automation of vSphere administrative tasks

o Remediation of infrastructure failures

o Automation of general IT admin tasks

Primary Role & Use Cases for vCenter Orchestrator

11 11

VMware Example – vCenter Operations

Manager with vCenter Orchestrator

Automated Remediation

12 12

vCenter Operations Remediation Workflow Package

What is its purpose?

• The purpose of the vCenter Operations Manager Remediation Workflow

Package is to be able to launch remediation workflows in vCenter

Orchestrator, as response to alerts received from vCenter Operations

Requirements on which the solution is based

• Create a solution for the problem - to be launching workflows, when vCenter

Operations alerts are received

• This solution should be simple and should not need any programming or

scripting from the user

• The user should be able to launch any workflow, from the library, or his/her

own creation, as a response to an alert

• It should be easily configurable

• The user should be able to filter the incoming events, based on different

alert properties

13 13

vCenter Operations Remediation Workflow Package

What do I need to use it?

• vCenter Orchestrator virtual appliance. (v5.1 or later)

• vCenter Orchestrator SNMP plugin

• vCenter Operations integration package

• vCenter Operations Manager

How does it work?

• vCenter Operations Manager sends SNMP traps to vCenter Orchestrator

• vCenter Orchestrator acts on the appropriate traps by executing workflows

14 14

vCenter Orchestrator + vCOps Remediation

1. vCenter health and operational

data is continually passed to

vCOps for analysis

2. When vCOps identifies an

operational issue, it throws an

SNMP trap to vCO, triggering a

vCO Policy to process the trap

3. vCO verifies the incoming trap is

mapped to an alert definition

4. vCO verifies there are filter

conditions defined for the trap

5. vCO launches the appropriate

remediation workflow

6. The vCO remediation workflow

corrects the operational issue

15 15

vCenter Orchestrator + vCOps Remediation

1. vCenter health and operational

data is continually passed to

vCOps for analysis

2. When vCOps identifies an

operational issue, it throws an

SNMP trap to vCO, triggering a

vCO Policy to process the trap

3. vCO verifies the incoming trap is

mapped to an alert definition

4. vCO verifies there are filter

conditions defined for the trap

5. vCO launches the appropriate

remediation workflow

6. The vCO remediation workflow

corrects the operational issue

16 16

Example Use Case: Identify a Datastore Capacity Issue

Datastore

running out of

capacity

17 17

Example Use Case: Identify Powered Off VMs

Powered off VMs

on the datastore

19 19

vCenter Operations Alerts Trigger Outbound Notification

Alerts trigger outbound

notification via Email and

SNMP traps

20 20

vCenter Orchestrator SNMP Trap Policy Workflow

SNMP Trap policy

workflow

Sample code that starts

remediation workflow if

capacity remaining alert is

received

21 21

Automate Remediation Using vCenter Orchestrator Workflows

Workflow to list powered off

VMs and VM snapshots to

resolve capacity issue

Prepare report and

send email notification

22 22

Email Notification from the Datastore Remediation Workflow

Email listing

powered off VMs and

associated snapshots

23 23

Customer Example – CatamaranRX Nick Colyer

Team Lead – Server Engineering

CatamaranRX

24 24

Customer Examples of Automation – Nick Colyer, CatamaranRX

Who is Nick Colyer?

• Brief History

• Blog: v-nick.com

• Twitter: @vNickC

How I got into automation

My Examples:

• Example #1

• Self Healing: Automating Configurations

• Example #2

• Self Healing: Automating Incident responses

25 25

Example 1:

Automating Configuration HA and DRS Settings

Start with a Goal in mind:

“I want to make sure that my ESXi

Clusters are checked every day to

ensure HA is on, DRS is fully

automated.”

Customer Example 1 – Automating Configuration for HA / DRS

26 26

Admission Control

settings 1

Enable Host

Monitoring 2

Break it down - HA Settings

Customer Example 1 – Automating Configuration for HA / DRS

27 27

Break it down - DRS Settings

DRS to Fully

Automated

3

4

Ensure other settings

remain!

Affinity Rules etc.

Customer Example 1 – Automating Configuration for HA / DRS

28 28

Customer Example 1 - Building the Workflow

Feed in clusters 1

Run corrective action 2

Repeat for every

cluster in your

environment

3

Schedule workflow to

run every night 4

29 29

Customer Example 1 - Create a Reusable Action Item

Create scriptable tasks workflow or an action.

30 30

Action Item: Enable HA/DRS Javascript

1. Calculate HA % based on number of hosts

//Get all the hosts in the cluster

var Hosts = System.getModule("com.vmware.library.vc.cluster").getAllHostSystemsOfCluster(cluster);

System.log("Number of Hosts in Cluster: " + Hosts.length);

//Calculate HA Percentage to tolerate 1 host worth of resources being offline

var HApercent = ((1/Hosts.length)*100);

HApercent = HApercent.toFixed(0);

//Log it

System.log("HA Percent which will be used for cluster is: " + Hapercent)

31 31

2. Turn on HA and DRS (partial code)

Action Item: Enable HA/DRS Javascript

var clusterConfigSpec = new VcClusterConfigSpecEx();

clusterConfigSpec.drsConfig = new VcClusterDrsConfigInfo();

clusterConfigSpec.dasConfig = new VcClusterDasConfigInfo();

//Enable DRS/HA

System.log("Setting HA and DRS to Enabled (even if they were already)");

clusterConfigSpec.dasConfig.enabled = true;

clusterConfigSpec.drsConfig.enabled = true;

//Reconfigure the cluster, by adding the True parameter this ensures any previous settings remain

System.log("Executing Cluster Reconfiguration for " + cluster.name);

task = cluster.reconfigureComputeResource_Task(clusterConfigSpec, true);

IMPORTANT!

If you don’t add the true

option, it will remove all your

other existing HA/DRS

settings. i.e. affinity rules

33 33

Example 2:

Automation in response to an event

Start with a Goal in mind:

“Enable repeatable scripted

actions to be initiated in response

to an SNMP Trap”

34 34

Customer Example 2: Breaking it Down…

vCenter critical alarm for a

datastore over 95% full 1

Send trap to vCO 2

Run Storage DRS on

storage pool 3

35 35

Customer Example 2: How Do We Achieve This…

1. Configure SNMP Trap receiver on vCenter Orchestrator

• http://blogs.vmware.com/orchestrator/2011/09/snmp-plug-in-integration-with-vcenter.html

• http://www.vcoportal.de/2012/05/integrate-vcops-and-vco/

2. Create Workflow which interprets traps

3. Create Workflows for repeatable automated corrective actions

a. Locates Datastore Cluster which Datastore is a member of

b. Executes SDRS

a. Expand on it further: Auto provision a LUN from the SAN

36 36

Customer Example 2: Master Workflow That Feeds Corrective Action Workflows

1

Scriptable task to interpret

trap data

Does the trap contain

something we know how to

handle?

2

3 Run corrective

action

37 37

Customer Example 2: Run SDRS Workflow in Detail

1. Search vCenter for a datastore with the same name as the one in the

trap.

2. Check SDRS Pods to see if they contain the datastore object

3. Refresh Storage recommendations

task = m.refreshStorageDrsRecommendation(podToRunSDRSOn)

Full script on my web site: v-nick.com

39 39

Taking It to the Next Level…

1. Instead of just running SDRS, create a workflow to auto-provision

storage from the array when the space left in an SDRS pool gets

below a threshold

2. Have a workflow that automatically creates the change order, but

waits for someone to actually release the workflow

3. Corrective actions from other monitoring systems

• i.e. Solarwinds/SCOM when a Windows 2008 Server drive is below critical

amount.

• vCO can automatically expand the disk in vSphere, and then expand it inside

the OS.

40 40

Being Successful in a Corporate Environment

How do you start?

• Need upper leadership to be bought into the idea of automation.

• Standardize > Write Procedures > Automate

• Adopt an automate first approach

Develop a team that will become “Stewards” of vCO

• Empower others to automate

Keep it simple

• Re-use existing code

• Look at the built in workflows

Know what other tools in your environment can integrate with vCO

(e.g. ServiceNow)

41 41

Partner Example – EMC Unified

Infrastructure Manager

42 42

vCenter Orchestrator + EMC Unified Infrastructure Manager plug-in

Use Case 1: vSphere Cluster capacity at maximum, need to add host

• Virtual machines are running slow and you find out hosts are overloaded and

running low on CPU and memory

• VMware administrator can initiate adding a new server to the UIM VDI service,

making it available as a new host to vCenter, either through the vCO interface,

or through the vSphere web client

43 43

vCenter Orchestrator + EMC Unified Infrastructure Manager plug-in

Use Case 2: Low remaining capacity on Oracle database server

• An Oracle database is running out of storage, which could impact availability of

the production applications

• VMware administrator can initiate adding additional storage array LUNs to the

UIM Oracle service, making them available as datastores within the vCenter

cluster, either through the vCO interface, or through the vSphere web client

44 44

Advice, Considerations and Tips

Map out your process

• Before trying to automate anything, map out how YOU would fix the problem,

step by step

Factor in alert storms

• Design your workflows to be aware of its active instances to prevent overlap

Know when to give up

• Remediation workflows only know as much as you teach them. If fixing an

issue goes beyond the capabilities of your workflows, add notifications to let

you know when manual intervention is required

Establish credibility with the low-hanging fruit

• Start out by fixing the easy stuff – stray snapshots, remounting of data stores

Don’t reinvent the wheel!

• Leverage the established community of vCenter Orchestrator experts – many

have example workflows and packages to offer!

.

45 45

Questions?

46 46

VMworld on Social Media

@startswithv – Dan M

#CloudMgmt

#CloudAutomation

#VMworld

47 47

Summary: SDDC Delivers Transformational Benefits

* Claims being validated by the Taneja Group (final numbers expected August, 2013).

Support for over 500

ISV solutions and 80

operating systems

Choice

Any App Anywhere

Reduce IT capex by

75% and opex by 56%*

Cloud Service Provider

Economics

Control

Reduce downtime

for tier 1 applications

by 36%*

Cloud on Your Terms

Agility

Increase IT

productivity by 67%*

Apps at Business Speed

Start Your Journey with the VMware SDDC Today

48 48

Other VMware Activities Related to This Session

HOL:

HOL-SDC-1307

vCloud Automation Solutions

VCM5695

THANK YOU

Part 2: How to Build a Self-Healing Data Center with

vCenter Orchestrator

Nicholas Colyer, Catamaran RX

Dan Mitchell, VMware

VCM5695

#VCM5695