Ha & drs gotcha's

49
HA & DRS Gotchas that will HA & DRS Gotchas that will Kill your Infrastructure Kill your Infrastructure Greg Shields Greg Shields Partner and Principal Technologist Concentrated Technology www.ConcentratedTech.com

description

 

Transcript of Ha & drs gotcha's

Page 1: Ha & drs gotcha's

HA & DRS Gotchas that willHA & DRS Gotchas that willKill your InfrastructureKill your Infrastructure

Greg ShieldsGreg ShieldsPartner and Principal TechnologistConcentrated Technologywww.ConcentratedTech.com

Page 2: Ha & drs gotcha's

This slide deck was used in one of our many conference presentations. We hope you enjoy it, and invite you to use it

within your own organization however you like.

For more information on our company, including information on private classes and upcoming conference appearances, please

visit our Web site, www.ConcentratedTech.com.

For links to newly-posted decks, follow us on Twitter:@concentrateddon or @concentratdgreg

This work is copyright ©Concentrated Technology, LLC

Page 3: Ha & drs gotcha's

Class DiscussionClass Discussion

Two Questions:

So……How often do you actually vMotion your virtual machines?– Or, would you if you haven’t yet deployed ESX.

Page 4: Ha & drs gotcha's

Class DiscussionClass Discussion

Two Questions:

So……How often do you actually vMotion your virtual machines?– Or, would you if you haven’t yet deployed ESX.

What failure states does vMotion provide protection?– Hint: They’re fewer than you’d think…

Page 5: Ha & drs gotcha's

vMotion Solves Two ProblemsvMotion Solves Two Problems

Problem #1: Protection from Host Failures orScheduled HostDowntime

(These are relatively rare)

(They will get even more rare as we migrate to ESXi)

Page 6: Ha & drs gotcha's

vMotion Solves Two ProblemsvMotion Solves Two Problems

OverloadedVirtual Host

Shared Storage

UnderloadedVirtual Host

Network

Live Migration to New Host

Problem #2:Load Balancing of VM & Host Resources

(Much morecommon, whereturned on)

Page 7: Ha & drs gotcha's

Costs vs. BenefitsCosts vs. Benefits

High-availability adds dramatically greater uptime for virtual machines.– Protection against host failures– Protection against resource overuse– Protection against scheduled/unscheduled downtime

High-availability also adds much greater cost…– Shared storage between hosts– Connectivity– Higher (and more expensive) software editions

Not every environment needs HA!– Does anyone want to argue this point?

Page 8: Ha & drs gotcha's

Contrary to Popular Belief…Contrary to Popular Belief…

…seeing the actual vMotion process occurisn’t all that sexy.

DEMO: Watching avMotion occur…

Page 9: Ha & drs gotcha's

vMotion: What Really Happens?vMotion: What Really Happens?

Sexy is recognizing what’s going on under the hood.

Page 10: Ha & drs gotcha's

vMotion: What Really Happens?vMotion: What Really Happens?

Sexy is recognizing what’s going on under the hood.

Let’s Compare to Hyper-V’s First Release– Remember those days? A Hyper-V VM could be

relocated “with a minimum of downtime”.

Page 11: Ha & drs gotcha's

vMotion: What Really Happens?vMotion: What Really Happens?

Sexy is recognizing what’s going on under the hood.

Let’s Compare to Hyper-V’s First Release– Remember those days? A Hyper-V VM could be relocated

“with a minimum of downtime”.

– This downtime was directly related to the amount of memory assigned to the virtual machine and the connection speed between virtual hosts and shared storage.

– A VM with 2G of vRAM could take 32 seconds or longer to migrate!

– Virtual machines with more assigned virtual memory and/or slow networks take longer to complete.

– Those with less complete the migration faster.

Page 12: Ha & drs gotcha's

vMotion: What Really Happens?vMotion: What Really Happens?

So…How does it work?– As you invoked a Hyper-V “Quick Migration”, the virtual

machine was immediately put into a saved state.

– That state was not a power down, nor was it the same as pausing.

– In the saved state – and unlike pausing – the virtual machine released its memory reservation on the host machine and stored the contents of its memory pages to disk.

– Once this completed, a target host could take over the ownership of the virtual machine and bring it back to operations.

Page 13: Ha & drs gotcha's

vMotion: What Really Happens?vMotion: What Really Happens?

OverloadedVirtual Host

Shared Storage

UnderloadedVirtual Host

Network

Live Migration to New Host

Page 14: Ha & drs gotcha's

vMotion: What Really Happens?vMotion: What Really Happens?

This saving of virtual machine state and transferring memory contents consumed downtime.

So, how is this different from vMotion (& HV today)?

Page 15: Ha & drs gotcha's

vMotion: What Really Happens?vMotion: What Really Happens?

This saving of virtual machine state and transferring memory contents consumed downtime.

So, how is this different from vMotion (& HV today)?– vMotion and today’s Hyper-V use a pre-copy mechanism.

– Transfers VM memory pages from source to target host prior to starting the migration and while it is still running.

– During the pre-copy memory changes are logged.

– These tend to be relatively small in size.

– Once the initial copy has completed, vMotion then…

…pauses the virtual machine…copies the memory deltas…notifies network switches (RARP) & fibre fabric…transfers ownership to the target host

The result: Effectively “zero” downtime.

Page 16: Ha & drs gotcha's

vMotion: GotchavMotion: Gotcha’’ss

Successful vMotion requires similar processors.– Processors must be from the same manufacturer. No

Intel-to-AMD or AMD-to-Intel vMotioning.– Processors must be of a proximate families.– This bites people a few years down the road all the

time!

Page 17: Ha & drs gotcha's

vMotion: GotchavMotion: Gotcha’’ss

Page 18: Ha & drs gotcha's

vMotion: GotchavMotion: Gotcha’’ss

Big problem: As a virtual environment ages, hardware is refreshed and new hardware is added.– New servers sometimes create “islands” of vMotion

capability

How can we always vMotion between computers?

Page 19: Ha & drs gotcha's

vMotion: GotchavMotion: Gotcha’’ss

Big problem: As a virtual environment ages, hardware is refreshed and new hardware is added.– New servers sometimes create “islands” of vMotion

capability

How can we always vMotion between computers?

– You can always refresh all hardware at the same time (Har!)

Page 20: Ha & drs gotcha's

vMotion: GotchavMotion: Gotcha’’ss

Big problem: As a virtual environment ages, hardware is refreshed and new hardware is added.– New servers sometimes create “islands” of vMotion

capability

How can we always vMotion between computers?

– You can always refresh all hardware at the same time (Har!)

– You can cold migrate, with the machine powered down. This always works, but ain’t all that friendly.

Page 21: Ha & drs gotcha's

vMotion: GotchavMotion: Gotcha’’ss

Big problem: As a virtual environment ages, hardware is refreshed and new hardware is added.– New servers sometimes create “islands” of vMotion capability

How can we always vMotion between computers?

– You can always refresh all hardware at the same time (Har!)– You can cold migrate, with the machine powered down.

This always works, but ain’t all that friendly.– You can use vMotion Enhanced Compatibility Mode to

manage your vMotion-ability. Create islands as individual clusters.

DEMO: vMotion EVC

Page 22: Ha & drs gotcha's

Class DiscussionClass Discussion

Blades versus Servers has been a common battle in virtualization spaces.– Even I hated them for years.– Then, I saw the light. <Cue angels singing>

Why are blades and EVC perfect for each other?

Page 23: Ha & drs gotcha's

Class DiscussionClass Discussion

Blades versus Servers has been a common battle in virtualization spaces.– Even I hated them for years.– Then, I saw the light. <Cue angels singing>

Why are blades and EVC perfect for each other?

How are blades and Private Clouds perfect for each other?

Page 24: Ha & drs gotcha's

New Topic!New Topic!vMotion vs. Storage vMotionvMotion vs. Storage vMotion

Page 25: Ha & drs gotcha's

vMotion vs. Storage vMotionvMotion vs. Storage vMotion

Page 26: Ha & drs gotcha's

vMotion vs. Storage vMotionvMotion vs. Storage vMotion

Page 27: Ha & drs gotcha's

vMotion vs. Storage vMotionvMotion vs. Storage vMotion

Three options for migrating machines and disks.– Online & Powered On: vMotion– Online & Powered On: Storage vMotion– Offline & Powered Off: vMotion + Storage vMotion

at the same time.

Page 28: Ha & drs gotcha's

vMotion vs. Storage vMotionvMotion vs. Storage vMotion

Requirements:– Virtual machines with snapshots cannot be

svMotioned.– Virtual machine disks must be persistent mode or

RDMs.– The host must have sufficient resources to support

two instances of the VM running concurrently for a brief time.

– The host must have a vMotion license, and be correctly configured for vMotion.

– The host must have access to both the source and target datastores.

Page 29: Ha & drs gotcha's

HA & DRS:HA & DRS:Combining vMotion + MathCombining vMotion + Math

Page 30: Ha & drs gotcha's

vMotion ItselfvMotion ItselfMight Not be Sexy…Might Not be Sexy…

Page 31: Ha & drs gotcha's

vMotion ItselfvMotion ItselfMight Not be Sexy…Might Not be Sexy…

…but what-you-can-do-with-it-once-you-combine-it-with-monitoring-and-a-set-of-smart-calculations…is! Rawrrr…

Page 32: Ha & drs gotcha's

vMotion ItselfvMotion ItselfMight Not be Sexy…Might Not be Sexy…

…but what-you-can-do-with-it-once-you-combine-it-with-monitoring-and-a-set-of-smart-calculations…is! Rawrrr…

HA– A host goes down, to where do I relocate VMs?

DRS– A host is overloaded, how should I re-balance VMs

to optimize resource capacity to resource demand?

These aren’t trivial questions to answer!

Page 33: Ha & drs gotcha's

vCenter ConstructsvCenter Constructs

Datacenter– The boundary of a virtual infrastructure (& vMotion)

Cluster– Collection of ESX hosts for centralized management

Resource Pool– Logical abstraction of processing and memory

capacity, which can be distributed to VMs as necessary.

– Sub-collections of resource capacity for distribution to VMs based on business rules.

DEMO: Properties of a Cluster

Page 34: Ha & drs gotcha's

Shares, Reservations, & LimitsShares, Reservations, & Limits

Shares– Identifies the ratio of resources a VM can consume

Reservations– Identifies minimum resources a VM is guaranteed– Ensures minimum service level during resource

contention

Limits– Identifies maximum resources a VM may have– Protects against resource overuse (spiking)

Page 35: Ha & drs gotcha's

Understanding Shares & Understanding Shares & Resource PoolsResource Pools

TechMentorCluster

PowerShell Deep DiveResource Pool1000 CPU & RAM Shares2048 RAM

ESXpert Deep DiveResource Pool2000 CPU & RAM Shares4096 RAM

WMI Class200 CPU Shares512M RAM

AD Class600 CPU Shares784M RAM

Storage Class1200 CPU Shares3G RAM

vMotion Class800 CPU Shares512M RAM

Page 36: Ha & drs gotcha's

Abstracting ResourcesAbstracting Resources

EXTENDED DEMO:– Reservations– Limits– Shares– Resource Pools

Remember: These all factor into HA/DRS cluster automation calculations

Page 37: Ha & drs gotcha's

What Role Does DRS Play?What Role Does DRS Play?

Chassis 1 Chassis 2 Chassis 3

TechMentorCluster

HOT!HOT!

Page 38: Ha & drs gotcha's

What Role Does DRS Play?What Role Does DRS Play?

38

Automation Level

Initial VM Placement

Dynamic Balancing

Administrator Involvement

Manual Manual Manual High –All actions

Partially-Automated

Automatic Manual Moderate – Approval Actions

Fully-Automated

Automatic Automatic Low –Monitoring

Administrator trust is often the deciding

factor in choosing the automation level.

Page 39: Ha & drs gotcha's

How Does DRS Do its Doo-Doo?How Does DRS Do its Doo-Doo?

Five recommendation levels– 1 = DO THIS NOW!– 2 = DO THIS (SORT OF) NOW– 3 = Do this very soon-ish– 4 = You know, if you have the time, you might

consider…– 5 = Meh. Get around to it when you have time.

Page 40: Ha & drs gotcha's

How Does DRS Do its Doo-Doo?How Does DRS Do its Doo-Doo?

Five recommendation levels– 1 = DO THIS NOW!– 2 = DO THIS (SORT OF) NOW– 3 = Do this very soon-ish– 4 = You know, if you have the time, you might

consider…– 5 = Meh. Get around to it when you have time.

Page 41: Ha & drs gotcha's

How Does DRS Do its Doo-Doo?How Does DRS Do its Doo-Doo?

Great, but how is this calculated?– Anybody wanna’ get super geeky?

Uh, duh...

“Ceiling”Operator

Page 42: Ha & drs gotcha's

What About HA?What About HA?

Chassis 1 Chassis 2 Chassis 3

TechMentorCluster

CRASH

CRASH!!

RESTART

RESTARTRESTART

RESTARTRESTART

RESTART

Page 43: Ha & drs gotcha's

What About HA?What About HA?

VMware HA must be configured for each Virtual Machine.

HA is a component of DRS. DRS will analyze the resource load of the

system at an HA event and decide where to restart the failed server.

Crashed system will restart on new chassis. Will incur an outage, but that outage will be short.

Both HA and DRS require DNS.– This is exceptionally important!

Page 44: Ha & drs gotcha's

Affinity / Anti-AffinityAffinity / Anti-Affinity

Within a DRS cluster, certain machines should remain on the same chassis.– E.g., an application server and it’s database

server

Others should never– E.g., two domain controllers

Use affinity rules and anti-affinity rules to ensure correct placement of systems during DRS load balancing.– Ensure systems aren’t given conflicting rules

DEMO: Affinity

44

"Serve the public trust. Protect the innocent. Uphold the law.”

Page 45: Ha & drs gotcha's

Building in Cluster ReserveBuilding in Cluster Reserve

All of these calculations highlight the notion that you’re always going to need somewhere to go.– Hosts that die need to send VMs somewhere.– DRS needs a resource buffer if its to do its job.– You need expansion potential.

That’s why maintaining a cluster reserve is important.

LIVE DRAW: Cluster Reserve

Page 46: Ha & drs gotcha's

Easter Egg: Change DRSEaster Egg: Change DRSInvocation FrequencyInvocation Frequency You can customize how often DRS will

automatically take its own advice.– I wish my wife had this setting…

On your vCenter Server, locateC:\Users\All Users\Application Data\VMware\VMware VirtualCenter\vpxd.cfg

Add in the followinglines (appropriately!):

Page 47: Ha & drs gotcha's

Final ThoughtsFinal Thoughts

For the love of gosh, turn on HA/DRS.– But only if you have enough hardware!– You’ve already paid for it.– It is smarter than you.

Understand why your VMs move around.– …and always make sure that you’ve got the correct

connected resources that they need on every host!

Save some cluster resources in reserve.– You’ll thank me for it!

Page 48: Ha & drs gotcha's

HA & DRS Gotchas that willHA & DRS Gotchas that willKill your InfrastructureKill your Infrastructure

Greg ShieldsGreg ShieldsPartner and Principal TechnologistConcentrated Technologywww.ConcentratedTech.com

Please fill out evaluations,or your karma will suffer!

!!!

Page 49: Ha & drs gotcha's

This slide deck was used in one of our many conference presentations. We hope you enjoy it, and invite you to use it

within your own organization however you like.

For more information on our company, including information on private classes and upcoming conference appearances, please

visit our Web site, www.ConcentratedTech.com.

For links to newly-posted decks, follow us on Twitter:@concentrateddon or @concentratdgreg

This work is copyright ©Concentrated Technology, LLC