Jonas Berglund - VMwaredownload3.vmware.com/elq/img/5737_APAC_ASEAN_MY_TechDay/d… · Jonas...
Transcript of Jonas Berglund - VMwaredownload3.vmware.com/elq/img/5737_APAC_ASEAN_MY_TechDay/d… · Jonas...
Availability Management
Jonas Berglund
Business Development Manager, Jardine
John Mah
2
Agenda
High Availability
Storage V-Motion
V-Motion
- What are they used for?- How do they work?
3
�Customer Benefits:
� High availability with minimal capital & operational costs
Reduced need for passive stand-by hardware & dedicated administrators
� Broadly applicable
All x86 workloads virtualized are automatically protected without agents or scripting
� Manageability & Flexibility
Very simple to deploy and maintain
Application manageability unchanged
N-to-N failovers, hosts and VMs easily moved in and out of clusters
Resource Pool
High Availability with VMware HA
What is VMware HA?
�Automatic restart of virtual machines
in case of physical server failures
4
�Customer Benefits:
� High availability with minimal capital & operational costs
Reduced need for passive stand-by hardware & dedicated administrators
� Broadly applicable
All x86 workloads virtualized are automatically protected without agents or scripting
� Manageability & Flexibility
Very simple to deploy and maintain
Application manageability unchanged
N-to-N failovers, hosts and VMs easily moved in and out of clusters
Resource Pool
X
High Availability with VMware HA
What is VMware HA?
�Automatic restart of virtual machines
in case of physical server failures
5
Introduction
�VMware HA continuously communicate with each other.
�If an ESX server crashes, remaining ESX will start VMs
�If an ESX server is deemed isolated
� First Default Gateway is pinged
� No response will render a shutdown of ESX server
6
VMware HA Under the Covers
VirtualCenter
Heartbeat
NW
HA agents are installed & configured on ESX Hosts via VirtualCenter
� Post configuration, HA agents maintain a heartbeat
� VMFS & shared storage allows hosts to access &
power-on virtual machines
� During a failover, quick
restart is the primary goal
7
Architecture Overview - Storage
�As HA requires that a machine can be started on any ESX Server
�Therefore storage must be shared between each ESX Server
�A quick test would be to VMotion a machine across all boxes.
8
Failure Detection & Failover Details
� ESX host failures are detected via missing heartbeats
� Failover operations are coordinated as follows:
1. One of the primary hosts is selected to coordinate the failover, and one of the remaining hosts with spare capacity becomes failover target
2. Virtual machines affected are sorted by priority, and powered onuntil failover target runs out of spare capacity
3. If the host selected as coordinator fails, another primary continues the effort
4. If one of the hosts that fails was a primary node, one of the remaining secondary nodes is promoted to being a primary
9
Isolation Response
� Network failures can cause “split-brain” conditions
� Isolation response is used to prevent split-brain conditions
� Powering virtual machines off releases VMFS locks, enables other hosts to recover
Virtual Machine
Virtual Machine
Virtual Machine
Virtual Machine
Virtual Machine ?
10
VMware HA Best Practices – Resource Management
1. Larger groups of homogenous servers will allow higher levels of utilization across an HA/DRS enabled cluster (on average)
2. To define the sizing estimates used for admission control, set reasonable reservations as the minimumresources needed
3. Pay attention to cluster warnings and messages
11
VMware HA Best Practices – Setup & Networking
1. Proper DNS & Network settings are needed for initial configuration
2. Redundancy to ESX Service Console is essential (several options)
3. Network maintenance activities should take into account dependencies on the ESX Service Console network(s)
4. Valid VM network label names required for proper failover
V- Motion
13
About Migration
Cold Migration
VM is powered-off or
suspended
Does not require shared
storage
Supports VM migration
between datacenters
Migration with VMotion
VM is powered-on
Requires VMotion licensing
Requires shared storage
Does not support VM
migration between
datacenters
Required for VMware DRS
14
Migration With VMotion
Shared StorageB
Source Destination
VMotion Network
Production Network
A
15
Migration With VMotion
Shared StorageB
Source Destination
VMotion Network
Production Network
A
16
Migration With VMotion
Shared StorageB
Source Destination
VMotion Network
Production Network
AA
17
Migration With VMotion
Shared StorageB
Source Destination
VMotion Network
Production Network
AA
18
Migration With VMotion
Shared StorageB
Source Destination
VMotion Network
Production Network
A
19
Requirements for VMotion
�Source and destination hosts must be:
Part of the same Datacenter
Connected to the same Gigabit subnet
Connected to the same storage (SAN, NAS, iSCSI)
�Recommended: dedicated Gigabit network for VMotion
�Destination host must have enough resources
�VM must not use physical devices like CDROM,
Floppy
�Compatible CPU models for source and destination host
20
VMware Enhanced VMotion Compatibility
�Based on Intel and AMD’s hardware support, VMware developed Enhanced VMotion Compatibility (EVC)
�General idea: A cluster of physical machines with different CPUs can be made VMotion compatible
�Each host in an EVC cluster presents the same CPUID feature bits to a VM
An Intel-based cluster will present the features present in Merom cores
An AMD-based cluster will present the features present in Opteron Rev E CPUs
Of course, the cross-vendor restriction remains
21
EVC screenshot
22
EVC screenshot
23
EVC in future releases
�General idea and concept remains the same
�Hide as much of the complexity as possible
�Possibly introduce more choices as to which CPU features are exposed to the guest:
Most capable: expose the largest subset of CPU features supported by the CPUs in the cluster
Older hardware may later not be added to the cluster
Broadest compatible: minimal set of CPU features are exposed to guest so that older hardware may be added to the cluster
Storage V-Motion
25
VMotion: Decouples VM From Host
� VMotion
Migrate a running VM transparently between physical hosts
No service interruption
� VM decoupled from host
Host downtime no longer equates to VM or application downtime
26
What About Storage?
�VMotion doesn’t help for storage downtime
�Storage downtime/migration currently a time-consuming process
�Migrating VM storage without VM or application downtime is very useful
27
Upgrade VMotion
�Non-disruptive migration of running VM from host running ESX Server 2 to host running ESX Server 3
Debuted in ESX Server 3.0.1/VirtualCenter 2.0.1
VMotion VM from ESX Server 2 to ESX Server 3
Online virtual disk relocation from VMFS 2 to VMFS 3
VMFS 2 VMFS 3
ESX Server 3ESX Server 2
28
Storage VMotion
�Migrate running VM to new storage
VM stays on same host
VM disks may be individually placed
Storage type independent
�Migration does not disturb VM
No downtime
Transparent to guest OS and apps
Minimal performance impact
29
Use Cases
�Storage maintenance
�Retire or migrate between arrays
Arrays coming off maintenance/leasing cycles
�Storage tiering
Migrate from FC to iSCSI, NAS or within or between
enclosure(s)
�Eliminate performance bottlenecks
Load balance through LUN reconfiguration
Seamlessly add and begin using new LUNs
�Upgrade VMFS non-disruptively
Future proofing disk format
30
What We Need To Move
�VM’s “home” directory
Config file
Logs
Swap file
Snapshots
Other misc files
�VM disks
Treat disks separately to support independent placement
VMFS volume
config
swap
disks
logs
Virtual Machine
31
Storage VMotion In Action
Source Destination
“Self”-VMotion to new VM home
2
Consolidate child disk into copied disk
5
Copy VM home to new location
1
Delete original VM home and disks
6
Take disk-only snapshot(creates child disk)
3
Copy disk to destination
4
32
Storage Type Agnostic
� Developed to work on all storage types
High-level copier technologies used
VM home copied using NFC copier
VM disks moved with snapshot technology
Copiers not storage type specific
Located “above” filesystem layer
Source and destination can be different
storage types
� Only FC SAN officially supported inVI 3.5
NFSiSCSISAN Local
VMFSCopiers
33
VMotion Requirements
� Use of VMotion requires…
Temporary doubling of VM’s cpu/mem resources
Two copies of the VM exist simultaneously on single host
Each VM uses memory and has its own cpu/mem reservation
VMotion network interface must be configured properly
Does not require a physical link
Intrahost “network” connection between the two VMs
Must not be using a device that prevents VMotion
34
Invoking a Storage VMotion
�Performed through command-line script
Script part of the RCLI
Can be installed in Windows or Linux or run from a virtual appliance
Connects to VC and performs Storage VMotion
�Script is interactive
Prompts for input
Non-interactive mode for CLI experts
�Download RCLI from http://www.vmware.com/go/remotecli
Separate download from VirtualCenter!
36
Disclaimer
This session may contain product features that are currently under development.
This session/overview of the new technology represents no commitment from VMware to deliver these features in any generally available product.
Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
Technical feasibility and market demand will affect final delivery.
Pricing and packaging for any new technologies or features discussed or presented have not been determined.
“These features are representative of feature areas under development. Feature commitments are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final delivery.”
37
Virtual Datacenter OS from VMware
38
Application vServices
39
Application Availability
� VMware Infrastructure improves application availability in a variety of ways
Including VMware VMotion, VMware HA, Site Recovery Manager, etc.
Provides increased availability to all OSes and applications
� New technology: VMware Fault Tolerance (FT)
Allows a virtual machine to survive a server failure without any interruption in service—zero downtime and zero data loss!
Based on vLockstep technology that runs a primary and secondary machine in virtual lockstep with no special hardware
Allows users to protect more applications, with lower cost and complexity
40
Ha
rdw
are
Failu
re T
ole
ran
ce
UNPROTECTED
Application Coverage
AUTOMATEDRESTART
CONTINUOUS
0% 10% 100%
Transforming Availability Service Levels
41
Ha
rdw
are
Failu
re T
ole
ran
ce
UNPROTECTED
Application Coverage
AUTOMATEDRESTART
CONTINUOUS
0% 10% 100%
Transforming Availability Service Levels
42
Ha
rdw
are
Failu
re T
ole
ran
ce
UNPROTECTED
Application Coverage
AUTOMATEDRESTART
CONTINUOUS
0% 10% 100%
Transforming Availability Service Levels
with VMware HA
43
VMware FT
Ha
rdw
are
Failu
re T
ole
ran
ce
UNPROTECTED
Application Coverage
AUTOMATEDRESTART
CONTINUOUS
0% 10% 100%
Transforming Availability Service Levels
with VMware HA
44
VMware Fault Tolerance (FT) Highlights
45
VMware Fault Tolerance (FT) Highlights
VMware FT
Primary Secondary
46
VMware Fault Tolerance (FT) Highlights
X
VMware FT
Primary Secondary
X
47
VMware Fault Tolerance (FT) Highlights
X
VMware FT
X
Primary
48
VMware Fault Tolerance (FT) Highlights
Exciting technology for mission critical VM’s
� Single VM running on two hosts for extra redundancy
� Zero downtime and zero data loss in case of hardware failures
� Automatically restore redundancy after failover
Functionality in 2009:
� Support for all guest OS’s supported by platform (UP only)
� Integrated with VMware HA/DRS
� Multiple FT VM’s per host
� Can disable/re-enable FT dynamically
X
VMware FT
Primary Secondary
X
Primary
49
Virtual Machine Record & Replay
REPLAY
Application
Operating System
Virtualization Layer
Application
Operating System
Virtualization Layer
RECORDLogging causes of non-determinism
• Input (network, user), asynchronous I/O (disk, devices), CPU timer interrupts
Deterministic delivery of events previously logged
• Result = repeatable VM execution
VMware has developed capability to record & replay VM execution
50
Determinism
Given the exact same inputs as a previous run, a processor will deterministically execute the same instruction stream and end up in the exact same state
“Input” in this context means anything outside the CPU/memory that is visible to software:
I/O (disk, networking, keyboard, etc.), interrupts (I/O devices, timers)
It also includes any non-deterministic behavior of the processor itself that is visible to software:
Getting processor timestamp
It’s hard to replay execution on physical machines, but doable for virtual machines
Record/replay technology has been in VMware Workstation for more than a year
51
Virtual Machine Fault Tolerance—Basic Concepts
For a given primary VM, run a secondary VM on a different host
Secondary shares virtual disks with primary
Secondary VM kept in “virtual lockstep”via logging info sent over private network connection
Only primary VM sends and receives network packets, secondary is “silent partner”
If primary host fails, secondary VM takes over with no interruption to applications!
vLockstep
52
VMware Fault Tolerance and HA Work Together
Resource Pool
� FT VM’s run only in an HA cluster
� Mission-critical VMs are protected by FT and HA, remaining VM’s protected by HA
When a host fails:
� FT secondary takes over
� New FT secondary is started by HA
� HA-only VM’s are restarted
53
VMware Fault Tolerance and HA Work Together
Resource Pool
VMware HA
VMware FT
VMware FT
� FT VM’s run only in an HA cluster
� Mission-critical VMs are protected by FT and HA, remaining VM’s protected by HA
When a host fails:
� FT secondary takes over
� New FT secondary is started by HA
� HA-only VM’s are restarted
54
VMware Fault Tolerance and HA Work Together
Resource Pool
VMware HA
X
X
VMware FT
VMware FT
VMware FT
� FT VM’s run only in an HA cluster
� Mission-critical VMs are protected by FT and HA, remaining VM’s protected by HA
When a host fails:
� FT secondary takes over
� New FT secondary is started by HA
� HA-only VM’s are restarted
X
55
Lifecycle of a fault-tolerant VM
� User selects VM on which to enable FT
� Secondary VM is automatically created that shares disk with primary
� When primary VM is powered on, secondary VM is started on another host via special kind of VMotion
� Secondary VM keeps in virtual lockstep
� If the primary host goes down, the secondary VM will “go live” and become the primary VM
� VMware HA automatically starts another secondary VM to restore redundancy
� Secondary VM powers off when primary does or when FT is disabled
vLockstep
56
Configuring a Fault-tolerant VM
57
Configuring a Fault-tolerant VM
58
Questions and Answers
Q and A?
Thank You!