Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available
-
Upload
martin-perina -
Category
Internet
-
view
499 -
download
1
Transcript of Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available
DevConf.cz 2016 1/21
Host fencing in oVirt
Fixing the unknown and allowingVMs to be highly available
Martin Peřina
Software Engineer at Red Hat
DevConf.cz 2016 2/21
Agenda
● Introduction
● Fencing in real life
● Future plans
DevConf.cz 2016 3/21
Introduction
DevConf.cz 2016 4/21
oVirt architecture
Engine
VDSM
VDSM
Storage
Cluster
Data Center
DevConf.cz 2016 5/21
Terminology
● Host - physical server to run hypervisor on
● Cluster - set of hosts with same architecture/capabilities to enable VM migrations between those hosts
● Data Center - set of clusters and storage
● Highly Available VM - VM which is automatically restarted on different host in case of failure
DevConf.cz 2016 6/21
Terminology
● Power Management Interface - interface of the hosts that allow to perform PM operations on it
● Fence Agent - tool that exposes PM interface of the host through common API
● Non Responsive Host - host that didn't respond to engine communication request for some time
● Fence Proxy - host on which fence agent is executed to perform power management operation for non responsive host
DevConf.cz 2016 7/21
Host detail / Power Management tab
DevConf.cz 2016 8/21
Power management operation
Engine
Fence Proxy Host
Target Host - Power Management Restart - Fence Agent Call
DevConf.cz 2016 9/21
Fence proxy selection
● Process to select a host on which fence agent will be executed
● Hosts are preferred according to their status
● Hosts are evaluated by their location:– Cluster– Data Center– Other Data Center (not by default)
● Proxy host location preference can be customized either globally or per host
DevConf.cz 2016 10/21
Power management proxy preference
DevConf.cz 2016 11/21
Fencing
● Process that tries to make non responsive host responsive again using various techniques
● Successful detection of host dumping flow or successful execution of power management stop is the only way how to ensure that VMs executed on the host are no longer alive -> those VMs can be restarted on different host
● Prevent data corruption is most important goal
DevConf.cz 2016 12/21
Fencing flow steps
1. SSH Soft Fencing– Attempt to restart VDSM using SSH connection
2. Kdump Detection– Detect if host is dumping and wait until it finishes
dumping to preserve kdump data
3. Power Management Restart– Restart the host using power management interface
DevConf.cz 2016 13/21
Fencing in real life
DevConf.cz 2016 14/21
VDSM crashed
Engine
Fence Proxy Host
Non Responding Host
- SSH Soft Fencing
DevConf.cz 2016 15/21
Link failure - simple network configuration
Engine
Fence Proxy Host
Non Responding Host
- SSH Soft Fencing
X
- Power Management Restart - Fence Agent Call
DevConf.cz 2016 16/21
Host is dumping
Engine
Fence Proxy Host
Dumping Host - SSH Soft Fencing - Host starts dumping - notification to engine
- Host finished dumping - notification to engine
DevConf.cz 2016 17/21
Link failure - advanced network configuration
Engine
Fence Proxy Host (cluster 2)
Non Responding Host (cluster 1)
- SSH Soft Fencing - Power Management Restart - Fence Agent Call
xStorage
DevConf.cz 2016 18/21
Cluster Fencing Policy
DevConf.cz 2016 19/21
Future plans
DevConf.cz 2016 20/21
Fencing – Future plans
● Storage fencing
● Detection of hardware failures