Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

Post on 08-Feb-2017

499 views 1 download

Transcript of Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 1/21

Host fencing in oVirt

Fixing the unknown and allowingVMs to be highly available

Martin Peřina

Software Engineer at Red Hat

DevConf.cz 2016 2/21

Agenda

● Introduction

● Fencing in real life

● Future plans

DevConf.cz 2016 3/21

Introduction

DevConf.cz 2016 4/21

oVirt architecture

Engine

VDSM

VDSM

Storage

Cluster

Data Center

DevConf.cz 2016 5/21

Terminology

● Host - physical server to run hypervisor on

● Cluster - set of hosts with same architecture/capabilities to enable VM migrations between those hosts

● Data Center - set of clusters and storage

● Highly Available VM - VM which is automatically restarted on different host in case of failure

DevConf.cz 2016 6/21

Terminology

● Power Management Interface - interface of the hosts that allow to perform PM operations on it

● Fence Agent - tool that exposes PM interface of the host through common API

● Non Responsive Host - host that didn't respond to engine communication request for some time

● Fence Proxy - host on which fence agent is executed to perform power management operation for non responsive host

DevConf.cz 2016 7/21

Host detail / Power Management tab

DevConf.cz 2016 8/21

Power management operation

Engine

Fence Proxy Host

Target Host - Power Management Restart - Fence Agent Call

DevConf.cz 2016 9/21

Fence proxy selection

● Process to select a host on which fence agent will be executed

● Hosts are preferred according to their status

● Hosts are evaluated by their location:– Cluster– Data Center– Other Data Center (not by default)

● Proxy host location preference can be customized either globally or per host

DevConf.cz 2016 10/21

Power management proxy preference

DevConf.cz 2016 11/21

Fencing

● Process that tries to make non responsive host responsive again using various techniques

● Successful detection of host dumping flow or successful execution of power management stop is the only way how to ensure that VMs executed on the host are no longer alive -> those VMs can be restarted on different host

● Prevent data corruption is most important goal

DevConf.cz 2016 12/21

Fencing flow steps

1. SSH Soft Fencing– Attempt to restart VDSM using SSH connection

2. Kdump Detection– Detect if host is dumping and wait until it finishes

dumping to preserve kdump data

3. Power Management Restart– Restart the host using power management interface

DevConf.cz 2016 13/21

Fencing in real life

DevConf.cz 2016 14/21

VDSM crashed

Engine

Fence Proxy Host

Non Responding Host

- SSH Soft Fencing

DevConf.cz 2016 15/21

Link failure - simple network configuration

Engine

Fence Proxy Host

Non Responding Host

- SSH Soft Fencing

X

- Power Management Restart - Fence Agent Call

DevConf.cz 2016 16/21

Host is dumping

Engine

Fence Proxy Host

Dumping Host - SSH Soft Fencing - Host starts dumping - notification to engine

- Host finished dumping - notification to engine

DevConf.cz 2016 17/21

Link failure - advanced network configuration

Engine

Fence Proxy Host (cluster 2)

Non Responding Host (cluster 1)

- SSH Soft Fencing - Power Management Restart - Fence Agent Call

xStorage

DevConf.cz 2016 18/21

Cluster Fencing Policy

DevConf.cz 2016 19/21

Future plans

DevConf.cz 2016 20/21

Fencing – Future plans

● Storage fencing

● Detection of hardware failures

DevConf.cz 2016 21/21

THANK YOU!

http://www.ovirt.orgmperina@redhat.commperina at #ovirt (irc.oftc.net)