Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

21
DevConf.cz 2016 1/21 Host fencing in oVirt Fixing the unknown and allowing VMs to be highly available Martin Peřina Software Engineer at Red Hat

Transcript of Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

Page 1: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 1/21

Host fencing in oVirt

Fixing the unknown and allowingVMs to be highly available

Martin Peřina

Software Engineer at Red Hat

Page 2: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 2/21

Agenda

● Introduction

● Fencing in real life

● Future plans

Page 3: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 3/21

Introduction

Page 4: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 4/21

oVirt architecture

Engine

VDSM

VDSM

Storage

Cluster

Data Center

Page 5: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 5/21

Terminology

● Host - physical server to run hypervisor on

● Cluster - set of hosts with same architecture/capabilities to enable VM migrations between those hosts

● Data Center - set of clusters and storage

● Highly Available VM - VM which is automatically restarted on different host in case of failure

Page 6: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 6/21

Terminology

● Power Management Interface - interface of the hosts that allow to perform PM operations on it

● Fence Agent - tool that exposes PM interface of the host through common API

● Non Responsive Host - host that didn't respond to engine communication request for some time

● Fence Proxy - host on which fence agent is executed to perform power management operation for non responsive host

Page 7: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 7/21

Host detail / Power Management tab

Page 8: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 8/21

Power management operation

Engine

Fence Proxy Host

Target Host - Power Management Restart - Fence Agent Call

Page 9: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 9/21

Fence proxy selection

● Process to select a host on which fence agent will be executed

● Hosts are preferred according to their status

● Hosts are evaluated by their location:– Cluster– Data Center– Other Data Center (not by default)

● Proxy host location preference can be customized either globally or per host

Page 10: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 10/21

Power management proxy preference

Page 11: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 11/21

Fencing

● Process that tries to make non responsive host responsive again using various techniques

● Successful detection of host dumping flow or successful execution of power management stop is the only way how to ensure that VMs executed on the host are no longer alive -> those VMs can be restarted on different host

● Prevent data corruption is most important goal

Page 12: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 12/21

Fencing flow steps

1. SSH Soft Fencing– Attempt to restart VDSM using SSH connection

2. Kdump Detection– Detect if host is dumping and wait until it finishes

dumping to preserve kdump data

3. Power Management Restart– Restart the host using power management interface

Page 13: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 13/21

Fencing in real life

Page 14: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 14/21

VDSM crashed

Engine

Fence Proxy Host

Non Responding Host

- SSH Soft Fencing

Page 15: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 15/21

Link failure - simple network configuration

Engine

Fence Proxy Host

Non Responding Host

- SSH Soft Fencing

X

- Power Management Restart - Fence Agent Call

Page 16: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 16/21

Host is dumping

Engine

Fence Proxy Host

Dumping Host - SSH Soft Fencing - Host starts dumping - notification to engine

- Host finished dumping - notification to engine

Page 17: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 17/21

Link failure - advanced network configuration

Engine

Fence Proxy Host (cluster 2)

Non Responding Host (cluster 1)

- SSH Soft Fencing - Power Management Restart - Fence Agent Call

xStorage

Page 18: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 18/21

Cluster Fencing Policy

Page 19: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 19/21

Future plans

Page 20: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 20/21

Fencing – Future plans

● Storage fencing

● Detection of hardware failures

Page 21: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

DevConf.cz 2016 21/21

THANK YOU!

http://[email protected] at #ovirt (irc.oftc.net)