23
High Availability
This section explains the XenServer implementation of virtual machine high availability (HA), and how to
configure it using the xe CLI.
Note:
XenServer HA is only available with XenServer Advanced edition or above. To find out about XenServer
editions, visit the Citrix website here.

HA Overview

When HA is enabled, XenServer continually monitors the health of the hosts in a pool. The HA mechanism
automatically moves protected VMs to a healthy host if the current VM host fails. Additionally, if the host
that fails is the master, HA selects another host to take over the master role automatically, so that you can
continue to manage the XenServer pool.
To absolutely guarantee that a host is unreachable, a resource pool configured for high-availability uses
several heartbeat mechanisms to regularly check up on hosts. These heartbeats go through both the storage
interfaces (to the Heartbeat SR) and the networking interfaces (over the management interfaces). Both of
these heartbeat routes can be multi-homed for additional resilience to prevent false positives.
XenServer dynamically maintains a failover plan which details what to do if a set of hosts in a pool fail at any
given time. An important concept to understand is the host failures to tolerate value, which is defined as part
of HA configuration. This determines the number of failures that is allowed without any loss of service. For
example, if a resource pool consisted of 16 hosts, and the tolerated failures is set to 3, the pool calculates a
failover plan that allows for any 3 hosts to fail and still be able to restart VMs on other hosts. If a plan cannot
be found, then the pool is considered to be overcommitted. The plan is dynamically recalculated based on
VM lifecycle operations and movement. Alerts are sent (either through XenCenter or e-mail) if changes (for
example the addition on new VMs to the pool) cause your pool to become overcommitted.

Overcommitting

A pool is overcommitted if the VMs that are currently running could not be restarted elsewhere following a
user-defined number of host failures.
This would happen if there was not enough free memory across the pool to run those VMs following failure.
However there are also more subtle changes which can make HA guarantees unsustainable: changes to
Virtual Block Devices (VBDs) and networks can affect which VMs may be restarted on which hosts. Currently
it is not possible for XenServer to check all actions before they occur and determine if they will cause
violation of HA demands. However an asynchronous notification is sent if HA becomes unsustainable.

Overcommitment Warning

If you attempt to start or resume a VM and that action causes the pool to be overcommitted, a warning
alert is raised. This warning is displayed in XenCenter and is also available as a message instance through
the Xen API. The message may also be sent to an email address if configured. You will then be allowed
to cancel the operation, or proceed anyway. Proceeding causes the pool to become overcommitted. The
amount of memory used by VMs of different priorities is displayed at the pool and host levels.

Host Fencing

If a server failure occurs such as the loss of network connectivity or a problem with the control stack
is encountered, the XenServer host self-fences to ensure that the VMs are not running on two servers
simultaneously. When a fence action is taken, the server immediately and abruptly restarts, causing all VMs