High Availability, HA Overview, Overcommitting

Warning:

Citrix highly recommends rebooting all XenServers prior to installing an update and then verifying their configuration. This is because some configuration changes only take effect when a XenServer is rebooted, so the reboot may uncover configuration problems that would cause the update to fail.

To prepare a XenServer host in a pool for maintenance operations using the CLI

1.Run the command

xe host-disable uuid=<xenserver_host_uuid> xe host-evacuate uuid=<xenserver_host_uuid>

This will disable the XenServer host and then migrate any running VMs to other XenServer hosts in the pool.

2.Perform the desired maintenance operation.

3.Once the maintenance operation is completed, enable the XenServer host: xe host-enable

Restart any halted VMs and/or resume any suspended VMs.

High Availability

This section explains the XenServer implementation of virtual machine high availability (HA), and how to configure it using the xe CLI.

Note:

XenServer HA is only available with XenServer Advanced edition or above. To find out about XenServer editions, visit the Citrix website here.

HA Overview

When HA is enabled, XenServer continually monitors the health of the hosts in a pool. The HA mechanism automatically moves protected VMs to a healthy host if the current VM host fails. Additionally, if the host that fails is the master, HA selects another host to take over the master role automatically, so that you can continue to manage the XenServer pool.

To absolutely guarantee that a host is unreachable, a resource pool configured for high-availability uses several heartbeat mechanisms to regularly check up on hosts. These heartbeats go through both the storage interfaces (to the Heartbeat SR) and the networking interfaces (over the management interfaces). Both of these heartbeat routes can be multi-homed for additional resilience to prevent false positives.

XenServer dynamically maintains a failover plan which details what to do if a set of hosts in a pool fail at any given time. An important concept to understand is the host failures to tolerate value, which is defined as part of HA configuration. This determines the number of failures that is allowed without any loss of service. For example, if a resource pool consisted of 16 hosts, and the tolerated failures is set to 3, the pool calculates a failover plan that allows for any 3 hosts to fail and still be able to restart VMs on other hosts. If a plan cannot be found, then the pool is considered to be overcommitted. The plan is dynamically recalculated based on VM lifecycle operations and movement. Alerts are sent (either through XenCenter or e-mail) if changes (for example the addition on new VMs to the pool) cause your pool to become overcommitted.

Overcommitting

A pool is overcommitted if the VMs that are currently running could not be restarted elsewhere following a user- defined number of host failures.

This would happen if there was not enough free memory across the pool to run those VMs following failure. However there are also more subtle changes which can make HA guarantees unsustainable: changes to Virtual

Citrix Systems 6 manual High Availability, HA Overview, Overcommitting

Models: 6

High Availability

HA Overview

Overcommitting