Member failures

In the absence of HA, master nodes detect the failures of members by receiving regular heartbeat messages. If no heartbeat has been received for 600 seconds, the master assumes the member is dead. There are two ways to recover from this problem:

Repair the dead host (e.g. by physically rebooting it). When the connection to the member is restored, the master will mark the member as alive again.

Shutdown the host and instruct the master to forget about the member node using the xe host-forgetCLI command. Once the member has been forgotten, all the VMs which were running there will be marked as offline and can be restarted on other XenServer hosts. Note it is very important to ensure that the XenServer host is actually offline, otherwise VM data corruption might occur. Be careful not to split your pool into multiple pools of a single host by using xe host-forget , since this could result in them all mapping the same shared storage and corrupting VM data.

Warning:

If you are going to use the forgotten host as a XenServer host again, perform a fresh installation of the XenServer software.

Do not use xe host-forgetcommand if HA is enabled on the pool. Disable HA first, then forget the host, and then re-enable HA.

When a member XenServer host fails, there may be VMs still registered in the running state. If you are sure that the member XenServer host is definitely down, use the xe vm-reset-powerstateCLI command to set the power state of the VMs to halted. See the section called “vm-reset-powerstate”for more details.

Warning:

Incorrect use of this command can lead to data corruption. Only use this command if absolutely necessary.

Before you can start VMs on another XenServer host, you are also required to release the locks on VM storage. Each disk in an SR can only be used by one host at a time, so it is key to make the disk accessible to other XenServer hosts once a host has failed. To do so, run the following script on the pool master for each SR that contains disks of any affected VMs:

/opt/xensource/sm/resetvdis.py <host_UUID> <SR_UUID> [master]

You need only supply the third string ("master") if the failed host was the SR master — pool master or XenServer host using local storage — at the time of the crash.

Warning:

Be absolutely sure that the host is down before executing this command. Incorrect use of this command can lead to data corruption.

If you attempt to start a VM on another XenServer host before running the script above, then you will receive the following error message: VDI <UUID> already attached RW.

Master failures

Every member of a resource pool contains all the information necessary to take over the role of master if required. When a master node fails, the following sequence of events occurs:

1.If HA is enabled, another master is elected automatically.

2.If HA is not enabled, each member will wait for the master to return.

If the master comes back up at this point, it re-establishes communication with its members, and operation returns to normal.

103

Page 121
Image 121
Citrix Systems 6 manual Member failures, Master failures, Opt/xensource/sm/resetvdis.py hostUUID Sruuid master