Member failures
In the absence of HA, master nodes detect the failures of members by receiving regular heartbeat messages. If no heartbeat has been received for 600 seconds, the master assumes the member is dead. There are two ways to recover from this problem:
•Repair the dead host (e.g. by physically rebooting it). When the connection to the member is restored, the master will mark the member as alive again.
•Shutdown the host and instruct the master to forget about the member node using the xe
Warning:
•If you are going to use the forgotten host as a XenServer host again, perform a fresh installation of the XenServer software.
•Do not use xe
When a member XenServer host fails, there may be VMs still registered in the running state. If you are sure that the member XenServer host is definitely down, use the xe
Warning:
Incorrect use of this command can lead to data corruption. Only use this command if absolutely necessary.
Before you can start VMs on another XenServer host, you are also required to release the locks on VM storage. Each disk in an SR can only be used by one host at a time, so it is key to make the disk accessible to other XenServer hosts once a host has failed. To do so, run the following script on the pool master for each SR that contains disks of any affected VMs:
/opt/xensource/sm/resetvdis.py <host_UUID> <SR_UUID> [master]
You need only supply the third string ("master") if the failed host was the SR master — pool master or XenServer host using local storage — at the time of the crash.
Warning:
Be absolutely sure that the host is down before executing this command. Incorrect use of this command can lead to data corruption.
If you attempt to start a VM on another XenServer host before running the script above, then you will receive the following error message: VDI <UUID> already attached RW.
Master failures
Every member of a resource pool contains all the information necessary to take over the role of master if required. When a master node fails, the following sequence of events occurs:
1.If HA is enabled, another master is elected automatically.
2.If HA is not enabled, each member will wait for the master to return.
If the master comes back up at this point, it
103