Arbitration for Data Integrity in Serviceguard Clusters

How Serviceguard Uses Arbitration

The cluster lock is used as a tie-breaker only for situations in which a running cluster fails and, as Serviceguard attempts to form a new cluster, the cluster is split into two sub-clusters of equal size. Each sub-cluster will attempt to acquire the cluster lock. The sub-cluster which gets the cluster lock will form the new cluster, preventing the possibility of two sub-clusters running at the same time. If the two sub-clusters are of unequal size, the sub-cluster with greater than 50% of the nodes will form the new cluster, and the cluster lock is not used.

If you have a two-node cluster, you are required to configure the cluster lock. If communications are lost between these two nodes, the node that obtains the cluster lock will take over the cluster and the other node will undergo a forced halt. Without a cluster lock, a failure of either node in the cluster will result in a forced immediate system halt of the other node, and therefore the cluster will halt.

If the cluster lock fails or is unavailable during an attempt to acquire it, the cluster will halt. You can avoid this problem by configuring the cluster’s hardware so that the cluster lock is not lost due to an event that causes a failure in another cluster component.

No Cluster Lock

Normally, you should not configure a cluster of three or fewer nodes without a cluster lock. In two-node clusters, a cluster lock is required. You may consider using no cluster lock with configurations of three or more nodes, although the decision should be affected by the fact that any cluster may require tie-breaking. For example, if one node in a three-node cluster is removed for maintenance, the cluster reforms as a two-node cluster. If a tie-breaking scenario later occurs due to a node or communication failure, the entire cluster will become unavailable.

In a cluster with four or more nodes, you may not need a cluster lock since the chance of the cluster being split into two halves of equal size is very small. However, be sure to configure your cluster to prevent the failure of exactly half the nodes at one time. For example, make sure there is no potential single point of failure such as a single LAN between equal numbers of nodes, and that you use multiple power circuits with less than half of the nodes on any single power circuit.

Cluster lock disks are not allowed in clusters of more than four nodes. A quorum server or arbitrator nodes may be employed with larger clusters, and this kind of arbitration is necessary for extended distance clusters and with MetroCluster configurations to provide disaster tolerance.

14

Page 14
Image 14
HP Serviceguard manual No Cluster Lock