Arbitration for Data Integrity in Serviceguard Clusters
How Serviceguard Uses Arbitration
Dynamic Cluster Re-Formation
A dynamic re-formation is a temporary change in cluster membership that takes place as nodes join or leave a running cluster. Re-formation differs from reconfiguration, which is a permanent modification of the configuration files. Re-formation of the cluster occurs under the following conditions (not a complete list):
•An SPU or network failure was detected on an active node.
•An inactive node wants to join the cluster. The cluster manager daemon has been started on that node.
•The system administrator halted a node.
•A node halts because of a package failure.
•A node halts because of a service failure.
•Heavy network traffic prohibited the heartbeat signal from being received by the cluster.
•The heartbeat network failed, and another network is not configured to carry heartbeat.
Typically, re-formation results in a cluster with a different composition. The new cluster may contain fewer or more nodes than in the previous incarnation of the cluster.
Cluster Quorum and Cluster Locking
Recall that the algorithm for cluster re-formation requires a cluster quorum of a strict majority (that is, more than 50%) of the nodes previously running. If both halves (exactly 50%) of a previously running cluster were allowed to re-form, there would be a split-brainsituation in which two instances of the same cluster were running.
Cluster Lock
Although a cluster quorum of more than 50% is generally required, Serviceguard allows exactly 50% of the previously running nodes to re-form as a new cluster provided that the other 50% of the previously running nodes do not also re-form.This is guaranteed by the use of an arbiter or tie-breaker to choose between the two equal-sized node groups, allowing one group to form the cluster and forcing the other group to shut down. This type of arbitration is known as a cluster lock.