Quorum, Split-Brain | HP Serviceguard instruction

Arbitration for Data Integrity in Serviceguard Clusters

Cluster Membership Concepts

Quorum

Cluster re-formation takes place when there is some change in the cluster membership. In general, the algorithm for cluster re-formation requires the new cluster to achieve a cluster quorum of a strict majority (that is, more than 50%) of the nodes previously running. If both halves (exactly 50%) of a previously running cluster were allowed to re-form, there would be a split-brainsituation in which two instances of the same cluster were running.

Split-Brain

How could a split-brain situation arise? Suppose a two-node cluster experiences the loss of all network connections between the nodes. This means that cluster heartbeat ceases. Each node will then try to re-form the cluster separately. If this were allowed to occur, it would have the potential to run the same application in two different locations and to corrupt application data. In a split-brain scenario, different incarnations of an application could end up simultaneously accessing the same disks. One incarnation might well be initiating recovery activity while the other is modifying the state of the disks. Serviceguard’s quorum requirement is designed to prevent a split-brain situation.

How likely is a split-brain situation? Partly, the answer to this depends on the types of intra-node communication the cluster is using: some types are more robust than others. For example, the use of the older coaxial cable technology makes communication loss a significant problem. In that technology, the loss of termination would frequently result in the loss of an entire LAN. On the other hand, the use of redundant groups of current Ethernet hubs makes the loss of communication between nodes extremely unlikely, but it is still possible. In general, with mission-critical data, it is worth the cost to eliminate even small risks associated with split-brain scenarios.

Asplit-brain situation is more likely to occur in a two-node cluster than in a larger local cluster that splits into two even-sized sub-groups. Split-brain is also more likely to occur in a disaster-tolerant cluster where separate groups of nodes are located in different data centers.