HP Serviceguard manual Quorum, Split-Brain

Page 6

Arbitration for Data Integrity in Serviceguard Clusters

Cluster Membership Concepts

Quorum

Cluster re-formation takes place when there is some change in the cluster membership. In general, the algorithm for cluster re-formation requires the new cluster to achieve a cluster quorum of a strict majority (that is, more than 50%) of the nodes previously running. If both halves (exactly 50%) of a previously running cluster were allowed to re-form, there would be a split-brainsituation in which two instances of the same cluster were running.

Split-Brain

How could a split-brain situation arise? Suppose a two-node cluster experiences the loss of all network connections between the nodes. This means that cluster heartbeat ceases. Each node will then try to re-form the cluster separately. If this were allowed to occur, it would have the potential to run the same application in two different locations and to corrupt application data. In a split-brain scenario, different incarnations of an application could end up simultaneously accessing the same disks. One incarnation might well be initiating recovery activity while the other is modifying the state of the disks. Serviceguard’s quorum requirement is designed to prevent a split-brain situation.

How likely is a split-brain situation? Partly, the answer to this depends on the types of intra-node communication the cluster is using: some types are more robust than others. For example, the use of the older coaxial cable technology makes communication loss a significant problem. In that technology, the loss of termination would frequently result in the loss of an entire LAN. On the other hand, the use of redundant groups of current Ethernet hubs makes the loss of communication between nodes extremely unlikely, but it is still possible. In general, with mission-critical data, it is worth the cost to eliminate even small risks associated with split-brain scenarios.

Asplit-brain situation is more likely to occur in a two-node cluster than in a larger local cluster that splits into two even-sized sub-groups. Split-brain is also more likely to occur in a disaster-tolerant cluster where separate groups of nodes are located in different data centers.

6

Image 6
Contents Arbitration For Data Integrity Serviceguard Clusters Manufacturing Part Number B3936-90078 JulyLegal Notices Arbitration for Data Integrity in Serviceguard Clusters Cluster Membership Concepts MembershipCluster Membership Concepts Quorum Split-BrainTie-Breaking To Arbitrate or Not to Arbitrate No Arbitration-Multiple PathsMultiple Heartbeat Failures No Arbitration-Multiple Media Single Node FailureMultiple Paths with Different Media Additional Multiple Paths with Different MediaNo Arbitration-Risks How Serviceguard Uses Arbitration Cluster StartupStartup and Re-Formation Dynamic Cluster Re-Formation Cluster Quorum and Cluster LockingCluster Lock No Cluster Lock Lock Requirements Use of a Lock Disk as the Cluster Lock Lock Disk OperationSingle Cluster Lock Dual Cluster LockUse of a Lock LUN as the Cluster Lock Oot Irror Lock LUN OperationUse of a Quorum Server as the Cluster Lock Quorum Server OperationSetting up the Quorum Server Running the Quorum ServerSpecifying a Quorum Server Quorum Server Status and StateViewing Quorum Server Status and State Viewing Quorum Server System DataUse of Arbitrator Nodes Use of Arbitrator NodeArbitration in Disaster-Tolerant Clusters Extended Distance ClustersMetropolitan Clusters Arbitrator Nodes Quorum ServerContinental Clusters Use of Dual Lock Disks in Extended Distance ClustersDisk area is not mirrored Arbitration for Data Integrity in Serviceguard Clusters Arbitration Advantages Disadvantages Mode SummaryComparison of Different Arbitration Methods Arbitration for Data Integrity in Serviceguard Clusters Summary
Related manuals
Manual 407 pages 39.81 Kb