HP Serviceguard manual Cluster Membership Concepts

Page 5

Arbitration for Data Integrity in Serviceguard Clusters

Cluster Membership Concepts

When the cluster is part of a disaster tolerant solution that has nodes located in more than one data center, loss of communication can easily happen unless redundant networking is implemented with different routing for the redundant links.

In all the above cases, the loss of heartbeat communication with other nodes in the cluster causes the re-formation protocol to be carried out. This means that nodes attempt to communicate with one another to rebuild the membership list. In case (1) above, the running nodes choose a coordinator and re-form the cluster with one less node. But in case (3), there are two sets of running nodes, and the nodes in each set attempt to communicate with the other nodes in the same set to rebuild the membership list. The result is that the two sets of nodes build different lists for membership in the new cluster. Now, if both sets of nodes were allowed to re-form the cluster, there would be two instances of the same cluster running in two locations. In this situation, the same application could start up in two different places and modify data inappropriately. This is an example of data corruption.

How does Serviceguard handle cases like the above partitioning of the cluster? The process is called arbitration. In the Serviceguard user’s manual, the process is known as tie-breaking, because it is a means to decide on a definitive cluster membership when different competing groups of cluster nodes are independently trying to re-form a cluster.

At cluster startup time, nodes join the cluster, and a tally of the cluster membership is created and maintained in memory on all cluster nodes. Occasionally, changes in membership occur. For example, when the administrator halts a node, the node leaves the cluster, and the cluster membership data in memory is changed accordingly.

When a node crashes, the other nodes become aware of this by the fact that no cluster heartbeat is received from that node after the expected interval. Thus, the transmission and receipt of heartbeat messages is essential for keeping the membership data continuously up-to-date. Why is this membership data important? In Serviceguard, a basic package, containing an application and its data, can only be allowed to run on one node at a time. Therefore, the cluster needs to know what nodes are running in order to tell whether it is appropriate or not to start a package, and where the packages should be started. A package should not be started if it is already running; it should be started on an alternate node if the primary node is down; and so forth.

5

Image 5
Contents Manufacturing Part Number B3936-90078 July Arbitration For Data Integrity Serviceguard ClustersLegal Notices Arbitration for Data Integrity in Serviceguard Clusters Membership Cluster Membership ConceptsCluster Membership Concepts Split-Brain QuorumTie-Breaking Multiple Heartbeat Failures To Arbitrate or Not to ArbitrateNo Arbitration-Multiple Paths Single Node Failure No Arbitration-Multiple MediaAdditional Multiple Paths with Different Media Multiple Paths with Different MediaNo Arbitration-Risks Startup and Re-Formation How Serviceguard Uses ArbitrationCluster Startup Cluster Lock Dynamic Cluster Re-FormationCluster Quorum and Cluster Locking No Cluster Lock Lock Requirements Lock Disk Operation Use of a Lock Disk as the Cluster LockDual Cluster Lock Single Cluster LockUse of a Lock LUN as the Cluster Lock Lock LUN Operation Oot IrrorQuorum Server Operation Use of a Quorum Server as the Cluster LockRunning the Quorum Server Setting up the Quorum ServerQuorum Server Status and State Specifying a Quorum ServerViewing Quorum Server System Data Viewing Quorum Server Status and StateUse of Arbitrator Node Use of Arbitrator NodesMetropolitan Clusters Arbitration in Disaster-Tolerant ClustersExtended Distance Clusters Quorum Server Arbitrator NodesUse of Dual Lock Disks in Extended Distance Clusters Continental ClustersDisk area is not mirrored Arbitration for Data Integrity in Serviceguard Clusters Comparison of Different Arbitration Methods Arbitration Advantages Disadvantages ModeSummary Arbitration for Data Integrity in Serviceguard Clusters Summary
Related manuals
Manual 407 pages 39.81 Kb