HP Serviceguard manual Cluster Membership Concepts

Page 4

Arbitration for Data Integrity in Serviceguard Clusters

Cluster Membership Concepts

Cluster Membership Concepts

What is arbitration? Why is it necessary? When and how is it carried out? To answer these questions, it is necessary to explain a number of clustering concepts that are central to the processes of cluster formation and re-formation. These concepts are membership, quorum, split-brain, and tie-breaking.

Membership

A cluster is a networked collection of nodes. The key to success in controlling the location of applications in the cluster and ensuring there is no inappropriate duplication is maintaining a well-defined cluster node list. When the cluster starts up, all the nodes communicate and build this membership list, a copy of which is in the memory of every node. The list is validated continuously as the cluster runs; this is done by means of heartbeat messages that are transmitted among all the nodes. As nodes enter and leave the cluster, the list is changed in memory. Changes in membership can result from an operator’s issuing a command to run or halt a node, or from system events that cause a node to halt, reboot, or crash. Some of these events are routine, and some may be unexpected. There are frequent cases in cluster operation when cluster membership is changing and when the cluster software must determine which node in the cluster should run an application.

How does the cluster software tell where an application should run? In a running cluster, when one system cannot communicate with the others for a significant amount of time, there can be several possible reasons:

1.The node has crashed.

2.The node is experiencing a kernel hang, and processing has stopped.

3.The cluster is partitioned because of a network problem. Either all the network cards connecting the node to the rest of the cluster have failed, or all the cables connecting the cards to the network have failed, or there has been a failure of the network itself.

It is often impossible for the cluster manager software to distinguish (1) from (2) and (3), and therein lies a problem, because in case (1), it is safe to restart the application on another node in the cluster, but in (2) and (3), it is not safe.

4

Image 4
Contents Arbitration For Data Integrity Serviceguard Clusters Manufacturing Part Number B3936-90078 JulyLegal Notices Arbitration for Data Integrity in Serviceguard Clusters Cluster Membership Concepts MembershipCluster Membership Concepts Quorum Split-BrainTie-Breaking No Arbitration-Multiple Paths To Arbitrate or Not to ArbitrateMultiple Heartbeat Failures No Arbitration-Multiple Media Single Node FailureMultiple Paths with Different Media Additional Multiple Paths with Different MediaNo Arbitration-Risks Cluster Startup How Serviceguard Uses ArbitrationStartup and Re-Formation Cluster Quorum and Cluster Locking Dynamic Cluster Re-FormationCluster Lock No Cluster Lock Lock Requirements Use of a Lock Disk as the Cluster Lock Lock Disk OperationSingle Cluster Lock Dual Cluster LockUse of a Lock LUN as the Cluster Lock Oot Irror Lock LUN OperationUse of a Quorum Server as the Cluster Lock Quorum Server OperationSetting up the Quorum Server Running the Quorum ServerSpecifying a Quorum Server Quorum Server Status and StateViewing Quorum Server Status and State Viewing Quorum Server System DataUse of Arbitrator Nodes Use of Arbitrator NodeExtended Distance Clusters Arbitration in Disaster-Tolerant ClustersMetropolitan Clusters Arbitrator Nodes Quorum ServerContinental Clusters Use of Dual Lock Disks in Extended Distance ClustersDisk area is not mirrored Arbitration for Data Integrity in Serviceguard Clusters Summary Arbitration Advantages Disadvantages ModeComparison of Different Arbitration Methods Arbitration for Data Integrity in Serviceguard Clusters Summary
Related manuals
Manual 407 pages 39.81 Kb