NOTE:

For most clusters that use an LVM cluster lock or lock LUN, a minimum MEMBER_TIMEOUT of 14 seconds is appropriate.

For most clusters that use a MEMBER_TIMEOUT value lower than 14 seconds, a quorum server is more appropriate than a lock disk or lock LUN.

The cluster will fail if the time it takes to acquire the disk lock exceeds 0.2 times the MEMBER_TIMEOUT. This means that if you use a disk-based quorum device (lock disk or lock LUN), you must be certain that the nodes in the cluster, the connection to the disk, and the disk itself can respond quickly enough to perform 10 disk writes within 0.2 times the MEMBER_TIMEOUT.

With the lowest supported value of 3 seconds, a failover time of 4 to 5 seconds can be achieved.

NOTE: The failover estimates provided here apply to the Serviceguard component of failover; that is, the package is expected to be up and running on the adoptive node in this time, but the application that the package runs may take more time to start.

Keep the following guidelines in mind when deciding how to set the value.

Guidelines: You need to decide whether it's more important for your installation to have fewer (but slower) cluster re-formations, or faster (but possibly more frequent) re-formations:

To ensure the fastest cluster re-formations, use the minimum value applicable to your cluster. But keep in mind that this setting will lead to a cluster re-formation, and to the node being removed from the cluster and rebooted, if a system hang or network load spike prevents the node from sending a heartbeat signal within the MEMBER_TIMEOUT value. More than one node could be affected if, for example, a network event such as a broadcast storm caused kernel interrupts to be turned off on some or all nodes while the packets are being processed, preventing the nodes from sending and processing heartbeat messages.

See “Cluster Re-formations Caused by

MEMBER_TIMEOUT Being Set too Low” (page 333) for troubleshooting information.

For fewer re-formations, use a setting in the range of 10 to 25 seconds (10,000,000 to 25,000,000 microseconds), keeping in mind that a value larger than the default will lead to slower re-formations than the default. A value in this range is appropriate for most installations

120 Planning and Documenting an HA Cluster

Page 120
Image 120
HP Serviceguard manual Planning and Documenting an HA Cluster