Arbitration for Data Integrity in Serviceguard Clusters

Arbitration in Disaster-Tolerant Clusters

Note that if the first lock disk is located in the first data center when the heartbeat is lost, the first data center will normally obtain the lock first because it is closest to the disk. Thus in this scenario, the first data center will re-form the cluster.

3.If a node in one data center is successful at obtaining the first lock disk but the disk link is not viable because the other data center is down, then the first data center will not be able to obtain the second lock disk, but because the lock was not refused, it will still be allowed to re-form the cluster. This is the expected behavior when there is a disaster.

4.If there is a loss of both heartbeat and disk link, there is a danger of split brain because each sub-cluster, attempting to acquire both lock disks, is able to obtain the lock in its own data center, and is not refused the other lock. It is important to minimize or eliminate this slight danger by ensuring that data and heartbeat links are separately routed between data centers.

NOTE

A dual lock disk configuration does not provide a redundant cluster lock.

 

In fact, the dual lock is a compound lock, and both disks have to

 

participate in the protocol of lock acquisition by the two equal-sized sets

 

of nodes. Even when mirrored LVM is used via MirrorDisk/UX, the lock

 

disk area is not mirrored.

 

At cluster formation time, a set of nodes must gain access to one disk,

 

and must either gain access to the other disk or not be denied access to it.

 

(“Not being denied” occurs when a disk is not accessible to a set of nodes.)

 

The group of nodes that gains access to at least one disk and is not

 

denied access by any disk is allowed to form the new cluster.

 

If one of the dual lock disks fails, Serviceguard will detect this when it

 

carries out periodic checking, and it will write a message to the syslog

 

file. After the loss of one of the lock disks, if the failure of a cluster node

 

results in the need for arbitration, the cluster will go down.

 

 

28

Page 28
Image 28
HP Serviceguard manual Disk area is not mirrored