NOTE: If a simple resource is down on a particular node, it is down on that node for all the packages using it whereas, in case of an extended resource the resource may be up on a node for a particular package and down for another package, since it is dependent on the generic_resource_up_criteria.

Additionally, in a running package configured with a generic resource:

Any failure of generic resource of evaluation type "before_package_start" configured in a package will not disable the node switching for the package.

Any failure of generic resource of evaluation type "during_package_start" configured in a package will disable the node switching for the package.

“Choosing Switching and Failover Behavior” (page 132) provides advice on choosing appropriate failover behavior.

Service Restarts

You can allow a service to restart locally following a failure. To do this, you indicate a number of restarts for each service in the package control script. When a service starts, the variable RESTART_COUNT is set in the service’s environment. The service, as it executes, can examine this variable to see whether it has been restarted after a failure, and if so, it can take appropriate action such as cleanup.

Network Communication Failure

An important element in the cluster is the health of the network itself. As it continuously monitors the cluster, each node listens for heartbeat messages from the other nodes confirming that all nodes are able to communicate with each other. If a node does not hear these messages within the configured amount of time, a node timeout occurs; see “What Happens when a Node Times Out” (page 88).

Responses to Failures

91

Page 91
Image 91
HP Serviceguard manual Service Restarts, Network Communication Failure