CFS primary placement for failover applications

For failover applications that only run on one node at the time, the location of the CFS primary is also important. The best performance is expected if the CFS primaries of all file systems used by an application is using are co-located with the application package on the same node and not evenly distributed among all cluster nodes.

Shutdown for single-instance failover applications in CFS clusters

Single-instance failover applications might benefit from a CFS cluster due to reduced failover times because some of the tasks necessary during the failover process in an VxVM/VxFS environment are not needed with CVM/CFS (for example: importing disk groups, mounting file systems). The concurrent availability of the CFS on all cluster nodes can also introduce new challenges.

In a non-CFS cluster, the data of single-instance failover applications is moving with the application package from node to node. The volumes are exclusively activated only on the node that currently runs the package, which prevents the application data from being altered concurrently by multiple nodes. The Serviceguard package control scripts enable that on package failure or shutdown, the storage is made inaccessible on the node the package failed or was halted.

This behavior changes with CFS. The application data is accessible on multiple nodes at the same time even if an application package is not running on a specific node or not running on any node in the cluster. If an application does not shut down completely on one node before it is started on another node, it can result in data corruption. To address this potential issue, the start and stop functions of single-instance failover applications require special attention in a CFS cluster.

Application shutdown procedures require an analysis of their robustness when CFS is used to store the application data. HP provides the following recommendations for robust application shutdown within the cluster:

Before the application shutdown procedure returns the control back to the Serviceguard control script, it should verify the success, or report a failure if some of the application processes had not died. A failure code directs Serviceguard to disable the global AUTO_RUN flag of this application package and prevents it from starting up on any other node without operator intervention.

Serviceguard package control scripts use the fuser command to terminate all processes having a mount point open before they unmount exclusively mounted file systems and deactivates the exclusively activated volume groups. The same fuser command can be used in the application-specific shutdown function.

You can make CFS volumes temporarily unavailable on application shutdown by specifying the mount point and disk group within the same cmhaltpkg command as the application package. However, this process might not be suitable in all cases because the file system may require operator intervention to be made available again on the node where the package was halted.

As a last resort, the NODE_FAIL_FAST_ENABLED flag can be set to YES to trigger Serviceguard to halt the node on a package failure. This process works when a package fails or when it fails to shut down.

Serviceguard package management provides interfaces and mechanisms to integrate robust application shutdown procedures. It is up to the application integrator to choose and utilize them to prevent unwanted concurrent access to the application data.

The following pseudo code lists the major components of a robust shutdown procedure that should be added to the customer_defined_run_cmds/customer_defined_halt_cmds functions of the package control script.

16