nodes are listed in decreasing order of preference. In a Metrocluster configuration, the node names list in the package configuration is ordered by site. Node names from the same site are listed sequentially. The node names of the site with the primary node must be specified first in the list. The site with the primary node is referred as primary site and the other site is referred as alternate site in this section.

The Metrocluster package will failover to the alternate site, only when there are no other nodes available to run it on the primary site. However once a package has failed over to the alternate site, any subsequent failure can result in the package being failed back to an available node on the primary site. If the failback policy is set to "automatic", the Metrocluster package is moved back to the primary site even without a failure, as soon as the primary node is capable of running it. In both cases, the package is unnecessarily moved back to the primary site even when there are nodes in the alternate site that are capable of running it.

Starting from HP Serviceguard version A.11.18, 1 a new Serviceguard failover policy

site_preferred enables operators to optimize the situation by avoiding unnecessary movement of workloads across sites. When a package is configured with the site_preferred failover policy, Serviceguard uses a site aware evaluation method to select target nodes during a failover. Nodes within the site that the package last ran on are considered before considering nodes on the other site.

Starting from HP Serviceguard version A.11.20, a new Serviceguard failover policy:

site_preferred_manual is introduced for failover packages configured in a Metrocluster. This failover policy provides automatic failover of packages within a site and manual failover across sites. This policy is supported on only Metrocluster configured sites.

During a failover, the HP Serviceguard moves the package to the next available node from the list of NODE_NAME entries that belong to the site that the package last ran on. If there is no node available in the list of NODE_NAME entries for a SITE, then the package does not automatically failover to the other site. In such instances, manual intervention is required to start the package. The package can either be started on the same site after cleaning up the failed nodes, or it can be started on the nodes in the other site. To start the package, run the following command:

# cmrunpkg -n <node_name> <package_name>

where: <node_name> is the name of any node in either of the sites.

To use either of these policies, the underlying cluster must be configured with sites and each cluster node must be associated to a site. The Serviceguard cluster configuration file includes the following attributes to define sites:

SITE_NAME

To define a unique name for a site in the cluster.

SITE

To associate a node to a site, specify the site name using the SITE keyword under the node's NODE_NAME definition.

Following is a sample of the site definition in a Serviceguard cluster configuration file:

SITE_NAME san_francisco SITE_NAME san_jose NODE_NAME SFO_1

SITE san_francisco

.....

NODE_NAME SFO_2 SITE san_francisco

........

NODE_NAME SJC_1 SITE san_jose

.......

1. HP Serviceguard A.11.18 requires additional patches to enable SITE definitions and site_preferred policy.

Designing a Disaster Recovery Architecture for use with Metrocluster Products

27