HP Serviceguard Continentalcluster 252

Switching to the Recovery Cluster in Case of Disaster

It is vital the administrator verify that recovery is needed after receiving a cluster alert or alarm. Network failures may produce false alarms. After validating a failure, start the recovery process using the cmrecovercl [-f]command. Note the following:

•During an alert, the cmrecovercl will not start the recovery packages unless the -foption is used.

•During an alarm, the cmrecovercl will start the recovery packages without the -foption.

•When there is neither an alert nor an alarm condition, cmrecovercl cannot start the recovery packages on the target disk site. This condition applies not only when no alert or alarm was issued, but also applies to the situation where there was an alert or alarm, but the source disk site recovered and its current status is Up.

Failover to Recovery Site

After reception of the Continentalcluster’s alerts and alarm, the administrators at the recovery site follow the prescribed processes and recovery procedures to start the protected applications on the target disk site.

The recovery package control script will evaluate the status of the DR group used by the package, and will do the failover of the DR group to the EVA in the recovery site. This means after the failover was successful, the DR group in the recovery site's EVA will be source and accessible with read/write mode.

NOTE: If the Continuous Access links between the two EVAs are down, the recovery package will only start up if one of the following conditions are true:

•The package failover policy variable “DT_APPLICATION_STARTUP_POLICY” in the package’s environment file is set to “Availability_Preferred”.

•The package failover policy variable “DT_APPLICATION_STARTUP_POLICY” in the package's environment file is set to “ Data_Currency_Preferred”, and a FORCEFLAG file exits in the package directory.

After the recovery package is up and running, the EVA in the recovery site will have more current data than the one in the primary site.

Failover Scenarios

The goal of HP Continentalclusters is to maximize system and application availability. However, even systems configured with Continentalclusters can experience hardware failures at the primary site or the recovery site, as well as the hardware or networking failures connecting the two sites. The following scenarios addresses some of those failures and suggests recovery approaches applicable to environments using data replication provided by HP StorageWorks EVA series disk arrays and Continuous Access.

Scenario 1

The primary site has lost power for a prolonged time, including backup power (UPS), to both the systems and disk arrays that make up the Serviceguard Cluster at the primary site. There is no loss of data on either the EVA disk array or the operating systems of the systems at the primary site.

Failback to the Primary Site

In this scenario, the EVA in the primary site is down due to the loss of power; therefore, the storage configuration information and the application data prior to power failure remain intact in the EVA. When the primary site’s power is restored, the EVA is up and running, and Continuous Access links are up, Continuous Access EVA software will automatically resynchronize the data from the recovery site's EVA back to the primary site’s EVA. If the resynchronization is a full copy operation,

252 Building Disaster Recovery Serviceguard Solutions Using Metrocluster with Continuous Access EVA