7.If using physical data replication, do not resync from the recovery cluster to the primary cluster. Instead, manually issue a command that will overwrite any changes on the recovery disk array that may inadvertently have been made.

8.Start the package up on the primary cluster and allow connection to the application.

Testing Continentalclusters Operations

Use the following procedures to exercise typical Continentalclusters behaviors:

1.Halt both clusters in a recovery pair, then restart both clusters. The monitor packages on both clusters should start automatically. The Continentalclusters packages (primary, data sender, data receiver, and recovery) should not start automatically. Any other packages may or may not start automatically, subject to their configuration.

NOTE: If an UP status is configured for a cluster, then an appropriate alert notification (email, SNMP, etc.) should be received at the configured time interval from the node running the monitor package on the other cluster. Due to delays in email or SNMP, the notifications may arrive later than expected.

In addition to alerts/alarms sent using the mechanisms defined in the Continentalclusters configuration file, they are also recorded in the file /var/opt/resmon/log/cc/eventlog on the system reporting the event.

2.While the monitor package is running on a monitoring cluster, halt the monitored cluster

(cmhaltcl -f). An appropriate alert notification (email, SNMP, etc.) should be received at the configured time interval from the node running the monitor package. Run cmrecovercl. The command should fail. Additional notifications should be received at the configured time intervals. After the alarm notification is received, run cmrecovercl. Any data receiver packages on the monitoring cluster should halt and the recovery package(s) should start with package switching enabled. Halt the recovery packages.

3.Test 2 should be rerun under a variety of conditions (and multiple conditions) such as the following:

Rebooting and powering off systems one at a time

Rebooting and powering off all systems at the same time

Running the monitor package on each node in each cluster

Disconnecting the WAN connection between the clusters

If physical data replication is used disconnect the physical replication links between the disk arrays:

Powering off the disk array at the primary site

Powering off the disk array at the recovery site

Testing cmrecovercl -f as well as cmrecovercl

Depending on the condition, the primary packages should be running to test real life failures and recovery procedures.

4.After each scenario in tests 2-4, restore both clusters to their production state, restart the primary package(s) (as well as any data sender and data receiver packages) and note any issues, time delays, etc.

5.Halt the monitor package on one cluster. Halt the other cluster. No notifications are generated that the other cluster has failed. What mechanism is available to the organization to monitor the monitor?

6.Halt the packages on one cluster, but do not halt the cluster. No notifications are generated that the packages on that cluster have failed. What mechanism is available to the organization to monitor package status?

Testing the Continentalclusters

93