HP Serviceguard Continentalcluster 305

Failback Scenarios

There is no failback counterpart to the “pushbutton” failover from the source disk site to the target disk site. Failback is dependent on the original nature of the failover, the state of primary and secondary Symmetrix SRDF volumes (R1 and R2) and the condition of the source disk site. In Chapter 2: “Designing Continentalclusters”, there is a discussion of failback mechanisms and methodologies in the section “Restoring Disaster Tolerance” (page 98).

The goal of HP Continentalclusters is to maximize system and application availability. However, even systems configured with Continentalclusters can experience hardware failures at the primary site or the recovery site, as well as the hardware or networking failures connecting the two sites. The following discussion addresses some of those failures and suggests recovery approaches applicable to the environments using data replication provided by Symmetrix Disk Arrays and Symmetrix Remote Data Facility SRDF.

Scenario 1

The primary site has lost power, including backup power (UPS), to both the systems and disk arrays that make up the Serviceguard Cluster at the primary site. There is no loss of data on either the Symmetrix or the operating systems of the systems at the primary site. After reception of the Continentalclusters alerts and alarm, the administrators at the recovery site follow the prescribed processes and recovery procedures to start the protected applications on the target disk site. The Continentalclusters package control file will invoke Metrocluster with EMC SRDF to evaluate the status of the R1 and R2 paired group volumes. The command symrdf list will display status of the device group.

Source

(R1)

View

Target (R2) View

MODES

--------------------------------

------------------------ ----- ------------

Standard

Logical

R1 Inv

R2 Inv

R1 Inv

R2 Inv

RDF Pair

Device

Dev

Tracks

Dev

Tracks

MDA

STATE

--------------------------------

------------------------

----- ------------

DEV001

009F

0 NR 00A5

S..

Failed Over

DEV002

00A0

0 NR 00A6 RW

0 S..

Failed Over

After power is restored to the primary site, the Symmetrix device groups may be in the status of Failed Over. The procedure to move the application packages back to the primary site are different depending on the status of the device groups.

The following procedure applies to the situation where the device groups have a status of “Failed Over”:

1.Halt the Continentalclusters recovery packages at the recovery site.

# cmhaltpkg <pkg_name>

This will halt any applications, remove any floating IP addresses, unmount file systems and deactivate volume groups as programmed into the package control files. The status of the device groups will remain “Synchronized” at the recovery site and “Failed Over” at the primary site.

2.Halt the recovery cluster, which also halts the monitor package ccmonpkg.

3.Start the cluster at the primary site. Assuming they have been properly configured the Continentalclusters primary packages should not start. The monitor package should start automatically.

4.Manually start the Continentalclusters primary packages at the primary site.

#cmrunpkg <pkg_name> or

#cmmodpkg -e <pkg_name>

The control script is programmed to handle this case. The control script will issue an SRDF failback command to move the device group back to the R1 side and to resynchronize the R1 from the R2 side. Until the resynchronization is complete, the SRDF “read-through” feature will

Building Continentalclusters Solution with EMC SRDF 305