Timing Considerations

In a journal group, many journal volumes can be configured to hold a significant amount of the journal data (host-write data). The package startup time may increase significantly when a Metrocluster Continuous Access package fails over. Delay in package startup time will occur in these situations:

1.When recovering from broken pair affinity. On failover, the SVOL pull all the journal data from PVOL site. The time needed to complete all data transfer to SVOL depends on the amount of outstanding journal data in the PVOL and the bandwidth of the Continuous Access links.

2.When host I/O faster than Continuous Access data replication. The outstanding data not being replicated to the SVOL is accumulated in journal volumes. Upon package fail over to the SVOL site, the SVOL pull all the journal data from PVOL site. The completion of the all data transfer to the SVOL depends on the bandwidth of the Continuous Access links and amount of outstanding data in the PVOL journal volume.

Data maintenance with the failure of a Metrocluster with Continuous Access for P9000 and XP Failover

The following sections, “Swap Takeover Failure (Asynchronous/Journal mode)” and “Takeover Timeout (for Continuous Access Journal mode)” describes data maintenance upon failure of a Metrocluster with Continuous Access for P9000 and XP failover.

Swap Takeover Failure (Asynchronous/Journal mode)

When a device group pair state is SVOL-PAIR at a local site and is PVOL-PAIR at the remote site, the Metrocluster Continuous Access performs a swap takeover. The swap takeover would fail if there is an internal (unseen) error (for example, cache or shared memory failure) in the device group pair. In this case, if the AUTO-NONCURDATA is set to 0, the package will not be started and the SVOL state is change to SVOL-PSUE (SSWS) by the takeover command. The PVOL site either remains in PVOL-PAIR or is changed to PVOL-PSUE.

The SVOL is in SVOL-PSUE(SSWS) meaning that the SVOL is read/write enabled and the data is usable but not as current as PVOL.

In this case, either use FORCEFLAG to startup the package on SVOL site or fix the problem and resume the data replication with the following procedures:

1.Split the device group pair completely (pairsplit -g <dg> -S).

2.Re-create a pair from original PVOL as source (use paircreate command).

3.Startup package on either the PVOL site or SVOL site.

Takeover Timeout (for Continuous Access Journal mode)

A takeover timeout occurs when a package failover to the secondary site (SVOL) and Metrocluster Continuous Access issues takeover (either swap or SVOL takeover) command on SVOL. If the journal group pair is flushing the journal data from PVOL to SVOL and takeover timeout occurs, the package would not start and the following situations would occur:

1.The device group pair state remains in PVOL-PAIR/SVOL-PAIR.

2.The journal data is continuously transferring to the SVOL.

In this case, it is required to wait for the completion of the journal data flushing and the state for each of the following:

Primary site: PVOL-PAIR or PVOL-PSUS(E)

Secondary site: SVOL-PSUS(SSWS) or SVOL-PSUE(SSWS)

At this point, execute either: (1) by using the FORCEFLAG to startup the package on SVOL site or

(2)to fix the problem (if any of Continuous Access links was failed) and resume the data replication with the following procedures:

196 Building Disaster Recovery Serviceguard Solutions Using Metrocluster with Continuous Access for P9000 and XP