CAUTION: When fault tolerance is compromised, data loss can occur. However, it may be possible to recover the data. For more information, see "Recovering from compromised fault tolerance (on page 31)."

If more drives fail than the fault-tolerance method can manage, fault tolerance is compromised, and the logical drive fails. If this failure occurs, the operating system rejects all requests and indicates unrecoverable errors.

For example, fault tolerance might occur when a drive in an array fails while another drive in the array is being rebuilt.

Compromised fault tolerance can also be caused by problems unrelated to drives. In such cases, replacing the physical drives is not required.

Recovering from compromised fault tolerance

If fault tolerance is compromised, inserting replacement drives does not improve the condition of the logical volume. Perform the following procedure to recover data:

1.Check for loose, dirty, broken, or bent cabling and connectors on all devices.

2.Power down the MDS600 ("Power down" on page 13).

3.Power up the MDS600 ("Power up" on page 12).

In some cases, a marginal drive is operational long enough to allow backup of important files.

4.Make copies of important data, if possible.

5.Replace any failed drives ("Installing the hard drives" on page 19).

Factors to consider before replacing hard drives

You can replace hard drives without powering down the system. However, before replacing a degraded drive:

Open HP SIM and inspect the Error Counter window for each physical drive in the same array to confirm that no other drives have any errors. (For details, refer to the HP SIM documentation on the Management CD.)

Be sure that the array has a current, valid backup.

Use replacement drives that have a capacity at least as great as that of the smallest drive in the array. The controller immediately fails drives that have insufficient capacity.

To minimize the likelihood of fatal system errors when removing failed drives, take the following precautions:

Do not remove a degraded drive if any other drive in the array is offline (the online LED is off). In this situation, removing any other drive in the array causes data loss.

Exceptions:

o When RAID 1+0 is used, drives are mirrored in pairs. Several drives can be in a failed condition simultaneously (and they can all be replaced simultaneously) without data loss, if no two failed drives belong to the same mirrored pair.

o When RAID 6 with ADG is used, two drives can fail simultaneously (and be replaced simultaneously) without data loss.

o If the offline drive is a spare, the degraded drive can be replaced.

Troubleshooting 31