The surviving controller sends event messages announcing the controller failure of the other controller.

A “Redundant Controller Failure Detected” alert message is displayed and written to the event log.

If one controller in the redundant controller configuration fails, the surviving controller takes over for the failed controller until it is replaced.

A failed controller is managed by the surviving controller, which disables and disconnects from its counterpart while gaining access to all the signal paths. The surviving controller then manages the ensuing event notifications and takes over all processes. It remains the primary controller regardless of its original status, and any replacement controller afterward assumes the role of the secondary controller.

The failover and failback processes are completely transparent to the host.

Controllers are hot-swappable if you are using a redundant configuration, and replacing a failed unit takes only a few minutes. Since the I/O connections are on the controllers, you might experience some unavailability between the times when the failed controller is removed and a new one is installed in its place.

To maintain your redundant controller configuration, replace the failed controller as soon as possible. For details, refer to SANnet II Family FRU Installation Guide.

8.5Recovering From Fatal Drive Failure

With a redundant RAID array system, your system is protected with the RAID parity drive and a global spare or spares.

A FATAL FAIL occurs when more drives fail than your RAID redundancy can accommodate. The redundancy of your RAID array depends on your configuration. In a RAID 3 or RAID 5 configuration, two or more drives than the number of spares available must fail for a FATAL FAIL status. In a RAID 1 configuration, you can lose multiple drives without fatal failure if all the failed drives consist of no more than one drive from a mirrored pair.

It might be possible to recover the RAID from a FATAL FAIL. However, it might be impossible to do a full data recovery, depending on the circumstances of the failure. It is important to check your recovered data using the data application or host-based tools following a FATAL FAIL recovery.

It is rare for two or more drives to fail at the same time. To minimize the chance of this happening, regular RAID integrity checks should be performed. For RAID 3 and RAID 5, this can be done using the firmware application “regenerate Parity” option, or using the SANscape CLI command-line utility check parity. Refer to the SANnet

IIFamily RAID Firmware User’s Guide for details on the “regenerate Parity” option. Refer to the SANscape CLI User’s Guide for details on the check parity command-line utility.

8-8SANnet II 200 FC, SATA, and SATA SE Array Installation, Operation, and Service Manual • March 2005

Page 126
Image 126
Dot Hill Systems II 200 FC service manual Recovering From Fatal Drive Failure