Chapter 2. RAID Array Controller
When one controller fails, the survivor will process all I/O requests until the failed controller is repaired and powered on. The subsystem will then return to its previous state (i.e., ACTIVE / ACTIVE or ACTIVE / PASSIVE).
2.8.1 Initialization
During initialization, the firmware in the RAID 3000 verifies that both control- lers have consistent configurations including identical memory cache and system parameters. If the controller setups are incompatible, the set is not bound and each controller operates in
2.8.2 Message Passing
Information is shared between the two controllers by a collection of messages passed through the backplane connectors. The messages provide configuration data as well as a heartbeat which is transmitted by each controller every 500 ms. If a controller does not receive a heartbeat within one second, it assumes the peer controller has become inoperable and begins failing over.
If the controllers cannot exchange messages due to communication problems over the backplane, they will break the connection and each controller will switch to a
2.8.3 Failover
Failover describes the process of transferring data from a failed controller to a survivor and completing any active tasks. When one controller begins the fail- over process, it sends a reset to the other controller, which prevents the failing unit from processing any more information and enables any host ports that are passive. It then downloads the failed controller’s cache to its unused portion of cache and begins acting upon that data.
While downloading the data, the controller responds to I/O by disconnecting (if allowed) and waiting approximately three seconds before reconnecting and pre- senting a BUSY status. The delay is to prevent host operating systems from seeing too many errors and fencing off the controller.