OLA/R Overview and Concepts

Planning and Preparation

could replace the card when the server is off-line, or if you must take action immediately, you can use rad to attempt an on-line addition of a backup card and deletion of the target card.

Failover Actions / Single Points of Failure

In most cases, the system will automatically fail over to the alternate resource when a card is suspended. However, some subsystems might require manual intervention. For example, the Logical Volume Manager (LVM), will automatically redirect I/O for a temporarily disconnected disk resource to a mirror, logging errors as it handles this situation.

Along those lines, if the resource will be suspended for an extended period of time, a large number of error log entries could result.

In this type of situation, you may want to manually switch over to a mirror beforehand. When you have completed the OLA/R procedure, the mirror and disk can be resynchronized.

If you suspend a card and the backup takes over, the system can contain a single point of failure. If the backup resource fails before the new card is on-line, the system could potentially crash. This window of vulnerability can be minimized by keeping the period of suspension as short as possible. This requires careful planning, and gathering as much information as possible before actually suspending driver operation and powering-down a card slot.

When an extended suspension period is unavoidable, or when the system is mission-critical, it is desirable to configure a second backup resource if possible.

102

Chapter 4