Disaster Tolerance and Recovery in a Serviceguard Cluster

What is a Disaster Tolerant Architecture?

impact. For these types of installations, and many more like them, it is important to guard not only against single points of failure, but against multiple points of failure (MPOF), or against single massive failures that cause many components to fail, such as the failure of a data center, of an entire site, or of a small area. A data center, in the context of disaster recovery, is a physically proximate collection of nodes and disks, usually all in one room.

Creating clusters that are resistant to multiple points of failure or single massive failures requires a different type of cluster architecture called a disaster tolerant architecture. This architecture provides you with the ability to fail over automatically to another part of the cluster or manually to a different cluster after certain disasters. Specifically, the disaster tolerant cluster provides appropriate failover in the case where a disaster causes an entire data center to fail, as in Figure 1-2.

Figure 1-2

Disaster Tolerant Architecture

Chapter 1

17