Decreasing Single Points of Failure

Chapter 2: Capacity and Availability Management 17

Decreasing Single Points of Failure

You can maintain availability in Exchange 2000, even in the event of a failure, provided you ensure that it is not a single point of failure. In some areas, such as database corrup- tion, it is not possible to eliminate single points of failure, but in many cases you can guard against individual failures and still maintain reliability. An obvious example is the direc- tory. By having multiple domain controllers and Global Catalog servers available in any part of your network, you maintain availability of Exchange even in the event of failure of a particular domain controller or Global Catalog server. Having local domain controllers or Global Catalog servers keeps Exchange available in the event of a non-local network failure.

Using front-end servers is another way to avoid single points of failure. The failure of a single front-end server will have no effect on the availability of Exchange to non-MAPI clients. The clients will simply be rerouted to another front-end server, with no loss of service.

Exchange 2000 routing can be modified to minimize single points of failure. In particular, you can modify Routing Group connectors to ensure that there are multiple bridgeheads available, and thus maintain delivery from one part of the organization to another. You can also set up Routing Group meshes, which consist of a series of fully interconnected Routing Groups with multiple possible routes between them.

Multiple messaging routes between servers are useless if they all rely on the same net- work connections and the network goes down. You should therefore ensure that there are multiple network paths (using differing technologies) that Exchange and Windows 2000 can use.

One of the most significant single points of failure is a mailbox server. This can affect very large numbers of users, depending on the server. Mailbox servers can be clustered to ensure their continued high availability. If you are running Exchange 2000 on Windows 2000 Advanced Server, you can cluster over two nodes and you have two possible ways to cluster the servers—active/passive and active/active. Active/passive clustering is the current recom- mended clustering implementation for Exchange. If you choose to implement active/active clustering, you should realize that it requires careful planning to ensure that Exchange can fail over correctly to the other node. With Service Pack 1 of Exchange 2000 and Windows 2000 Datacenter server, you can have four nodes in your cluster. In this implementation consider active/active/active/passive clustering.

In a standard clustered environment, however, the disk array is still the single point of failure, so you should think seriously about using a storage area network (SAN) to maxi- mize the availability of all your servers running Exchange.

If you are creating truly redundant Exchange 2000 servers, you shouldn’t stop at the disk subsystem. Your servers should be equipped with redundant RAID controllers, network interface cards (NICs), and power supplies. In fact, you should aim to have redundancy everywhere.

Single points of failure can also be created by improper maintenance of systems. For example, if you are using a RAID 5 array on a server running Exchange with a hot spare,

Microsoft 1 manual Decreasing Single Points of Failure

Models: 1

Chapter 2: Capacity and Availability Management 17

Decreasing Single Points of Failure