Service Availability, Minimizing System Failures

16Microsoft Exchange 2000 Operations — Version 1.0

Of course, just because you have allowed for a certain amount of downtime per server per month, this does not mean that you have to use it, and in most cases you will not. On the other hand, just because you haven’t performed offline maintenance one month does not mean that the hours can be carried over to the following month. Your user community will be very unhappy if you take a system down for 2 days, even if it has been up solidly for 2 years!

You might wish to define different service hours for the different services available in Exchange (mail, public folders, etc). This would depend on the amount of offline mainte- nance that is typically required for each service. For example, you might determine that your SMTP bridgehead servers and firewall servers never require offline maintenance and so might set the level of service hours for mail delivery significantly higher than for mail- box access. If you are prepared to spend the appropriate money on resources, it is very possible to achieve extremely low levels of scheduled downtime, and this can be reflected in your SLA.

Service Availability

Service availability is a measure of how available your Exchange services are during the service hours you have defined. In other words, it defines the levels of unscheduled down- time you can tolerate within your organization. Typically levels of availability in an SLA of an enterprise are between 99.9 and 99.999 percent. This corresponds to a downtime of as much as 525 and as few as 5 minutes per service per year.

Of course, ANY unscheduled downtime is inconvenient at best, and very costly at worst, so you need to do your best to minimize it.

To ensure high levels of availability, you need to consider two key questions:

◆How often, on average, is there downtime for a service?

◆How long does it take to recover the service if there is downtime?

Once you have considered these questions, you can set about minimizing the number of times a service fails and the time taken to recover that service.

Availability management is intrinsically linked with capacity management. If capacity is not managed properly, then overloaded servers running Exchange might fail, causing availability problems. A classic example of this would be running out of disk space on a server running Exchange, which would result in the databases shutting down and in users losing a number of services.

Minimizing System Failures

To minimize the frequency of failure in Exchange 2000, you need do the following:

◆Decrease single points of failure

◆Increase the reliability of Exchange 2000 itself.

Microsoft 1 manual Service Availability, Minimizing System Failures

Models: 1

16Microsoft Exchange 2000 Operations — Version 1.0

Service Availability

Minimizing System Failures