Chapter 5: Protection | 87 |
In training your staff, you should note that restoring to another online server while the first is online is a very difficult procedure to the majority of disaster recovery procedures, because you have to recover to a different forest under those circumstances. The best way to simulate the type of restore you may have to perform in an emergency is using a test network that is completely separate from the main network. This allows you to simulate anything from failures of stores, to total hardware failures of servers and learn what to do under those circumstances.
However, this does not mean you should not perform alternate server restores. These restores tell us other information, such as the fact that the backup software/tapes/storage procedures are working properly, and that a particular live database can be backed up and restored with no hitches. After all, there is no point in having highly trained staff to do a restore if the restores themselves will fail because of faulty tapes. You should ensure that every one of your databases has been restored to an alternate server at least once every six months.
The Operations Manager should be responsible for ensuring that the organization is fully prepared for disaster recovery. This involves regular restores being performed for each Exchange server and each backup device, by the staff who would be involved in restores when they are actively required.
Summary
In an ideal world, Exchange would never suffer problems. However, we live in a world of very diverse hardware and software, viruses and hackers, so it is inevitable that sometimes you are going to run into difficulties with your Exchange configuration.
As you have seen here, to help you meet your SLAs it is vital to minimize the recovery time in the event of a failure. However, in some cases it is very important that you shut down services to protect the system, even though this will affect the availability measurement defined in your SLA.
To protect operations against the unforeseen it is important to factor in unusual circum- stances in your service level agreements. For example you may have established that you have such resilient hardware and such efficient restore technology that you are able to achieve 99.998 percent uptime in your organization (around 10 minutes downtime across your organization per year). However, if a new virus hits your company because the anti- virus vendor hasn’t informed you about it, then you may end up having to shut down Exchange services just to prevent more damage. You can deal with this eventuality in two ways. Either you can take a risk assessment on the effect of such unforeseen circumstances and reduce the SLA accordingly, or you can simply modify the SLA so it states that if the corporation is victim to an unforeseen hacker attack or virus attack and you are able to show that you used your best efforts to combat the problem, then this downtime will not count against the service level agreement.