■Since ECC protects functional data in all routes including a computing unit, a register, cache memory, and a system bus, all
2.4.2Availability
Availability is characterized by how easily a server fails and how quickly the user can be recovered from the failure. The amount of time the system is usable is represented as a percentage.
Hardware and software faults in the system cannot be completely eliminated. To provide high availability, the system must include mechanisms that enable continuous system operation even if a failure occurs in hardware, such as components and devices, or in software, such as the operating system, or application software.
SPARC Enterprise M8000/M9000 servers provide the functions listed below to obtain high availability. Higher availability can also be obtained by combining the server with clustering software or management software.
■Supporting redundant configurations and hot replacement of power supplies and fans
■Supporting redundant configuration of disks, mirroring by software and active replacement
■Extended range of automatic correction of temporary faults in memory, system buses, and LSI internal data
■Support of an enhanced retry function and degradation function for detected faults
■Shortening the downtime by using automatic system reboot
■Shortening the time taken for system startup
■XSCF collection of fault information, and preventive maintenance using different types of warnings
■Supporting the Chipkill function in the memory subsystem, which enables single- bit error correction to continue processing in response to continuous burst read errors caused by failures of a memory device
■Supporting the memory mirroring function enables normal data processing through the other memory bus, thereby preventing system failures in response to an error at the bus or device connected to memory bus
■Memory patrol function has no influence on the workload of software operation because it is implemented in hardware
Chapter 2 System Features