•Fan speed is controlled by monitoring actual temperatures on critical components and adjusting accordingly. If internal component temperatures reach critical levels, the system will shut down immediately, regardless of fan speed. When a redundant fan fails, the system calls out the failing fan and continues running. When a nonredundant fan fails, the system shuts down immediately.
Availability enhancement functions
The POWER7 family of systems continues to offer and introduce significant enhancements designed to increase system availability.
POWER7 processor functions
As in POWER6, the POWER7 processor has the ability to do processor instruction retry and alternate processor recovery for a number of
Hard failures are more difficult, being true logical errors that will be replicated each time the instruction is repeated. Retrying the instruction will not help in this situation. As in POWER6, POWER7 processors have the ability to extract the failing instruction from the faulty core and retry it elsewhere in the system for a number of faults, after which the failing core is dynamically deconfigured and called out for replacement. These systems are designed to avoid a full system outage.
POWER7 single processor checkstopping
As in POWER6, POWER7 provides single processor checkstopping. This significantly reduces the probability of a fault in any one processor affecting total system availability.
Partition availability priority
Also available is the ability to assign availability priorities to partitions. If an alternate processor recovery event requires spare processor resources in order to protect a workload, when no other means of obtaining the spare resources is available, the system will determine which partition has the lowest priority and attempt to claim the needed resource. On a properly configured POWER7 processor- based server, this allows that capacity to be first obtained from, for example, a test partition instead of a financial accounting system.
POWER7 cache availability
The POWER
Special uncorrectable error handling
Special Uncorrectable Error (SUE) handling was an IBM innovation introduced for
POWER5TM processors, where an uncorrectable error in memory or cache does not immediately cause the system to terminate. Rather, the system tags the data and determines whether it will ever be used again. If the error is irrelevant, it will not force a checkstop.
PCI extended error handling
PCI extended error handling
IBM Europe, Middle East, and Africa Hardware | IBM is a registered trademark of International Business Machines Corporation 13 |
Announcement |
|