IBM 755 manual Availability enhancement functions, POWER7 processor functions

Page 7

redundant fan fails, the system calls out the failing fan and continues running. When a nonredundant fan fails, the system shuts down immediately.

Availability enhancement functions

The POWER7 family of systems continues to offer and introduce significant enhancements designed to increase system availability.

POWER7 processor functions

As in POWER6TM, the POWER7 processor has the ability to do processor instruction retry and alternate processor recovery for a number of core-related faults. This significantly reduces exposure to both hard (logic) and soft (transient) errors in the processor core. Soft failures in the processor core are transient (intermittent) errors, often due to cosmic rays or other sources of radiation, and generally are not repeatable. With this function, when an error is encountered in the core, the POWER7 processor will first automatically retry the instruction. If the source of the error was truly transient, the instruction will succeed and the system will continue as before. On IBM systems prior to POWER6, this error would have caused a checkstop.

Hard failures are more difficult, being true logical errors that will be replicated each time the instruction is repeated. Retrying the instruction will not help in this situation because the instruction will continue to fail. As in POWER6, POWER7 processors have the ability to extract the failing instruction from the faulty core and retry it elsewhere in the system for a number of faults, after which the failing core is dynamically deconfigured and called out for replacement. The entire process is transparent to the partition owning the failing instruction. These systems are designed to avoid a full system outage.

POWER7 single processor checkstopping

As in POWER6, POWER7 provides single processor checkstopping. This significantly reduces the probability of any one processor affecting total system availability.

Partition availability priority

Also available is the ability to assign availability priorities to partitions. If an alternate processor recovery event requires spare processor resources in order to protect a workload, when no other means of obtaining the spare resources is available, the system will determine which partition has the lowest priority and attempt to claim the needed resource. On a properly configured POWER7 processor- based server, this allows that capacity to be first obtained from, for example, a test partition instead of a financial accounting system.

POWER7 cache availability

The POWER® processor-based line of servers continues to be at the forefront of cache availability enhancements. The L3 cache is now integrated on the POWER7 processor. The POWER7 processor provides both L2 and L3 cache line delete functions.

Special uncorrectable error handling

Uncorrectable errors are difficult for any system to tolerate, although there are some situations where they can be shown to be irrelevant. For example, if an uncorrectable error occurs in cached data that will never again be read or where a fresh write of the data is imminent, it would be unwise to "protect" the user by forcing an immediate reboot.

Special Uncorrectable Error (SUE) handling was an IBM innovation introduced for

POWER5TM processors, where an uncorrectable error in memory or cache does not immediately cause the system to terminate. Rather, the system tags the data and determines whether it will ever be used again. If the error is irrelevant, it will not force a checkstop.

IBM United States Hardware Announcement 110-008

IBM is a registered trademark of International Business Machines Corporation

7

Image 7
Contents Table of contents Key prerequisites Description Planned availability dateFor more information, visit EXP 12S SAS Drawer #5886 Minimum defined configuration, if no choice is made, isReliability, fault tolerance, and data correction Reliability, availability, and serviceability RAS featuresMutual surveillance Memory error correction extensionsFault monitoring functions Redundancy for array self-healingPartition availability priority POWER7 processor functionsPOWER7 single processor checkstopping Availability enhancement functionsServiceability First Failure Data Capture and Error Data Analysis Service InterfaceService labels Stand-alone diagnosticsLocation diagrams Service Processor Error Handling and ReportingCall Home Benefits IBM Electronic ServicesStatement of general direction Product number E8C 5774 3m 14-Ft 3PH/24A Power Cord 8236 Inch racks Publications Safety InformationSpecified operating environment ServicesPhysical specifications Technical informationHardware requirements Noise level and sound powerSoftware requirements 110-008 LimitationsSAS AIX Linux DVD-RAM Sata Planning information IBM Electronic ServicesCable orders IBM Global Financing Warranty periodWarranty service Terms and conditionsWarranty service upgrades Maintenance Services Graduated program license charges apply Usage plan machine IBM hourly service rate classificationField-installable features Model conversions Machine installationEducational allowance PricesAdptr AA E8C 3687 Both PCI Full Width Keyboard -- USB, Danish, #159 IOP E8C US English Existing features New featuresMeter SAS Cable Virtual USB, Spanish USB, Russian Optional 8XDIMMS Kapalua Trademarks Order nowTerms of use