It also uses the hardware error detection logic in the processor to capture run-time recoverable and irrecoverable error indications. The firmware uses the error signatures in the hardware to analyze and isolate the error to a specific processor.

The processors that are deconfigured remain off-line for subsequent reboots until the faulty processor hardware is replaced.

This function allows usersto manually deconfigure or re-enable a previously deconfigured processor through the Service Processor menu. The user can also enable or disable this function through the Service Processor.

Processor Run-Time Deconfiguration (CPU-Gard)

Processor run-time deconfiguration allows for the dynamic removal of CPUs from the system configuration. The objective is to minimize system failures or data integrity exposures due to a faulty processor. The processor to be removed is the one that has experienced repeated run-time recoverable internal errors (over a predefined threshold).

The function uses the hardware error detection logic in the processor to capture run-time recoverable error indications. The firmware uses the error signatures in the hardware to analyze and isolate the error to a specific CPU. The firmware also maintains error-threshold information.

When an internal recoverable error for a processor reaches a predefined threshold, the firmware notifies the AIX operating system. The AIX operating system migrates all software processes and interrupts to another processor and puts the faulty processor in stop state.

CPUs that are deconfigured at run time remain off-line for subsequent reboots through the CPU Boot Time Deconfiguration function, until the faulty CPU hardware is replaced. The user can also enable or disable this function via the AIX system management function.

Memory Boot-Time Deconfiguration (Memory Repeat-Gard)

Memory boot time deconfiguration allows for the removal of a memory segment or DIMM from the system configuration at boot time. The objective is to minimize system failures or data integrity exposure due to faulty memory hardware. The hardware resource(s) to be removed are the ones that experienced the following failures:

vA boot-time test failure.

vRun-time recoverable errors over threshold prior to the current boot phase.

vRun-time irrecoverable errors prior to the current boot phase.

This function uses firmware Power-On Self-Test (POST) to discover and isolate memory hardware failures during boot time. It also uses the hardware error detection logic in the memory controller to capture run-time recoverable and irrecoverable error indications.

The firmware uses the error signatures in the hardware to analyze and isolate the error to the specific memory segment or DIMM.

6444P Series Model 170 User's Guide

Page 82
Image 82
IBM RS/6000 44P manual Processor Run-Time Deconfiguration CPU-Gard, Memory Boot-Time Deconfiguration Memory Repeat-Gard