
Configuring and Deconfiguring | Processors or | Memory |
All failures that | crash the | system with a machine check or check stop, even if |
intermittent, are reported as a diagnostic callout for service repair. To prevent the recurrence of intermittent problems and improve the availability of the system until a scheduled maintenance window, processors and memory modules with a failure
history are marked "bad" to prevent their being configured on subsequent boots.
A | processor or | memory | module | is | marked "bad" under the following circumstances: |
Ÿ | A processor | or memory module | fails | ||
| (POST) testing | during | boot | (as | determined by the Service Processor). |
ŸA processor or memory module causes a machine check or check stop during
runtime, and the failure can be isolated specifically to that processor or memory module (as determined by the processor runtime diagnostics in the Service Processor).
ŸA processor or memory module reaches a threshold of recovered failures that results in a predictive callout (as determined by the processor runtime diagnostics in the Service Processor).
During boot time, the Service Processor does not configure processors or memory modules that are marked "bad," much in the same way that it would deconfigure them for BIST/POST failures.
If a processor is deconfigured, | the processor remains offline for subsequent reboots | ||
until | the faulty processor | is replaced. The Repeat Gard function also provides the | |
users with the option of | manually deconfiguring a processor, or | ||
previously deconfigured processor. For information on how to configure or | |||
deconfigure a processor, | see the | Processor Configuration/Deconfiguration Menu on | |
page | 46. |
|
|
You can enable or disable CPU Repeat Gard or Memory Repeat Gard using the Processor Configuration/Deconfiguration Menu, which is a submenu under the
System Information Menu.
Run-Time CPU Deconfiguration (CPU Gard)
L1 | instruction | cache | recoverable errors, L1 | data | cache | correctable errors, | and | L2 | |||||
cache | correctable | errors are monitored by | the | processor runtime | diagnostics | (PRD) | |||||||
code | running | in | the | Service Processor. When a predefined error threshold is | met, | an | |||||||
error | log | with | warning severity and threshold | exceeded | status is | returned | to | AIX. At | |||||
the | same | time, | PRD | marks the CPU for deconfiguration at | the next | boot. AIX | will |
|
70 RS/6000 Enterprise Server Model H80 Series User's Guide