Memory, Run-Time CPU Deconfiguration CPU Gard, Prd

Configuring and Deconfiguring	Processors or	Memory
All failures that	crash the	system with a machine check or check stop, even if

intermittent, are reported as a diagnostic callout for service repair. To prevent the recurrence of intermittent problems and improve the availability of the system until a scheduled maintenance window, processors and memory modules with a failure

history are marked "bad" to prevent their being configured on subsequent boots.

A	processor or	memory	module	is	marked "bad" under the following circumstances:
Ÿ	A processor	or memory module			fails built-in self test (BIST) or power-on self test
	(POST) testing	during	boot	(as	determined by the Service Processor).

ŸA processor or memory module causes a machine check or check stop during

runtime, and the failure can be isolated specifically to that processor or memory module (as determined by the processor runtime diagnostics in the Service Processor).

ŸA processor or memory module reaches a threshold of recovered failures that results in a predictive callout (as determined by the processor runtime diagnostics in the Service Processor).

During boot time, the Service Processor does not configure processors or memory modules that are marked "bad," much in the same way that it would deconfigure them for BIST/POST failures.

If a processor is deconfigured,			the processor remains offline for subsequent reboots
until	the faulty processor	is replaced. The Repeat Gard function also provides the
users with the option of		manually deconfiguring a processor, or re-enabling a
previously deconfigured processor. For information on how to configure or
deconfigure a processor,		see the	Processor Configuration/Deconfiguration Menu on
page	46.

You can enable or disable CPU Repeat Gard or Memory Repeat Gard using the Processor Configuration/Deconfiguration Menu, which is a submenu under the

System Information Menu.

Run-Time CPU Deconfiguration (CPU Gard)

instruction

cache

recoverable errors, L1

data

cache

correctable errors,

and

cache

correctable

errors are monitored by

the

processor runtime

diagnostics

(PRD)

code

running

the

Service Processor. When a predefined error threshold is

met,

error

log

with

warning severity and threshold

exceeded

status is

returned

AIX. At

the

same

time,

PRD

marks the CPU for deconfiguration at

the next

boot. AIX

will

70 RS/6000 Enterprise Server Model H80 Series User's Guide

IBM H80 Series manual Memory, Run-Time CPU Deconfiguration CPU Gard, Processors or, Prd

Models: H80 Series

Processors or

Memory

Run-Time CPU Deconfiguration (CPU Gard)

(PRD)