During boot time, the service processor does not configure processors or memory DIMMs that are marked ªbad.º
If a processor or memory DIMM is deconfigured, the processor or memory DIMM remains offline for subsequent reboots until it is replaced or repeat gard is disabled. The repeat gard function also provides the user with the option of manually deconfiguring a processor or memory DIMM, or
For information about configuring or deconfiguring a processor, see the Processor Configuration/Deconfiguration Menu on page 46. For information on configuring or deconfiguring a memory DIMM, see the Memory Configuration/Deconfiguration Menu on page 47. Both of these menus are submenus under the System Information Menu. You can enable or disable CPU Repeat Gard or Memory Repeat Gard using the Processor Configuration/Deconfiguration Menu.
Run-Time CPU Deconfiguration (CPU Gard)
L1 instruction cache recoverable errors, L1 data cache correctable errors, and L2 cache correctable errors are monitored by the processor runtime diagnostics (PRD) code running in the service processor. When a predefined error threshold is met, an error log with warning severity and threshold exceeded status is returned to AIX. At the same time, PRD marks the CPU for deconfiguration at the next boot. AIX will attempt to migrate all resources associated with that processor to another processor and then stop the defective processor.
Service Processor System Monitoring - Surveillance
Surveillance is a function in which the service processor monitors the system, and the system monitors the service processor. This monitoring is accomplished by periodic samplings called heartbeats.
Surveillance is available during the following phases:
vSystem firmware bringup (automatic)
vOperating system runtime (optional)
System Firmware Surveillance
System firmware surveillance is automatically enabled during system
If the service processor detects no heartbeats during system IPL (for a set period of time), it cycles the system power to attempt a reboot. The maximum number of retries is set from the service processor menus. If the fail condition persists, the service processor leaves the machine powered on, logs an error, and displays menus to the user. If
Chapter 3. Using the Service Processor 65