IBM P5 570 manual Serviceability, Error indication and LED indicators

Models: P5 570

1 82
Download 82 pages 52.29 Kb
Page 71
Image 71
3.2.8 Serviceability

If the output shows CPU Guard as disabled, enter the following command to enable it:

chdev -l sys0 -a cpuguard='enable'

Cache or cache-line deallocation is aimed at performing dynamic reconfiguration to bypass potentially failing components. This capability is provided for both L2 and L3 caches. Dynamic run-time deconfiguration is provided if a threshold of L1 or L2 recovered errors is exceeded.

In case of an L3 cache run-time array single-bit solid error, the spare chip resources are used to perform a L3 cache line delete on the failing line.

PCI hot-plug slot fault tracking helps prevent slot errors from causing a system machine check interrupt and subsequent reboot. This provides superior fault isolation, and the error affects only the single adapter. Run-time errors on the PCI bus that are caused by failing adapters will result in recovery action. If this is unsuccessful, the PCI device will be gracefully shut down. Parity errors on the PCI bus itself will result in bus retry and, if uncorrected, the bus and any I/O adapters or devices on that bus will be deconfigured.

The p5-570 supports PCI Extended Error Handling (EEH) if it is supported by the PCI-X adapter. In the past, PCI bus parity errors caused a global machine check interrupt, which eventually required a system reboot in order to continue. In the p5-570 system, hardware, system firmware, and AIX interaction has been designed to allow transparent recovery of intermittent PCI bus parity errors and graceful transition to the I/O device available state in the case of a permanent parity error in the PCI bus.

EEH-enabled adapters respond to a special data packet that is generated from the affected PCI slot hardware by calling system firmware, which examines the affected bus, allows the device driver to reset it, and continues without a system reboot.

Persistent deallocation functions include:

￿Processor

￿Memory

￿Deconfigure or bypass failing I/O adapters

￿L3 cache

Following a hardware error that has been flagged by the service processor, the subsequent reboot of the system invokes extended diagnostics. If a processor or L3 cache has been marked for deconfiguration by persistent processor deallocation, the boot process will attempt to proceed to completion with the faulty device automatically deconfigured. Failing I/O adapters will be deconfigured or bypassed during the boot process.

Note: The auto-restart (reboot) option, when enabled, can reboot the system automatically following an unrecoverable software error, software hang, hardware failure, or environmentally induced failure (such as loss of power supply).

3.2.8 Serviceability

By increasing service productivity, the system is up and running for a longer time. p5-570 improves service productivity by providing the following functions.

Error indication and LED indicators

The p5-570 is designed to be installed by an IBM service representative. The addition of most hardware features after the install is customer setup. To help the customer and the IBM service representative, the p5-570 provides internal LED diagnostics that identify parts that require service. Indication of an error is provided through a series of light attention signals,

Chapter 3. Capacity on Demand, RAS, and manageability 59

Page 71
Image 71
IBM P5 570 manual Serviceability, Error indication and LED indicators