IBM P5 570 manual 3.2.5 N+1 redundancy, 3.2.6Fault masking, 3.2.7Resource deallocation

Models: P5 570

1 82
Download 82 pages 52.29 Kb
Page 70
Image 70

(dynamic bit-steering). Memory scrubbing is the process of reading the contents of the memory during idle time and checking and correcting any single-bit errors that have accumulated by passing the data through the ECC logic. This function is a hardware function on the memory controller chip and does not influence normal system memory performance.

3.2.5 N+1 redundancy

The use of redundant parts allows the p5-570 to remain operational with full resources:

￿Redundant spare memory bits in L1, L2, L3, and main memory

￿Redundant fans

￿Redundant power supplies

3.2.6Fault masking

If corrections and retries succeed and do not exceed threshold limits, the system remains operational with full resources and no client or IBM customer engineer intervention is required:

￿CEC bus retry and recovery

￿PCI-X bus recovery

￿ECC Chipkill soft error

3.2.7Resource deallocation

If recoverable errors exceed threshold limits, resources can be deallocated with the system remaining operational, allowing deferred maintenance at a convenient time.

Dynamic or persistent deallocation

Dynamic deallocation of potentially failing components is non-disruptive, allowing the system to continue to run. Persistent deallocation occurs when a failed component is detected, which is then deactivated at a subsequent reboot.

Dynamic deallocation functions include:

￿Processor

￿L3 cache line delete

￿Partial L2 cache deallocation

￿PCI-X bus and slots

For dynamic processor deallocation, the service processor performs a predictive failure analysis based on any recoverable processor errors that have been recorded. If these transient errors exceed a defined threshold, the event is logged and the processor is deallocated from the system while the operating system continues to run. This feature (named CPU Guard) enables maintenance to be deferred until a suitable time. Processor deallocation can occur only if there are sufficient functional processors (at least two).

To verify whether CPU Guard has been enabled, run the following command:

lsattr -El sys0 grep cpuguard

If CPU Guard is enabled, the output will be similar to:

cpuguard

enable

CPU Guard

True

58p5-570 Technical Overview and Introduction

Page 70
Image 70
IBM P5 570 manual 3.2.5 N+1 redundancy, 3.2.6Fault masking, 3.2.7Resource deallocation, Dynamic or persistent deallocation