data in the damaged DIMM is used until the system is powered down and the DIMM replaced.

Certain restrictions exist with respect to placement and size of memory DIMMs when memory mirroring is enabled. These are discussed in “Memory mirroring” on page 67.

￿Chipkill memory

Chipkill is integrated into the XA-32 chipset and does not require special Chipkill DIMMs. Chipkill corrects multiple single-bit errors to keep a DIMM from failing. When combining Chipkill with Memory ProteXion and Active Memory, the x440 provides very high reliability in the memory subsystem. Chipkill memory is approximately 100 times more effective than ECC technology, providing correction for up to four bits per DIMM (eight bits per memory controller), whether on a single chip or multiple chips.

If a memory chip error does occur, Chipkill is designed to automatically take the inoperative memory chip offline while the server keeps running. The memory controller provides memory protection similar in concept to disk array striping with parity, writing the memory bits across multiple memory chips on the DIMM. The controller is able to reconstruct the “missing” bit from the failed chip and continue working as usual.

Chipkill support is provided in the memory controller and implemented using standard ECC DIMMs, so it is transparent to the operating system.

In addition, to maintain the highest levels of system availability, if a memory error is detected during POST or memory configuration, the server can automatically disable the failing memory bank and continue operating with reduced memory capacity. You can manually re-enable the memory bank after the problem is corrected via the Setup menu in BIOS.

Memory mirroring, Chipkill, and Memory ProteXion provide multiple levels of redundancy to the memory subsystem. Combining Chipkill with Memory ProteXion enables up to two memory chip failures per memory port (8 DIMMs) on the x440. An eight-way x440 with its four memory ports could sustain up to eight memory chip failures. Memory mirroring provides additional protection with the ability to continue operations with memory module failures.

1.The first failure detected by the Chipkill algorithm on each port doesn’t generate a Light Path Diagnostics error, since Memory ProteXion recovers from the problem automatically.

2.Each memory port could then sustain a second chip failure without shutting down.

3.Provided that memory mirroring is enabled, the third chip failure on that port would send the alert and take the DIMM offline, but keep the system running out of the redundant memory bank.

22 IBM ^xSeries 440 Planning and Installation Guide

Page 36
Image 36
IBM manual IBM xSeries 440 Planning and Installation Guide