Sun Microsystems X4440, X4240 DIMM Replacement Policy, How DIMM Errors Are Handled by the System

Models: X4140 X4440 X4240

1 80
Download 80 pages 45.98 Kb
Page 22
Image 22
DIMM Replacement Policy

DIMM Replacement Policy

Replace a DIMM when one of the following events takes place:

The DIMM fails memory testing under BIOS due to Uncorrectable Memory Errors (UCEs).

UCEs occur and investigation shows that the errors originated from memory.

In addition, a DIMM should be replaced whenever more than 24 Correctable Errors (CEs) originate in 24 hours from a single DIMM and no other DIMM is showing further CEs.

If more than one DIMM has experienced multiple CEs, other possible causes of CEs have to be ruled out by a qualified Sun Support specialist before replacing any DIMMs.

Retain copies of the logs showing the memory errors per the above rules to send to Sun for verification prior to calling Sun.

How DIMM Errors Are Handled by the System

This section describes system behavior for the two types of DIMM errors: UCEs and CEs, and also describes BIOS DIMM error messages.

Uncorrectable DIMM Errors

For all operating systems (OS’s), the behavior is the same for UCEs:

1.When an UCE occurs, the memory controller causes an immediate reboot of the system.

2.During reboot, the BIOS checks the Machine Check registers and determines that the previous reboot was due to an UCE, then reports this in POST after the memtest stage:

A Hypertransport Sync Flood occurred on last boot

12 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008

Page 22
Image 22
Sun Microsystems X4440 manual DIMM Replacement Policy, How DIMM Errors Are Handled by the System, Uncorrectable DIMM Errors