Intel® Server Board Set SE8500HW4

System BIOS

DIMM bank’s uncorrectable error count. If the error count is less than 10 per hour, the BIOS reports the uncorrectable ECC error to the SEL. When the DIMM uncorrectable error count reaches 10, BIOS lights the bad DIMMs LEDs and disables the DIMM bank for subsequent boots. The system continues to function from redundant memory.

Multiple consecutive uncorrectable ECC errors may cause a XMB fail condition and the entire Memory Board to be disabled. When the XMB fail occurs, the BIOS is no longer able to access the XMB registers in order to locate the failing DIMM(s). Hence, the BIOS does NOT light the bad DIMM LED, log the failed DIMM information or disable the failed DIMMs.

If XMB failed due to uncorrectable ECC errors while system is operating in a redundant state, the system continues operation in a non-redundant state. The BIOS logs a SEL event to indicate that an uncorrectable ECC error has occurred on the failed Memory Board. The BIOS also sends commands to the BMC update the DIMM state as “Not Present”. The user may perform a memory hot replace operation to replace the bad Memory Board with a good Memory Board to restore the system to redundant mode.

If multiple uncorrectable ECC errors occur while the system is operating in non-redundant mode, the system will hang.

When a correctable ECC error occurs during runtime, the DIMM correctable error count is incremented. If the error count is less than the error stop report threshold, the BIOS reports the correctable ECC error to the SEL. If the board containing the DIMM with the correctable error has available spares, the error stop report threshold shall be the same as the error threshold for switching to spare. If the board has no available spare, the error stop report threshold shall be 10 errors per hour. When the error count reaches the error stop report threshold, the BIOS reports to the SEL that the correctable error stop report threshold has been reached and stops report of subsequent correctable ECC errors for the DIMM. If a spare Rank is available on the Memory Board with the error when error threshold for switching to spare is reached, the system copies the contents of the bad Rank to the spare Rank, switches to the spare Rank, sets the Memory Board LED to indicate the bad DIMM(s) and disables the bad DIMM bank and sparing for subsequent boots. With sparing disabled, the ranks previously reserved for spares are used for system memory.

Any disabled event reporting will be re-enabled on the next reboot.

10.3.3I/O Devices

Revision 1.0

77

 

Intel order number D22893-001