Functional Architecture

Intel® Server Board SE7520JR2

Uncorrectable memory errors are critical errors that may cause the system to fail. The BIOS normally detects and logs these errors as IPMI SEL events for all management levels, except in the case described below.

It is possible that a critical hardware error (uncorrectable memory or bus error) may prevent the BIOS from running, reporting the error, and restarting the system. In Professional and Advanced management models, the Sahalee BMC monitors the SMI signal, which, if it stays asserted for a long period of time, is an indication that BIOS cannot run. In this case, the Sahalee BMC logs an SMI Timeout event and probes for errors. If one is found it will log data against the IPMI type 0Ch Memory Sensor and will log against the IPMI 13h Critical Interrupt sensor for a bus error. Both of these can include additional data in bytes 2 and 3 depending on the exact nature of the error and what the chipset reports to the Sahalee BMC.

3.3.6Memory RASUM Features

The Intel E7520 MCH supports several memory RASUM (Reliability, Availability, Serviceability, Usability, and Manageability) features. These features include the Intel® x4 Single Device Data Correction (x4 SDDC) for memory error detection and correction, Memory Scrubbing, Retry on Correctable Errors, Integrated Memory Initialization, DIMM Sparing, and Memory Mirroring. The following sections describe how each is supported.

Note: The operation of the memory RASUM features listed below is supported regardless of the platform management model used. However, with no Intel® Management Module installed, the system has limited memory monitoring and logging capabilities. It is possible for a RASUM feature to be initiated without notification that the action has occurred when standard Onboard Platform Instrumentation is used.

3.3.6.1DRAM ECC – Intel® x4 Single Device Data Correction (x4 SDDC)

The DRAM interface uses two different ECC algorithms. The first is a standard SEC/DED ECC across a 64-bit data quantity. The second ECC method is a distributed, 144-bit S4EC-D4ED mechanism, which provides x4 SDDC protection for DIMMS that utilize x4 devices. Bits from x4 parts are presented in an interleaved fashion such that each bit from a particular part is represented in a different ECC word. DIMMs that use x8 devices, can use the same algorithm but will not have x4 SDDC protection, since at most only four bits can be corrected with this method. The algorithm does provide enhanced protection for the x8 parts over a standard SEC- DED implementation. With two memory channels, either ECC method can be utilized with equal performance, although single-channel mode only supports standard SEC/DED.

When memory mirroring is enabled, x4 SDDC ECC is supported in single channel mode when the second channel has been disabled during a fail-down phase. The x4 SDDC ECC is not supported during single-channel operation outside of DIMM mirroring fail-down as it does have significant performance impacts in that environment.

3.3.6.2Integrated Memory Scrub Engine

The Intel E7520 MCH includes an integrated engine to walk the populated memory space proactively seeking out soft errors in the memory subsystem. In the case of a single bit correctable error, this hardware detects, logs, and corrects the data except when an incoming write to the same memory address is detected. For any uncorrectable errors detected, the scrub

42

Revision 1.0

 

C78844-002