Intel® Server Board SE7520JR2

Functional Architecture

engine logs the failure. Both types of errors may be reported via multiple alternate mechanisms under configuration control. The scrub hardware will also execute “demand scrub” writes when correctable errors are encountered during normal operation (on demand reads, rather than scrub-initiated reads). This functionality provides incremental protection against time-based deterioration of soft memory errors from correctable to uncorrectable.

Using this method, a 16GB system can be completely scrubbed in less than one day. The effect of the scrub writes do not cause any noticeable degradation to memory bandwidth, although they will cause a greater latency for that one very infrequent read that is delayed due to the scrub write cycle.

Note that an uncorrectable error encountered by the memory scrub engine is a “speculative error.” This designation is applied because no system agent has specifically requested use of the corrupt data, and no real error condition exists in the system until that occurs. It is possible that the error resides in an unmodified page of memory that will be simply dropped on a swap back to disk. Were that to occur, the speculative error would simply “vanish” from the system undetected without adverse consequences.

3.3.6.3Retry on Uncorrectable Error

The Intel E7520 MCH includes specialized hardware to resubmit a memory read request upon detection of an uncorrectable error. When a demand fetch (as opposed to a scrub) of memory encounters an uncorrectable error as determined by the enabled ECC algorithm, the memory control hardware will cause a (single) full resubmission of the entire cache line request from memory to verify the existence of corrupt data. This feature is expected to greatly reduce or eliminate the reporting of false or transient uncorrectable errors in the DRAM array.

Note that any given read request will only be retried a single time on behalf of this error detection mechanism. If the uncorrectable error is repeated, it will be logged and escalated as directed by device configuration. In the memory mirror mode, the retry on an uncorrectable error will be issued to the mirror copy of the target data, rather than back to the devices responsible for the initial error detection. This has the added benefit of making uncorrectable errors in DRAM fully correctable unless the same location in both primary and mirror happens to be corrupt. This RASUM feature may be enabled and disabled via configuration.

3.3.6.4Integrated Memory Initialization Engine

The Intel E7520 MCH provides hardware managed ECC auto-initialization of all populated DRAM space under software control. Once internal configuration has been updated to reflect the types and sizes of populated DIMM devices, the MCH will traverse the populated address space initializing all locations with good ECC. This not only speeds up the mandatory memory initialization step, but also frees the processor to pursue other machine initialization and configuration tasks.

Additional features have been added to the initialization engine to support high speed population and verification of a programmable memory range with one of four known data patterns (0/F, A/5, 3/C, and 6/9). This function facilitates a limited, very high speed memory test, as well as provides a BIOS accessible memory zeroing capability for use by the operating system.

Revision 1.0

43

 

C78844-002

Page 43
Image 43
Intel SE7520JR2 manual Retry on Uncorrectable Error, Integrated Memory Initialization Engine