Data Integrity and Error Handling

capable of any recovery. The first error, especially if it is fatal, may itself have caused downstream errors to be flagged. The error that is flagged as first should be considered as correct and an indication of some real problem. The problem may have been a transient condition or a true hardware failure. Errors that are flagged in the NERR register should not be considered as caused by a hardware failure, since the first error may have actually precipitated the later errors.

6.8.1SDC Multiple Errors

There are several important cases of multiple errors in the SDC. These are:

The data buffer in the SDC is 16B wide. Each 16B is checked for errors. Therefore if a single line has errors in multiple 16B chunks, then both the SDC_FERR and SDC_NERR registers will be set. ECC checking is done on 8B, and it is possible that a 16B chunk has both 8B chunks as bad; this case will be reported as a single error and only SDC_FERR is set. Since this is one line, there is only one error reported from the SDC to the SAC, so only FERR_SAC is set (or NERR_SAC if FERR_SAC were previously set by a different error).

Multiple lines in the SDC may have errors. For example, memory ECC errors are marked in the Data Buffer as the data is read from memory and placed in the buffer. As the data is sent from the SDC to the appropriate target, an indication is sent to the SAC as to whether there was an error or not on that transfer. If multiple lines have a 2x memory ECC error, only the first line that was retrieved by the SDC will have the SAC FERR error set. Later 2x memory errors are flagged as generic errors. For the SAC_FERR, generic errors set the SNE bit. Therefore, if SAC_NERR has the SNE bit set, this simply means that the SDC had multiple errors that were non-fatal. This may be multiple errors of the same type or different types. Even 1x correctable memory errors will set the SNE bit of SAC_NERR if there are more than one.

One transaction may have any combination of front-side bus, single-bit memory ECC error, and/or double-bit memory ECC error. The following shows what is captured for the different cases, where FSE is front-side bus, DED is 2x memory ECC error, and SEC is 1x memory ECC error:

FSE

SEC

DED Status

ERR ERR

ERR

 

0

0

0

Store nothing, no errors

0

1

0

Store ITID in SECTID, log SCME in FERR

0

0

1

Store ITID in DEDTID, log SNE in FERR

0

1

1

Store ITID in DEDTID, log only SNE in FERR

1

0

0

Store ITID in FSETID, log SNE in FERR

1

0

1

Store ITID in DEDTID, log SNE in FERR

1

1

0

Store ITID in FSETID, log only SNE in FERR

1

1

1

Store ITID in DEDTID, log only SNE in FERR

If there are multiple 1x Memory ECC errors, then the FERR[SCME] bit is set in the SAC. Also the FERR[SNE] bit will be set on the second 1x Memory ECC error (since SCME does not prevent FERR from being written. If there are more than 2 errors, then NERR[SNE] will also be set. Software must look at the SDC to determine that all the errors were single bit errors.

6-12

Intel® 460GX Chipset Software Developer’s Manual