Bcache Data Single-Bit Correctable ECC Error on a Probe

If the quadword in error is used to satisfy a load instruction, then the flow is very simi- lar to that used for a Dcache ECC error:

The load instruction’s destination register is written with incorrect data; however, the load queue will retain the state associated with the load instruction.

A consumer of the load instruction’s data may be issued before the error is recognized; however, the Ibox will invoke a replay trap at an instruction that is older than (or equal to) any instruction that consumes the load instruction’s data. The Ibox stalls the replayed Istream in the map stage of the pipeline until the error is corrected.

With a READ_ERR read type from the Mbox for the load instruction in error, the Cbox scrubs the block in the Dcache by evicting the block into the victim buffer and writing it back into the Dcache.

C_STAT[DSTREAM_MEM_ERR] is set.

C_ADDR contains bits [42:6] of the system memory fill address of the block that contains the error.

C_SYNDROME_0[7:0] and C_SYNDROME_1[7:0] contain the syndrome of quadword 0 and 1, respectively, of the octaword subblock that contains the error.

The load queue retries the load instruction and rewrites the register.

DC_STAT[ECC_ERR_LD] is set.

A corrected read data (CRD) error interrupt is posted, when enabled.

Note: Errors in speculative load instructions cause a CRD error to be posted but the data is not scrubbed by hardware. The PALcode cannot scrub the data because C_STAT is zero, and C_ADDR does not have the address of the block with the error.

8.10 Bcache Data Single-Bit Correctable ECC Error on a Probe

The probed processor extracts the block from its Bcache, signaling a corrected read data (CRD) error and latching error information. The single-bit ECC detected error data is not corrected by the probed processor, but is forwarded to the requesting processor. The requesting processor then detects a related system fill error as a result of this sys- tem probe transaction.

No hardware correction is performed.

C_STAT[PROBE_BC_ERR] is set.

C_ADDR contains bit [42:6] of the Bcache address of the block that contains the error.

C_SYNDROME_0[7:0] and C_SYNDROME_1[7:0] contain the syndrome of quadword 0 and 1, respectively, of the octaword subblock that contains the error.

A CRD error interrupt is posted, when enabled.

The PALcode on the probed processor may choose to scrub the error, though it will probably be scrubbed by the requesting processor.

8–8

Error Detection and Error Handling

Alpha 21264/EV67 Hardware Reference Manual

Page 236
Image 236
Compaq EV67, 21264 specifications Bcache Data Single-Bit Correctable ECC Error on a Probe