Restriction 46: Avoiding Live locks in Speculative Load CRD Handlers

D.42 Restriction 46: Avoiding Live locks in Speculative Load CRD Handlers

Speculative load CRD handlers that release from the interrupt without scrubbing a cache block could suffer from the following live-lock condition:

1.An initial error on a speculative load forces a CRD interrupt.

2.The CRD releases without scrubbing the block. A speculative load in the shadow of the hw_ret (or hw_ret_stall) touches a Dcache location that has the single-bit error, forcing a CRD.

3.The CRD handler is entered again immediately.

4.Go to (2).

This problem can be avoided if all jumps in the CRD handler path for speculative loads use the following sequence:

mb

 

; make sure hw_ret goes

ALIGN_FETCH_BLOCK <^x47FF041F>

 

mulq

p6, #1, p6

; Hold up loads

mulq

p6, #1, p6

; Hold up loads

hw_mtpr p6, <EV6__MM_STAT ! ^x44>

; Hold up loads

PVC_VIOLATE<43>

; Ignore restriction 43

hw_ret_stall (p23)

; Return

This sequence prevents speculative loads from issuing in the shadow of the

hw_ret_stall. Note that it is a violation of restriction 4 to have in the same fetch block a MTPR that specifies scoreboard bit 2 (an explicit writer in the memory operation group) and a HW_RET (an implicit reader in the memory operation group). Under nor- mal circumstances, the intention would be for a HW_RET to wait until the MTPR issues, and that can only be enforced by putting the two instructions in different fetch blocks. In this case, the intention is for the HW_RET to issue before the MTPR. The hardware does not enforce the scoreboarding when the two instructions are in the same fetch block, and thus the HW_RET can issue and mispredict before any speculative loads (which are held up by the MTPR) can issue.

D.43 Restriction 47: Cache Eviction for Single-Bit Cache Errors

Alive lock can occur if issuing instructions out-of-order causes a floating-point store instruction (with sberr) to replay trap.

Ahardware mechanism exists that keeps track of replayed floating-point store instruc- tions, and cancels the dirty register check. See Section D.5 for more details.

If the floating-point store instruction has an sberr and the CRD_HANDLER is entered/ exited before the instruction is replayed, the mechanism will lose track of the instruc- tion. When the instruction is replayed, the dirty register check is not canceled, and a replay trap occurs, causing the floating-point store instruction to continually replay the trap until the sberr is evicted from cache. The sberr will not evict, because the floating- point store instruction is killed by the replay trap. Killed instructions are not scrubbed by the Error Recovery Machine, and CBOX_ERR[C_ADDR] may not contain the address of the sberr. Because CBOX_ERR[C_ADDR] is not guaranteed, the CRD_HANDLER might not evict the sberr.

D–22PALcode Restrictions and Guidelines

Alpha 21264/EV67 Hardware Reference Manual

Page 320
Image 320
Compaq EV67, 21264 Restriction 47 Cache Eviction for Single-Bit Cache Errors, 22PALcode Restrictions and Guidelines