Compaq EV68A In-Order Processing of LDxL/STxC Instructions, Internal Eviction of LDxL Blocks

Models: EV68A

1 356
Download 356 pages 47.63 Kb
Page 103
Image 103

Lock Mechanism

4.6.1 In-Order Processing of LDx_L/STx_C Instructions

The 21264/EV68A uses the stWait logic in the IQ to ensure that LDx_L/STx_C pairs are issued in order. The stWait logic treats an Ldx_L instruction like Stx instructions. STx_C instructions are always loaded into the IQ with their associate stWait bit set.

Thus, a STx_C instruction is not issued until the older LDx_L is out of the IQ.

4.6.2 Internal Eviction of LDx_L Blocks

The 21264/EV68A prevents the eviction of cache blocks in the Dcache due to either of the following references:

Istream references with a Bcache index that matches the Dcache block and a Bcache tag that mismatches the Dcache block.

To avoid evictions of LDx_L blocks, Istream references that match the index of a block in the Dcache are converted to noncached references.

Ldx or Stx references with a Dcache index that matches the block.

In the Alpha architecture, Dstream references between a LDx_L/STx_C pair force the value of the STx_C success flag to be UNPREDICTABLE. The 21264/EV68A forces all STx_C instructions that interrupt an LDx_L/STx_C pair to fail in pro- gram order.

There should be no Dstream references between LDx_L/STx_C pairs; however, the out-of-order nature of the 21264/EV68A can introduce Dstream references between LDx_L/STx_C pairs. To prevent load or store instructions older than the LDx_L from evicting the LDx_L cache block, the Mbox invokes a replay trap on the incoming load or store instruction, which also aborts the LDx_L. These instructions are issued in program order in the next iteration of the trap retry down the pipeline. To prevent newer load or store instructions from evicting the locked cache line, the Ibox ensures that a STx_C is issued before any newer load or store instruction by placing the STx_C into the IQ and stalling all subsequent instructions in the map stage of the pipe until the IQ is empty.

Branch instructions between the LDx_L/STx_C pair may be mispredicted, intro- ducing load and store instructions that evict the locked cache block. To prevent that from happening, there is a bit in the instruction fetcher that is set for a LDx_L refer- ence and cleared on any other memory reference. When this bit is set, the branch predictor predicts all branches to fall through.

4.6.3Liveness and Fairness

To prevent a livelock condition, the 21264/EV68A processes the STx_C as follows:

1.If a STx_C misses the Dcache, then no system port transaction is started and the STx_C fails.

2.If a STx_C hits a block that is not dirty, then a ChangeToDirty (Shared or Clean) is launched after the STx_C retires and all older store queue entries are in the writable state. This ensures that once the ChangeToDirty command is launched on behalf of the STx_C, the STx_C will be executed to completion if the ChangeToDirty com- mand succeeds.

21264/EV68A Hardware Reference Manual

Cache and External Interfaces 4–15

Page 103
Image 103
Compaq EV68A In-Order Processing of LDxL/STxC Instructions, Internal Eviction of LDxL Blocks, Liveness and Fairness