I/O Write Buffer and the WMB Instruction

2.11.1.1 Load-Load Order Trap

The Mbox ensures that load instructions that read the same physical byte(s) ultimately issue in correct order by using the load-loadorder trap. The Mbox compares the address of each load instruction, as it is issued, to the address of all load instructions in the load queue. If the Mbox finds a newer load instruction in the load queue, it invokes a load-loadorder trap on the newer instruction. This is a replay trap that aborts the tar- get of the trap and all newer instructions from the machine and refetches instructions starting at the target of the trap.

2.11.1.2 Store-Load Order Trap

The Mbox ensures that a load instruction ultimately issues after an older store instruc- tion that writes some portion of its memory operand by using the store-loadorder trap. The Mbox compares the address of each store instruction, as it is issued, to the address of all load instructions in the load queue. If the Mbox finds a newer load instruction in the load queue, it invokes a store-loadorder trap on the load instruction. This is a replay trap. It functions like the load-loadorder trap.

The Ibox contains extra hardware to reduce the frequency of the store-loadtrap. There is a 1-bit by 1024-entry VPC-indexed table in the Ibox called the stWait table. When an Icache instruction is fetched, the associated stWait table entry is fetched along with the Icache instruction. The stWait table produces 1 bit for each instruction accessed from the Icache. When a load instruction gets a store-loadorder replay trap, its associated bit in the stWait table is set during the cycle that the load is refetched. Hence, the trapping load instruction’s stWait bit will be set the next time it is fetched.

The IQ will not issue load instructions whose stWait bit is set while there are older unis- sued store instructions in the queue. A load instruction whose stWait bit is set can be issued the cycle immediately after the last older store instruction is issued from the queue. All the bits in the stWait table are unconditionally cleared every 16384 cycles, or every 65536 cycles if I_CTL[ST_WAIT_64K] is set.

2.11.2 Other Mbox Replay Traps

The Mbox also uses replay traps to control the flow of the load queue and store queue, and to ensure that there are never multiple outstanding misses to different physical addresses that map to the same Dcache or Bcache line. Unlike the order traps, however, these replay traps are invoked on the incoming instruction that triggered the condition.

2.12 I/O Write Buffer and the WMB Instruction

The I/O write buffer (IOWB) consists of four 64-byte entries with the associated address and control logic used to buffer I/O write data between the store queue (SQ) and the system port.

2.12.1 Memory Barrier (MB/WMB/TB Fill Flow)

The Cbox CSR SYSBUS_MB_ENABLE bit determines if MB instructions produce external system port transactions. When the SYSBUS_MB_ENABLE bit equals 0, the Cbox CSR MB_CNT[3:0] field contains the number of pending uncommitted transac- tions. The counter will increment for each of the following commands:

RdBlk, RdBlkMod, RdBlkI

2–32Internal Architecture

Alpha 21264/EV67 Hardware Reference Manual

Page 60
Image 60
Compaq EV67 12 I/O Write Buffer and the WMB Instruction, Other Mbox Replay Traps, Memory Barrier MB/WMB/TB Fill Flow