Memory and I/O Address Space Instructions

The Mbox allocates a new MAF entry to an I/O load instruction and increases I/O band- width by attempting to merge I/O load instructions in a merge register. Table 2–7shows the rules for merging data. The columns represent the load instructions replayed to the MAF while the rows represent the size of the load in the merge register.

Table 2–7 Rules for I/O Address Space Load Instruction Data Merging

Merge Register/

 

 

 

Replayed Instruction

Load Byte/Word

Load Longword

Load Quadword

 

 

 

 

Byte/Word

No merge

No merge

No merge

Longword

No merge

Merge up to 32 bytes

No merge

Quadword

No merge

No merge

Merge up to 64 bytes

 

 

 

 

In summary, Table 2–7shows some of the following rules:

Byte/word load instructions and different size load instructions are not allowed to merge.

A stream of ascending non-overlapping, but not necessarily consecutive, longword load instructions are allowed to merge into naturally aligned 32-byte blocks.

A stream of ascending non-overlapping, but not necessarily consecutive, quadword load instructions are allowed to merge into naturally aligned 64-byte blocks.

Merging of quadwords can be limited to naturally-aligned 32-byte blocks based on the Cbox WRITE_ONCE chain 32_BYTE_IO field.

Issued MB, WMB, and I/O load instructions close the I/O register merge window. To minimize latency, the merge window is also closed when a timer detects no I/O store instruction activity for 1024 cycles.

After the Mbox I/O register has closed its merge window, the Cbox sends I/O read requests offchip in the order that they were received from the Mbox.

2.8.3 Memory Address Space Store Instructions

The Mbox begins execution of a store instruction by translating its virtual address to a physical address using the DTB and by probing the Dcache. The Mbox puts informa- tion about the store instruction, including its physical address, its data and the results of the Dcache probe, into the store queue (SQ).

If the Mbox does not find the addressed location in the Dcache, it places the address into the MAF for processing by the Cbox. If the Mbox finds the addressed location in a Dcache block that is not dirty, then it places a ChangeToDirty request into the MAF.

A store instruction can write its data into the Dcache when it is retired, and when the Dcache block containing its address is dirty and not shared. SQ entries that meet these two conditions can be placed into the writable state. These SQ entries are placed into the writable state in program order at a maximum rate of two entries per cycle. The Mbox transfers writable store queue entry data from the SQ to the Dcache in program order at a maximum rate of two entries per cycle. Dcache lines associated with writable store queue entries are locked by the Mbox. System port probe commands cannot evict these blocks until their associated writable SQ entries have been transferred into the Dcache. This restriction assists in STx_C instruction and Dcache ECC processing.

2–28Internal Architecture

21264/EV68A Hardware Reference Manual

Page 56
Image 56
Compaq EV68A Memory Address Space Store Instructions, Rules for I/O Address Space Load Instruction Data Merging