I/O Write Buffer and the WMB Instruction

Also consider the related sequence shown in Table 2–13.In this case, the data could be cached in the Bcache; Pj should fetch datai if it is using PTEi.

Table 2–13 TB Fill Flow Example Sequence 2

Pi

Pj

 

 

Write Datai

Istream read datai

MB

<TB miss>

Write PTEi

Load-PTE

 

<write TB>

 

Istream read (restart) - will miss the Icache

 

 

The 21264/EV67 processes Dstream loads to the PTE by injecting, in hardware, some memory barrier processing between the PTE transaction and any subsequent load or store instruction. This is accomplished by the following mechanism:

1.The integer queue issues a HW_LD instruction with VPTE.

2.The integer queue issues a HW_MTPR instruction with a DTB_PTE0, that is data- dependent on the HW_LD instruction with a VPTE, and is required in order to fill the DTBs. The HW_MTPR instruction, when queued, sets IPR scoreboard bits [4] and [0].

3.When a HW_MTPR instruction with a DTB_PTE0 is issued, the Ibox signals the Cbox indicating that a HW_LD instruction with a VPTE has been processed. This

causes the Cbox to begin processing the MB instruction. The Ibox prevents any subsequent memory operations being issued by not clearing the IPR scoreboard bit

[0]. IPR scoreboard bit [0] is one of the scoreboard bits associated with the HW_MTPR instruction with DTB_PTE0.

4.When the Cbox completes processing the MB instruction (using one of the above sequences, depending upon the state of SYSBUS_MB_ENABLE), the Cbox sig- nals the Ibox to clear IPR scoreboard bit [0].

The 21264/EV67 uses a similar mechanism to process Istream TB misses and fills to the PTE for the Istream.

1.The integer queue issues a HW_LD instruction with VPTE.

2.The IQ issues a HW_MTPR instruction with an ITB_PTE that is data-dependent upon the HW_LD instruction with VPTE. This is required in order to fill the ITB. The HW_MTPR instruction, when queued, sets IPR scoreboard bits [4] and [0].

3.The Cbox issues a HW_MTPR instruction for the ITB_PTE and signals the Ibox that a HW_LD/VPTE instruction has been processed, causing the Cbox to start pro- cessing the MB instruction. The Mbox stalls Ibox fetching from when the HW_LD/ VPTE instruction finishes until the probe queue is drained.

4.When the 21264/EV67 is finished (SYS_MB selects one of the above sequences), the Cbox directs the Ibox to clear IPR scoreboard bit [0]. Also, the Mbox directs the Ibox to start prefetching.

Inserting MB instruction processing within the TB fill flow is only required for multi- processor systems. Uniprocessor systems can disable MB instruction processing by deasserting Ibox CSR I_CTL[TB_MB_EN].

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture 2–35

Page 63
Image 63
Compaq 21264, EV67 specifications TB Fill Flow Example Sequence