Architecture

4.2 Delay Slot

Some R3900 Processor Core instructions are executed with a delay of one instruction cycle. The cycle in which an instruction is delayed is called a delay slot. A delay occurs with load instructions and branch/jump instructions.

4.2.1Delayed load

With load instructions, a one-cycle delay occurs while waiting for the data being loaded to become available for use by another instruction. The R3900 Processor Core checks the instruction in the delay slot (the instruction immediately following the load instruction) to see if that instruction needs to use the load result; if so, it stalls the pipeline (see Figure 4-2).

With the R3000A, if the instruction following a load instruction required access to the loaded data, then a NOP had to be inserted immediately after the load instruction. The delay load feature in the R3900 Processor Core eliminates the need for a NOP instruction, resulting in smaller code size than with the R3000A.

LW r2, 20(r0)

ADD r3, r1, r2

F

D

E

M

 

W

 

 

 

F

D

ES

 

E

M

W

 

 

 

 

Pipeline stall

 

 

Figure 4-2. Load delay slot and pipeline stall

4.2.2Delayed branching

Figure 4-3 shows the pipeline flow for jump/branch instructions. The branch target address that must be generated for these type of instructions does not become available until the E stage ￿ too late to be used by the instruction in the branch delay slot. The branch target instruction is fetched immediately after the branch delay slot cycle.

It is, however, possible to fetch a different instruction that would normally be executed prior to the branch instruction.

Branch/Jump instruction

Branch delay slot

F

D

E

 

M

W

 

 

 

Target address

 

 

 

 

F

D

 

E

M

W

Branch target address

F

D

E

M

W

Figure 4-3. Branch instruction delay slot

You can make effective use of the branch delay slot as follows.

Since the instruction immediately following a branch instruction will be executed just priot to the branch, you can therefore place an instruction (that logically should be executed just before the branch) into the delay slot following the branch instruction.

40

Page 49
Image 49
Toshiba TX39 user manual Delay Slot, Delayed load, Delayed branching