User’s Manual

IBM PowerPC 750GX and 750GL RISC Microprocessor

Figure 6-7. Branch Taken

 

 

 

 

Branch Folding

 

 

 

Branch Folding

 

 

 

 

(Taken Branch/BTIC Hit)

 

(Taken Branch/BTIC Miss)

 

 

 

Clock 0

 

Clock 1

Clock 2

 

Clock 0

Clock 1

Clock 2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

IQ5

 

add5

 

 

 

 

 

 

IQ5

add5

 

 

 

 

 

IQ4

 

add4

 

 

 

 

 

 

IQ4

add4

 

 

 

 

 

IQ3

 

add3

 

 

 

 

and6

 

IQ3

add3

 

 

 

and4

 

IQ2

 

b

 

 

 

 

and5

 

IQ2

b

 

 

 

and3

 

IQ1

 

add2

 

 

and2

 

and4

 

IQ1

add2

 

 

 

and2

 

IQ0

add1

 

 

and1

 

and3

 

IQ0

add1

 

 

 

and1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6-8shows the removal of fall-through branch instructions, which occurs when a branch is not taken or is predicted as not taken.

Figure 6-8. Removal of Fall-Through Branch Instruction

Branch Fall-Through

(Not-Taken Branch)

 

Clock 0

Clock 1

Clock 2

 

 

 

 

 

 

IQ5

add5

 

add8

 

etc.

 

 

 

 

 

 

IQ4

add4

 

add7

 

add9

 

 

 

 

 

 

IQ3

add3

 

add6

 

add8

 

 

 

 

 

 

IQ2

b

 

add5

 

add7

 

 

 

 

 

 

IQ1

add2

 

add4

 

add6

 

 

 

 

 

 

IQ0

add1

 

add3

 

add5

 

 

 

 

 

 

When a branch instruction is detected before it reaches a dispatch position, and if the branch is correctly predicted as taken, folding the branch instruction (and any instructions from the incorrect path) reduces the latency required for flow control to zero. Instruction execution proceeds as though the branch was never there.

The advantage of removing the fall-through branch instructions at dispatch is only marginally less than that of branch folding. Because the branch is not taken, only the branch instruction needs to be discarded. The only cost of expelling the branch instruction from one of the dispatch entries rather than folding it is missing a chance to dispatch an executable instruction from that position.

6.4.1.2 Branch Instructions and Completion

As described in the previous section, instructions that do not update either the LR or CTR are removed from the instruction stream before they reach the completion queue, either for branch taken or by removing fall- through branch instructions at dispatch. However, branch instructions that update the architected LR and CTR must do so in program order. Therefore, they must perform write-back in the completion stage, like the instructions that update the FPRs and GPRs.

Branch instructions that update the CTR or LR pass through the instruction queue like nonbranch instruc- tions. At the point of dispatch, however, they are not sent to an execution unit, but rather are assigned a slot in the completion queue, as shown in Figure 6-9on page 228.

gx_06.fm.(1.2)

Instruction Timing

March 27, 2006

Page 227 of 377