User’s Manual
IBM PowerPC 750GX and 750GL RISC Microprocessor
Performance features such as branch folding, BTIC, dynamic branch prediction (implemented in the BHT),
•Whether the branch is taken
•Whether instructions in the target stream, typically the first two instructions in the target stream, are in the branch target instruction cache (BTIC)
•Whether the target instruction stream is in the L1 cache
•Whether the branch is predicted
•Whether the prediction is correct
6.4.1.1 Branch FoldingWhen a branch instruction is encountered by the fetcher, the BPU immediately begins to decode it and tries to resolve it. Branch folding is the removal of branches from the instruction stream. This is independent of whether the branch is taken or not taken. However, if the branch instruction updates either the LR or CTR it cannot be removed and must be allocated a position in the completion queue. If a branch cannot be resolved immediately, it is predicted and instruction fetching resumes along the predicted path. Those instructions are conditionally fed into the instruction queue. Later, if the prediction is finally correctly resolved, the fetched instructions are validated and allowed to complete and be retired. If the prediction is resolved incorrectly, then the instructions fetched are invalidated, and instruction fetching resumes along the other path of the branch.
Figure
Figure
If the target instruction is not in the BTIC, there is an idle cycle while the fetcher attempts to fetch the first four instructions from the instruction cache (on the next clock cycle). In the example in Figure
If the target instruction misses in the BTIC or L1 caches, an L2 cache or memory access is required. The latency of this access is dependent on several factors, such as processor/bus clock ratios. In most cases, new instructions arrive in the IQ before the execution units become idle.
Instruction Timing | gx_06.fm.(1.2) |
Page 226 of 377 | March 27, 2006 |