Prefetch Unit

5.2Branch prediction

The PFU normally fetches instructions from sequential addresses. If a branch instruction is fetched, the next instruction to be fetched can only be determined with certainty after the instruction has completed execution at the end of the pipeline in the DPU. If the branch is taken, the next instruction to be executed is not sequential. The sequential instructions that the PFU has fetched while the branch instruction was executing must be flushed from the pipeline and the correct instruction fetched. This has the effect of reducing the performance of the processor.

The PFU can detect branches in the Pd-stage of the pipeline, predict whether or not the branch is taken, and determine or predict the target address for a taken branch. This enables the PFU to start fetching instructions at the destination of a taken branch before the branch has completed execution in the DPU. The branch instruction is still executed in the DPU to determine the accuracy of the prediction. If the branch was mispredicted, the pipeline must be flushed and the correct instruction fetched. In general, more branches are correctly predicted than mispredicted so fewer pipeline flushes occur and the performance of the processor is enhanced.

Two major classes of branch are addressed in the processor prediction scheme:

1.Direct branches, including B, BL, CZB, and BLX immediate, where the target address is a fixed offset, encoded in the instruction, from the program counter. If such an instruction has been fetched, and the program counter is known, predicting the destination of the branch only involves predicting whether the instruction passes or fails its condition code, that is, whether the branch is taken or not taken.

2.Indirect branches such as load and Branch and eXchange (BX), instructions which write to the PC, that can be identified as a likely return from a procedure call. Two identifiable cases are:

loads to the PC from an address derived from R13

BX from R0-R14.

In these cases, if the calling operation can also be identified, the likely return address can be stored in the return stack. Typical calling operations are BL and BLX instructions.

Note

Unconditional instructions of either class of program flow are always executed, and do not affect prediction history. Unconditional return stack operations always affect the return stack.

This section describes:

Disabling program flow prediction

Branch predictor on page 5-4

Incorrect predictions and correction on page 5-4.

5.2.1Disabling program flow prediction

You cannot disable program flow prediction using the Z bit, bit [11], of CP15 Register c1. The Z bit is tied to 1. To disable the program flow prediction you must disable the return stack and set the branch prediction policy to not-taken. For more information see c1, System Control Register on page 4-35.

You can also control the return stack, the branch predictor, and the fetch rate using the Auxiliary Control Register. For more information see Auxiliary Control Registers on page 4-38.

ARM DDI 0363E

Copyright © 2009 ARM Limited. All rights reserved.

5-3

ID013010

Non-Confidential, Unrestricted Access

 

Page 162
Image 162
ARM R4F, r1p3 manual Branch prediction, Disabling program flow prediction