Intel IXC1100, IXP42X 3.9.2 Branch Prediction, 3.9.3 Addressing Modes, 3.9.4 Instruction Latencies

Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor

Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor

DM September 2006

160 Order Number: 252480-006US

Maximum Interrupt Latency can be reduced by:

• Ensuring that the interrupt vector and interrupt service routine are resident in the

instruction cache. This can be accomplished by locking them down into the cache.

• Removing or reducing the occurrences of hardware page table walks. This also can

be accomplished by locking down the application’s page table entries into the TLBs,

along with the page table entry for the interrupt service routine.

3.9.2 Branch Prediction

The IXP42X product line and IXC1100 control plane processors implement dynamic

branch prediction for the ARM instructions B and BL and for the thumb instruction B.

Any instruction that specifies the PC as the destination is predicted as not taken. For

example, an LDR or a MOV that loads or moves directly to the PC will be predicted not

taken and incur a branch latency penalty.

These instructions — ARM B, ARM BL and thumb B -- enter into the branch target

buffer when they are “taken” for the first time. (A “taken” branch refers to when they

are evaluated to be true.) Once in the branch target buffer, IXP42X product line and

IXC1100 control plane processors dynamically predict the outcome of these

instructions based on previous outcomes. Tabl e 76 shows the branch latency penalty

when these instructions are correctly predicted and when they are not. A penalty of

zero for correct prediction means that the IXP42X product line and IXC1100 control

plane processors can execute the next instruction in the program flow in the cycle

following the branch.

3.9.3 Addressing Modes

All load and store addressing modes implemented in the IXP42X product line and

IXC1100 control plane processors do not add to the instruction latencies numbers.

3.9.4 Instruction Latencies

The latencies for all the instructions are shown in the following sections with respect to

their functional groups: branch, data processing, multiply, status register access, load/

store, semaphore, and coprocessor.

The following section explains how to read these tables.

3.9.4.1 Performance Terms

• Issue Clock (cycle 0)

The first cycle when an instruction is decoded and allowed to proceed to further

stages in the execution pipeline (i.e., when the instruction is actually issued).

Table 76. Branch Latency Penalty

Core Clock Cycles Description

ARM*Thumb*

+0 + 0 Predicted Correctly. The instruction is in the branch target cache and is

correctly predicted.

+4 + 5

Mispredicted. There are three occurrences of branch misprediction, all of

which incur a 4-cycle branch delay penalty.

1. The instruction is in the branch target buffer and is predicted not-

taken, but is actually taken.

2. The instruction is not in the branch target buffer and is a taken branch.

3. The instruction is in the branch target buffer and is predicted taken, but

is actually not-taken