Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
September 2006 DM
Order Number: 252480-006US 175
Intel XScale® Processor—Intel® IXP42X product line and IXC1100 control plane processors
3.10.3.1.2 Optimizing Branches
Branches decrease application performance by indirectly causing pipeline stalls. Branch
prediction improves the performance by lessening the delay inherent in fetching a new
instruction stream. The number of branches that can accurately be predicted is limited
by the size of the branch target buffer. Since the total number of branches executed in
a program is relatively large compared to the size of the branch target buffer; it is often
beneficial to minimize the number of branches in a program. Consider the following C
code segment.
The code generated for the if-else portion of this code segment using branches is:
The code generated above takes three cycles to execute the else part and four cycles
for the if-part assuming best case conditions and no branch misprediction penalties. In
the case of the IXP42X product line and IXC1100 control plane processors, a branch
misprediction incurs a penalty of four cycles. If the branch is incorrectly predicted 50
percent of the time, and if we assume that both the if-part and the else-part are
equally likely to be taken, on an average the code above takes 5.5 cycles to execute.
If we were to use the IXP42X product line and IXC1100 control plane processors to
execute instructions conditionally, the code generated for the above if-else statement
is:
int foo(int a)
{
if (a > 10)
return 0;
else
return 1;
}
cmp r0, #10
ble L1
mov r0, #0
b L2
L1:
mov r0, #1
L2:
50
100
---------434+
2
------------
+×
⎝⎠
⎛⎞
5.5=cycles
cmp r0, #10
movgt r0, #0
movle r0, #1