Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
DM September 2006
192 Order Number: 252480-006US
Note that this rearrangement may not be always possible. Consider the following
example:
In the example above, the LDR instruction cannot be moved before the ADDNE or the
SUBEQ instructions because the LDR instruction depends on the result of these
instructions. Rewrite the above code to make it run faster at the expense of increasing
code size:
The optimized code takes six cycles to execute compared to the seven cycles taken by
the unoptimized version.
The result latency for an LDR instruction is significantly higher if the data being loaded
is not in the data cache. To minimize the number of pipeline stalls in such a situation
the LDR instruction should be moved as far away as possible from the instruction that
uses result of the load. Note that this may at times cause certain register values to be
spilled to memory due to the increase in register pressure. In such cases, use a preload
instruction or a preload hint to ensure that the data access in the LDR instruction hits
the cache when it executes. A preload hint should be used in cases where we cannot be
sure whether the load instruction would be executed. A preload instruction should be
used in cases where we can be sure that the load instruction would be executed.
Consider the following code sample:
ldr r0, [r5]
add r1, r2, r3
sub r8, r2, r3
add r6, r0, r1
mul r9, r2, r3
cmp r1, #0
addne r4, r5, #4
subeq r4, r5, #4
ldr r0, [r4]
cmp r0, #10
cmp r1, #0
ldrne r0, [r5, #4]
ldreq r0, [r5, #-4]
addne r4, r5, #4
subeq r4, r5, #4
cmp r0, #10