Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
DM September 2006
196 Order Number: 252480-006US
All data processing instructions incur a two cycle issue penalty and a two-cycle result
penalty when the shifter operand is a shift/rotate by a register or shifter operand is
RRX. Since the next instruction would always incur a 2 cycle issue penalty, there is no
way to avoid such a stall except by re-writing the assembler instruction. Consider the
following segment of code:
The subtract instruction would incur a one-cycle stall due to the issue latency of the
add instruction as the shifter operand is shift by a register. The issue latency can be
avoided by changing the code as follows:
3.10.5.3 Scheduling Multiply Instructions
Multiply instructions can cause pipeline stalls due to either resource conflicts or result
latencies. The following code segment would incur a stall of zero to three cycles
depending on the values in registers r1, r2, r4 and r5 due to resource conflicts.
The following code segment would incur a stall of one to three cycles, depending on the
values in registers r1 and r2 due to result latency.
Note that a multiply instruction that sets the condition codes blocks the whole pipeline.
A four-cycle multiply operation that sets the condition codes behaves the same as a 4
cycle issue operation. Consider the following code segment:
The add operation above would stall for three cycles if the multiply takes four cycles to
complete. It is better to replace the code segment above with the following sequence:
add r1, r2, r3
sub r6, r7, r8
mov r4, r1, LSL #2
mov r3, #10
mul r4, r2, r3
add r5, r6, r2, LSL r3
sub r7, r8, r2
mov r3, #10
mul r4, r2, r3
add r5, r6, r2, LSL #10
sub r7, r8, r2
mul r0, r1, r2
mul r3, r4, r5
mul r0, r1, r2
mov r4, r0
muls r0, r1, r2
add r3, r3, #1
sub r4, r4, #1
sub r5, r5, #1
mul r0, r1, r2
add r3, r3, #1
sub r4, r4, #1
sub r5, r5, #1
cmp r0, #0