Intel IXC1100, IXP42X 3.10.2.2 Instruction Flow Through the Pipeline

Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor

Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor

DM September 2006

170 Order Number: 252480-006US

While instructions are issued in-order, the main execution pipeline, memory, and MAC

pipelines are not lock-stepped, and, therefore, have different execution times. This

means that instructions may finish out of program order. Short ‘younger’ instructions

may be finished earlier than long ‘older’ ones. (The term ‘to finish’ is used here to

indicate that the operation has been completed and the result has been written back to

the register file.)

3.10.2.1.4 Register Scoreboarding

In certain situations, the pipeline may need to be stalled because of register

dependencies between instructions. A register dependency occurs when a previous

MAC or load instruction is about to modify a register value that has not been returned

to the register file and the current instruction needs access to the same register. Only

the destination of MAC operations and memory loads are scoreboarded. The

destinations of ALU instructions are not scoreboarded.

If no register dependencies exist, the pipeline will not be stalled. For example, if a load

operation has missed the data cache, subsequent instructions that do not depend on

the load may complete independently.

3.10.2.1.5 Use of Bypassing

The IXP42X product line and IXC1100 control plane processors pipeline make extensive

use of bypassing to minimize data hazards. Bypassing allows results forwarding from

multiple sources, eliminating the need to stall the pipeline.

3.10.2.2 Instruction Flow Through the Pipeline

The IXP42X product line and IXC1100 control plane processors’ pipeline issues a single

instruction per clock cycle. Instruction execution begins at the F1 pipe stage and

completes at the WB pipe stage.

Although a single instruction may be issued per clock cycle, all three pipelines (MAC,

memory, and main execution) may be processing instructions simultaneously. If there

are no data hazards, then each instruction may complete independently of the others.

Each pipe stage takes a single clock cycle or machine cycle to perform its subtask with

the exception of the MAC unit.

3.10.2.2.1 ARM* V5TE Instruction Execution

Figure 29 on page 169 uses arrows to show the possible flow of instructions in the

pipeline. Instruction execution flows from the F1 pipe stage to the RF pipe stage. The

RF pipe stage may issue a single instruction to either the X1 pipe stage or the MAC unit

(multiply instructions go to the MAC, while all others continue to X1). This means that

M1 or X1 will be idle.

All load/store instructions are routed to the memory pipeline after the effective

addresses have been calculated in X1.

The ARM V5TE bx (branch and exchange) instruction, which is used to branch between

ARM and thumb code, causes the entire pipeline to be flushed (The bx instruction is not

dynamically predicted by the BTB). If the processor is in Thumb mode, then the ID pipe

stage dynamically expands each Thumb instruction into a normal ARM V5TE RISC

instruction and execution resumes as usual.