Intel IXC1100, IXP42X 3.10.2 Processors Pipeline

Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor

Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor

DM September 2006

168 Order Number: 252480-006US

3.10.1.1 About This Section

This guide assumes that you are familiar with the ARM instruction set and the C

language. It consists of the following sections:

•“Introduction” on page 167 — Outlines the contents of this guide.

•“Processors’ Pipeline” on page 168 — This section provides an overview of IXP42X

product line and IXC1100 control plane processors pipeline behavior.

•“Basic Optimizations” on page 173 — This section outlines basic optimizations that

can be applied to IXP42X product line and IXC1100 control plane processors.

•“Cache and Prefetch Optimizations” on page 180 — This section contains

optimizations for efficient use of caches. Also included are optimizations that take

advantage of the prefetch instruction of IXP42X product line and IXC1100 control

plane processors.

•“Instruction Scheduling” on page 191 — This section shows how to optimally

schedule code for IXP42X product line and IXC1100 control plane processors

pipeline.

•“Optimizing C Libraries” on page 199 — This section contains information relating

to optimizations for C library routines.

•“Optimizations for Size” on page 199 — This section contains optimizations that

reduce the size of the generated code. Thumb optimizations are also included.

3.10.2 Processors’ Pipeline

One of the biggest differences between the IXP42X product line and IXC1100 control

plane processors and ARM processors is the pipeline. Many of the differences are

summarized in Figure 29. This section provides a brief description of the structure and

behavior of the IXP42X product line and IXC1100 control plane processors pipeline.

3.10.2.1 General Pipeline Characteristics

While the IXP42X product line and IXC1100 control plane processors pipeline are scalar

and single issue, instructions may occupy all three pipelines at once. Out of order

completion is possible. The following sections discuss general pipeline characteristics.

3.10.2.1.1 Number of Pipeline Stages

The IXP42X product line and IXC1100 control plane processors have a longer pipeline

(seven stages versus five stages) which operates at a much higher frequency than its

predecessors do. This allows for greater overall performance. The longer the IXP42X

product line and IXC1100 control plane processors pipeline have several negative

consequences, however:

• Larger branch misprediction penalty (four cycles in the IXP42X product line and

IXC1100 control plane processors instead of one in ARM Architecture). This is

mitigated by dynamic branch prediction.

• Larger load use delay (LUD) - LUDs arise from load-use dependencies. A load-use

dependency gives rise to a LUD if the result of the load instruction cannot be made

available by the pipeline in due time for the subsequent instruction. An optimizing

compiler should find independent instructions to fill the slot following the load.

• Certain instructions incur a few extra cycles of delay on the IXP42X product line

and IXC1100 control plane processors as compared to ARM processors (LDM,

STM).

• Decode and register file lookups are spread out over two cycles in the IXP42X

product line and IXC1100 control plane processors, instead of one cycle in

predecessors.