Intel® IXP42X product line and IXC1100 control plane processors—Intel XScale® Processor
Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
DM September 2006
168 Order Number: 252480-006US

3.10.1.1 About This Section

This guide assumes that you are familiar with the ARM instruction set and the C
language. It consists of the following sections:
“Introduction” on page 167 — Outlines the contents of this guide.
“Processors’ Pipeline” on page 168 — This section provides an overview of IXP42X
product line and IXC1100 control plane processors pipeline behavior.
“Basic Optimizations” on page 173 — This section outlines basic optimizations that
can be applied to IXP42X product line and IXC1100 control plane processors.
“Cache and Prefetch Optimizations” on page 180 — This section contains
optimizations for efficient use of caches. Also included are optimizations that take
advantage of the prefetch instruction of IXP42X product line and IXC1100 control
plane processors.
“Instruction Scheduling” on page 191 — This section shows how to optimally
schedule code for IXP42X product line and IXC1100 control plane processors
pipeline.
“Optimizing C Libraries” on page 199 — This section contains information relating
to optimizations for C library routines.
“Optimizations for Size” on page 199 — This section contains optimizations that
reduce the size of the generated code. Thumb optimizations are also included.
3.10.2 Processors’ Pipeline
One of the biggest differences between the IXP42X product line and IXC1100 control
plane processors and ARM processors is the pipeline. Many of the differences are
summarized in Figure 29. This section provides a brief description of the structure and
behavior of the IXP42X product line and IXC1100 control plane processors pipeline.

3.10.2.1 General Pipeline Characteristics

While the IXP42X product line and IXC1100 control plane processors pipeline are scalar
and single issue, instructions may occupy all three pipelines at once. Out of order
completion is possible. The following sections discuss general pipeline characteristics.
3.10.2.1.1 Number of Pipeline Stages
The IXP42X product line and IXC1100 control plane processors have a longer pipeline
(seven stages versus five stages) which operates at a much higher frequency than its
predecessors do. This allows for greater overall performance. The longer the IXP42X
product line and IXC1100 control plane processors pipeline have several negative
consequences, however:
Larger branch misprediction penalty (four cycles in the IXP42X product line and
IXC1100 control plane processors instead of one in ARM Architecture). This is
mitigated by dynamic branch prediction.
Larger load use delay (LUD) - LUDs arise from load-use dependencies. A load-use
dependency gives rise to a LUD if the result of the load instruction cannot be made
available by the pipeline in due time for the subsequent instruction. An optimizing
compiler should find independent instructions to fill the slot following the load.
Certain instructions incur a few extra cycles of delay on the IXP42X product line
and IXC1100 control plane processors as compared to ARM processors (LDM,
STM).
Decode and register file lookups are spread out over two cycles in the IXP42X
product line and IXC1100 control plane processors, instead of one cycle in
predecessors.