Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
September 2006 DM
Order Number: 252480-006US 185
Intel XScale® Processor—Intel® IXP42X product line and IXC1100 control plane processors
3.10.4.3.2 Memory Page Thrashing
Memory page thrashing occurs because of the nature of SDRAM. SDRAMs are typically
divided into four banks. Each bank can have one selected page where a page address
size for current memory components is often defined as 4 k. Memory lookup time or
latency time for a selected page address is currently two to three bus clocks. Thrashing
occurs when subsequent memory accesses within the same memory bank access
different pages. The memory page change adds three to four bus clock cycles to
memory latency. This added delay extends the prefetch distance correspondingly
making it more difficult to hide memory access latencies. This type of thrashing can be
resolved by placing the conflicting data structures into different memory banks or by
paralleling the data structures such that the data resides within the same memory
page. It is also extremely important to insure that instruction and data sections are in
different memory banks, or they will continually trash the memory page selection.
3.10.4.4 Prefetch Considerations
The IXP42X product line and IXC1100 control plane processors have a true prefetch
load instruction (PLD). The purpose of this instruction is to preload data into the data
and mini-data caches. Data prefetching allows hiding of memory transfer latency while
the processor continues to execute instructions. The prefetch is important to compiler
and assembly code because judicious use of the prefetch instruction can enormously
improve throughput performance of the IXP42X product line and IXC1100 control plane
processors. Data prefetch can be applied not only to loops but also to any data
references within a block of code. Prefetch also applies to data writing when the
memory type is enabled as write-allocate.
The IXP42X product line and IXC1100 control plane processors’ prefetch load
instruction is a true prefetch instruction because the load destination is the data or
mini-data cache and not a register. Compilers for processors which have data caches,
but do not support prefetch, sometimes use a load instruction to preload the data
cache. This technique has the disadvantages of using a register to load data and
requiring additional registers for subsequent preloads and thus increasing register
pressure. By contrast, the prefetch can be used to reduce register pressure instead of
increasing it.
The prefetch load is a hint instruction and does not guarantee that the data will be
loaded. Whenever the load would cause a fault or a table walk, then the processor will
ignore the prefetch instruction, the fault or table walk, and continue processing the
next instruction. This is particularly advantageous in the case where a linked list or
recursive data structure is terminated by a NULL pointer. Prefetching the NULL pointer
will not fault program flow.
3.10.4.4.1 Prefetch Loop Limitations
It is not always advantages to add prefetch to a loop. Loop characteristics that limit the
use value of prefetch are discussed below.
3.10.4.4.2 Compute versus Data Bus Bound
At the extreme, a loop, which is data bus bound, will not benefit from prefetch because
all the system resources to transfer data are quickly allocated and there are no
instructions that can profitably be executed. On the other end of the scale, compute
bound loops allow complete hiding of all data transfer latencies.
3.10.4.4.3 Low Number of Iterations
Loops with very low iteration counts may have the advantages of prefetch completely
mitigated. A loop with a small fixed number of iterations may be faster if the loop is
completely unrolled rather than trying to schedule prefetch instructions.