Intel® IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
September 2006 DM
Order Number: 252480-006US 141
Intel XScale® Processor—Intel® IXP42X product line and IXC1100 control plane processors
3.7.4.4 Data/Bus Request Buffer Full Mode
The Data Cache has buffers available to service cache misses or uncacheable accesses.
For every memory request that the Data Cache receives from the processor core a
buffer is speculatively allocated in case an external memory request is required or
temporary storage is needed for an unaligned access. If no buffers are available, the
Data Cache will stall the processor core. How often the Data Cache stalls depends on
the performance of the bus external to the IXP42X product line and IXC1100 control
plane processors and what the memory access latency is for Data Cache miss requests
to external memory. If the IXP42X product line and IXC1100 control plane processors
memory access latency is high, possibly due to starvation, these Data Cache buffers
will become full. This performance monitoring mode is provided to see if the IXP42X
product line and IXC1100 control plane processors are being starved of the bus
external to the IXP42X product line and IXC1100 control plane processors, which will
effect the performance of the application running on the IXP42X product line and
IXC1100 control plane processors.
PMN0 accumulates the number of clock cycles the processor is being stalled due to this
condition and PMN1 monitors the number of times this condition occurs.
Statistics derived from these two events:
The average number of cycles the processor stalled on a data-cache access that
may overflow the data-cache buffers. This is calculated by dividing PMN0 by PMN1.
This statistic lets you know if the duration event cycles are due to many requests or
are attributed to just a few requests. If the average is high, the IXP42X product line
and IXC1100 control plane processors may be starved of the bus external to the
IXP42X product line and IXC1100 control plane processors.
The percentage of total execution cycles the processor stalled because a Data
Cache request buffer was not available. This is calculated by dividing PMN0 by
CCNT, which was used to measure total execution time.
3.7.4.5 Stall/Write-Back Statistics
When an instruction requires the result of a previous instruction and that result is not
yet available, the IXP42X product line and IXC1100 control plane processors stall in
order to preserve the correct data dependencies. PMN0 counts the number of stall
cycles due to data-dependencies. Not all data-dependencies cause a stall; only the
following dependencies cause such a stall penalty:
Load-use penalty: attempting to use the result of a load before the load completes.
To avoid the penalty, software should delay using the result of a load until it’s
available. This penalty shows the latency effect of data-cache access.
Multiply/Accumulate-use penalty: attempting to use the result of a multiply or
multiply-accumulate operation before the operation completes. Again, to avoid the
penalty, software should delay using the result until it’s available.
ALU use penalty: there are a few isolated cases where back to back ALU operations
may result in one cycle delay in the execution. These cases are defined in
Table 3.9, “Performance Considerations” on page 159.
PMN1 counts the number of write-back operations emitted by the data cache. These
write-backs occur when the data cache evicts a dirty line of data to make room for a
newly requested line or as the result of clean operation (CP15, register 7).
Statistics derived from these two events:
The percentage of total execution cycles the processor stalled because of a data
dependency. This is calculated by dividing PMN0 by CCNT, which was used to
measure total execution time. Often a compiler can reschedule code to avoid these
penalties when given the right optimization switches.