Developers Manual March, 2003 12-9
Intel® 80200 Processor based on Intel® XScale Microarchitecture
Performance Monitoring
12.5.4 Data/Bus Request Buffer Full Mode
The Data Cache has buffers available to service cache misses or uncacheable accesses. For every
memory request that the Data Cache receives from the processor core, a buffer is speculatively
allocated in case an external memory request is required or temporary storage is needed for an
unaligned access. If no buffers are available, the Data Cache will stall the processor core. How
often the Data Cache stalls depends on the performance of the bus external to the Intel® 80200
processor and what the memory access latency is for Data Cache miss requests to external memory.
If the Intel® 80200 processor memory access latency is high, possibly due to starvation, these Data
Cache buffers becomes full. This performance monitoring mode is provided to see if the Intel®
80200 processor is being starved of the bus external to the Intel® 80200 processor, which effects
the performance of the application running on the Intel® 80200 processor.
PMN0 accumulates the number of clock cycles the processor is being stalled due to this condition
and PMN1 monitors the number of times this condition occurs.
Statistics derived from these two events:
The average number of cycles the processor stalled on a data-cache access that may overflow
the data-cache buffers. This is calculated by dividing PMN0 by PMN1. This statistic lets you
know if the duration event cycles are due to many requests or are attributed to just a few
requests. If the average is high then the Intel® 80200 processor may be starved of the bus
external to the Intel® 80200 processor.
The percentage of total execution cycles the processor stalled because a Data Cache request
buffer was not available. This is calculated by dividing PMN0 by CCNT, which was used to
measure total execution time.
12.5.5 Stall/Writeback Statistics
When an instruction requires the result of a previous instruction and that result is not yet available,
the Intel® 80200 processor stalls in order to preserve the correct data dependencies. PMN0 counts
the number of stall cycles due to data-dependencies. Not all data-dependencies cause a stall; only
the following dependencies cause such a stall penalty:
Load-use penalty: attempting to use the result of a load before the load completes. To avoid the
penalty, software should delay using the result of a load until it’s available. This penalty shows
the latency effect of data-cache access.
Multiply/Accumulate-use penalty: attempting to use the result of a multiply or
multiply-accumulate operation before the operation completes. Again, to avoid the penalty,
software should delay using the result until it’s available.
ALU use penalty: there are a few isolated cases where back to back ALU operations may
result in one cycle delay in the execution. These cases are defined in Chapter14,
“Performance Considerations”.