B-28 March, 2003 Developers Manual
Intel® 80200 Processor based on Intel® XScale Microarchitecture
Optimization Guide
B.4.4.6. Bandwidth Limitations
Overuse of prefetches can usurp resources and degrade performance. This happens because once
the bus traffic requests exceed the system resource capacity, the processor stalls. The Intel® 80200
processor data transfer resources are:
4 fill buffers
4 pending buffers
8 half cache line write buffer
SDRAM resources are typically:
4 memory banks
1 page buffer per bank referencing a 4K address range
4 transfer request buffers
Consider how these resources work together. A fill buffer is allocated for each cache read miss. A
fill buffer is also allocated each cache write miss if the memory space is write allocate along with a
pending buffer. A subsequent read to the same cache line does not require a new fill buffer, but
does require a pending buffer and a subsequent write also requires a new pending buffer. A fill
buffer is also allocated for each read to a non-cached memory and a write buffer is needed for each
memory write to non-cached memory that is non-coalescing. Consequently, a STM instruction
listing eight registers and referencing non-cached memory uses eight write buffers assuming they
don’t coalesce and two write buffers if they do coalesce. A cache eviction requires a write buffer
for each dirty bit set in the cache line. The prefetch instruction requires a fill buffer for each cache
line and 0, 1, or 2 write buffers for an eviction.
When adding prefetch instructions, caution must be asserted to insure that the combination of
prefetch and instruction bus requests do not exceed the system resource capacity described above
or performance is degraded instead of improved. The important points are to spread prefetch
operations over calculations so as to allow bus traffic to free flow and to minimize the number of
necessary prefetches.