25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

frequencies increase, so will the ratio of operating frequencies between processor caches and DDR memory. The processor-to-write-back cache bandwidth is also higher than processor-to-AGP-aperture bandwidth (write-combining memory type), since the DDR writes are avoided (as well as GART translation latencies).

It may be possible to prevent pollution of the L1-data and L2 caches from DMA data by using the nontemporal PREFETCHNTA instruction on the DMA buffer and limiting prefetching of the DMA buffer to less than 32 Kbytes (PREFETCHNTA uses only one way of the L1 data cache).

Use PREFETCHNTA on the linear address to the DMA buffer, and not the AGP aperture address, before reading or writing the DMA buffer.

Another key optimization for the DMA model on AMD Athlon 64 and AMD Opteron systems is that coherency is maintained between processor caches and an AGP master making accesses outside of the AGP aperture.

This is a key AGP enhancement that is required of AGP 3.0 target (host platform) systems.

In effect, this means that an AGP master can create a DMA buffer in normal write-back memory and then pass the physical DRAM page address to the AGP master; in other words, the AGP virtual address and GART translation is not used.

Use PREFETCHNTA on the linear address to the DMA buffer, before reading or writing the DMA buffer.

If the AGP card hardware is capable of buffering the physical DRAM page addresses sent to the AGP card in a FIFO, then in effect the AGP card’s device driver is getting AGP scatter-gather capabilities, with cache coherency provided by the processor.

D.6 Optimizations for Texture-Map Copies to AGP Memory

To avoid cache pollution, use the same technique described in “Fast-Write Optimizations for Video- Memory Copies” on page 349 to copy texture data into AGP memory, since this data tends to be nontemporal.

D.7 Optimizations for Vertex-Geometry Copies to AGP Memory

To avoid cache pollution, use the same technique described in “Fast-Write Optimizations for Video- Memory Copies” on page 349 to copy vertex data into AGP memory, since this data tends to be nontemporal.

Appendix D

AGP Considerations

353

Page 369
Image 369
AMD 250 Optimizations for Texture-Map Copies to AGP Memory, Optimizations for Vertex-Geometry Copies to AGP Memory, 353