25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

movdqa [rax + 48], xmm3

/* Allocate and fill another WC buffer. */

movdqa [rax + 64], xmm4 movdqa [rax + 80], xmm5 movdqa [rax + 96], xmm6

/* The second WC buffer is forced after the next write. */ /* The linear ascending order between cache lines */

/* is maintained since buffer is sent when filled. */

movdqa [rax + 112], xmm7

SFENCE

/* The SFENCE forces the write-combining buffer */ /* out of the processor and to the graphics chip. */ /* Set up the next drawing commands in cached */

/* memory structure ShadowRegs_Structure. */

D.3 Fast-Write Optimizations for Video-Memory Copies

When performing block copies of an image to the graphics accelerator’s local memory, you can preserve the contents of the L1 and L2 caches and reduce cache-line-replacement traffic to system memory by using a nontemporal block prefetch on the image data using the PREFETCHNTA instruction. This works well with images loaded into system memory through disk DMA because the data can be kept out of the L2 cache and mostly out of the L1 data cache (when using PREFETCHNTA). This is illustrated in Listing 32

Note: On the AMD Athlon™ 64 and AMD Opteron™ processors, PREFETCHNTA uses one way of the two-way set-associative L1 data cache. One way of the L1 data cache is 32 Kbytes, so limit the block prefetch size to less than or equal to 32 Kbytes.

Listing 32. Writing Nontemporal Data to Video RAM

/* Copy an image larger than 32 Kbytes into local memory, */

/* but limit the block prefetch so as not to exceed 32 Kbytes, */ /* which is the size of the nontemporal cache. */

/* First, block prefetch 16 Kbytes into the L1 data cache, then write */ /* it to the frame buffer. */

/* On AMD Athlon 64 and AMD Opteron processors, the PREFETCHNTA instruction must execute prior */

/* to subsequent instructions. */

/* Cache lines that are prefetched via PREFETCHNTA and later replaced are */ /* not evicted to the L2 cache or system memory. */

Appendix D

AGP Considerations

349

Page 365
Image 365
AMD 250 manual Fast-Write Optimizations for Video-Memory Copies, 349