25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

On the AMD Athlon™ 64 and AMD Opteron™ processors, write-combining can be used, and software can take advantage of the fact that writes are sent out of the processor's write buffers in ascending order (and appear on HyperTransport that way), from low quadword to high quadword.

Use the Memory Type Range Register (MTRR) mechanism in conjunction with the PAT MSR (model-specific register 277h) to enable write-combining as the memory type for the FIFO address space.

To enable write-combining as the memory type for the FIFO address space, follow these steps:

1.Change the PAT MSR entries that contain a type value of 00h (UC-uncacheable) to a type value of 07h (UC-minus).

2.Program an MTRR with the physical addres and mask range of the command FIFO.

Note: MTRR registers mark addresses on page granularity boundaries of 4 Kbytes, so the FIFO address should begin on a 4-Kbyte-aligned address boundary).

For more information, see Chapter 7, “Memory System,” in volume 2 of the AMD64 Architecture Programmer’s Manual, order# 24593.

Many graphics engines have a front-end command FIFO that requires the render command to be issued first, followed by a variable number of doublewords, depending on the render command.

Create a cache-aligned command structure in cacheable memory, map the rendering command into the lowest doubleword of the structure (which will be issued first), map the next data required in the command into the next structure element, and so on, until all the data “registers” for this command are included in the structure. An example is given in Figure 11.

Doubleword 16 (3Fh)

Doubleword 2 (8h)

Doubleword 1 (4h)

Doubleword 0 (0h)

Top of cache line

.

.

.

Parameter 2

Parameter 1

Render command 1

Figure 11. Cacheable-Memory Command Structure

When the command (or commands) are filled in the shadowed structure, use a high-speed copy routine like the one shown in Listing 31 on page 348. Copy the structure to the actual graphic accelerator’s write-combining FIFO address space. Locating the write-combining command FIFO at a cache-aligned address is slightly better, since one HyperTransport link-size write occurs instead of two).

Appendix D

AGP Considerations

347

Page 363
Image 363
AMD 250 manual Cacheable-Memory Command Structure, 347