Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

Table 12. Write-Combining Completion Events (Continued)

Event

Comment

 

 

WT Nonsequential

If a subsequent WT write is not in ascending sequential order, the

 

write-combining completes. WC writes have no addressing

 

constraints within the 64-byte line being combined.

 

 

TLB AD bit set

Write-combining is closed whenever a TLB reload sets the accessed

 

[A] or dirty [D] bits of a Pde or Pte.

 

 

B.4 Sending Write-Buffer Data to the System

The maximum write combined throughput is achieved when all quadwords or doublewords are valid and the AMD Athlon 64 and AMD Opteron processors can use one efficient 64-byte memory write instead of multiple 8-byte memory writes.

B.5 Write-Combining Optimization on Revision D and E AMD Athlon™ 64 and AMD Opteron™ Processors

The number of Write Combining buffers on revision D and revision E AMD Athlon 64 and AMD Opteron processors has changed from earlier CPU revisions. Although the number of buffers available for write combining depends on the specific CPU revision, current designs provide as many as four write buffers for WC memory mapped I/O address spaces. These same buffers are used for streaming store instructions. The number of write-buffers determines how many independent linear 64-byte streams of WC data the CPU can simultaneously buffer.

Having multiple write-combining buffers that can combine independent WC streams has implications on data throughput rates (bandwidth), especially when data is written by the CPU to WC memory mapped I/O devices, residing on the AGP, PCI, PCI-X and PCI-E busses including:

Memory Mapped I/O registers—command FIFO, etc.

Memory Mapped I/O apertures—windows to which the CPU use programmed I/O to send data to a hardware device

Sequential block of 2D/3D graphic engine registers written using programmed I/O

Video memory residing on the graphics accelerator—frame buffer, render buffers, textures, etc.

HyperTransport tunnels are HyperTransport-to-bus bridges. There are tunnels for AGP, PCI Express, PCI and PCI-X. Examples of tunnels are the AMD-8151™ graphics tunnel, the AMD-8131™ I/O bus tunnel, and the AMD-8132™ PCI-X tunnel. Many HyperTransport tunnels use a hardware optimization feature called write-chaining. In write-chaining, the tunnel device buffers and combines separate HyperTransport packets of data sent by the CPU, creating one large burst on the underlying bus when the data is received by the tunnel in sequential address order. Using larger bursts results in

266

Implementation of Write-Combining

Appendix B

Page 282
Image 282
AMD 250 manual Sending Write-Buffer Data to the System, 266