25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

byte boundary in a quadword, but is closed by a write that is either not “contiguous and ascending” or fills byte 7.

All other memory types for stores that go through the write buffer (UC and WP) cannot be combined except when the WB memory type is over-ridden for streaming store instructions such as the MOVNTQ and MOVNTI instructions, etc. These instructions use the write buffers and will be write-combined in the same way as address spaces mapped by the MTTR registers and PAT extensions. When WC is used for streaming store instructions, then the buffers are subject to the same flushing events as write-combined address spaces.

Combining is able to continue until interrupted by one of the conditions listed in Table 12 on page 265. When combining is interrupted, one or more bus commands are issued to the system for that write buffer, as described in “Sending Write-Buffer Data to the System” on page 266.

Table 12. Write-Combining Completion Events

Event

Comment

 

 

Non-WB write outside of current

(On revisions A–C processors only) The first non-WB write to a

buffer

different cache block address closes combining for previous writes.

 

WB writes do not affect write-combining. Only one line-sized buffer

 

can be open for write-combining at a time. Once a buffer is closed for

 

write-combining, it cannot be reopened for write-combining.

 

 

I/O Read or Write

Any IN/INS or OUT/OUTS instruction closes combining. The implied

 

memory type for all IN/OUT instructions is UC, which cannot be

 

combined.

 

 

Serializing instructions

Any serializing instruction closes combining. These instructions

 

include: MOVCRx, MOVDRx, WRMSR, INVD, INVLPG, WBINVD,

 

LGDT, LLDT, LIDT, LTR, CPUID, IRET, RSM, INIT, and HALT.

 

 

Flushing instructions

Any flush instruction causes the WC to complete.

 

 

Locks

Any instruction or processor operation that requires a cache or bus

 

lock closes write-combining before starting the lock. Writes within a

 

lock can be combined.

 

 

Uncacheable Read

A UC read closes write-combining. A WC read closes combining

 

only if a cache block address match occurs between the WC read

 

and a write in the write buffer.

 

 

Different memory type

Any WT write while write-combining for WC memory or any WC write

 

while write-combining for WT memory closes write-combining.

 

 

Buffer full

Write-combining is closed if all 64 bytes of the write buffer are valid.

 

 

WT time-out

If 16 processor clocks have passed since the most recent write for

 

WT write-combining, write-combining is closed. There is no time-out

 

for WC write-combining.

 

 

WT write fills byte 7

Write-combining is closed if a write fills the most significant byte of a

 

quadword, which includes writes that are misaligned across a

 

quadword boundary. In the misaligned case, combining is closed by

 

the LS part of the misaligned write and combining is opened by the

 

MS part of the misaligned store.

 

 

Appendix B

Implementation of Write-Combining

265

Page 281
Image 281
AMD 250 manual Write-Combining Completion Events, 265