Software Optimization Guide for AMD64 Processors | 25112 Rev. 3.06 September 2005 |
Note: PREFETCHNTA should NOT be used for large arrays that are only being written, not read. In such cases,
Current AMD Athlon 64 and AMD Opteron processors implement the PREFETCHT0, PREFETCHT1 and PREFETCHT2 instructions in exactly the same way as the PREFETCH instructions. That is, the data is brought into the L1 data cache. This functionality could be changed in future implementations.
PREFETCHW versus PREFETCH
Code that intends to modify the cache line that is brought in through prefetching should use the PREFETCHW instruction. PREFETCHW gives a hint to the AMD Athlon 64 and AMD Opteron
processors of an intent to modify the cache line. The AMD Athlon 64 and AMD Opteron processors mark the cache line being read by PREFETCHW as modified. Using PREFETCHW can save
additional cycles compared to PREFETCH, and avoid the subsequent cache state change caused by a write to the prefetched cache line. Only use PREFETCHW if there is a write to the same cache line afterwards.
Write-Combining Usage
Use
•The code will overwrite one or more complete cache lines with new data.
•The new data will not be used again soon.
One application where
Note: The
Note: For best performance, do not mix
106 | Cache and Memory Optimizations | Chapter 5 |