25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

iteration. However, unrolling the loop may not be feasible if the original loop stride is very small (for example, only two bytes).

Prefetch at Least 64 Bytes Away from Surrounding Stores

The prefetch instructions can be affected by false dependencies on stores. If there is a store to an address that matches a request, that request (the prefetch instruction) may be blocked until the store is written to the cache. Therefore, code should prefetch data that is located at least 64 bytes away from any surrounding store’s data address.

Chapter 5

Cache and Memory Optimizations

111

Page 127
Image 127
AMD 250 manual Prefetch at Least 64 Bytes Away from Surrounding Stores, 111