22007E/0 — November 1999

Determining Prefetch Distance

Prefetch at Least 64 Bytes Away from Surrounding Stores

AMD Athlon™ Processor x86 Code Optimization

The following optimization rules were applied to this example.

Loops should be unrolled to make sure that the data stride per loop iteration is equal to the length of a cache line. This avoids overlapping PREFETCH instructions and thus optimal use of the available number of outstanding PREFETCHes.

Since the array "array_a" is written rather than read, PREFETCHW is used instead of PREFETCH to avoid overhead for switching cache lines to the correct MESI state. The PREFETCH lookahead has been optimized such that each loop iteration is working on three cache lines while six active PREFETCHes bring in the next six cache lines.

Index arithmetic has been reduced to a minimum by use of complex addressing modes and biasing of the array base addresses in order to cut down on loop overhead.

Given the latency of a typical AMD Athlon processor system and expected processor speeds, the following formula should be used to determine the prefetch distance in bytes for a single array:

Prefetch Distance = 200 (DS/C) bytes

Round up to the nearest 64-byte cache line.

The number 200 is a constant based upon expected AMD Athlon processor clock frequencies and typical system memory latencies.

DS is the data stride in bytes per loop iteration.

C is the number of cycles for one loop to execute entirely from the L1 cache.

The prefetch distance for multiple arrays are typically even longer.

The PREFETCH and PREFETCHW instructions can be affected by false dependencies on stores. If there is a store to an address that matches a request, that request (the PREFETCH or PREFETCHW instruction) may be blocked until the store is written to the cache. Therefore, code should prefetch data that is located at least 64 bytes away from any surrounding store’s data address.

Use the 3DNow!™ PREFETCH and PREFETCHW Instructions

49

Page 65
Image 65
AMD x86 manual Determining Prefetch Distance, Prefetch Distance = 200 DS/C bytes