AMD x86 manual Example Multiple Prefetches, Code

Models: x86

1 256
Download 256 pages 58.62 Kb
Page 63
Image 63

22007E/0 — November 1999

AMD Athlon™ Processor x86 Code Optimization

PREFETCH/W versus The PREFETCHNTA/T0/T1/T2 instructions in the MMX PREFETCHNTA/T0/T1 extensions are processor implementation dependent. To

/T2maintain compatibility with the 25 million AMD-K6®-2 and AMD -K6 -III processors already sold, use the 3DNow! PREFETCH/W instructions instead of the various prefetch flavors in the new MMX extensions.

PREFETCHW Usage Code that intends to modify the cache line brought in through prefetching should use the PREFETCHW instruction. While PREFETCHW works the same as a PREFETCH on the AMD-K6-2 and AMD-K6-III processors, PREFETCHW gives a hint to the AMD Athlon processor of an intent to modify the cache line. The AMD Athlon processor will mark the cache line being brought in by PREFETCHW as Modified. Using PREFETCHW can save an additional 15-25 cycles compared to a PREFETCH and the subsequent cache state change caused by a write to the prefetched cache line.

Multiple Prefetches Programmers can initiate multiple outstanding prefetches on the AMD Athlon processor. While the AMD -K6 -2 and AMD-K6-III processors can have only one outstanding prefetch, the AMD Athlon processor can have up to six outstanding prefetches. When all six buffers are filled by various memory read requests, the processor will simply ignore any new prefetch requests until a buffer frees up. Multiple prefetch requests are essentially handled in-order. If data is needed first, then that data should be prefetched first.

The example below shows how to initiate multiple prefetches when traversing more than one array.

Example (Multiple Prefetches):

.CODE

.K3D

;original C code

;#define LARGE_NUM 65536

;double array_a[LARGE_NUM];

;double array b[LARGE_NUM];

;double array c[LARGE_NUM];

;int i;

;

;for (i = 0; i < LARGE_NUM; i++) {

;a[i] = b[i] * c[i]

;}

Use the 3DNow!™ PREFETCH and PREFETCHW Instructions

47

Page 63
Image 63
AMD x86 manual Example Multiple Prefetches, Code