Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

Definitions

Unit-stride access refers to a memory access pattern where consecutive memory accesses are made to consecutive array elements, in ascending or descending order. If the arrays are made of elemental types, then they imply adjacent memory locations as well. For example:

char j, k[MAX];

 

for (i = 0; i <

MAX; i++) {

...

 

 

j += k[i];

//

Every byte is used.

...

 

 

}

 

 

double x, y[MAX];

for (i = 0; i <

MAX; i++) {

...

 

 

x += y[i];

//

Every byte is used.

...

 

 

}

 

 

Exception to Unit Stride

The unit-stride concept works well when stepping through arrays of elementary data types. In some instances, unit stride alone may not be sufficient to determine how to use the PREFETCH instruction properly. For example, assume that there is a vertex structure of 256 bytes and the code steps through the vertices in unit stride, but using only the x, y, z, w components, each being of type float (for example, the first 16 bytes of each vertex). In this case, the prefetch distance obviously should be some function of the data size structure (for a properly chosen n):

prefetch [eax+n*structure_size]

...

 

add

eax, structure_size

You should experiment to find the optimal prefetch distance; there is no formula that works for all situations.

Data Stride per Loop Iteration

Assuming unit-stride access to a single array, the data stride of a loop (the loop stride) refers to the number of bytes accessed in the array per loop iteration. For example:

fldz add_loop:

fadd QWORD PTR [ebx*8+base_address] dec ebx

jnz add_loop

The data stride of the above loop is eight bytes. In general, for optimal use of prefetching, the data stride per iteration is the length of a cache line (64 bytes in the AMD Athlon 64 and AMD Opteron processors). If the loop stride is smaller, unroll the loop enough to use a whole cache line of data per

110

Cache and Memory Optimizations

Chapter 5

Page 126
Image 126
AMD 250 manual Definitions, Cache and Memory Optimizations