155

25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

Avoid an assembly-language equivalent like this, which uses base and displacement components (for example, [esi+a]) to compute array-element addresses, requiring additional pointer arithmetic to increment the offsets into the forward-traversed arrays:

mov ecx, MAXSIZE	; Initialize loop counter.
xor esi, esi	; Initialize offset into array a.
xor edi, edi	; Initialize offset into array b.
xor ebx, ebx	; Initialize offset into array c.
add_loop:
mov eax, [esi+a]	; Get element from	a.
mov edx, [edi+b]	; Get element from	b.
add eax, edx	; a[i] + b[i]
mov [ebx+c], eax	; Write result to c.
add esi, 4	; Increment offset	into a.
add edi, 4	; Increment offset	into b.
add ebx, 4	; Increment offset	into c.
dec ecx	; Decrement loop count
jnz add_loop	; until loop count is 0.

Instead, traverse the arrays in a downward direction (from higher to lower addresses), in order to take advantage of scaled-index addressing (for example, [ecx*4+a]), which minimizes pointer arithmetic within the loop:

mov ecx, MAXSIZE - 1	; Initialize	index.
add_loop:
mov eax, [ecx*4+a]	; Get element from a.
mov edx, [ecx*4+b]	; Get element from b.
add eax, edx	; a[i] + b[i]
mov [ecx*4+c], eax	; Write result	to	c.
dec ecx	; Decrement index
jns add_loop	; until index	is	negative.

A change in the direction of traversal is possible only if each loop iteration is completely independent of the others. If you cannot change the direction of traversal for a given array, it is still possible to minimize pointer arithmetic by using as a base address a displacement that points to the byte past the end of the array, and using an index that starts with a negative value and reaches zero when the loop expires:

mov ecx, (-MAXSIZE)	; Initialize index.
add_loop:
mov eax, [ecx4+a+MAXSIZE4]		; Get element from a.
mov edx, [ecx4+b+MAXSIZE4]		; Get element from b.
add eax, edx		; a[i] + b[i]
mov [ecx4+c+MAXSIZE4], eax		; Write result to c.
inc ecx		; Increment index
jnz add_loop		; until index is 0.

Chapter 7

Scheduling Optimizations

AMD 250 manual 155

Models: 250