Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

Example

Avoid code that places a load whose address takes longer to calculate before a load whose address can be determined more quickly:

add ebx, ecx

mov eax, DWORD PTR [10h] mov ecx, DWORD PTR [eax+ebx] mov edx, DWORD PTR [24h]

;Instruction 1

;Instruction 2 (fast address calc.)

;Instruction 3 (slow address calc.)

;This load is stalled from accessing the

;data cache due to the long latency

;caused by generating the address for

;instruction 3.

Where possible, reorder instructions so that loads with simpler address calculations come before those with more complex address calculations:

add ebx, ecx

; Instruction 1

mov eax, DWORD PTR [10h]

; Instruction 2

mov

edx,

DWORD PTR [24h]

; Place load above instruction 3 to avoid

 

 

 

;

address-generation interlock stall.

mov

ecx,

DWORD PTR [eax+ebx]

;

Instruction 3

152

Scheduling Optimizations

Chapter 7

Page 168
Image 168
AMD 250 manual Example, 152