Example, 152 | AMD 250 specs

25112 Rev. 3.06 September 2005

Example

Avoid code that places a load whose address takes longer to calculate before a load whose address can be determined more quickly:

add ebx, ecx

mov eax, DWORD PTR [10h] mov ecx, DWORD PTR [eax+ebx] mov edx, DWORD PTR [24h]

;Instruction 1

;Instruction 2 (fast address calc.)

;Instruction 3 (slow address calc.)

;This load is stalled from accessing the

;data cache due to the long latency

;caused by generating the address for

;instruction 3.

Where possible, reorder instructions so that loads with simpler address calculations come before those with more complex address calculations:

add ebx, ecx			; Instruction 1
mov eax, DWORD PTR [10h]			; Instruction 2
mov	edx,	DWORD PTR [24h]	; Place load above instruction 3 to avoid
			;	address-generation interlock stall.
mov	ecx,	DWORD PTR [eax+ebx]	;	Instruction 3

Scheduling Optimizations

Chapter 7