22007E/0 — November 1999

AMD Athlon™ Processor x86 Code Optimization

Example 1 (Avoid):

ADD EBX, ECX

MOV EAX, DWORD PTR [10h]

MOV ECX, DWORD PTR [EAX+EBX]

MOV EDX, DWORD PTR [24h]

;inst 1

;inst 2 (fast address calc.) ;inst 3 (slow address calc.) ;this load is stalled from

;accessing data cache due

;to long latency for

;generating address for

;inst 3

Example 2 (Preferred):

ADD EBX, ECX

;inst 1

MOV EAX, DWORD PTR [10h]

;inst 2

MOV EDX, DWORD PTR [24h]

;place load above inst 3

 

; to avoid address

 

; generation interlock stall

MOV ECX, DWORD PTR [EAX+EBX]

;inst 3

Use MOVZX and MOVSX

Use the MOVZX and MOVSX instructions to zero-extend and sign-extend byte-size and word-size operands to doubleword length. For example, typical code for zero extension creates a superset dependency when the zero-extended value is used, as in the following code:

Example 1 (Avoid):

XOR

EAX, EAX

MOV

AL, [MEM]

Example 2 (Preferred):

MOVZX

EAX, BYTE PTR [MEM]

Minimize Pointer Arithmetic in Loops

Minimize pointer arithmetic in loops, especially if the loop body is small. In this case, the pointer arithmetic would cause significant overhead. Instead, take advantage of the complex addressing modes to utilize the loop counter to index into memory arrays. Using complex addressing modes does not have any negative impact on execution speed, but the reduced number of instructions preserves decode bandwidth.

Use MOVZX and MOVSX

73

Page 89
Image 89
AMD x86 manual Use Movzx and Movsx, Minimize Pointer Arithmetic in Loops