AMD Athlon™ Processor x86 Code Optimization

22007E/0 — November 1999

Replace Certain SHLD Instructions with Alternative Code

 

Certain instances of the SHLD instruction can be replaced by

 

alternative code using SHR and LEA. The alternative code has

 

lower latency and requires less execution resources. SHR and

 

LEA (32-bit version) are DirectPath instructions, while SHLD is

 

a VectorPath instruction. SHR and LEA preserves decode

 

bandwidth as it potentially enables the decoding of a third

 

DirectPath instruction.

Example 1

(Avoid):

 

SHLD REG1, REG2, 1

 

(Preferred):

 

SHR REG2, 31

 

LEA REG1, [REG1*2 + REG2]

Example 2

(Avoid):

 

SHLD REG1, REG2, 2

 

(Preferred):

 

SHR REG2, 30

 

LEA REG1, [REG1*4 + REG2]

Example 3

(Avoid):

 

SHLD REG1, REG2, 3

 

(Preferred):

 

SHR REG2, 29

 

LEA REG1, [REG1*8 + REG2]

Use 8-Bit Sign-Extended Immediates

Using 8-bit sign-extended immediates improves code density with no negative effects on the AMD Athlon processor. For example, ADD BX, –5 should be encoded “83 C3 FB” and not “81 C3 FF FB”.

38

Replace Certain SHLD Instructions with Alternative

Page 54
Image 54
AMD x86 manual Use 8-Bit Sign-Extended Immediates, Replace Certain Shld Instructions with Alternative Code