25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

4.9Alternatives to SHLD Instruction

Optimization

Where register pressure is low, replace the SHLD instruction with alternative code using ADD and ADC, or SHR and LEA.

Application

This optimization applies to:

32-bit software

64-bit software

Rationale

Using alternative code in place of SHLD achieves lower overall latency and requires fewer execution resources. The 32-bit and 64-bit forms of ADD, ADC, SHR, and LEA are DirectPath instructions, while SHLD is a VectorPath instruction. Use of the replacement code optimizes decode bandwidth because it potentially enables the simultaneous decoding of a third DirectPath instruction. However, the replacement code may increase register pressure because it destroys the contents of one register (reg2 in the following examples) whereas the register is preserved by SHLD.

Example 1

Replace this instruction:

shld reg1, reg2, 1

with this code sequence:

add reg2, reg2 adc reg1, reg1

Example 2

Replace this instruction:

shld reg1, reg2, 2

with this code sequence:

shr reg2, 30

lea reg1, [reg1*4+reg2]

Chapter 4

Instruction-Decoding Optimizations

85

Page 101
Image 101
AMD 250 manual Alternatives to Shld Instruction