Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

4.5Take Advantage of x86 and AMD64 Complex Addressing Modes

Optimization

When porting from other architectures, or, perhaps, if you are just new to x86 assembly language, remember that the x86 architecture provides many complex addressing modes. By building the effective address in one instruction, the instruction count can sometimes be reduced, leading to better code density and greater decode bandwidth. Refer to the the section on effective addresses in the AMD64 Architecture Programmer's Manual Volume 1: Application Programming for more detailed information on how effective addresses are formed.

Application

This optimization applies to:

32-bit software

64-bit software

Rationale

When building the effective address you sometimes seem to require numerous instructions when there is a base address (such as the base of an array) an index and perhaps a displacement. But x86 architecture can often handle all of this in one instruction. This can lead to reduced code size and fewer instructions to decode. As always, attention should be paid to total instruction length, latencies and whether or not the instruction choices are DirectPath (fastest) or VectorPath (slower).

Example

This first instruction sequence of 5 instructions and a total latency count of 8 can be replaced by one instruction.

Number of Bytes

Latency

Instruction

 

 

 

3

1

movl %r10d,%r11d

 

 

 

8

2

leaq 0x68E35,rcx

 

 

 

3

1

addq %rcx,%r11

 

 

 

5

3

movb (%r11,%r13),%cl

 

 

 

2

1

cmpb %al,%cl

 

 

 

The following instruction replaces the functionality of the above sequence.

78

Instruction-Decoding Optimizations

Chapter 4

Page 94
Image 94
AMD 250 manual Take Advantage of x86 and AMD64 Complex Addressing Modes, Example