Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

5.14Stack Considerations

Make sure the stack is suitably aligned for the local variable with the largest base type. Then, using the technique described in “Sorting and Padding C and C++ Structures” on page 117, all variables can be properly aligned with no padding.

Application

This optimization applies to:

32-bit software

Extend Arguments to 32 Bits Before Pushing onto Stack

Function arguments smaller than 32 bits should be extended to 32 bits before being pushed onto the stack, which ensures that the stack is always doubleword aligned on entry to a function.

If a function has no local variables with a base type larger than a doubleword, no further work is necessary. If the function does have local variables whose base type is larger than a doubleword, insert additional code to ensure proper alignment of the stack. For example, the following code achieves quadword alignment:

prologue:

 

push

ebp

 

mov

ebp, esp

 

sub

esp, SIZE_OF_LOCALS

; Size of local variables

and

esp, –8

 

...

 

; Push registers that need to be preserved.

epilogue:

; Pop register that needed to be preserved.

leave

 

 

ret

 

 

With this technique, function arguments can be accessed through EBP, and local variables can be accessed through ESP. Save and restore EBP between the prologue and the epilogue to keep it free for general use.

Optimized Stack Usage

It is sometimes possible to improve performance in frequently executed routines by altering the way variables and parameters are passed and accessed on the stack. Replacing PUSH and POP instructions with MOV instructions can reduce stack pointer dependencies and uses fewer execution resources.

This optimization is usually most effective in smaller routines. Excessive use of this optimization can result in increased code size as MOV instructions are considerably larger than PUSH and POP instructions.

122

Cache and Memory Optimizations

Chapter 5

Page 138
Image 138
AMD 250 manual Stack Considerations, Extend Arguments to 32 Bits Before Pushing onto Stack, Optimized Stack Usage, 122