25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

7.3Inline Functions

Optimization

Use function inlining when:

A function is called from just one site in the code. (For the C language, determination of this characteristic is made easier if functions are explicitly declared static unless they require external linkage.)

A function—once inlined—contains fewer than 25 machine instructions.

Application

This optimization applies to:

32-bit software

64-bit software

Rationale

There are advantages and disadvantages to function inlining. On the one hand, function inlining eliminates function-call overhead and allows better register allocation and instruction scheduling at the site of the function call. The disadvantage of function inlining is decreased code reference locality, which can increase execution time due to instruction cache misses.

For functions that create fewer than 25 machine instructions once inlined, it is likely that the function- call overhead is close to, or more than, the time spent executing the function body. In these cases, function inlining is recommended.

Function-call overhead on the AMD Athlon 64 and AMD Opteron processors can be low because calls and returns are executed very quickly due to the use of prediction mechanisms. However, there is still overhead due to passing function arguments through memory, which creates store-to-load-

forwarding dependencies. (In 64-bit mode, this overhead is typically avoided by passing more arguments in registers, as specified in the AMD64 Application Binary Interface [ABI] for the

operating system.)

For longer functions, the benefits of reduced function-call overhead give diminishing returns. A function that results in the insertion of more than 500 machine instructions at the call site should probably not be inlined. Some larger functions might consist of multiple, relatively short paths that are negatively affected by function overhead. In such a case, it can be advantageous to inline larger functions. Profiling information is the best guide in determining whether to inline such large functions.

Chapter 7

Scheduling Optimizations

149

Page 165
Image 165
AMD 250 manual Inline Functions, 149