Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

7.1Instruction Scheduling by Latency

Optimization

In general, select instructions with shorter latencies that are DirectPath—not VectorPath— instructions. For a list of instruction latencies and classifications, see Appendix C, “Instruction Latencies.”

The AMD Athlon™ 64 and AMD Opteron™ processors can execute up to three AMD64 instructions per cycle, with each instruction possibly having a different latency. The AMD Athlon 64 and AMD Opteron processors have flexible scheduling, but for absolute maximum performance, schedule instructions according to their latencies and data dependencies. The goal is to reduce the overall length of dependency chains.

Application

This optimization applies to:

32-bit software

64-bit software

144

Scheduling Optimizations

Chapter 7

Page 160
Image 160
AMD 250 manual Instruction Scheduling by Latency, 144