AMD Athlon™ Processor x86 Code Optimization

22007E/0 — November 1999

Select DirectPath Over VectorPath Instructions

 

Use Direct Path instruct ions rather than Vec torPath

 

instructions. DirectPath instructions are optimized for decode

TOP

and execute efficiently by minimizing the number of operations

per x86 instruction, which includes ‘register register op

 

 

memory’ as well as ‘register register op register’ forms of

instructions. Up to three DirectPath instructions can be

 

decoded per cycle. VectorPath instructions will block the

 

decoding of DirectPath instructions.

 

The very high majority of instructions used be a compiler has

 

been implemented as DirectPath instructions in the

 

AMD Athlon processor. Assembly writers must still take into

 

consideration the usage of DirectPath versus VectorPath

 

instructions.

 

See Appendix F, “Instruction Dispatch and Execution

 

Resources” on page 187 and Appendix G, “DirectPath versus

 

VectorPath Instructions” on page 219 for tables of DirectPath

 

and VectorPath instructions.

Load-Execute Instruction Usage

Use Load-Execute Integer Instructions

decodable and can be decoded at the rate of three per cycle.

TOPSplitting a load-execute integer instruction into two separateMost load-execute integer instructions are DirectPathinstructions—a load instruction and a “reg, reg” instruction—reduces decoding bandwidth and increases register pressure,

which results in lower performance. The split-instruction form can be used to avoid scheduler stalls for longer executing instructions and to explicitly schedule the load and execute operations.

34

Select DirectPath Over VectorPath Instructions

Page 50
Image 50
AMD x86 manual Select DirectPath Over VectorPath Instructions, Load-Execute Instruction Usage