AMD x86 manual Group II Optimizations-Secondary Optimizations, Load-Execute Instruction Usage

Models: x86

1 256
Download 256 pages 58.62 Kb
Page 25
Image 25

22007E/0 — November 1999

AMD Athlon™ Processor x86 Code Optimization

anywhere, in any type of code (integer, x87, 3DNow!, MMX, etc.). Use the following formula to determine prefetch distance:

Prefetch Length = 200 (DS/C)

Round up to the nearest cache line.

DS is the data stride per loop iteration.

C is the number of cycles per loop iteration when hitting in the L1 cache.

See “Use the 3DNow!™ PREFETCH and PREFETCHW

Instructions” on page 46 for more details.

Select DirectPath Over VectorPath Instructions

 

Use Direct Path instruct ions rather than Vec torPath

TOP

instructions. DirectPath instructions are optimized for decode

and execute efficiently by minimizing the number of operations

per x86 instruction. Three DirectPath instructions can be

 

decoded in parallel. Using VectorPath instructions will block

 

DirectPath instructions from decoding simultaneously.

 

See Appendix G, “DirectPath versus VectorPath Instructions”

 

on page 219 for a list of DirectPath and VectorPath instructions.

Group II Optimizations—Secondary Optimizations

Load-Execute Instruction Usage

See “Load-Execute Instruction Usage” on page 34 for more details.

Use Load-Execute Instructions

 

Wherever possible, use load-execute instructions to increase

TOP

code density with the one exception described below. The

split-instruction form of load-execute instructions can be used

to avoid scheduler stalls for longer executing instructions and

 

to explicitly schedule the load and execute operations.

Group II Optimizations —Secondary Optimizations

9

Page 25
Image 25
AMD x86 manual Group II Optimizations-Secondary Optimizations, Select DirectPath Over VectorPath Instructions