22007E/0 — November 1999

AMD Athlon™ Processor x86 Code Optimization

Use 8-BitSign-Extended Displacements. . . . . . . . . . . . . . . . . . . . . . . 39 Code Padding Using Neutral Code Fillers . . . . . . . . . . . . . . . . . . . . . 39 Recommendations for the AMD Athlon Processor . . . . . . . . . 40

Recommendations for AMD-K6®Family and

AMD Athlon Processor Blended Code . . . . . . . . . . . . . . . . . . . 41

5

Cache and Memory Optimizations

45

Memory Size and Alignment Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Avoid Memory Size Mismatches . . . . . . . . . . . . . . . . . . . . . . . . 45 Align Data Where Possible . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Use the 3DNow! PREFETCH and PREFETCHW Instructions. . . . . 46

Take Advantage of Write Combining . . . . . . . . . . . . . . . . . . . . . . . . . 50 Avoid Placing Code and Data in the Same 64-Byte Cache Line. . . . 50 Store-to-Load Forwarding Restrictions. . . . . . . . . . . . . . . . . . . . . . . . 51 Store-to-Load Forwarding Pitfalls—True Dependencies. . . . 51 Summary of Store-to-Load Forwarding Pitfalls to Avoid . . . . 54 Stack Alignment Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Align TBYTE Variables on Quadword Aligned Addresses . . . . . . . . 55 C Language Structure Component Considerations . . . . . . . . . . . . . . 55 Sort Variables According to Base Type Size . . . . . . . . . . . . . . . . . . . 56

6

Branch Optimizations

57

Avoid Branches Dependent on Random Data . . . . . . . . . . . . . . . . . . 57 AMD Athlon Processor Specific Code . . . . . . . . . . . . . . . . . . . 58 Blended AMD-K6 and AMD Athlon Processor Code . . . . . . . 58 Always Pair CALL and RETURN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Replace Branches with Computation in 3DNow! Code . . . . . . . . . . . 60 Muxing Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Sample Code Translated into 3DNow! Code . . . . . . . . . . . . . . 61 Avoid the Loop Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Avoid Far Control Transfer Instructions . . . . . . . . . . . . . . . . . . . . . . 65 Avoid Recursive Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Contents

v

Page 5
Image 5
AMD x86 manual Cache and Memory Optimizations