22007E/0 — November 1999

AMD Athlon™ Processor x86 Code Optimization

Signed Derivation for Algorithm, Multiplier, and

Shift Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

9

Floating-Point Optimizations

97

Ensure All FPU Data is Aligned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Use Multiplies Rather than Divides . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Use FFREEP Macro to Pop One Register from the FPU Stack . . . . 98 Floating-Point Compare Instructions . . . . . . . . . . . . . . . . . . . . . . . . . 98 Use the FXCH Instruction Rather than FST/FLD Pairs . . . . . . . . . . 99 Avoid Using Extended-Precision Data . . . . . . . . . . . . . . . . . . . . . . . . 99 Minimize Floating-Point-to-Integer Conversions . . . . . . . . . . . . . . . 100 Floating-Point Subexpression Elimination. . . . . . . . . . . . . . . . . . . . 103

Check Argument Range of Trigonometric Instructions

Efficiently . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Take Advantage of the FSINCOS Instruction . . . . . . . . . . . . . . . . . 105

10 3DNow!™ and MMX™ Optimizations

107

Use 3DNow! Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Use FEMMS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Use 3DNow! Instructions for Fast Division . . . . . . . . . . . . . . . . . . . 108 Optimized 14-Bit Precision Divide . . . . . . . . . . . . . . . . . . . . . 108 Optimized Full 24-Bit Precision Divide . . . . . . . . . . . . . . . . . 108 Pipelined Pair of 24-Bit Precision Divides. . . . . . . . . . . . . . . 109 Newton-Raphson Reciprocal . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Use 3DNow! Instructions for Fast Square Root and

Reciprocal Square Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Optimized 15-Bit Precision Square Root . . . . . . . . . . . . . . . . 110 Optimized 24-Bit Precision Square Root . . . . . . . . . . . . . . . . 110 Newton-Raphson Reciprocal Square Root. . . . . . . . . . . . . . . 111

Use MMX PMADDWD Instruction to Perform

Two 32-Bit Multiplies in Parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 3DNow! and MMX Intra-Operand Swapping . . . . . . . . . . . . . . . . . . 112

Contents

vii

Page 7
Image 7
AMD x86 manual Floating-Point Optimizations