25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

Chapter 10 x87 Floating-Point Optimizations

AMD Athlon™ 64 and AMD Opteron™ processors support multiple methods of performing floating-point operations. They support the older x87 assembly instructions in addition to the more recent SIMD instructions (SSE, SSE2, and 3DNow!™ technologies). Many of the suggestions in this chapter are also generally applicable to the AMD Athlon 64 and AMD Opteron processors, with the exception of SSE2 optimizations and expanded register usage.

AMD Athlon 64 and AMD Opteron processors are 64-bit processors that are fully backwards compatible with 32-bit code. In general, 64-bit operating systems support the x87 and 3DNow! instructions in 32-bit threads; however, 64-bit operating systems may not support x87 and 3DNow! instructions in 64-bit threads. To make it easier to later migrate from 32-bit to 64-bit code, you may want to avoid x87 and 3DNow! instructions altogether and use only SSE and SSE2 instructions when writing new 32-bit code.

This chapter details the methods used to optimize floating-point code to the pipelined x87 floating- point registers.

This chapter covers the following topics:

Topic

Page

 

 

Using Multiplication Rather Than Division

238

 

 

Achieving Two Floating-Point Operations per Clock Cycle

239

 

 

Floating-Point Compare Instructions

244

 

 

Using the FXCH Instruction Rather Than FST/FLD Pairs

245

 

 

Floating-Point Subexpression Elimination

246

 

 

Accumulating Precision-Sensitive Quantities in x87 Registers

247

 

 

Avoiding Extended-Precision Data

248

 

 

Chapter 10

x87 Floating-Point Optimizations

237

Page 253
Image 253
AMD 250 manual X87 Floating-Point Optimizations, 237