25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

10.6Accumulating Precision-Sensitive Quantities in x87 Registers

Optimization

Accumulate results in the x87 registers rather than the SSE and SSE2 XMM registers, if more than 64 bits of accuracy are required.

Application

This optimization applies to:

32-bit software

64-bit software

Rationale

More than 64 bits of accuracy may be required, as when accumulating a result (for example, during the calculation of dot product). The precision of floating-point operations in the x87 registers ST(0)– ST(7) is 80 bits internally, whereas the precision of operations using SIMD instructions is only

64 bits.

Chapter 10

x87 Floating-Point Optimizations

247

Page 263
Image 263
AMD 250 manual Accumulating Precision-Sensitive Quantities in x87 Registers, 247