Software Optimization Guide for AMD64 Processors | 25112 Rev. 3.06 September 2005 |
2.25Accelerating Floating-Point Division and Square Root
Optimization
In applications that involve the heavy use of single precision division and square root operations, it is recommended that you port the code to SSE or 3DNow!™ inline assembly or use a compiler that can generate SSE or 3DNow! technology code. If neither of these methods are possible, the x87 FPU
control word register precision control specification bits (PC) can be set to single precision to improve performance. (The processor defaults to
control register.)
Application
This optimization applies to
Rationale
Division and square root have a much longer latency than other
The x87 FPU has a
The Microsoft® Visual C environment provides functions to manipulate the FPU control word and thus the precision control. Note that these functions are not very fast, so insert changes of precision control where it creates little overhead, such as outside a
overhead created by the function calls outweighs the benefit from reducing the latencies of divide and
The following example shows how to set the precision control to single precision and later restore the original settings in the Microsoft Visual C environment.
50 | C and C++ | Chapter 2 |