52 C and C++ Source-Level Optimizations Chapter 2
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
2.26 Fast Floating-Point-to-Integer Conversion
Optimization
Use 3DNow! PF2ID instruction to perform truncating conversion to accomplish rapid floating-point-
to-integer conversion, if the floating-point operand is a type float.
Application
This optimization applies to 32-bit software.
Rationale
Floating-point-to-integer conversion in C programs is typically a very slow operation. The semantics
of C and C++ demand that the conversion use truncation. If the floating-point operand is of type
float, and the compiler supports 3DNow! code generation, then the 3DNow! PF2ID instruction,
which performs truncating conversion, can be utilized by the compiler to accomplish rapid floating-
point-to-integer conversion.
Note: The PF2ID instruction does not provide conversion compliant with the IEEE-754 standard.
Some operands of type float (IEEE-754 single precision) such as NaNs, infinities, and
denormals, are either unsupported or not handled in compliance with the IEEE-754 standard
by 3DNow! technology.
For double precision operands, the usual way to accomplish truncating conversion involves the
following algorithm:
1. Save the current x87 rounding mode (this is usually round to nearest or even).
2. Set the x87 rounding mode to truncation.
3. Load the floating-point source operand and store the integer result.
4. Restore the original x87 rounding mode.
This algorithm is typically implemented through the C run-time library function ftol. While the
AMDAthlon 64 and AMD Opteron processors have special hardware optimizations to speed up the
changing of x87 rounding modes and therefore ftol, calls to ftol may still tend to be slow.
For situations where very fast floating-point-to-integer conversion is required, the conversion code in
Listing 24 on page 53 may be helpful. This code uses the current rounding mode instead of truncation
when performing the conversion. Therefore, the result may differ by 1 from the ftol result. The
replacement code adds the “magic number” 252+251 to the source operand, then stores the double
precision result to memory and retrieves the lower doubleword of the stored result. Adding the magic
number shifts the original argument to the right inside the double precision mantissa, placing the
binary point of the sum immediately to the right of the least-significant mantissa bit. Extracting the
lower doubleword of the sum then delivers the integral portion of the original argument.