Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

8.9Optimizing Integer Division

Optimization

When possible, use smaller data types for integer division.

Application

This optimization applies to:

32-bit software

64-bit software

Rationale

Division by a 16-bit value is significantly faster than division by a 32-bit value—about a 26 clock latency versus 42. Likewise, division by a 32-bit value is faster than division by a 64-bit value—about 42 clocks versus 74. Refer to IDIV in table 15. In algorithms in which integer division contributes a substantial component to performance, it may be beneficial to check whether using a smaller divide type is possible. Study the assembly language output generated by high-level language compilers to verify that the desired code is generated. Compilers often generate code that converts 16-bit types into 32-bit values that are then used to perform 32-bit division, thus eliminating the advantage of using 16- bit integer types. If the compiler cannot be coerced into producing the desired code, then compiler intrinsics or assembly language are required.

192

Integer Optimizations

Chapter 8

Page 208
Image 208
AMD 250 manual Optimizing Integer Division, 192