Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

6.7Replacing Branches with Computation

Optimization

Use computation to simulate predicted execution or conditional moves.

Application

This optimization applies to:

32-bit software

64-bit software

Rationale

Branches can negatively impact the performance of code. If the body of the branch is small, you can achieve higher performance by replacing the branch with computation. The computation simulates predicated execution or conditional moves. There are many SSE and SSE2 instructions that can be useful for accomplishing this. The principal instructions are as follows: ANDPS, ANDPD, ANDNPS, ANDNPD, CMPPS, CMPSS, CMPPD, CMPSD, MINPS, MINSS, MINPD, MINSD, MAXPS, MAXSS, MAXPD, MAXSD, ORPS, ORPD, PAND, PANDN, PCMPEQB, PCMPEQD, PCMPEQW, PCMPGTB, PCMPGTD, PCMPGTW, PMAXSW, PMAXUB, PMINSW, PMINUB, POR, PXOR, XORPS, and XORPD.

For 32-bit code using 3DNow!™ instructions, try to avoid moving the MMX™ data to integer registers to perform comparisons and branches. Moving MMX data to integer registers requires either transport through memory or the use of MOVD reg, mmreg instructions, which are relatively inefficient. When using 3DNow! technology and MMX registers, the following instructions may be useful for eliminating branches: PCMPGTB, PCMPGTD, PCMPGTW, PFCMPGT, PFCMPGE, PFMIN, PFMAX, PAND, PANDN, POR, and PXOR.

Muxing Constructs

The most important construct to use in avoiding branches in SIMD code is a two-way muxing construct that is equivalent to the ternary operator (?:) in C and C++.

136

Branch Optimizations

Chapter 6

Page 152
Image 152
AMD 250 manual Replacing Branches with Computation, Muxing Constructs, 136