AMD x86 manual Replace Branches with Computation in 3DNow! Code, Muxing Constructs

Models: x86

1 256
Download 256 pages 58.62 Kb
Page 76
Image 76

AMD Athlon™ Processor x86 Code Optimization

22007E/0 — November 1999

Replace Branches with Computation in 3DNow!™ Code

Branches negatively impact the performance of 3DNow! code. Branches can operate only on one data item at a time, i.e., they are inherently scalar and inhibit the SIMD processing that makes 3DNow! code superior. Also, branches based on 3DNow! comparisons require data to be passed to the integer units, which requires either transport through memory, or the use of “MOVD reg, MMreg” instructions. If the body of the branch is small, one can achieve higher performance by replacing the branch with computation. The computation simulates predicated execution or conditional moves. The principal tools for this are the following instructions: PCMPGT, PFCMPGT, PFCMPGE, PFMIN, PFMAX, PAND, PANDN, POR, PXOR.

Muxing Constructs

The most important construct to avoiding branches in 3DNow!™ and MMX™ code is a 2-way muxing construct that is equivalent to the ternary operator “?:” in C and C++. It is implemented using the PCMP/PFCMP, PAND, PANDN, and POR instructions. To maximize performance, it is important to apply the PAND and PANDN instructions in the proper order.

Example 1 (Avoid):

;r = (x < y) ? a : b

;in: mm0 a

;mm1 b

;mm2 x

;mm3 y

;out: mm1 r

PCMPGTD

MM3, MM2

; y > x ? 0xffffffff : 0

MOVQ

MM4, MM3

; duplicate

mask

PANDN

MM3, MM0

; y > x ? 0

: a

PAND

MM1, MM4

;

y

>

x

?

b

:

0

POR

MM1, MM3

;

r

=

y

>

x

?

b : a

Because the use of PANDN destroys the mask created by PCMP, the mask needs to be saved, which requires an additional register. This adds an instruction, lengthens the dependency chain, and increases register pressure. Therefore 2-way muxing constructs should be written as follows.

60

Replace Branches with Computation in 3DNow!™ Code

Page 76
Image 76
AMD x86 manual Replace Branches with Computation in 3DNow! Code, Muxing Constructs