22007E/0 — November 1999

AMD Athlon™ Processor x86 Code Optimization

Example 2 (Preferred):

;r = (x < y) ? a : b

;in: mm0 a

;mm1 b

;mm2 x

;mm3 y

;out: mm1 r

PCMPGTD

MM3, MM2

; y > x ? 0xffffffff : 0

PAND

MM1, MM3

; y > x ? b : 0

 

PANDN

MM3, MM0

;

y

>

x

>

0

: a

 

POR

MM1, MM3

;

r

=

y

>

x

? b : a

"

Sample Code Translated into 3DNow!™ Code

The following examples use scalar code translated into 3DNow! code. Note that it is not recommended to use 3DNow! SIMD instructions for scalar code, because the advantage of 3DNow! instructions lies in their “SIMDness”. These examples are meant to demonstrate general techniques for translating source code with branches into branchless 3DNow! code. Scalar source code was chosen to keep the examples simple. These techniques work in an identical fashion for vector code.

Each example shows the C code and the resulting 3DNow! code.

Example 1:

C code:

 

 

float x,y,z;

 

if (x <

y) {

 

z

+=

1.0;

 

}

 

 

 

else

{

 

 

z

-=

1.0;

 

}

 

 

3DNow! code:

;in: MM0 = x

;MM1 = y

;MM2 = z ;out: MM0 = z

MOVQ

MM3, MM0

;save x

 

 

MOVQ

MM4, one

;1.0

 

 

PFCMPGE

MM0, MM1

;x < y ? 0

: 0xffffffff

PSLLD

MM0, 31

;x < y ? 0

:

0x80000000

PXOR

MM0, MM4

;x < y ? 1.0

: -1.0

PFADD

MM0, MM2

;x < y ? z+1.0 : z-1.0

Replace Branches with Computation in 3DNow!™ Code

61

Page 77
Image 77
AMD x86 manual Sample Code Translated into 3DNow! Code, 3DNow! code