25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

Examples

SSE Solution (Preferred)

;r = (x < y) ? a : b

;In: XMM0 = a

;XMM1 = b

;XMM2 = x

;XMM3 = y

;Out: XMM0 = r

cmpps

xmm2,

xmm3, 1

; x < y ? 0xffffffff : 0

andps

xmm0,

xmm2

; x < y ? a : 0

andnps xmm2,

xmm1

;

x <

y

?

0

: b

orps

xmm0,

xmm2

;

x <

y

?

a

: b

MMX™ Solution (Avoid)

;r = (x < y) ? a : b

;In: MM0 = a

;MM1 = b

;MM2 = x

;MM3 = y

;Out: MM0 = r

pcmpgtd

mm3, mm2

; y > x ? 0xffffffff : 0

movq

mm4, mm3

; Duplicate

mask

pandn

mm3, mm1

; y > x ? 0

: b

pand

mm0,

mm4

;

y

>

x ?

a

:

0

por

mm0,

mm3

;

r

=

y >

x

?

a : b

Because the use of PANDN destroys the mask created by PCMPGTD, the mask needs to be saved, which requires an additional register. This adds an instruction, lengthens the dependency chain, and increases register pressure. Therefore, write two-way muxing constructs as follows:

MMX™ Solution (Preferred)

;r = (x < y) ? a : b

;In: MM0 = a

;MM1 = b

;MM2 = x

;MM3 = y

;Out: MM0 = r

pcmpgtd

mm3, mm2

; y > x ? 0xffffffff : 0

pand

mm0, mm3

; y > x ? a: 0

pandn

mm3,

mm1

;

y

>

x >

0

: b

por

mm0,

mm3

;

r

=

y >

x

? a : b

Chapter 6

Branch Optimizations

137

Page 153
Image 153
AMD 250 manual SSE Solution Preferred, MMX Solution Avoid, 137