Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

Sample Code Translated into AMD64 Code

The following examples use scalar code translated into AMD64 code. Note that it is not recommended that you use 3DNow! SIMD instructions for scalar code, because the advantage of 3DNow! instructions lies in their “SIMDness.” These examples are meant to demonstrate general techniques for translating source code with branches into branchless 3DNow! code. Scalar source code was chosen to keep the examples simple. These techniques work identically for vector code.

Each example shows the C code and the resulting 3DNow! code.

Example 1: C Code

float x, y, z; if (x < y) {

z += 1.0; } else {

z -= 1.0;

}

Example 1: 3DNow!™ Code

;In: MM0 = x

;MM1 = y

;MM2 = z

;Out: MM0 = z

movq

mm3, mm0

; Save x.

 

 

 

movq

mm4, one

; 1.0

 

 

 

pfcmpge

mm0, mm1

; x < y ?

0

: 0xffffffff

pslld

mm0, 31

; x < y ?

0

:

0x80000000

pxor

mm0, mm4

; x < y ?

1.0

: -1.0

pfadd

mm0, mm2

; x < y ?

z +

1.0 : z - 1.0

Example 2: C Code

float x, z; z = abs(x); if (z >= 1) {

z = 1 / z;

}

Example 2: 3DNow!™ Code

;In: MM0 = x

;Out: MM0 = z

movq

mm5, mabs

; 0x7fffffff

pand

mm0, mm5

; z = abs(x)

pfrcp

mm2, mm0

; 1

/ z approximation

movq

mm1, mm0

; Save z.

pfrcpit1 mm0, mm2

; 1

/ z step

pfrcpit2

mm0, mm2

; 1

/ z final

pfmin

mm0, mm1

; z = z < 1 ? z : 1 / z

138

Branch Optimizations

Chapter 6

Page 154
Image 154
AMD 250 manual Sample Code Translated into AMD64 Code, Example 1 C Code, Example 1 3DNow! Code, Example 2 C Code