138 Branch Optimizations Chapter 6
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
Sample Code Translated into AMD64 Code
The following examples use scalar code translated into AMD64 code. Note that it is not
recommended that you use 3DNow! SIMD instructions for scalar code, because the advantage of
3DNow! instructions lies in their “SIMDness.” These examples are meant to demonstrate general
techniques for translating source code with branches into branchless 3DNow! code. Scalar source
code was chosen to keep the examples simple. These techniques work identically for vector code.
Each example shows the C code and the resulting 3DNow! code.
Example 1: C Code
float x, y, z;
if (x < y) {
z += 1.0;
} else {
z -= 1.0;
}
Example 1: 3DNow!™ Code
; In: MM0 = x
; MM1 = y
; MM2 = z
; Out: MM0 = z
movq mm3, mm0 ; Save x.
movq mm4, one ; 1.0
pfcmpge mm0, mm1 ; x < y ? 0 : 0xffffffff
pslld mm0, 31 ; x < y ? 0 : 0x80000000
pxor mm0, mm4 ; x < y ? 1.0 : -1.0
pfadd mm0, mm2 ; x < y ? z + 1.0 : z - 1.0
Example 2: C Code
float x, z;
z = abs(x);
if (z >= 1) {
z = 1 / z;
}
Example 2: 3DNow!™ Code
; In: MM0 = x
; Out: MM0 = z
movq mm5, mabs ; 0x7fffffff
pand mm0, mm5 ; z = abs(x)
pfrcp mm2, mm0 ; 1 / z approximation
movq mm1, mm0 ; Save z.
pfrcpit1 mm0, mm2 ; 1 / z step
pfrcpit2 mm0, mm2 ; 1 / z final
pfmin mm0, mm1 ; z = z < 1 ? z : 1 / z