22007E/0 — November 1999

AMD Athlon™ Processor x86 Code Optimization

Use MMX™ PCMPEQD to Set All Bits in an MMX™ Register

To set all the bits in an MMX register to one, use:

PCMPEQD MMreg, MMreg

Note that PCMPEQD MMreg, MMreg is dependent on previous writes to MMreg. Therefore, using PCMPEQD in the manner described can lengthen dependency chains, which in return may lead to reduced performance. An alternative in such cases is to use:

ones DQ 0FFFFFFFFFFFFFFFFh

MOVQ MMreg, QWORD PTR [ones]

i.e., to load a quadword of 0xFFFFFFFFFFFFFFFF from a statically initialized and properly aligned memory location. However, loading the data from memory runs the risk of cache misses. Cases where MOVQ is superior to PCMPEQD are therefore rare and PCMPEQD should be used in general.

Use MMX™ PAND to Find Absolute Value in 3DNow!™ Code

Use the following to compute the absolute value of 3DNow! floating-point operands:

mabs

DQ 7FFFFFFF7FFFFFFFh

 

PAND

MM0, [mabs]

;mask out sign bit

Optimized Matrix Multiplication

The multiplication of a 4x4 matrix with a 4x1 vector is commonly used in 3D graphics for geometry transformation. This routine serves to translate, scale, rotate, and apply perspective to 3D coordinates represented in homogeneous coordinates. The following code sample is a 3DNow! optimized, general 3D vertex transformation routine that completes in 16 cycles on the AMD Athlon processor:

Use MMX™ PCMPEQD to Set All Bits in an MMX™ Register

119

Page 135
Image 135
AMD x86 manual Optimized Matrix Multiplication, Use MMX Pcmpeqd to Set All Bits in an MMX Register