AMD Athlon™ Processor x86 Code Optimization

22007E/0 — November 1999

cycle bypassing penalty, and another one cycle penalty if the result goes to a 3DNow! operation. The PFMUL execution latency is four, therefore, in the worst case, the PXOR and PMUL instructions are the same in terms of latency. On the AMD-K6 processor, there is only a one cycle latency for PXOR, versus a two cycle latency for the 3DNow! PFMUL instruction.

Use the following code to negate 3DNow! data:

msgn

DQ 8000000080000000h

 

PXOR

MM0, [msgn]

;toggle sign bit

Use MMX™ PCMP Instead of 3DNow!™ PFCMP

 

Use the MMX PCMP instruction instead of the 3DNow! PFCMP

 

instruction. On the AMD Athlon processor, the PCMP has a

 

latency of two cycles while the PFCMP has a latency of four

 

cycles. In addition to the shorter latency, PCMP can be issued to

 

either the FADD or the FMUL pipe, while PFCMP is restricted

 

to the FADD pipe.

 

Note: The PFCMP instruction has a ‘GE’ (greater or equal)

 

version (PFCMPGE) that is missing from PCMP.

Both Numbers

If both arguments are positive, PCMP always works.

Positive

 

One Negative, One

If one number is negative and the other is positive, PCMP still

Positive

works, except when one number is a positive zero and the other

 

is a negative zero.

Both Numbers

Be careful when performing integer comparison using PCMPGT

Negative

on two negative 3DNow! numbers. The result is the inverse of

 

the PFCMPGT floating-point comparison. For example:

 

–2 = 84000000

 

–4 = 84800000

 

PCMPGT gives 84800000 > 84000000, but –4 < –2. To address

 

this issue, simply reverse the comparison by swapping the

 

source operands.

114

Use MMX™ PCMP Instead of 3DNow!™ PFCMP

Page 130
Image 130
AMD x86 manual Use MMX Pcmp Instead of 3DNow! Pfcmp, Both Numbers, Positive One Negative, One