25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

9.14Finding the Floating-Point Absolute Value of Operands of SSE, SSE2, and 3DNow!™ Instructions

Optimization

Use instructions that perform AND operations (PAND, ANDPS, and ANDPD) to determine the absolute value of floating-point operands of SSE, SSE2, and 3DNow!instructions.

Application

This optimization applies to:

32-bit software

64-bit software

Rationale

The MMX PAND instruction has a latency of 2 cycles, whereas the SSE and SSE2 AND instructions (ANDPS and ANDPD, respectively) have latencies of 3 cycles. The following examples illustrate how to clear the sign bits:

; 3DNow!

absmask DQ 7FFFFFFF7FFFFFFFh

pand mm0, [absmask] ; Clear the sign bits of both floats in MM0.

; SSE

absmask DQ 7FFFFFFF7FFFFFFFh,7FFFFFFF7FFFFFFFh

andps xmm0, [absmask] ; Clear the sign bits of all four floats in XMM0.

; SSE2

absmask DQ 7FFFFFFFFFFFFFFFh,7FFFFFFFFFFFFFFFh

andpd xmm0, [absmask]

; Clear the sign bits of both doubles in XMM0.

Chapter 9

Optimizing with SIMD Instructions

217

Page 233
Image 233
AMD 250 manual 217