Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

9.13Clearing MMX™ and XMM Registers with XOR Instructions

Optimization

Use instructions that perform XOR operations (PXOR, XORPS, and XORPD) to clear all the bits in MMX and XMM registers.

Application

This optimization applies to:

32-bit software

64-bit software

Rationale

The latency of the MMX XOR instruction (PXOR) is only 3 cycles and comparable to the 3 cycles required to load data, assuming it is in the L1 data cache. The SSE and SSE2 XOR instructions (XORPS and XORPD, respectively) also have latencies of 3 cycles.

Examples

The following examples illustrate how to clear the bits in a register using the different exclusive-OR instructions:

; MMX

 

pxor mm0, mm0

; Clear the MM0 register.

; SSE

 

xorps xmm0, xmm0

; Clear the XMM0 register.

; SSE2

 

xorpd xmm0, xmm0

; Clear the XMM0 register.

216

Optimizing with SIMD Instructions

Chapter 9

Page 232
Image 232
AMD 250 manual Clearing MMX and XMM Registers with XOR Instructions, 216