Clearing MMX and XMM Registers with XOR Instructions, 216

Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

9.13Clearing MMX™ and XMM Registers with XOR Instructions

Optimization

Use instructions that perform XOR operations (PXOR, XORPS, and XORPD) to clear all the bits in MMX and XMM registers.

Application

This optimization applies to:

• 32-bit software

• 64-bit software

Rationale

The latency of the MMX XOR instruction (PXOR) is only 3 cycles and comparable to the 3 cycles required to load data, assuming it is in the L1 data cache. The SSE and SSE2 XOR instructions (XORPS and XORPD, respectively) also have latencies of 3 cycles.

Examples

The following examples illustrate how to clear the bits in a register using the different exclusive-OR instructions:

; MMX
pxor mm0, mm0	; Clear the MM0 register.
; SSE
xorps xmm0, xmm0	; Clear the XMM0 register.
; SSE2
xorpd xmm0, xmm0	; Clear the XMM0 register.

216

Optimizing with SIMD Instructions

Chapter 9

AMD 250 manual Clearing MMX and XMM Registers with XOR Instructions, 216

Models: 250