25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

9.4Use MOVAPD and MOVAPS Instead of MOVUPD and MOVUPS

Optimization

For best performance use the aligned versions of these instructions when using a memory operand.

Application

This optimization applies to:

32-bit software

64-bit software

Rationale

Both MOVUPS and MOVUPD are VectorPath instructions when one of the operands is a memory location. It is better to use MOVAPS and MOVAPD since they are both DirectPath Double decode types. Misaligned memory accesses also reduce the available memory bandwidth and SSE and SSE2 instructions have shorter latencies when operating on aligned memory operands. Aligning data on 16- byte boundaries allows you to use the aligned load instructions (MOVAPS, MOVAPD, and MOVDQA), which move through the floating-point unit with shorter latencies and reduce the possibility of stalling addition or multiplication instructions that are dependent on the load data.

Chapter 9

Optimizing with SIMD Instructions

199

Page 215
Image 215
AMD 250 manual Use Movapd and Movaps Instead of Movupd and Movups, 199