25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

9.1Ensure All Packed Floating-Point Data are Aligned

Optimization

Align all packed floating-point data on 16-byte boundaries.

Application

This optimization applies to:

32-bit software

64-bit software

Rationale

Misaligned memory accesses reduce the available memory bandwidth and SSE and SSE2 instructions have shorter latencies when operating on aligned memory operands.

Aligning data on 16-byte boundaries allows you to use the aligned load instructions (MOVAPS, MOVAPD, and MOVDQA), which move through the floating-point unit with shorter latencies and reduce the possibility of stalling addition or multiplication instructions that are dependent on the load data.

Chapter 9

Optimizing with SIMD Instructions

195

Page 211
Image 211
AMD 250 manual Ensure All Packed Floating-Point Data are Aligned, 195