25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

The statement movlps xmm1, mem64 marks the lower half of XMM1 as FPS (floating-point single-precision) but leaves the upper half of XMM1 unchanged. If XMM1 is later used in any instruction that uses the full 128 bits of XMM1, there can be a performance penalty if the top half is not also in FPS format. Examples of instructions that expect the full 128 bits of XMM1 to be in FPS format are MOVAPS, ANDPS, ANDNPS, and ORPS. For more information on XMM- register data types, see “Half-Register Operations” on page 356.

Rational—Double Precision

The MOVLPD instruction does not necessitate clearing the upper 64 bits of an XMM register, as the MOVSD/MOVQ instructions do, upon loading 64 bits of floating-point data into the lower 64 bits of the XMM register. Using the MOVLPD instruction can significantly increase performance on processor-limited SSE2 scalar floating-point-intensive code.

Consider the following caveat when using the MOVLPD instruction:

The statement movlpd xmm1, mem64 marks the lower half of XMM1 as FPD (floating-point double-precision) but leaves the upper half of XMM1 unchanged. If XMM1 is later used in any instruction that uses the full 128 bits of XMM1, there can be a performance penalty if the top half is not also in FPD format. Examples of instructions that expect the full 128 bits of XMM1 to be in FPD format are ANDPD, ANDNPD, and ORPD. For more information on XMM-register data types, see “Half-Register Operations” on page 356.

Chapter 9

Optimizing with SIMD Instructions

197

Page 213
Image 213
AMD 250 manual Rational-Double Precision, 197