Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

E.1 Half-Register Operations

Optimization

Take care when mixing data types of operands within the same register.

Application

This optimization applies to:

32-bit software

64-bit software

Rationale

Mixing data types in a single register is harmless if only scalar operations are used. However, this practice can cause performance problems if the register is used as a sourcce for a vector operation.

Example 1

Avoid code like this:

addps xmm1, xmm2

; Add four

packed single-precision

(FPD) values in XMM1

 

; to their

corresponding values in

XMM2.

cvtss2sd xmm1, xmm2

; Convert the low-order single-precision value in XMM2

;to 64-bit double precision FP format and store in

;lower 64-bits of XMM1.

In this example, the second instruction leaves the upper half of XMM1 in FPS format and the lower half in FPD format.

Example 2

Avoid code like this:

addps xmm1,xmm2

; Add four

packed single-precision

(FPD) values in XMM1

 

; to their

corresponding values in

XMM2.

movlpd xmm1,mem64

; Move

the

double-precision value in mem64 to the lower

 

; half

of XMM1.

 

In this example, The MOVLPD instruction sets the low half of XMM1 to FPD format but leaves the high half unchanged (in FPS format).

356

SSE and SSE2 Optimizations

Appendix E

Page 372
Image 372
AMD 250 manual Half-Register Operations, 356