Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

4.2.2Load-Execute Floating-Point Instructions with Floating-Point Operands

Optimization

When performing floating-point computations using floating-point (not integer) source operands, use load-execute instructions instead of discrete load and execute instructions.

Application

This optimization applies to:

32-bit software

64-bit software

Rationale

Using load-execute floating-point instructions that take floating-point operands improves performance for the following reasons:

Denser code allows more work to be held in the instruction cache.

Denser code generates fewer internal macro-ops, allowing the floating-point scheduler to hold more work, which increases the chances of extracting parallelism from the code.

Example

Avoid code like this, which uses discrete load and execute instructions:

movss xmm0, [float_var1] movss xmm12, [float_var2] mulss xmm0, xmm12

Instead, use code like this, which uses a load-execute floating-point instruction:

movss xmm0, [float_var1] mulss xmm0, [float_var2]

4.2.3Load-Execute Floating-Point Instructions with Integer Operands

Optimization

Avoid x87 load-execute floating-point instructions that take integer operands (FIADD, FICOM, FICOMP, FIDIV, FIDIVR, FIMUL, FISUB, and FISUBR). When performing floating-point computations using integer source operands, use discrete load (FILD) and execute instructions instead.

74

Instruction-Decoding Optimizations

Chapter 4

Page 90
Image 90
AMD 250 manual Movss xmm0, floatvar1 mulss xmm0, floatvar2