Movss xmm0, floatvar1 mulss xmm0, floatvar2

Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

4.2.2Load-Execute Floating-Point Instructions with Floating-Point Operands

Optimization

❖When performing floating-point computations using floating-point (not integer) source operands, use load-execute instructions instead of discrete load and execute instructions.

Application

This optimization applies to:

•32-bit software

•64-bit software

Rationale

Using load-execute floating-point instructions that take floating-point operands improves performance for the following reasons:

•Denser code allows more work to be held in the instruction cache.

•Denser code generates fewer internal macro-ops, allowing the floating-point scheduler to hold more work, which increases the chances of extracting parallelism from the code.

Example

Avoid code like this, which uses discrete load and execute instructions:

movss xmm0, [float_var1] movss xmm12, [float_var2] mulss xmm0, xmm12

Instead, use code like this, which uses a load-execute floating-point instruction: