25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

Application

This optimization applies to:

32-bit software

64-bit software

Rationale

The load-execute floating-point instructions that take integer operands are VectorPath instructions and generate two micro-ops in a cycle, while discrete load and execute intructions enable a third DirectPath instruction to be decoded in the same cycle. In some situations, these optimizations can also reduce execution time if FILD can be scheduled several instructions ahead of the arithmetic instruction in order to cover the FILD latency.

Example

Avoid code such as the following, which uses load-execute floating-point instructions that take integer operands:

fld QWORD PTR [foo] fimul DWORD PTR [bar] fiadd DWORD PTR [baz]

;Push foo onto FP stack [ST(0) = foo].

;Multiply bar by ST(0) [ST(0) = bar * foo].

;Add baz to ST(0) [ST(0) = baz + (bar * foo)].

Instead, use code such as the following, which uses discrete load and execute instructions:

fild DWORD PTR [bar] fild DWORD PTR [baz] fld QWORD PTR [foo] fmulp st(2), st faddp st(1), st

;Push bar onto FP stack.

;Push baz onto FP stack.

;Push foo onto FP stack.

;Multiply and pop [ST(1) = foo * bar, ST(0) = baz].

;Add and pop [ST(0) = baz + (foo * bar)].

Chapter 4

Instruction-Decoding Optimizations

75

Page 91
Image 91
AMD 250 manual Application