25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

4.2Load-Execute Instructions

A load-execute instruction is an instruction that loads a value from memory into a register and then performs an operation on that value. Many general purpose instructions, such as ADD, SUB, AND, etc., have load-execute forms:

add rax, QWORD PTR [foo]

This instruction loads the value foo from memory and then adds it to the value in the RAX register.

The work performed by a load-execute instruction can also be accomplished by using two discrete instructions—a load instruction followed by an execute instruction. The following example employs discrete load and execute stages:

mov rbx, QWORD PTR [foo] add rax, rbx

The first statement loads the value foo from memory into the RBX register. The second statement adds the value in RBX to the value in RAX.

The following optimizations govern the use of load-execute instructions:

Load-Execute Integer Instructions on page 73.

Load-ExecuteFloating-Point Instructions with Floating-Point Operands on page 74.

Load-ExecuteFloating-Point Instructions with Integer Operands on page 74.

4.2.1Load-Execute Integer Instructions

Optimization

When performing integer computations, use load-execute instructions instead of discrete load and execute instructions. Use discrete load and execute instructions only to avoid scheduler stalls for longer executing instructions and to explicitly schedule load and execute operations.

Application

This optimization applies to:

32-bit software

64-bit software

Rationale

Most load-execute integer instructions are DirectPath decodable and can be decoded at the rate of three per cycle. Splitting a load-execute integer instruction into two separate instructions reduces decoding bandwidth and increases register pressure, which results in lower performance.

Chapter 4

Instruction-Decoding Optimizations

73

Page 89
Image 89
AMD 250 manual Load-Execute Instructions, Load-Execute Integer Instructions Optimization