Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

6.2Two-Byte Near-Return RET Instruction

Optimization

Use of a two-byte near-return can improve performance. The single-byte near-return (opcode C3h) of the RET instruction should be used carefully. Specifically, avoid the following two situations:

Any kind of branch (either conditional or unconditional) that has the single-byte near-return RET instruction as its target. See “Examples.”

A conditional branch that occurs in the code directly before the single-byte near-return RET instruction. See “Examples.”

Application

This optimization applies to:

32-bit software

64-bit software

Rationale

The processor is unable to apply a branch prediction to the single-byte near-return form (opcode C3h) of the RET instruction.

The easiest way to assure the utilization of the branch prediction mechanism is to use a two-byte RET instruction. A two-byte RET has a REP instruction inserted before the RET, which produces the functional equivalent of the single-byte near-return RET instruction, but is not affected by the prediction limitations outlined above. To use a two-byte RET, define a text macro named REPRET and use it instead of the RET instruction to force the intended object code.

REPRET TEXTEQU <DB 0F3h, 0C3h>

Examples

Avoid branches in which the target of the branch is a single-byte near-return:

jmp label

;

Jump to a single-byte near-return RET instruction.

...

 

 

label:

 

 

ret

;

RET is potentially mispredicted.

Avoid branches that immediately precede a single-byte near-return:

jz label

; Conditional branch is

not taken.

ret

;

RET is a fall-through

instruction,

 

;

potentially mispredicted.

128

Branch Optimizations

Chapter 6

Page 144
Image 144
AMD 250 manual Two-Byte Near-Return RET Instruction, 128