Two-Byte Near-Return RET Instruction, 128

Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

6.2Two-Byte Near-Return RET Instruction

Optimization

Use of a two-byte near-return can improve performance. The single-byte near-return (opcode C3h) of the RET instruction should be used carefully. Specifically, avoid the following two situations:

•Any kind of branch (either conditional or unconditional) that has the single-byte near-return RET instruction as its target. See “Examples.”

•A conditional branch that occurs in the code directly before the single-byte near-return RET instruction. See “Examples.”

Application

This optimization applies to:

•32-bit software

•64-bit software

Rationale

The processor is unable to apply a branch prediction to the single-byte near-return form (opcode C3h) of the RET instruction.

The easiest way to assure the utilization of the branch prediction mechanism is to use a two-byte RET instruction. A two-byte RET has a REP instruction inserted before the RET, which produces the functional equivalent of the single-byte near-return RET instruction, but is not affected by the prediction limitations outlined above. To use a two-byte RET, define a text macro named REPRET and use it instead of the RET instruction to force the intended object code.

REPRET TEXTEQU <DB 0F3h, 0C3h>

Examples

Avoid branches in which the target of the branch is a single-byte near-return:

jmp label	;	Jump to a single-byte near-return RET instruction.
...
label:
ret	;	RET is potentially mispredicted.

Avoid branches that immediately precede a single-byte near-return:

jz label	; Conditional branch is		not taken.
ret	;	RET is a fall-through	instruction,
	;	potentially mispredicted.

128

Branch Optimizations

Chapter 6

AMD 250 manual Two-Byte Near-Return RET Instruction, 128

Models: 250