AMD Athlon™ Processor x86 Code Optimization

Example 1 (Avoid):

22007E/0 — November 1999

FLD

QWORD PTR [foo]

FIMUL

DWORD

PTR

[bar]

FIADD

DWORD

PTR

[baz]

Example 2 (Preferred):

FILD

DWORD PTR [bar]

FILD

DWORD

PTR [baz]

FLD

QWORD

PTR [foo]

FMULP

ST(2), ST

FADDP

ST(1),ST

Align Branch Targets in Program Hot Spots

In program hot spots (i.e., innermost loops in the absence of profiling data), place branch targets at or near the beginning of 16-byte aligned code windows. This technique helps to maximize the number of instructions that are filled into the instruction-byte queue while preventing I-cache space in branch intensive code.

Use Short Instruction Lengths

Assemblers and compilers should generate the tightest code possible to optimize use of the I-cache and increase average decode rate. Wherever possible, use instructions with shorter lengths. Using shorter instructions increases the number of instructions that can fit into the instruction-byte queue. For example, use 8-bit displacements as opposed to 32-bit displacements. In addition, use the single-byte format of simple integer instructions whenever possible, as opposed to the 2-byte opcode ModR/M format.

Example 1 (Avoid):

81

C0

78

56

34

12

add

eax, 12345678h

;uses 2-byte

opcode

 

 

 

 

 

 

 

 

; form (with

ModR/M)

81

C3

FB FF FF FF

add

ebx, -5

;uses 32-bit

 

 

 

 

 

 

 

 

 

; immediate

 

0F

84

05

00

00

00

jz

$label1

;uses 2-byte

opcode,

 

 

 

 

 

 

 

 

; 32-bit immediate

36

Align Branch Targets in Program Hot Spots

Page 52
Image 52
AMD x86 manual Align Branch Targets in Program Hot Spots, Use Short Instruction Lengths