25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

or fewer exist in a given 16-byte code window. For absolute optimal performance, try to limit branches to one per 16-byte code window. Avoid code sequences like the following:

ALIGN 16

label3:

 

 

call

label1

; 1st branch in 16-byte code window

jc

label3

; 2nd branch in 16-byte code window

call

label2

; 3rd branch in 16-byte code window

jnz

label4

; 4th branch in 16-byte code window

 

 

; Cannot be predicted.

If there is a jump table that contains many frequently executed branches, pad the table entries to 8 bytes each to assure that there are never more than three branches per 16-byte block of code.

Only branches that have been taken at least once are entered into the dynamic branch prediction, and therefore only those branches count toward the three-branch limit.

Chapter 6

Branch Optimizations

127

Page 143
Image 143
AMD 250 manual 127, Align