Chapter 6 Branch Optimizations 127
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
or fewer exist in a given 16-byte code window. For absolute optimal performance, try to limit
branches to one per 16-byte code window. Avoid code sequences like the following:
ALIGN 16
label3:
call label1 ; 1st branch in 16-byte code window
jc label3 ; 2nd branch in 16-byte code window
call label2 ; 3rd branch in 16-byte code window
jnz label4 ; 4th branch in 16-byte code window
; Cannot be predicted.
If there is a jump table that contains many frequently executed branches, pad the table entries to
8 bytes each to assure that there are never more than three branches per 16-byte block of code.
Only branches that have been taken at least once are entered into the dynamic branch prediction, and
therefore only those branches count toward the three-branch limit.