368 Index
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
REP string with low variable counts 168
unroll small loops 13
unrolling loops 145
Mmemory
dynamic memory allocation 19
pushing memory data 157
MMX™ instructions
PANDN instruction 137
PREFETCHNTA/T0/T1/T2 instructions 105
MOVZX and MOVSX instructions 153
multiplication
by constant 164
multiplies over division, floating-point 238
muxing constructs 136
NNonuniform Memory Access 96
Ooperands
largest possible operand size, repeated string 168
Pparallelism 35
PF2ID instructions 52
pointers
dereferenced arguments 44
use array-style code instead 10
population-count function 179
prefetch
determining distance 108
multiple 107
PREFETCH and PREFETCHW instructions 104, 106, 108
prototypes 29
Rrecursive functions 132
register reads and writes, partial 81
REP prefix 168
Sscalar code translated into 3DNow! code 138
scheduling 144
SHLD instruction 85
SHR instruction 85
single-byte near-return RET instruction (opcodeC3h) 128
SSE 193, 355
SSE2 193, 355
stack
alignment considerations 122
store-to-load forwarding 20, 22, 100–103
String Instructions 167
string instructions 167
structure (struct) 41, 117, 119
subexpressions, explicitly extract common 37
superscalar processor 251
switch statement 25, 28, 33
Uunit-stride access 105, 110
Wwrite combining 113, 260, 263–264, 266
XXOR instruction 169