25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

Index

Numerics

3DNow! 210, 215, 217218,221, 224, 230, 233

A

address-generation interlocks 151 AMD Athlon™ processor

microarchitecture 250251AMD Athlon™ system bus 260 arrays 10

B

binary-to-ASCII decimal conversion 181 boolean operators 17

branch target buffer (BTB) 126, 253 branches

align branch targets 76

based on comparisons between floats 54 compound branch conditions 14 dependent on random data 130 optimizing density of 126

prediction 253

replace with computation in 3DNow! code 136

C

C language 14

array notation versus pointers 10

C code to 3DNow! code examples 138140structures 39, 117

cache

64-byte cache line 116

CALL and RETURN instructions 132 ccNUMA 96

code padding using neutral code fillers 89 code segment (CS) base, nonzero 135 const type qualifier 30

D

data cache 255 decoding 254 DirectPath

DirectPath over VectorPath instructions 72 displacements, 8-bit sign-extended 88 division 160162,186

replace division with multiplication, integer 43, 160 dynamic memory allocation consideration 19

E

extended-precision data 248

F

far control-transfer instructions 142 floating-point

compare instructions 244 division and square roots 50 execution unit 258 scheduler 257

to integer conversions 52

variables and expressions are type float 9 FXCH instruction 245

I

if statement 16, 33

immediates, 8-bit sign-extended 87 IMUL instruction 164

inline functions 149, 170

inline REP string with low counts 168 instruction

cache 252 control unit 254 short encodings 80

integer

arithmetic, 64-bit 170 division 43 execution unit 256 operand, consider sign 48 scheduler 256

use 32-bit data types for integer code 47

L

L2 cache controller 259 LEA instruction 77, 85 LEAVE instruction 83 load/store 22, 258 load-execute instructions 73

floating-point instructions 74 integer instructions 73

local functions 34 local variables 41, 44 LOOP instruction 141 loops

generic loop hoisting 31 minimize pointer arithmetic 154 partial loop unrolling 146

Index

367

Page 383
Image 383
AMD 250 manual Numerics, 367