Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

Align and Pack DirectPath x87 Instructions

The last optimization to be performed is code packing and alignment. Having an abundance of operations in the decoder keeps the processor’s schedulers well fed in circumstances where instructions cannot be immediately provided to the decoders. Floating-point x87 code can be aligned to 8-byte boundaries as illustrated here, which is optimal on AMD Athlon, AMD Athlon 64, and AMD Opteron processors:

;Instruction Address

Opcode

Instruction

;==================================================

00000360

66

 

DB

066h

00000361

DD 06

fld

QWORD PTR [esi]

00000363

66

 

DB

066h

00000364

DD 07

fld

QWORD PTR [edi]

00000366

D8

C9

fmul

st(0), st(1)

00000368

DE C7

faddp

st(7), st(0)

0000036A

DD 04 38

fld

QWORD PTR [edi+eax]

0000036D

66

 

DB

066h

0000036E

D8

C9

fmul

st(0), st(1)

00000370

DE C6

faddp

st(6), st(0)

00000372

DD 04 47

fld

QWORD PTR [edi+eax*2]

00000375

66

 

DB

066h

00000376

D8

C9

fmul

st(0), st(1)

00000378

DE C5

faddp

st(5), st(0)

0000037A

DD 04 3B

fld

QWORD PTR [edi+ebx]

0000037D

66

 

DB

066h

0000037E

D8

C9

fmul

st(0), st(1)

00000380

DE C4

faddp

st(4), st(0)

00000382

DD 04 87

fld

QWORD PTR [edi+eax*4]

00000385

66

 

DB

066h

00000386

D8

C9

fmul

st(0), st(1)

00000388

DE C3

faddp

st(3), st(0)

0000038A

DC 0C 39

fmul

QWORD PTR [edi+ecx]

0000038D

66

 

DB

066h

0000038E

DE C1

faddp

st(1), st(0)

The instruction address specifies the address (in hexadecimal) of the instruction to the right.

Typically three DirectPath instructions occupy 7 bytes. Maintaining 8-byte alignment for the next group of three instructions requires the addition of a single byte. A 1-byte padding can easily be achieved using the single-byte NOP instruction (opcode 90h), as recommended in “Code Padding with Operand-Size Override and NOP” on page 89. However, for the special case of x87 instructions,

242

x87 Floating-Point Optimizations

Chapter 10

Page 258
Image 258
AMD 250 manual Align and Pack DirectPath x87 Instructions, 242