25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

Avoid

mov eax, 10h

 

 

 

 

mov WORD PTR [eax], bx

; Word

store

 

...

 

 

 

 

mov ecx, DWORD PTR [eax]

;

Doubleword

load--cannot forward upper byte

 

;

from

store

buffer

Avoid

mov eax, 10h

 

 

 

 

mov BYTE PTR [eax+3], bl

; Byte

store

 

...

 

 

 

 

mov ecx, DWORD PTR [eax]

;

Doubleword

load--cannot forward upper byte

 

;

from

store

buffer

Wide-to-Narrow Store-Buffer Data-Forwarding Restriction

If the following conditions are present, there is a wide-to-narrow store-buffer data-forwarding restriction:

The operand size of the store data is greater than the operand size of the load data.

The start address of the store data does not match the start address of the load data.

Avoid

mov eax, 10h

add DWORD PTR [eax], ebx mov cx, WORD PTR [eax+2]

;Doubleword store

;Word load--cannot forward high word

;from store buffer

Avoid

movq

[foo], mm1

; Store upper and lower half.

...

 

 

 

 

add

eax,

[foo]

;

Fine

add

edx,

[foo+4]

;

Not good!

Preferred

movd

[foo], mm1

; Store lower half.

punpckhdq

mm1, mm1

; Copy upper half into lower half.

movd

[foo+4], mm1

; Store lower half.

...

 

 

add

eax, [foo]

; Fine

add

edx, [foo+4]

; Fine

Misaligned Store-Buffer Data-Forwarding Restriction

If the following condition is present, there is a misaligned store-buffer data-forwarding restriction:

The store or load address is misaligned. For example, a quadword store is not aligned to a quadword boundary.

Chapter 5

Cache and Memory Optimizations

101

Page 117
Image 117
AMD 250 Wide-to-Narrow Store-Buffer Data-Forwarding Restriction, Misaligned Store-Buffer Data-Forwarding Restriction, 101