Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

A common case of misaligned store-data forwarding involves the passing of misaligned quadword floating-point data on the doubleword-aligned integer stack. Avoid the type of code shown in the following example:

mov

esp, 24h

 

 

 

fstp

QWORD

PTR [esp]

; ESP =

24

...

 

 

; Store

occurs to quadword misaligned address.

fld

QWORD

PTR [esp]

;

Quadword load cannot forward from quadword

 

 

 

;

misaligned ‘FSTP[ESP]’ store operation.

High-Byte Store-Buffer Data-Forwarding Restriction

If the following condition is present, there is a high-byte store-data buffer-forwarding restriction—the store data is from a high-byte register (AH, BH, CH, DH).

Avoid the type of code shown in the following example:

mov eax, 10h

 

 

mov [eax], bh

;

High-byte store

...

 

 

mov dl, [eax]

;

Load cannot forward from high-byte store.

One Supported Store-to-Load Forwarding Case

There is one case of a mismatched store-to-load forwarding that is supported by AMD Athlon 64 and AMD Opteron processors. The lower 32 bits from an aligned quadword write feeding into a doubleword read is allowed, as illustrated in the following example:

movq [alignedQword], mm0

...

mov eax, [alignedQword]

Store-to-Load Forwarding—False Dependencies

A load may detect a false dependency on a store-buffer entry if the load does not have a true dependency on the most recent store that matches address bits 11–2 of the load. A false match could occur on the most recent store that writes somewhere within the same doubleword of memory as the load. In addition, a false match could occur if a store address is located at an exact multiple of

102

Cache and Memory Optimizations

Chapter 5

Page 118
Image 118
AMD 250 manual High-Byte Store-Buffer Data-Forwarding Restriction, One Supported Store-to-Load Forwarding Case, 102