Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

5.1Memory-Size Mismatches

Optimization

Avoid memory-size mismatches when different instructions operate on the same data. When one instruction stores and another instruction subsequently loads the same data, keep their operands aligned and keep the loads/stores of each operand the same size.

Application

This optimization applies to:

32-bit software

64-bit software

Examples—Store-to-Load-Forwarding Stalls

The following code examples result in a store-to-load-forwarding stall:

64-bit (Avoid)

foo DQ ?

...

mov DWORD PTR foo, eax mov DWORD PTR foo+4, ebx mov rcx, QWORD PTR foo

;Assume foo is 8-byte aligned.

;Store a DWORD to foo.

;Now store to foo+4.

;Load a QWORD from foo.

32-bit (Avoid)

foo DQ ?

...

mov DWORD PTR foo, eax mov DWORD PTR foo+4, edx fld QWORD PTR foo

;Assume foo is 4-byte aligned.

;Store a DWORD in foo.

;Store a DWORD in foo+4.

;Load a QWORD from foo.

Avoid

mov foo, eax mov foo+4, edx

...

movq mm0, foo

Preferred

mov

foo, eax

mov

foo+4, edx

...

 

 

movd

mm0,

foo

punpckldq

mm0,

foo+4

92

Cache and Memory Optimizations

Chapter 5

Page 108
Image 108
AMD 250 manual Memory-Size Mismatches, Examples-Store-to-Load-Forwarding Stalls, Bit Avoid, Preferred