100 Cache and Memory Optimizations Chapter 5
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
5.5 Store-to-Load Forwarding Restrictions
Store-to-load forwarding refers to the process of a load reading (forwarding) data from the store
buffer. When this can occur, it improves performance because the load does not have to wait for the
recently written (stored) data to be written to cache and then read back out again. There are instances
in the load-store architecture of the AMD Athlon 64 and AMD Opteron processors when a load
operation is not allowed to read needed data from a store in the store buffer.
In these cases, the load cannot complete (load the needed data into a register) until the store has
retired out of the store buffer and written to the data cache. A store-buffer entry cannot retire and
write to the data cache until every instruction before the store has completed and retired from the
reorder buffer.
The implication of this restriction is that all instructions in the reorder buffer, up to and including the
store, must complete and retire out of the reorder buffer before the load can complete. Effectively, the
load has a false dependency on every instruction up to the store.
Due to the significant depth of the LS buffer of the AMDAthlon 64 and AMD Opteron processors,
any load that is dependent on a store that cannot bypass data through the LS buffer may experience
significant delays of up to tens of clock cycles, where the exact delay is a function of pipeline
conditions.
The following sections describe store-to-load forwarding examples.

Store-to-Load Forwarding Pitfalls—True Dependencies

A load is allowed to read data from the store-buffer entry only if all of the following conditions are
satisfied:
The start address of the load matches the start address of the store.
The load operand size is equal to or smaller than the store operand size.
Neither the load nor the store is misaligned.
The store data is not from a high-byte register (AH, BH, CH, or DH).
The following sections describe common-case scenarios to avoid. In these scenarios, a load has a true
dependency on an LS2-buffered store, but cannot read (forward) data from a store-buffer entry.

Narrow-to-Wide Store-Buffer Data-Forwarding Restriction

If the following conditions are present, there is a narrow-to-wide store-buffer data-forwarding
restriction:
The operand size of the store data is smaller than the operand size of the load data.
The range of addresses spanned by the store data covers some subrange of the addresses spanned
by the load data.