Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

5.5Store-to-Load Forwarding Restrictions

Store-to-load forwarding refers to the process of a load reading (forwarding) data from the store buffer. When this can occur, it improves performance because the load does not have to wait for the recently written (stored) data to be written to cache and then read back out again. There are instances in the load-store architecture of the AMD Athlon 64 and AMD Opteron processors when a load operation is not allowed to read needed data from a store in the store buffer.

In these cases, the load cannot complete (load the needed data into a register) until the store has retired out of the store buffer and written to the data cache. A store-buffer entry cannot retire and write to the data cache until every instruction before the store has completed and retired from the reorder buffer.

The implication of this restriction is that all instructions in the reorder buffer, up to and including the store, must complete and retire out of the reorder buffer before the load can complete. Effectively, the load has a false dependency on every instruction up to the store.

Due to the significant depth of the LS buffer of the AMD Athlon 64 and AMD Opteron processors, any load that is dependent on a store that cannot bypass data through the LS buffer may experience significant delays of up to tens of clock cycles, where the exact delay is a function of pipeline conditions.

The following sections describe store-to-load forwarding examples.

Store-to-Load Forwarding Pitfalls—True Dependencies

A load is allowed to read data from the store-buffer entry only if all of the following conditions are satisfied:

The start address of the load matches the start address of the store.

The load operand size is equal to or smaller than the store operand size.

Neither the load nor the store is misaligned.

The store data is not from a high-byte register (AH, BH, CH, or DH).

The following sections describe common-case scenarios to avoid. In these scenarios, a load has a true dependency on an LS2-buffered store, but cannot read (forward) data from a store-buffer entry.

Narrow-to-Wide Store-Buffer Data-Forwarding Restriction

If the following conditions are present, there is a narrow-to-wide store-buffer data-forwarding restriction:

The operand size of the store data is smaller than the operand size of the load data.

The range of addresses spanned by the store data covers some subrange of the addresses spanned by the load data.

100

Cache and Memory Optimizations

Chapter 5

Page 116
Image 116
AMD 250 manual Store-to-Load Forwarding Restrictions, Store-to-Load Forwarding Pitfalls-True Dependencies, 100