Software Optimization Guide for AMD64 Processors | 25112 Rev. 3.06 September 2005 |
5.5Store-to-Load Forwarding Restrictions
In these cases, the load cannot complete (load the needed data into a register) until the store has retired out of the store buffer and written to the data cache. A
The implication of this restriction is that all instructions in the reorder buffer, up to and including the store, must complete and retire out of the reorder buffer before the load can complete. Effectively, the load has a false dependency on every instruction up to the store.
Due to the significant depth of the LS buffer of the AMD Athlon 64 and AMD Opteron processors, any load that is dependent on a store that cannot bypass data through the LS buffer may experience significant delays of up to tens of clock cycles, where the exact delay is a function of pipeline conditions.
The following sections describe
Store-to-Load Forwarding Pitfalls—True Dependencies
A load is allowed to read data from the
•The start address of the load matches the start address of the store.
•The load operand size is equal to or smaller than the store operand size.
•Neither the load nor the store is misaligned.
•The store data is not from a
The following sections describe
Narrow-to-Wide Store-Buffer Data-Forwarding Restriction
If the following conditions are present, there is a
•The operand size of the store data is smaller than the operand size of the load data.
•The range of addresses spanned by the store data covers some subrange of the addresses spanned by the load data.
100 | Cache and Memory Optimizations | Chapter 5 |