Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

2.9Matching Store and Load Size

Optimization

Align memory accesses and match addresses and sizes of stores and dependent loads.

Application

This optimization applies to:

32-bit software

64-bit software

Rationale

The AMD Athlon 64 and AMD Opteron processors contain a load-store buffer to speed up the forwarding of store data to dependent loads. However, this store-to-load forwarding (STLF) inside the load-store buffer occurs, in general, only when the addresses and sizes of the store and the dependent load match, and when both memory accesses are aligned. For details, see “Store-to-Load Forwarding Restrictions” on page 100.

It is impossible to control load and store activity at the source level so as to avoid all cases that violate restrictions placed on store-to-load-forwarding. In some instances it is possible to spot such cases in the source code. Size mismatches can easily occur when different-size data items are joined in a union. Address mismatches could be the result of pointer manipulation.

The following examples show a situation involving a union of different-size data items. The examples show a user-defined unsigned 16.16 fixed-point type and two operations defined on this type. Function fixed_add adds two fixed-point numbers, and function fixed_int extracts the integer portion of a fixed-point number. Listing 5 shows an inappropriate implementation of fixed_int, which, when used on the result of fixed_add, causes misalignment, address mismatch, or size mismatch between memory operands, such that no store-to-load forwarding in the load-store buffer takes place. Listing 6 shows how to properly implement fixed_int in order to allow store-to-load forwarding in the load-store buffer.

Examples

Listing 5. Avoid

typedef union { unsigned int whole; struct {

unsigned short frac; /* Lower 16 bits are fraction. */ unsigned short intg; /* Upper 16 bits are integer. */

}parts;

}FIXED_U_16_16;

22

C and C++ Source-Level Optimizations

Chapter 2

Page 38
Image 38
AMD 250 manual Matching Store and Load Size, Examples