2.8Unnecessary Store-to-Load Dependencies
A store-to-load dependency exists when data is stored to memory, only to be read back shortly thereafter. For details, see “Store-to-Load Forwarding Restrictions” on page 100. The
AMD Athlon™ 64 and AMD Opteron™ processors contain hardware to accelerate such store-to-load dependencies, allowing the load to obtain the store data before it has been written to memory. However, it is still faster to avoid such dependencies altogether and keep the data in an internal register.
Avoiding store-to-load dependencies is especially important if they are part of a long dependency chain, as may occur in a recurrence computation. If the dependency occurs while operating on arrays, many compilers are unable to optimize the code in a way that avoids the store-to-load dependency. In some instances the language definition may prohibit the compiler from using code transformations that would remove the store-to-load dependency. Therefore, it is recommended that the programmer remove the dependency manually, for example, by introducing a temporary variable that can be kept in a register, as in the following example. This can result in a significant performance increase.
Listing 3. Avoid
double x[VECLEN], y[VECLEN], z[VECLEN]; unsigned int k;
for (k = 1; k < VECLEN; k++) { x[k] = x[k-1] + y[k];
}
for (k = 1; k < VECLEN; k++) { x[k] = z[k] * (y[k] - x[k-1]);
}
Listing 4. Preferred
double x[VECLEN], y[VECLEN], z[VECLEN]; unsigned int k;
double t;
t = x[0];
for (k = 1; k < VECLEN; k++) { t = t + y[k];
x[k] = t;
}
t = x[0];
for (k = 1; k < VECLEN; k++) { t = z[k] * (y[k] - t); x[k] = t;
}