AMD Athlon™ Processor x86 Code Optimization

22007E/0 — November 1999

lead to unexpected results. Fortunately, in the vast majority of cases, the final result will differ only in the least significant bits.

Example 1 (Avoid):

double a[100],sum; int i;

sum = 0.0f;

for (i=0; i<100; i++) { sum += a[i];

}

Example 2 (Preferred):

double a[100],sum1,sum2,sum3,sum4,sum; int i;

sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0;

for (i=0; i<100; i+4) { sum1 += a[i];

sum2 += a[i+1]; sum3 += a[i+2]; sum4 += a[i+3];

}

sum = (sum4+sum3)+(sum1+sum2);

Notice that the 4-way unrolling was chosen to exploit the 4-stage fully pipelined floating-point adder. Each stage of the floating- point adder is occupied on every clock cycle, ensuring maximal sustained utilization.

Explicitly Extract Common Subexpressions

In certain situations, C compilers are unable to extract common subexpressions from floating-point expressions due to the guarantee against reordering of such expressions in the ANSI standard. Specifically, the compiler can not re-arrange the computation according to algebraic equivalencies before extracting common subexpressions . In such cases, the p rog ra m m e r s h o u l d m a nu a l ly ex t ra c t t h e c o m m o n subexpression. It should be noted that re-arranging the expression may result in different computational results due to the lack of associativity of floating-point operations, but the results usually differ in only the least significant bits.

26

Explicitly Extract Common Subexpressions

Page 42
Image 42
AMD x86 manual Explicitly Extract Common Subexpressions, Example 1 Avoid