AMD Athlon™ Processor x86 Code Optimization

22007E/0 — November 1999

 

no fa st er t h an three it era t ions in 1 0 cy c l es, or 6/ 10

 

floating-point adds per cycle, or 1.4 times as fast as the original

 

loop.

Deriving Loop

A frequently used loop construct is a counting loop. In a typical

Control For Partially

case, the loop count starts at some lower bound lo, increases by

Unrolled Loops

some fixed, positive increment inc for each iteration of the

 

loop, and may not exceed some upper bound hi. The following

 

example shows how to partially unroll such a loop by an

 

unrolling factor of fac, and how to derive the loop control for

 

the partially unrolled version of the loop.

 

Example 1 (rolled loop):

 

for (k = lo; k <= hi; k += inc) {

 

x[k] =

 

...

 

}

 

Example 2 (partially unrolled loop):

 

for (k = lo; k <= (hi - (fac-1)*inc); k += fac*inc) {

 

x[k] =

 

...

 

x[k+inc] =

 

...

 

...

 

x[k+(fac-1)*inc] =

 

...

 

}

 

/* handle end cases */

 

for (k = k; k <= hi; k += inc) {

 

x[k] =

 

...

 

}

70

Unrolling Loops

Page 86
Image 86
AMD x86 Deriving Loop, Control For Partially, Unrolled Loops, Example 1 rolled loop, Example 2 partially unrolled loop