25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

2.3Unrolling Small Loops

Optimization

Completely unroll loops that have a small fixed loop count and a small loop body.

Application

This optimization applies to:

32-bit software

64-bit software

Rationale

Many compilers do not aggressively unroll loops. Manually unrolling loops can benefit performance, especially if the loop body is small, which makes the loop overhead significant.

Example

Avoid a small loop like this:

//3D-transform: Multiply vector V by 4x4 transform matrix M. for (i = 0; i < 4; i++) {

r[i] = 0;

for (j = 0; j < 4; j++) { r[i] += m[j][i] * v[j];

}

}

Instead, replace it with its completely unrolled equivalent, as shown here:

r[0] = m[0][0] * v[0] + m[1][0] * v[1] + m[2][0] * v[2] + m[3][0] * v[3];

r[1] = m[0][1] * v[0] + m[1][1] * v[1] + m[2][1] * v[2] + m[3][1] * v[3];

r[2] = m[0][2] * v[0] + m[1][2] * v[1] + m[2][2] * v[2] + m[3][2] * v[3];

r[3] = m[0][3] * v[0] + m[1][3] * v[1] + m[2][3] * v[2] + m[3][3] * v[3];

Related Information

For information on loop unrolling at the assembly-language level, see “Loop Unrolling” on page 145.

Chapter 2

C and C++ Source-Level Optimizations

13

Page 29
Image 29
AMD 250 manual Unrolling Small Loops, Related Information