Chapter 2 C and C++ Source-Level Optimizations 13
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
2.3 Unrolling Small Loops
Optimization
Completely unroll loops that have a small fixed loop count and a small loop body.
Application
This optimization applies to:
32-bit software
64-bit software
Rationale
Many compilers do not aggressively unroll loops. Manually unrolling loops can benefit performance,
especially if the loop body is small, which makes the loop overhead significant.
Example
Avoid a small loop like this:
// 3D-transform: Multiply vector V by 4x4 transform matrix M.
for (i = 0; i < 4; i++) {
r[i] = 0;
for (j = 0; j < 4; j++) {
r[i] += m[j][i] * v[j];
}
}
Instead, replace it with its completely unrolled equivalent, as shown here:
r[0] = m[0][0] * v[0] + m[1][0] * v[1] + m[2][0] * v[2] + m[3][0] * v[3];
r[1] = m[0][1] * v[0] + m[1][1] * v[1] + m[2][1] * v[2] + m[3][1] * v[3];
r[2] = m[0][2] * v[0] + m[1][2] * v[1] + m[2][2] * v[2] + m[3][2] * v[3];
r[3] = m[0][3] * v[0] + m[1][3] * v[1] + m[2][3] * v[2] + m[3][3] * v[3];
Related Information
For information on loop unrolling at the assembly-language level, see “Loop Unrolling” on page 145.