25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

7.2Loop Unrolling

Optimization

Use loop unrolling where appropriate to increase instruction-level parallelism:

If all of these conditions are true

Then use

 

 

• The loop is in a frequently executed piece of code.

Complete loop unrolling

The number of loop iterations is known at compile time.

The loop body includes fewer than 10 instructions.

• Spare registers are available (for example, when operating in 64-bit mode,

Partial loop unrolling

 

where additional registers are available).

 

The loop body is small, so that loop overhead is significant.

 

The number of loop iterations is likely greater than 10.

 

Application

This optimization applies to:

32-bit software

64-bit software

Loop Unrolling

Loop unrolling is a technique that duplicates the body of a loop one or more times in order to increase the number of instructions relative to the branch and allow operations from different loop iterations to execute in parallel.

There are two types of loop unrolling:

Complete loop unrolling

Partial loop unrolling

Chapter 7

Scheduling Optimizations

145

Page 161
Image 161
AMD 250 manual Loop Unrolling, 145