NOTE: A subroutine (but not a function) is always expected to have side effects. If you apply this directive to a subroutine call, the optimizer assumes that the call has no effect on program results and can eliminate the call to improve performance.

Indeterminate iteration counts

If the compiler finds that a runtime determination of a loop's iteration count cannot be made before the loop starts to execute, the compiler will not parallelize the loop. The reason for this precaution is that the runtime code must know the iteration count in order to determine how many iterations to distribute to the executing processors.

The following conditions can prevent a runtime count:

The loop is a DO-foreverconstruct.

An EXITstatement appears in the loop.

The loop contains a conditional GO TOstatement that exits from the loop.

The loop modifies either the loop-control or loop-limit variable.

The loop is a DO WHILEconstruct and the condition being tested is defined within the loop.

Data dependences

When a loop is parallelized, the iterations are executed independently on different processors, and the order of execution will differ from the serial order when executing on a single processor. This difference is not a problem if the iterations can occur in any order with no effect on the results. Consider the following loop:

DO I = 1, 5

A(I) = A(I) * B(I)

END DO

In this example, the array A will always end up with the same data regardless of whether the order of execution is 1-2-3-4-5, 5-4-3-2-1, 3-1-4-5-2, or any other order. The independence of each iteration from the others makes the loop an eligible candidate for parallel execution.

Such is not the case in the following:

DO I = 2, 5

A(I) = A(I-1) * B(I)

END DO

In this loop, the order of execution does matter. The data used in iteration Iis dependent upon the data that was produced in the previous iteration (I-1). The array A would end up with very different data if the order of execution were any other than 2-3-4-5. The data dependence in this loop thus makes it ineligible for parallelization.

Not all data dependences inhibit parallelization. The following paragraphs discuss some of the exceptions.

Nested loops and matrices Some nested loops that operate on matrices may have a data dependence in the inner loop only, allowing the outer loop to be parallelized. Consider the following:

DO I = 1, 10

DO J = 2, 100

A(J,I) = A(J-1,I) + 1

END DO

END DO

The data dependence in this nested loop occurs in the inner (J) loop: each row access of A(J,I)depends upon the preceding row (J-1)having been assigned in the previous iteration. If the iterations of the J loop were to execute in any other order than the one in which they would execute on a single processor, the matrix would be assigned different values. The inner loop, therefore, must not be parallelized.

102 Performance and optimization