But no such data dependence appears in the outer loop: each column access is independent of every other column access. Consequently, the compiler can safely distribute entire columns of the matrix to execute on different processors; the data assignments will be the same regardless of the order in which the columns are executed, so long as the rows execute in serial order.

Assumed dependences When analyzing a loop, the compiler may err on the safe side and assume that what looks like a data dependence really is one and so not parallelize the loop. Consider the following:

DO I = 101, 200

A(I) = A(I-K)

END DO

The compiler will assume that a data dependence exists in this loop because it appears that data that has been defined in a previous iteration is being used in a later iteration. On this assumption, the compiler will not parallelize the loop.

However, if the value ofk is 100, the dependence is assumed rather than real because A(I-K)is defined outside the loop. If in fact this is the case, the programmer can insert one of the following directives immediately before the loop, forcing the compiler to ignore any assumed dependences when analyzing the loop for parallelization:

DIR$ IVDEP

FPP$ NODEPCHK

VD$ NODEPCHK

For more information about these directives, see “Compatibility directives” (page 128).

Vectorization

When vectorization is enabled, the optimizer replaces eligible loops with calls to specially tuned routines in the math library. When you compile with the +Ovectorizeoption, the optimizer vectorizes wherever it determines that it is safe and feasible to do so. However, you can use directives to limit vectorization. As an alternative to the optimizer’s automatic vectorization, you can make explicit calls to the Basic Linear Algebra Subroutine (BLAS) library to perform common vector and matrix operations.

Using the +Ovectorize option

To enable vectorization, you must compile the program at optimization level 3 or higher and specify the +Ovectorizeoption, as in the following example command line:

f90 +O3 +Ovectorize prog.f90

When vectorization is enabled, the optimizer uses a pattern-matching algorithm to identify program loops as eligible for vectorization. If the optimizer can also determine that:

Vectorization will produce the same results as the original loop

There are no other optimizations that will yield better performance

the optimizer replaces the loop by a call to one of the math library routines listed in Table 6-5.

Table 31 Vector routines called by +Ovectorize

Vector routine

Description

 

 

daxpy

Add a scalar multiple of a vector to a vector, using double-precision operands.

 

 

ddot

Compute the dot product of two double-precision vectors.

 

 

memcpy

See the memory(1) man page

 

 

memmove

See the memory(1) man page

 

 

memset

See the memory(1) man page.

 

 

Vectorization 103