But no such data dependence appears in the outer loop: each column access is independent of every other column access. Consequently, the compiler can safely distribute entire columns of the matrix to execute on different processors; the data assignments will be the same regardless of the order in which the columns are executed, so long as the rows execute in serial order.
Assumed dependences When analyzing a loop, the compiler may err on the safe side and assume that what looks like a data dependence really is one and so not parallelize the loop. Consider the following:
DO I = 101, 200
A(I) =
END DO
The compiler will assume that a data dependence exists in this loop because it appears that data that has been defined in a previous iteration is being used in a later iteration. On this assumption, the compiler will not parallelize the loop.
However, if the value ofk is 100, the dependence is assumed rather than real because
•DIR$ IVDEP
•FPP$ NODEPCHK
•VD$ NODEPCHK
For more information about these directives, see “Compatibility directives” (page 128).
Vectorization
When vectorization is enabled, the optimizer replaces eligible loops with calls to specially tuned routines in the math library. When you compile with the +Ovectorizeoption, the optimizer vectorizes wherever it determines that it is safe and feasible to do so. However, you can use directives to limit vectorization. As an alternative to the optimizer’s automatic vectorization, you can make explicit calls to the Basic Linear Algebra Subroutine (BLAS) library to perform common vector and matrix operations.
Using the +Ovectorize option
To enable vectorization, you must compile the program at optimization level 3 or higher and specify the +Ovectorizeoption, as in the following example command line:
f90 +O3 +Ovectorize prog.f90
When vectorization is enabled, the optimizer uses a
•Vectorization will produce the same results as the original loop
•There are no other optimizations that will yield better performance
the optimizer replaces the loop by a call to one of the math library routines listed in Table
Table 31 Vector routines called by +Ovectorize
Vector routine | Description |
|
|
daxpy | Add a scalar multiple of a vector to a vector, using |
|
|
ddot | Compute the dot product of two |
|
|
memcpy | See the memory(1) man page |
|
|
memmove | See the memory(1) man page |
|
|
memset | See the memory(1) man page. |
|
|
Vectorization 103