HP UX Performance Tools 49

Parallel processing

endif c$omp end parallel

call omp_set_nested(.false.)

...

Using MLIB_NUMBER_OF_THREADS set to 1, the code would run two-way parallel: one OpenMP thread for

C = αAB + βC

and another for

F = αDE + βF

Setting MLIB_NUMBER_OF_THREADS to 2 would allow nested parallelism and run the code four-way parallel.

If a parallel VECLIB subprogram is called from a parallelized loop or region, VECLIB will automatically avoid over-subscription of the CPUs. The number of threads spawned by each call to a parallelized VECLIB subroutine on a nested parallel region is limited by:

•MLIB_NUMBER_OF_THREADS

•The number of threads still available in the system

•will never be larger than four. Speciﬁcally:

MIN (MLIB_NUMBER_OF_THREADS, threads still available, 4)

Message passing-based nested parallelism

Nested parallelism can be achieved when calling VECLIB parallelized subprograms from an MPI process. (See “Parallelized subprograms in VECLIB” on page 1104.) Consider the following code:

...

call mpi_init (ierr)

call mpi_comm_rank(MPI_COMM_WORLD, myid, ierr) if (myid.eq.0) then

call dgemm(‘n’, ‘n’, m, m, m, alpha, a, lda, b, ldb, beta, c,ldc)

else

call dgemm(‘n’, ‘n’, m, m, m, alpha, d, ldd, e, lde, beta, f,ldf)

endif

...

Chapter 1 Introduction to VECLIB 21