Parallel processing

endif c$omp end parallel

call omp_set_nested(.false.)


Using MLIB_NUMBER_OF_THREADS set to 1, the code would run two-way parallel: one OpenMP thread for

C = αAB + βC

and another for

F = αDE + βF

Setting MLIB_NUMBER_OF_THREADS to 2 would allow nested parallelism and run the code four-way parallel.

If a parallel VECLIB subprogram is called from a parallelized loop or region, VECLIB will automatically avoid over-subscription of the CPUs. The number of threads spawned by each call to a parallelized VECLIB subroutine on a nested parallel region is limited by:


The number of threads still available in the system

will never be larger than four. Specifically:

MIN (MLIB_NUMBER_OF_THREADS, threads still available, 4)

Message passing-based nested parallelism

Nested parallelism can be achieved when calling VECLIB parallelized subprograms from an MPI process. (See “Parallelized subprograms in VECLIB” on page 1104.) Consider the following code:


call mpi_init (ierr)

call mpi_comm_rank(MPI_COMM_WORLD, myid, ierr) if (myid.eq.0) then

call dgemm(‘n’, ‘n’, m, m, m, alpha, a, lda, b, ldb, beta, c,ldc)


call dgemm(‘n’, ‘n’, m, m, m, alpha, d, ldd, e, lde, beta, f,ldf)



