Parallel processing

Assume the application started on two MPI processes. Using

MLIB_NUMBER_OF_THREADS set to 1, the code would run two-way parallel: one MPI process for

C = αAB + βC

and another for

F = αDE + βF

Setting MLIB_NUMBER_OF_THREADS to 2 would allow nested parallelism and run the code four-way parallel.

Default CPS library stack is too small for MLIB

In libcps, the HP Compiler Parallel Support library, a CPS thread has a default stack size of 8M bytes. For performance reasons, several subprograms in HP MLIB use the stack for temporary arrays that exceed the default value. Using the default CPS stack size, these routines overwrite neighboring stacks, resulting in errors that are difficult to diagnose.

The solution is to change the CPS thread stacksize attribute to a value that is large enough to accommodate all the MLIB subprograms the thread may encounter. Currently, 8 MB*(the number of threads) should be sufficient for all MLIB subprograms.

The environment variable CPS_STACK_SIZE expects values in K bytes. Setting the stack size as follows would be sufficient for programs that execute on two threads:

For C shell:

%setenv CPS_STACK_SIZE 16384

For Korn shell:

%export CPS_STACK_SIZE=16384

Default Pthread library stack is too small for MLIB

The stack allocated for each new thread created using direct pthread calls to “pthread_create” might not be large enough for HP MLIB. Several subprograms in HP MLIB use the stack for storing temporary work arrays and improve performance. If the stack size is not large enough, these routines overwrite neighboring stacks, resulting in errors that are difficult to diagnose.

22HP MLIB User’s Guide