2.1.5 MP_TASK_AFFINITY

Setting MP_TASK_AFFINITY to SNI tells parallel operating environment (POE) to bind each task to the MCM containing the HPS adapter it will use, so that the adapter, CPU, and memory used by any task are all local to the same MCM. To prevent multiple tasks from sharing the same CPU, do not set MP_TASK_AFFINITY to SNI if more than four tasks share any HPS adapter. If more than four tasks share any HPS adapter, set MP_TASK_AFFINITY to MCM, which allows each MPI task to use CPUs and memory from the same MCM, even if the adapter is on a remote MCM. If MP_TASK_AFFINITY is set to either MCM or SNI, MEMORY_AFFINITY should be set to MCM.

2.1.6 MP_CSS_INTERRUPT

The MP_CSS_INTERRUPT variable allows you to control interrupts triggered by packet arrivals. Setting this variable to no implies that the application should run in polling mode. This setting is appropriate for applications that have mostly synchronous communication. Even applications that make heavy use of MPI_ISEND/MPI_IRECV should be considered synchronous unless there is significant computation between the ISEND/IRECV postings and the MPI_WAITALL. The default value for MP_CSS_INTERRUPT is no.

For applications with an asynchronous communication pattern (one that uses non-blocking MPI calls), it might be more appropriate to set this variable to yes. Setting MP_CSS_INTERRUPT to yes can cause your application to be interrupted when new packets arrive, which could be helpful if a receiving MPI task is likely to be in the middle of a long numerical computation at the time when data from a remote-blocking send arrives.

2.2 MPI-IO

The most effective use of MPI-IO is when an application takes advantage of file views and collective operations to read or write a file in which data for each task is dispersed across the file. To simplify we focus on read, but write is similar.

An example is reading a matrix with application-wide scope from a single file, with each task needing a different fragment of that matrix. To bring in the fragment needed for each task, several disjoint chunks must be read. If every task were to do POSIX read of each chunk, the GPFS file system handle it correctly. However, because each read() is independent, there is little chance to apply an effective strategy.

When the same set of reads is done with collective MPI-IO, every task specifies all the chunks it needs to one MPI-IO call. Because the call is collective, the requirements of all the tasks are known at one time. As a result, MPI can use a broad strategy for doing the I/O.

When MPI-IO is used but each call to read or write a file is local or specifies only a single chunk of data, there is much less chance for MPI-IO to do anything more than a simple POSIX read() would do. Also, when the file is organized by task rather than globally, there is less MPI-IO can do to help. This is the case when each task's fragment of the matrix is stored contiguously in the file rather than having the matrix organized as a whole.

pshpstuningguidewp040105.doc

Page 7

Page 7
Image 7
IBM pSeries manual Mptaskaffinity, Mpcssinterrupt, Mpi-Io