2.0 Tunables and settings for switch software

To optimize the HPS, you can set shell variables for Parallel Environment MPI-based workloads and for IP-based workloads. This section reviews the shell variables that are most often used for performance tuning. For a complete list of tunables and their usage, see the documentation listed in section 7 of this paper.

2.1 MPI tunables for Parallel Environment

The following sections list the most common MPI tunables for applications that use the HPS. Along with each tunable is a description of the variable, what it is used for, and how to set it appropriately.

2.1.1 MP_EAGER_LIMIT

The MP_EAGER_LIMIT variable tells the MPI transport protocol to use the "eager" mode for messages less than or equal to the specified size. Under the "eager" mode, the sender sends the message without knowing if the matching receive has actually been posted by the destination task. For messages larger than the EAGER_LIMIT, a rendezvous must be used to confirm that the matching receive has been posted

The sending task does not have to wait for an okay from the receiver before sending the data, so the effective start-up cost for a small message is lower in “eager” mode. As a result, any messages that are smaller than the EAGER_LIMIT are typically faster, especially if the corresponding receive has already been posted. If the receive has not been posted, the transport incurs an extra copy cost on the target, because data is staged through the early-arrival buffers. However, the overall time to send a small message might still be less in "eager" mode. Well- designed MPI applications often try to post each MPI_RECV before the message is expected, but because tasks of a parallel job are not in lock step, most applications have occasional early arrivals.

The maximum message size for the “eager” protocol is currently 65536 bytes, although the default value is lower. An application for which a significant fraction of the MPI messages are less than 65536 bytes might see a performance benefit from setting MP_EAGER_LIMIT. If MP_EAGER_LIMIT is increased above the default value, it might also be necessary to increase MP_BUFFER_MEM, which determines the amount of memory available for early arrival buffers. Higher “eager” limits or larger task counts either demand more buffer memory or reduce the number of unlimited “eager” messages that can be outstanding, and therefore can also impact performance.

2.1.2MP_POLLING_INTERVAL and

MP_RETRANSMIT_INTERVAL

The MP_POLLING_INTERVAL and MP_RETRANSMIT_INTERVAL variables control how often the protocol code checks whether data that was previously sent is assumed to be lost and needs to be retransmitted. When the values are larger, this checking is done less often. There are two different environment variables because the check can be done by an MPI/LAPI service

pshpstuningguidewp040105.doc

Page 5

Page 5
Image 5
IBM pSeries manual Tunables and settings for switch software, MPI tunables for Parallel Environment, Mpeagerlimit