Scali MPI Connect Release 4.4 Users Guide 47
Chapter 5 Tuning SMC to your application
Scali MPI Connect allows the user to exercise control over the communication mechanisms
through adjustment of the thresholds that steer which mechanism to use for a particular
message. This is one technique that can be used to improve performance of parallel
applications on a cluster.
Forcing size parameters to mpimon is usually not necessary. This is only a means of
optimising SMC to a particular application, based on knowledge of communication patterns. For
unsafe MPI programs it may be necessary to adjust buffering to allow the program to complete.

5.1 Tuning communication resources

The communication resources allocated by Scali MPI Connect are shared among the MPI
processes in the node.
Communication buffer adaption: If the communication behaviour of the application is
known, explicitly providing buffersize settings to mpimon, to match the requirement of
the application, will in most cases improve performance.
Example: Application sending only 900 bytes messages.
Set channel_inline_threshold 964 (64 added for alignment) and increase the channel-
size significantly (32-128 k).
Setting eager_size 1k and eager_count high (16 or more).
If all messages can be buffered, the transporter-{size, count} can be set to low values to
reduce shared memory consumption.
How do I control shared memory usage?
Adjusting SMC buffer sizes
How do I calculate shared memory usage?
The buffer space required by a communication channel is approximately:
chunk-size = (2 * channel-entry-size * channel-entry-count)
+ (transporter-size * transporter-count)
+ (eager-size * eager-count)
+4096 (give-or-take-a-few-bytes)
Total-usage = chunk-size * no-of-processes
5.1.1 Automatic buffer management
The pool-size is a limit for the total amount of shared memory. The automatic buffer size
computations is based on full connectivity, i.e. all communicating with all others. Given a total
pool of memory dedicated to communication, each communication channel will be restricted to
use a partition of only(P = number of processes):
chunk = inter_pool_size / P
The automatic approach is to downsize all buffers associated with a communication channel
until it fits in its part of the pool. The automatic chunk size is calculated to wrap a complete
communication channel.