Chapter 5

Tuning SMC to your application

 

 

 

 

Scali MPI Connect allows the user to exercise control over the communication mechanisms through adjustment of the thresholds that steer which mechanism to use for a particular message. This is one technique that can be used to improve performance of parallel applications on a cluster.

Forcing size parameters to mpimon is usually not necessary. This is only a means of optimising SMC to a particular application, based on knowledge of communication patterns. For unsafe MPI programs it may be necessary to adjust buffering to allow the program to complete.

5.1 Tuning communication resources

The communication resources allocated by Scali MPI Connect are shared among the MPI processes in the node.

Communication buffer adaption: If the communication behaviour of the application is known, explicitly providing buffersize settings to mpimon, to match the requirement of the application, will in most cases improve performance.

Example: Application sending only 900 bytes messages.

Set channel_inline_threshold 964 (64 added for alignment) and increase the channel- size significantly (32-128 k).

Setting eager_size 1k and eager_count high (16 or more).

If all messages can be buffered, the transporter-{size, count} can be set to low values to reduce shared memory consumption.

How do I control shared memory usage? Adjusting SMC buffer sizes

How do I calculate shared memory usage?

The buffer space required by a communication channel is approximately:

chunk-size = (2 * channel-entry-size * channel-entry-count) + (transporter-size * transporter-count)

+ (eager-size * eager-count)

+4096 (give-or-take-a-few-bytes) Total-usage = chunk-size * no-of-processes

5.1.1 Automatic buffer management

The pool-size is a limit for the total amount of shared memory. The automatic buffer size computations is based on full connectivity, i.e. all communicating with all others. Given a total pool of memory dedicated to communication, each communication channel will be restricted to use a partition of only(P = number of processes):

chunk = inter_pool_size / P

The automatic approach is to downsize all buffers associated with a communication channel until it fits in its part of the pool. The automatic chunk size is calculated to wrap a complete communication channel.

Scali MPI Connect Release 4.4 Users Guide

47

Page 59
Image 59
Escali 4.4 manual Tuning communication resources, Automatic buffer management