Section: 5.4 Collective operations

5.3.2 Memory consumption increase after warm-up

Remember that group operations (MPI_Comm_{create, dup, split}) may involve creating new communication buffers. If this is a problem, decreasing chunck_size may help.

5.4 Collective operations

A collective communication is a communication operation in which a group of processes works together to distribute or gather together a set of one or more values. Scali MPI Connect uses a number of different approaches to implement collective operations. Through environment variables the user can control which algorithm the application uses.

Consider the Integer Sort (IS) benchmark in NPB (NAS Parallel Benchmarks). When running on ten processes on 5 nodes over Gigabit Ethernet (mpimon -net smp,tcp bin/is.A.16.scampi

--r1 2 r2 2 r3 2 r4 2 r5 2) the resulting performance is:

Mop/s total

= 34.05

 

 

 

 

 

 

Mop/s/process

= 2.13

 

 

 

 

 

 

Extracting the MPI profile of the run can be done as follows:

 

 

 

user% export SCAMPI_TRACE="-f arg;timing"

> trace.out

 

 

user% mpimon bin/is.A.16.scampi

-- $ALL2

 

 

And running the output through scanalyze yields the following:

 

 

MPI Call

<128

128-1k

1-8k

8-32k 32-256k

256k-1M

>1M

MPI_Send

0.00

0.00

0.00

0.00

0.00

0.00

0.00

MPI_Irecv

0.00

0.00

0.00

0.00

0.00

0.00

0.00

MPI_Wait

0.69

0.00

0.00

0.00

0.00

0.00

0.00

MPI_Alltoall

0.14

0.00

0.00

0.00

0.00

0.00

0.00

MPI_Alltoallv

11.20

0.00

0.00

0.00

0.00

0.00

0.00

MPI_Reduce

1.04

0.00

0.00

0.00

0.00

0.00

0.00

MPI_Allreduce

0.00

0.00

15.63

0.00

0.00

0.00

0.00

MPI_Comm_size

0.00

0.00

0.00

0.00

0.00

0.00

0.00

MPI_Comm_rank

0.00

0.00

0.00

0.00

0.00

0.00

0.00

MPI_Keyval_free

0.00

0.00

0.00

0.00

0.00

0.00

0.00

The MPI_Alltoallv uses a high fraction of the total execution time. The communication time is the sum of all used algorithms and the total timing may depend on more than one type of communication. If one type or a few operations dominate the time consumption, they are promising candidates for tuning/optimization.

Note: Please note that the run time selectable algorithms and their values may vary on different Scali MPI Connect release versions. For information on which algorithms that are selectable at run time and their valid values, set the environment variable SCAMPI_ALGORITHM and run an example application:

# SCAMPI_ALGORITHM=1 mpimon /opt/scali/examples/bin/hello -- localhost

This will produce a listing of the different implementations of particular collective MPI calls. For each collective operation a listing consisting of a number and a short description of the algoritmn is produced, e.g., for MPI_Alltoallv() the following:

SCAMPI_ALLTOALLV_ALGORITHM alternatives

0pair0

1pair1

2pair2

3pair3

Scali MPI Connect Release 4.4 Users Guide

49

Page 61
Image 61
Escali 4.4 manual Collective operations, Memory consumption increase after warm-up