Section: 5.4 Collective operations
5.3.2 Memory consumption increase after warm-up
Remember that group operations (MPI_Comm_{create, dup, split}) may involve creating new communication buffers. If this is a problem, decreasing chunck_size may help.
5.4 Collective operations
A collective communication is a communication operation in which a group of processes works together to distribute or gather together a set of one or more values. Scali MPI Connect uses a number of different approaches to implement collective operations. Through environment variables the user can control which algorithm the application uses.
Consider the Integer Sort (IS) benchmark in NPB (NAS Parallel Benchmarks). When running on ten processes on 5 nodes over Gigabit Ethernet (mpimon
--r1 2 r2 2 r3 2 r4 2 r5 2) the resulting performance is:
Mop/s total | = 34.05 |
|
|
|
|
|
|
Mop/s/process | = 2.13 |
|
|
|
|
|
|
Extracting the MPI profile of the run can be done as follows: |
|
|
| ||||
user% export | > trace.out |
|
| ||||
user% mpimon bin/is.A.16.scampi |
|
| |||||
And running the output through scanalyze yields the following: |
|
| |||||
MPI Call | <128 | >1M | |||||
MPI_Send | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
MPI_Irecv | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
MPI_Wait | 0.69 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
MPI_Alltoall | 0.14 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
MPI_Alltoallv | 11.20 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
MPI_Reduce | 1.04 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
MPI_Allreduce | 0.00 | 0.00 | 15.63 | 0.00 | 0.00 | 0.00 | 0.00 |
MPI_Comm_size | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
MPI_Comm_rank | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
MPI_Keyval_free | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
The MPI_Alltoallv uses a high fraction of the total execution time. The communication time is the sum of all used algorithms and the total timing may depend on more than one type of communication. If one type or a few operations dominate the time consumption, they are promising candidates for tuning/optimization.
Note: Please note that the run time selectable algorithms and their values may vary on different Scali MPI Connect release versions. For information on which algorithms that are selectable at run time and their valid values, set the environment variable SCAMPI_ALGORITHM and run an example application:
# SCAMPI_ALGORITHM=1 mpimon /opt/scali/examples/bin/hello
This will produce a listing of the different implementations of particular collective MPI calls. For each collective operation a listing consisting of a number and a short description of the algoritmn is produced, e.g., for MPI_Alltoallv() the following:
SCAMPI_ALLTOALLV_ALGORITHM alternatives
0pair0
1pair1
2pair2
3pair3
Scali MPI Connect Release 4.4 Users Guide | 49 |