Escali 4.4 manual Finding the best algorithm

Models: 4.4

1 81
Download 81 pages 20.52 Kb
Page 62
Image 62

Section: 5.4 Collective operations

 

4

pair4

 

5

pipe0

 

6

pipe1

def

7

safe

8 smp

By looping through these alternatives the performance of IS varies:

algorithm 0: Mop/s total = 95.60

algorithm 1: Mop/s total = 78.37

algorithm 2: Mop/s total = 34.44

algorithm 3: Mop/s total = 61.77

algorithm 4: Mop/s total = 41.00

algorithm 5: Mop/s total = 49.14

algorithm 6: Mop/s total = 85.17

algorithm 7: Mop/s total = 60.22

algorithm 8: Mop/s total = 48.61

For this particular combination of Alltoallv-algorithm and application (IS) the performance varies significantly, with algorithm 0 close to doubling the performance over the default.

5.4.1 Finding the best algorithm

Consider the image processing example from Chapter 4 which containes four collective operations. All of these can be tuned with respect to algorithm according to the following pattern:

user% for a in <range>; do

\>; SCAMPI_<MPI-function>_ALGORITHM=$a \

\>;mpimon <application> -- <nodes> ><application>.out.$a; \ \>; done

For example, trying out the alternative algorithms for MPI_Reduce with two processes can be done as follows (assuming Bourne Again Shell [bash]:

user% for a in 0 1 2 3 4 5 6 7 8; do \>; SCAMPI_REDUCE_ALGORITHM=$a

\>; mpimon ./kollektive-8 ./uf256-8.pgm -- r1 r2; \>; done

Given that the application then reports the timing of the relevant parts of the code a best choice can be made. Note however that with multiple collective operations working in the same program there may be interference between the algorithms. Also, the performance of the implementations is interconnect dependent.

Scali MPI Connect Release 4.4 Users Guide

50

Page 62
Image 62
Escali 4.4 manual Finding the best algorithm