Escali 4.4 manual Collective operations, Memory consumption increase after warm-up

Page 61

Section: 5.4 Collective operations

5.3.2 Memory consumption increase after warm-up

Remember that group operations (MPI_Comm_{create, dup, split}) may involve creating new communication buffers. If this is a problem, decreasing chunck_size may help.

5.4 Collective operations

A collective communication is a communication operation in which a group of processes works together to distribute or gather together a set of one or more values. Scali MPI Connect uses a number of different approaches to implement collective operations. Through environment variables the user can control which algorithm the application uses.

Consider the Integer Sort (IS) benchmark in NPB (NAS Parallel Benchmarks). When running on ten processes on 5 nodes over Gigabit Ethernet (mpimon -net smp,tcp bin/is.A.16.scampi

--r1 2 r2 2 r3 2 r4 2 r5 2) the resulting performance is:

Mop/s total

= 34.05

 

 

 

 

 

 

Mop/s/process

= 2.13

 

 

 

 

 

 

Extracting the MPI profile of the run can be done as follows:

 

 

 

user% export SCAMPI_TRACE="-f arg;timing"

> trace.out

 

 

user% mpimon bin/is.A.16.scampi

-- $ALL2

 

 

And running the output through scanalyze yields the following:

 

 

MPI Call

<128

128-1k

1-8k

8-32k 32-256k

256k-1M

>1M

MPI_Send

0.00

0.00

0.00

0.00

0.00

0.00

0.00

MPI_Irecv

0.00

0.00

0.00

0.00

0.00

0.00

0.00

MPI_Wait

0.69

0.00

0.00

0.00

0.00

0.00

0.00

MPI_Alltoall

0.14

0.00

0.00

0.00

0.00

0.00

0.00

MPI_Alltoallv

11.20

0.00

0.00

0.00

0.00

0.00

0.00

MPI_Reduce

1.04

0.00

0.00

0.00

0.00

0.00

0.00

MPI_Allreduce

0.00

0.00

15.63

0.00

0.00

0.00

0.00

MPI_Comm_size

0.00

0.00

0.00

0.00

0.00

0.00

0.00

MPI_Comm_rank

0.00

0.00

0.00

0.00

0.00

0.00

0.00

MPI_Keyval_free

0.00

0.00

0.00

0.00

0.00

0.00

0.00

The MPI_Alltoallv uses a high fraction of the total execution time. The communication time is the sum of all used algorithms and the total timing may depend on more than one type of communication. If one type or a few operations dominate the time consumption, they are promising candidates for tuning/optimization.

Note: Please note that the run time selectable algorithms and their values may vary on different Scali MPI Connect release versions. For information on which algorithms that are selectable at run time and their valid values, set the environment variable SCAMPI_ALGORITHM and run an example application:

# SCAMPI_ALGORITHM=1 mpimon /opt/scali/examples/bin/hello -- localhost

This will produce a listing of the different implementations of particular collective MPI calls. For each collective operation a listing consisting of a number and a short description of the algoritmn is produced, e.g., for MPI_Alltoallv() the following:

SCAMPI_ALLTOALLV_ALGORITHM alternatives

0pair0

1pair1

2pair2

3pair3

Scali MPI Connect Release 4.4 Users Guide

49

Image 61
Contents Scali MPI ConnectTM Users Guide Copyright 1999-2005 Scali AS. All rights reserved AcknowledgementScali Bronze Software Certificate II Software License Terms Commencement MaintenanceGrant of License License Manager SupportSub-license and distribution Export RequirementsLICENSEE’s Obligations SCALI’s ObligationsTransfer Title to Intellectual Property RightsCompliance with Licenses Warranty of Title and Substantial PerformanceScali MPI Connect Release 4.4 Users Guide Vii Limitation on Remedies and LiabilitiesMiscellaneous Proprietary InformationGoverning Law Scali MPI Connect Release 4.4 Users Guide Table of contents Profiling with Scali MPI Connect Appendix a Example MPI code Scali MPI Connect Release 4.4 Users Guide Scali MPI Connect product context ChapterProblem reports Scali mailing lists SMC FAQ SMC release documentsSupport Platforms supportedAcronyms and abbreviations How to read this guideLicensing FeedbackNIC Typographic conventions Terms and conventionsGUI style font Typographic conventions Scali MPI Connect components Description of Scali MPI ConnectSMC network devices Network devices Direct Access Transport DATShared Memory Device Ethernet DevicesUsing detstat Using detctl3.2 DET Infiniband Myrinet4.1 GM 5.1 IB6 SCI Communication protocols on DAT-devicesChannel buffer Eagerbuffering protocol Inlining protocolTransporter protocol Support for other interconnects MPI-2 FeaturesZerocopy protocol Scali MPI Connect Release 4.4 Users Guide MPI-2 Features Compiling and linking Setting up a Scali MPI Connect environmentScali MPI Connect environment variables RunningLinker flags Compiler supportNaming conventions Running Scali MPI Connect programsBasic usage Mpimon monitor programIdentity of parallel processes Standard input Controlling options to mpimonStandard output Program specNetwork options How to provide options to mpimonMpirun usage Mpirun wrapper scriptSuspending and resuming jobs Running with tcp error detection TfdrRunning with dynamic interconnect failover capabilities Part partDebugging with a sequential debugger Debugging and profilingBuilt-in-tools for debugging Using built-in segment protect violation handlerAssistance for external profiling Debugging with Etnus TotalviewCommunication resources on DAT-devices Controlling communication resourcesChannelinlinethreshold size to set threshold for inlining Using MPIBsend Using MPIIsend, MPIIrecvGood programming practice with SMC Matching MPIRecv with MPIProbeUser interface errors and warnings Error and warning messagesFatal errors Unsafe MPI programsMpimon options Prefix Giving numeric values to mpimonPostfix Scali MPI Connect Release 4.4 Users Guide Example Profiling with Scali MPI ConnectTracing Using Scali MPI Connect built-in trace+relSecs S eTime where AbsRank MPIcallcommNamerankcall-dependant-parameters whereExample FeaturesTiming Using Scali MPI Connect built-in timingMPIcallDcallsDtimeDfreq TcallsTtimeTfreq Commrank recv from fromworldFromcommonFields Using the scanalyzeCommrank send to toworldTocommonFields where Count!avrLen!zroLen!inline!eager!transporter! whereFor timing This produces the following report Using SMCs built-in CPU-usage functionalityScali MPI Connect Release 4.4 Users Guide Automatic buffer management Tuning communication resourcesBenchmarking How to optimize MPI performanceCaching the application program on the nodes First iteration is very slowMemory consumption increase after warm-up Collective operationsFinding the best algorithm Programs in the ScaMPItst package Appendix aImage contrast enhancement Scali MPI Connect Release 4.4 Users Guide Original File formatWhy does not my program start to run? When things do not work troubleshootingAppendix B Why does my program terminate abnormally? Why can I not start mpid?General problems Appendix C Per node installation of Scali MPI ConnectInstall Scali MPI Connect for Direct Ethernet Install Scali MPI Connect for TCP/IPInstall Scali MPI Connect for Myrinet ExampleInstall Scali MPI Connect for SCI Install Scali MPI Connect for InfinibandInstall and configure SCI management software License optionsTroubleshooting Network providers Uninstalling SMCScali kernel drivers Troubleshooting the GM provider Troubleshooting 3rdparty DAT providersScali MPI Connect Release 4.4 Users Guide Bracket expansion Appendix D Bracket expansion and groupingGrouping Scali MPI Connect Release 4.4 Users Guide Related documentation Appendix EScali MPI Connect Release 4.4 Users Guide List of figures Scali MPI Connect Release 4.4 Users Guide Transporter protocol IndexSSP