Escali 4.4 Error and warning messages, Unsafe MPI programs, Name space pollution, Fatal errors

Page 45

Section: 3.10 Error and warning messages

3.9.5 Unsafe MPI programs

Because of different buffering behavior, some programs may run with MPICH, but not with SMC. Unsafe MPI programs may require resources that are not always guaranteed by SMC, and deadlock might occur (since SMC uses spin locks, these may appear to be live locks). If you want to know more about how to write portable MPI programs, see for example MPI: The complete reference: vol. 1, the MPI core [2].

A typical example that will not work with SMC (for long messages):

while (...) {

MPI_Send(buf, cnt, dtype, partner, tag, comm);

MPI_Recv(buf, cnt, dtype, MPI_ANY_SOURCE, MPI_ANY_TAG, comm, sts); doStuff();

}

Thsi code tries to se the same buffer for both sending and receiving. Such logic can be found, e.g., where processes from a ring where they communicate with their neigbours. Unfortunately writing the code this way leads to deadlock, and to make it work the MPI_Send() must be replaced with MPI_Isend() and MPI_Wait(), or the whole construction should be replaced with MPI_Sendrecv() or MPI_Sendrecv_replace().

3.9.6 Name space pollution

The SMC library is written in C and all of its C names are prefixed with scampi_. Depending on the compiler used, the user may run into problems if he/she has C code using the same scampi_ prefix. In addition, there are a few global variables that may cause problems. All of these functions and variables are listed in the include files mpi.h and mpif.h. Normally, these files are installed in /opt/scali/include.

Given that SMC has not fixed its OS routines to specific libraries, it is good programming practice to avoid using OS functions or standard C-lib functions as application function names. Naming routines or global variables as send, recv, open, close, yield, internal_error, failure, service or other OS reserved names may result in an unpredictable and undesirable behavior.

3.10 Error and warning messages

3.10.1 User interface errors and warnings

User interface errors usually result from problems where the setup of the environment causes difficulties for mpimon when starting an MPI program. mpimon will not start before the environment is properly defined. These problems are usually easy to fix, by giving mpimon the correct location of the necessary executable. The error message provides a straight forward indication of what to do. Thus, only particularly troublesome user interface errors will be listed here.

Using the -verboseoption enables mpimon to print more detailed warnings.

3.10.2 Fatal errors

When a fatal error occurs, SMC prints an error message before calling MPI_Abort() to shut down all MPI-processes.

Scali MPI Connect Release 4.4 Users Guide

33

Image 45
Contents Scali MPI ConnectTM Users Guide Copyright 1999-2005 Scali AS. All rights reserved AcknowledgementScali Bronze Software Certificate Maintenance II Software License Terms CommencementGrant of License License Manager SupportSub-license and distribution Export RequirementsLICENSEE’s Obligations SCALI’s ObligationsTransfer Title to Intellectual Property RightsCompliance with Licenses Warranty of Title and Substantial PerformanceScali MPI Connect Release 4.4 Users Guide Vii Limitation on Remedies and LiabilitiesMiscellaneous Proprietary InformationGoverning Law Scali MPI Connect Release 4.4 Users Guide Table of contents Profiling with Scali MPI Connect Appendix a Example MPI code Scali MPI Connect Release 4.4 Users Guide Scali MPI Connect product context ChapterProblem reports Scali mailing lists SMC FAQ SMC release documentsSupport Platforms supportedAcronyms and abbreviations How to read this guideLicensing FeedbackNIC Terms and conventions Typographic conventionsGUI style font Typographic conventions Scali MPI Connect components Description of Scali MPI ConnectSMC network devices Network devices Direct Access Transport DATShared Memory Device Ethernet DevicesUsing detctl Using detstat3.2 DET Infiniband Myrinet4.1 GM 5.1 IBCommunication protocols on DAT-devices 6 SCIChannel buffer Inlining protocol Eagerbuffering protocolTransporter protocol MPI-2 Features Support for other interconnectsZerocopy protocol Scali MPI Connect Release 4.4 Users Guide MPI-2 Features Compiling and linking Setting up a Scali MPI Connect environmentScali MPI Connect environment variables RunningLinker flags Compiler supportNaming conventions Running Scali MPI Connect programsMpimon monitor program Basic usageIdentity of parallel processes Standard input Controlling options to mpimonStandard output Program specNetwork options How to provide options to mpimonMpirun usage Mpirun wrapper scriptSuspending and resuming jobs Running with tcp error detection TfdrRunning with dynamic interconnect failover capabilities Part partDebugging with a sequential debugger Debugging and profilingBuilt-in-tools for debugging Using built-in segment protect violation handlerAssistance for external profiling Debugging with Etnus TotalviewControlling communication resources Communication resources on DAT-devicesChannelinlinethreshold size to set threshold for inlining Using MPIBsend Using MPIIsend, MPIIrecvGood programming practice with SMC Matching MPIRecv with MPIProbeUser interface errors and warnings Error and warning messagesFatal errors Unsafe MPI programsMpimon options Giving numeric values to mpimon PrefixPostfix Scali MPI Connect Release 4.4 Users Guide Example Profiling with Scali MPI ConnectTracing Using Scali MPI Connect built-in trace+relSecs S eTime where AbsRank MPIcallcommNamerankcall-dependant-parameters whereExample FeaturesTiming Using Scali MPI Connect built-in timingMPIcallDcallsDtimeDfreq TcallsTtimeTfreq Commrank recv from fromworldFromcommonFields Using the scanalyzeCommrank send to toworldTocommonFields where Count!avrLen!zroLen!inline!eager!transporter! whereFor timing This produces the following report Using SMCs built-in CPU-usage functionalityScali MPI Connect Release 4.4 Users Guide Automatic buffer management Tuning communication resourcesBenchmarking How to optimize MPI performanceCaching the application program on the nodes First iteration is very slowMemory consumption increase after warm-up Collective operationsFinding the best algorithm Appendix a Programs in the ScaMPItst packageImage contrast enhancement Scali MPI Connect Release 4.4 Users Guide Original File formatWhen things do not work troubleshooting Why does not my program start to run?Appendix B Why can I not start mpid? Why does my program terminate abnormally?General problems Appendix C Per node installation of Scali MPI ConnectInstall Scali MPI Connect for Direct Ethernet Install Scali MPI Connect for TCP/IPInstall Scali MPI Connect for Myrinet ExampleInstall Scali MPI Connect for SCI Install Scali MPI Connect for InfinibandInstall and configure SCI management software License optionsUninstalling SMC Troubleshooting Network providersScali kernel drivers Troubleshooting the GM provider Troubleshooting 3rdparty DAT providersScali MPI Connect Release 4.4 Users Guide Appendix D Bracket expansion and grouping Bracket expansionGrouping Scali MPI Connect Release 4.4 Users Guide Related documentation Appendix EScali MPI Connect Release 4.4 Users Guide List of figures Scali MPI Connect Release 4.4 Users Guide Transporter protocol IndexSSP