Escali 4.4 Suspending and resuming jobs, Running with dynamic interconnect failover capabilities

Page 40

Section: 3.4 Suspending and resuming jobs

 

<proc>: all (default), none, or MPI-process number(s).

-part <part>

Use nodes from partition <part>

-q

Keep quiet, no mpimon printout.

-t

test mode, no MPI program is started

<params>

Parameters not recognized are passed on to mpimon.

3.4 Suspending and resuming jobs

From time to time it is convenient to be able to suspend regular jobs running on a cluster in order to allow a critical, maybe real-time job to be use the cluster. When using Scali MPI Connect to run parallel applications suspending jobs to yield the cluster to other jobs can be achieved by sending a SIGUSR1 or SIGTSTP signal to the mpimon representing the job.

Assuming that the process identifier for this mpimon is <PID>, the user interface for this is:

user% kill -USR1 <PID> or

user% kill -TSTP <PID>

Similarly the suspended job can be resumed by sending it a SIGUSR2 or SIGCONT signal, i.e.,

user% kill -USR2 <PID> or

user% kill -CONT <PID>

3.5 Running with dynamic interconnect failover capabilities

If a runtime failure on a high speed interconnect occurs, ScaMPI has the ability to do an interconnect failover and continue running on a secondary network device. This high availability feature is part of the Scali MPI Connect/HA product, which requires a separately priced license. Once this license is installed, you may enable the failover functionality by setting the environment variable SCAMPI_FAILOVER_MODE to 1, or by using the mpimon command line argument -failover_mode.

Currently the Scali MPI Infiniband (ib0), Myrinet (gm0) and all DAT-based drivers are supported. SCI is not supported. Note also that the combination of failover and tfdr is not supported in this version of Scali MPI Connect.

Some failures will not result in a explicit error value propagating to Scali MPI. Scali MPI handles this by treating a lack of progress within a specified time as a failure. You may alter this time by setting the environment variable SCAMPI_FAILOVER_TIMEOUT to the desired number of seconds.

Failures will be logged using the standard syslog mechanism.

3.6 Running with tcp error detection - TFDR

Errors may occur when transferring data from the network card to memory. When offloading the tcp stack in hardware, this may result in actual data errors. Using the wrapper script tfdrmpimon (Transmission Failure Detection and Retransmit), Scali MPI will handle such errors by adding an extra checksum and retransmit the data if an error should occur. This high availability feature is part of the Scali MPI Connect/HA product which requires a separate license.

Scali MPI Connect Release 4.4 Users Guide

28

Image 40
Contents Scali MPI ConnectTM Users Guide Acknowledgement Copyright 1999-2005 Scali AS. All rights reservedScali Bronze Software Certificate II Software License Terms Commencement MaintenanceGrant of License Support License ManagerSub-license and distribution Export RequirementsSCALI’s Obligations LICENSEE’s ObligationsTitle to Intellectual Property Rights TransferWarranty of Title and Substantial Performance Compliance with LicensesLimitation on Remedies and Liabilities Scali MPI Connect Release 4.4 Users Guide ViiProprietary Information MiscellaneousGoverning Law Scali MPI Connect Release 4.4 Users Guide Table of contents Profiling with Scali MPI Connect Appendix a Example MPI code Scali MPI Connect Release 4.4 Users Guide Chapter Scali MPI Connect product contextScali mailing lists SMC FAQ SMC release documents Problem reportsSupport Platforms supportedHow to read this guide Acronyms and abbreviationsLicensing FeedbackNIC Typographic conventions Terms and conventionsGUI style font Typographic conventions Description of Scali MPI Connect Scali MPI Connect componentsSMC network devices Direct Access Transport DAT Network devicesShared Memory Device Ethernet DevicesUsing detstat Using detctl3.2 DET Myrinet Infiniband4.1 GM 5.1 IB6 SCI Communication protocols on DAT-devicesChannel buffer Eagerbuffering protocol Inlining protocolTransporter protocol Support for other interconnects MPI-2 FeaturesZerocopy protocol Scali MPI Connect Release 4.4 Users Guide MPI-2 Features Setting up a Scali MPI Connect environment Compiling and linkingScali MPI Connect environment variables RunningCompiler support Linker flagsRunning Scali MPI Connect programs Naming conventionsBasic usage Mpimon monitor programIdentity of parallel processes Controlling options to mpimon Standard input Standard output Program specHow to provide options to mpimon Network optionsMpirun wrapper script Mpirun usageRunning with tcp error detection Tfdr Suspending and resuming jobsRunning with dynamic interconnect failover capabilities Part partDebugging and profiling Debugging with a sequential debuggerUsing built-in segment protect violation handler Built-in-tools for debuggingAssistance for external profiling Debugging with Etnus TotalviewCommunication resources on DAT-devices Controlling communication resourcesChannelinlinethreshold size to set threshold for inlining Using MPIIsend, MPIIrecv Using MPIBsendGood programming practice with SMC Matching MPIRecv with MPIProbeError and warning messages User interface errors and warningsFatal errors Unsafe MPI programsMpimon options Prefix Giving numeric values to mpimonPostfix Scali MPI Connect Release 4.4 Users Guide Profiling with Scali MPI Connect ExampleUsing Scali MPI Connect built-in trace TracingAbsRank MPIcallcommNamerankcall-dependant-parameters where +relSecs S eTime whereFeatures ExampleUsing Scali MPI Connect built-in timing TimingMPIcallDcallsDtimeDfreq TcallsTtimeTfreq Using the scanalyze Commrank recv from fromworldFromcommonFieldsCommrank send to toworldTocommonFields where Count!avrLen!zroLen!inline!eager!transporter! whereFor timing Using SMCs built-in CPU-usage functionality This produces the following reportScali MPI Connect Release 4.4 Users Guide Tuning communication resources Automatic buffer managementHow to optimize MPI performance BenchmarkingCaching the application program on the nodes First iteration is very slowCollective operations Memory consumption increase after warm-upFinding the best algorithm Programs in the ScaMPItst package Appendix aImage contrast enhancement Scali MPI Connect Release 4.4 Users Guide File format OriginalWhy does not my program start to run? When things do not work troubleshootingAppendix B Why does my program terminate abnormally? Why can I not start mpid?General problems Per node installation of Scali MPI Connect Appendix CInstall Scali MPI Connect for TCP/IP Install Scali MPI Connect for Direct EthernetInstall Scali MPI Connect for Myrinet ExampleInstall Scali MPI Connect for Infiniband Install Scali MPI Connect for SCIInstall and configure SCI management software License optionsTroubleshooting Network providers Uninstalling SMCScali kernel drivers Troubleshooting 3rdparty DAT providers Troubleshooting the GM providerScali MPI Connect Release 4.4 Users Guide Bracket expansion Appendix D Bracket expansion and groupingGrouping Scali MPI Connect Release 4.4 Users Guide Appendix E Related documentationScali MPI Connect Release 4.4 Users Guide List of figures Scali MPI Connect Release 4.4 Users Guide Index Transporter protocolSSP