PAR Technologies V5 manual Multicasts

Page 29

Integration with PBS PRO

If an external queuing system is used, the environment variable PSI_NODES_SORT should be set to "none", thus no sorting of any predefined node list will be done by ParaStation.

ParaStation includes its own queuing facility. For more details, refer to Section 5.4, “Using theParaStation5 queuing facility” and ParaStation5 User's Guide.

5.15.1. Integration with PBS PRO

Parallel jobs started by PBS PRO using the ParaStation mpirun command will be automatically recognized. Due to the environment variable PBS_NODEFILE, defined by PBS PRO, ParaStation will automatically setup the PSI_HOSTFILE to PBS_NODEFILE. The environment variable PSI_NODES_SORT is set to "none", thus no sorting of the predefined node list will occur. The tasks will be spawned in the given order on the predefined list of nodes.

Therefore, ParaStation will use the (unsorted) hostfile supplied by PBS PRO to startup the parallel task.

5.15.2. Integration with OpenPBS

Refer to previous Section 5.15.1, “Integration with PBS PRO”.

5.15.3. Integration with Torque

Refer to previous Section 5.15.1, “Integration with PBS PRO”.

5.15.4. Integration with LSF

Similar to Section 5.15.1, “Integration with PBS PRO”, ParaStation will also recognize the variable LSB_HOSTS, provided by LSF. This variable holds a list of nodes for the parallel task. It is copied to the ParaStation variable PSI_HOSTS, consequently it will be used for starting up the task. The environment variable PSI_NODES_SORT is set to "none", thus no sorting of the predefined node list will occur. The tasks will be spawned in the given order on the predefined list of nodes.

5.15.5. Integration with LoadLeveler

ParaStation recognizes the variable LOADL_PROCESSOR_LIST, provided by IBM LoadLeveler. This variable holds a list of nodes for the parallel task. It is copied to the ParaStation variable PSI_HOSTS, consequently it will be used for starting up the task. The environment variable PSI_NODES_SORT is set to "none", thus no sorting of the predefined node list will occur. The tasks will be spawned in the given order on the predefined list of nodes.

5.16. Multicasts

This version of ParaStation uses the ParaStation RDP protocol to exchange status information between the psid(8) daemons. Therefore, multicast functionality is no longer required. It is still possible to use multicasts, if requested.

To enable Multicast message exchange, edit parastation.conf and uncomment the

ParaStation5 Administrator's Guide

25

Image 29
Contents Administrators Guide Info@par-tec.com ParaStation5 Administrators GuideTable of Contents Problem different groups of nodes are seen as up or down History of ParaStation IntroductionAbout this document Runtime daemon Technical overviewLibraries Kernel modulesLicense Hardware InstallationPrerequisites Kernel version Directory structureSoftware Getting the ParaStation5 RPM packages Installation via RPM packagesMan Mpi2, mpi2-intel, mpi2-pgi, mpi2-pscFile Version Installing the RPMsCompiling the ParaStation5 packages from source ParaStation entries Installing the documentationEtc/init.d/xinetd reload Further steps Installing MPI# rpm -Uv psdoc-5.0.0-1.noarch.rpm # rpm -Uv psmpi2.5.0.0-1.i586.rpmUninstalling ParaStation5 ParaStation5 Administrators Guide Configuration of the ParaStation system ConfigurationCopy template Define Number of nodesHostname id HWType runJob starter accounter Enable optimized network drivers# /opt/parastation/bin/testconfig Testing the installation # /opt/parastation/bin/testnodes -np nodes # /opt/parastation/bin/psiadmin -s -c listParaStation5 pscom communication library Insight ParaStation5# cat /proc/sys/ps4/state/connections # echo 10 /proc/sys/ps4/state/ResendTimeoutDirectory /proc/sys/ps4/state Directory /proc/sys/ps4/local Controlling process placementExporting environment variables for a task Using the ParaStation5 queuing facilityUsing non-ParaStationapplications Export LDPRELOAD=/opt/parastation/lib64/libp4tcp.so Controlling ParaStation5 communication paths Pspshm or Pspsharedmem Authentication within ParaStation5 PSPP4S or PSPP4SOCK Export PSPLIB=/opt/parastation/lib64/libpscomopenib.soSingle system view Homogeneous user ID spaceParallel shell tool Nodes and CPUsIntegrating external queuing systems Integration with AFSTok2env PSIRARGPRE0=/some/path/env2tokMulticasts Copying files in parallel Using ParaStation accounting# UseMCast Route add -net 224.0.0.0 netmask 240.0.0.0 dev ethXUsing memory binding Using ParaStation process pinningChanging the default ports for psid8 Spawning processes belonging to all groupsPort Problem node shown as down TroubleshootingProblem psiadmin returns error Problem bad performance Problem cannot start parallel taskProblem different groups of nodes are seen as up or down Problem cannot start process on frontendProblem psid does not startup, reports port in use Problem pssh failsProblem processes cannot access files on remote nodes Reference Pages ParaStation5 Administrators Guide Parastation.conf InstallDir inst-dir , InstallationDir inst-dirDescription ParametersStartscript SetupscriptStopscript StatusscriptOpenib P4sockMvapi ElanNrOfNodes num Accounter$GENERATE 1-96 node$0,2 $0 Node node17 16 HWType ethernet p4sock starter yes runJobs noDeadInterval num SelectTime timeLogLevel num MCastGroup group-numCPUTime time Core sizeDataSize size MemLock sizeProc CPUmap map Processes maxprocsStatusTimeout ms RdpTimeout msRdpClosedTimeout ms RdpResendTimeout msSee also ErrorsParaStation5 Administrators Guide Options PsiadminSynopsis Standard Input Standard ErrorStandard Output Extended descriptionAllproc cnt count ExitAll Down Count hw hwHardware LoadQuit RdpSummary max max User nodes Accounters nodesGroup nodes Maxproc nodesFreeOnSuspend nodes Master nodesHandleOldBins nodes NodesSort nodesRlrss nodes Cpumap nodesStatusTimeout nodes RdpTimeout nodesRdpClosedTimeout nodes RdpResendTimeout nodesRestart nodes Resolve nodesPsiddebug mask nodes Selecttime time nodesPattern Name Description HandleOldBins 0 1 nodes Rdpmaxretrans val nodes StatusTimeout ms nodes RdpTimeout ms nodesRdpClosedTimeout ms nodes RdpResendTimeout ms nodesQuiet FilesNormal VerbosePsid Logfile=file Configfile=fileDebug=level Filename TestconfigNum ? , --usage Show a help messageParaStation5 Administrators Guide Np num TestnodesCnt num MapParaStation5 Administrators Guide Testpse -npnum TestpseParaStation5 Administrators Guide Sock P4statNet ?,--helpParaStation5 Administrators Guide Delete P4tcpAdd ParaStation5 Administrators Guide Pattern Description PsaccounterCoredir=dir Dumpcore?, --help Var/account/yyyymmdd Accounting files, one per dayPsaccview Lu,--ltotuser Lj,--ljobsLg,--ltotgroup Ls,--ltotsumAqtime CpuweightCputime EndInitialization file Mlisten ParaStation5 Administrators Guide Appendix A. Quick Installation Guide Testing # /opt/parastation/bin/psiadmin psiadmin add# chkconfig -a /etc/init.d/parastation Appendix B. ParaStation license Page Page Page # psiadmin -s Building and installing ParaStation5 packagesAppendix C. Upgrading ParaStation4 to ParaStation5 Changes to the runtime environmentPage ARP GlossarySee ParaStation Logger To share a common address space within a node ParaStation5 Administrators Guide