PAR Technologies V5 manual Controlling ParaStation5 communication paths

Page 25

ParaStation5 TCP bypass

In order to run applications linked with one of those MPI libraries, ParaStation5 provides dedicated mpirun commands. The processes for those type of parallel tasks are spawned obeying all restrictions described in Section 5.3, “Controlling process placement”. Of course, the data transfer will be based on the communication channels supported by the particular MPI library. For MPIch using ch_p4 (TCP), ParaStation5 provides an alternative, see Section 5.7, “ParaStation5 TCP bypass”.

The command mpirun-ipath-psrunning programs linked with InfiniPathMPI is part of the psipath package. For details how to obtain this package, please contact <support@par-tec.com>.

For more information refer to mpirun_chp4(8), mpirun_chgm(8), mpirun-ipath-ps(8), mpirun_openib(8) and mpirun_elan(8).

Using the ParaStation5 command mpiexec, any parallel application supporting the PMI protocol, which is part of the MPI2 standard, may be run using the ParaStation process environment. Therefore, many other MPI2 compatible MPI libraries are now supported by ParaStation5.

It is also possible to run serial applications, thus applications not parallelized with MPI, within ParaStation. ParaStation distinguishes between serial tasks allocating a dedicated CPU within the resource management system and administrative tasks not allocating a CPU. To execute a serial program, run mpiexec -n 1 To run an administrative task, use pssh or mpiexec -A-n 1.

For more details on how to start-up serial and parallel jobs refer to mpiexec(8), pssh(8) and the ParaStation5 User's Guide.

5.7.ParaStation5 TCP bypass

ParaStation5 offers a feature called "TCP bypass", enabling applications based on TCP to use the efficient p4sock protocol. The data will be redirected within the kernel to the p4sock protocol. No modifications to the application are necessary!

To automatically configure the TCP bypass during ParaStation startup, insert a line like

Env PS_TCP FirstAddress-LastAddress

in the p4sock-section of the configuration file parastation.conf, were FirstAddress and LastAddress are the first and last IP addresses for which the bypass should be configured.

To enable the bypass for a pair of processes, the library libp4tcp.so, located in the directory /opt/ parastation/lib64 must be pre-loaded by both processes using:

export LD_PRELOAD=/opt/parastation/lib64/libp4tcp.so

For parallel and serial tasks launched by ParaStation, this environment variable is exported to all processes by default. Please refer to ps_environment(5).

It's not recommended to insertlibp4tcp.so in the global preload configuration file /etc/ld.so.preload, as this may hang connections to daemon processes started up before the bypass was configured.

See also p4tcp(8).

5.8. Controlling ParaStation5 communication paths

ParaStation uses different communication paths, see Section 5.1, “ParaStation5 pscom communication library” for details. In order to restrict the paths to use, a number of environment variables are recognized by ParaStation.

ParaStation5 Administrator's Guide

21

Image 25
Contents Administrators Guide Info@par-tec.com ParaStation5 Administrators GuideTable of Contents Problem different groups of nodes are seen as up or down History of ParaStation IntroductionAbout this document Runtime daemon Technical overviewLibraries Kernel modulesLicense Prerequisites InstallationHardware Software Directory structureKernel version Getting the ParaStation5 RPM packages Installation via RPM packagesMan Mpi2, mpi2-intel, mpi2-pgi, mpi2-pscCompiling the ParaStation5 packages from source Installing the RPMsFile Version Etc/init.d/xinetd reload Installing the documentationParaStation entries Further steps Installing MPI# rpm -Uv psdoc-5.0.0-1.noarch.rpm # rpm -Uv psmpi2.5.0.0-1.i586.rpmUninstalling ParaStation5 ParaStation5 Administrators Guide Configuration of the ParaStation system ConfigurationCopy template Define Number of nodes# /opt/parastation/bin/testconfig Enable optimized network driversHostname id HWType runJob starter accounter Testing the installation # /opt/parastation/bin/testnodes -np nodes # /opt/parastation/bin/psiadmin -s -c listParaStation5 pscom communication library Insight ParaStation5Directory /proc/sys/ps4/state # echo 10 /proc/sys/ps4/state/ResendTimeout# cat /proc/sys/ps4/state/connections Directory /proc/sys/ps4/local Controlling process placementUsing non-ParaStationapplications Using the ParaStation5 queuing facilityExporting environment variables for a task Export LDPRELOAD=/opt/parastation/lib64/libp4tcp.so Controlling ParaStation5 communication pathsPspshm or Pspsharedmem Authentication within ParaStation5PSPP4S or PSPP4SOCK Export PSPLIB=/opt/parastation/lib64/libpscomopenib.soSingle system view Homogeneous user ID spaceParallel shell tool Nodes and CPUsIntegrating external queuing systems Integration with AFSTok2env PSIRARGPRE0=/some/path/env2tokMulticasts Copying files in parallel Using ParaStation accounting# UseMCast Route add -net 224.0.0.0 netmask 240.0.0.0 dev ethXUsing memory binding Using ParaStation process pinningChanging the default ports for psid8 Spawning processes belonging to all groupsPort Problem psiadmin returns error TroubleshootingProblem node shown as down Problem bad performance Problem cannot start parallel taskProblem different groups of nodes are seen as up or down Problem cannot start process on frontendProblem psid does not startup, reports port in use Problem pssh failsProblem processes cannot access files on remote nodes Reference Pages ParaStation5 Administrators Guide Parastation.conf InstallDir inst-dir , InstallationDir inst-dirDescription ParametersStartscript SetupscriptStopscript StatusscriptOpenib P4sockMvapi ElanNrOfNodes num Accounter$GENERATE 1-96 node$0,2 $0 Node node17 16 HWType ethernet p4sock starter yes runJobs noDeadInterval num SelectTime timeLogLevel num MCastGroup group-numCPUTime time Core sizeDataSize size MemLock sizeProc CPUmap map Processes maxprocsStatusTimeout ms RdpTimeout msRdpClosedTimeout ms RdpResendTimeout msSee also ErrorsParaStation5 Administrators Guide Synopsis PsiadminOptions Standard Input Standard ErrorStandard Output Extended descriptionAll ExitAllproc cnt count Down Count hw hwHardware LoadSummary max max RdpQuit User nodes Accounters nodesGroup nodes Maxproc nodesFreeOnSuspend nodes Master nodesHandleOldBins nodes NodesSort nodesRlrss nodes Cpumap nodesStatusTimeout nodes RdpTimeout nodesRdpClosedTimeout nodes RdpResendTimeout nodesRestart nodes Resolve nodesPsiddebug mask nodes Selecttime time nodesPattern Name Description HandleOldBins 0 1 nodes Rdpmaxretrans val nodes StatusTimeout ms nodes RdpTimeout ms nodesRdpClosedTimeout ms nodes RdpResendTimeout ms nodesQuiet FilesNormal VerbosePsid Debug=level Configfile=fileLogfile=file Filename TestconfigNum ? , --usage Show a help messageParaStation5 Administrators Guide Np num TestnodesCnt num MapParaStation5 Administrators Guide Testpse -npnum TestpseParaStation5 Administrators Guide Sock P4statNet ?,--helpParaStation5 Administrators Guide Add P4tcpDelete ParaStation5 Administrators Guide Pattern Description PsaccounterCoredir=dir Dumpcore?, --help Var/account/yyyymmdd Accounting files, one per dayPsaccview Lu,--ltotuser Lj,--ljobsLg,--ltotgroup Ls,--ltotsumAqtime CpuweightCputime EndInitialization file Mlisten ParaStation5 Administrators Guide Appendix A. Quick Installation Guide # chkconfig -a /etc/init.d/parastation # /opt/parastation/bin/psiadmin psiadmin addTesting Appendix B. ParaStation license Page Page Page # psiadmin -s Building and installing ParaStation5 packagesAppendix C. Upgrading ParaStation4 to ParaStation5 Changes to the runtime environmentPage ARP GlossarySee ParaStation Logger To share a common address space within a node ParaStation5 Administrators Guide