PAR Technologies V5 manual Problem pssh fails, Problem psid does not startup, reports port in use

Page 35

Warning issued on task startup

This typically happens, if the frontend or head node is included as compute node and also acts as gateway for the compute nodes. The "external" address of the frontend is not known to the compute nodes.

Use the PSP_NETWORK environment variable to re-direct all traffic to the cluster-internal network. See ps_environment(5) and Section 5.8, “ControllingParaStation5 communication paths” for details.

6.7. Warning issued on task startup

While starting up a parallel task, the message

execClient: chdir(/usr/tmp/username/./.): No such file or \ directory

Will use user's home directory

---------------------------------------

is displayed.

The current directory /usr/tmp/username does not exist on one or more of the remote nodes. The user's home directory, defined by the environment variable HOME will be used instead.

Make sure that the directory /usr/tmp/username is accessible on each node or change your current directory to a globally accessible directory.

6.8. Problem: pssh fails

Problem: users other than root cannot run commands on remote nodes using the pssh command.

$ pssh -n 0 date

PSI: dospawn: spawn to node 0 failed: Permission denied

By default, only root may spawn processes which are not consuming CPUs. The command pssh uses this way to run a process on a remote node.

To allow other users to spawn this type of processes, add the user or group to the adminuser or admingroup list within ParaStation using the command

psiadmin> set adminuser +username psiadmin> set admingroup +group

or add appropriate adminuser and/or admingroup entries to the ParaStation configuration file parastation.conf.

6.9. Problem: psid does not startup, reports port in use

Problem: the psid terminates after startup reporting that the port 886 is in use.

By default, the psid uses the port 886 (UDP) for the RDP protocol (inter-daemon communication). If this port is already in use, the daemon refuses to start-up and terminates immediately.

ParaStation5 Administrator's Guide

31

Image 35
Contents Administrators Guide Info@par-tec.com ParaStation5 Administrators GuideTable of Contents Problem different groups of nodes are seen as up or down History of ParaStation IntroductionAbout this document Kernel modules Technical overviewRuntime daemon LibrariesLicense Hardware InstallationPrerequisites Kernel version Directory structureSoftware Mpi2, mpi2-intel, mpi2-pgi, mpi2-psc Installation via RPM packagesGetting the ParaStation5 RPM packages ManFile Version Installing the RPMsCompiling the ParaStation5 packages from source ParaStation entries Installing the documentationEtc/init.d/xinetd reload # rpm -Uv psmpi2.5.0.0-1.i586.rpm Installing MPIFurther steps # rpm -Uv psdoc-5.0.0-1.noarch.rpmUninstalling ParaStation5 ParaStation5 Administrators Guide Define Number of nodes ConfigurationConfiguration of the ParaStation system Copy templateHostname id HWType runJob starter accounter Enable optimized network drivers# /opt/parastation/bin/testconfig Testing the installation # /opt/parastation/bin/testnodes -np nodes # /opt/parastation/bin/psiadmin -s -c listParaStation5 pscom communication library Insight ParaStation5# cat /proc/sys/ps4/state/connections # echo 10 /proc/sys/ps4/state/ResendTimeoutDirectory /proc/sys/ps4/state Directory /proc/sys/ps4/local Controlling process placementExporting environment variables for a task Using the ParaStation5 queuing facilityUsing non-ParaStationapplications Export LDPRELOAD=/opt/parastation/lib64/libp4tcp.so Controlling ParaStation5 communication pathsExport PSPLIB=/opt/parastation/lib64/libpscomopenib.so Authentication within ParaStation5Pspshm or Pspsharedmem PSPP4S or PSPP4SOCKNodes and CPUs Homogeneous user ID spaceSingle system view Parallel shell toolPSIRARGPRE0=/some/path/env2tok Integration with AFSIntegrating external queuing systems Tok2envMulticasts Route add -net 224.0.0.0 netmask 240.0.0.0 dev ethX Using ParaStation accountingCopying files in parallel # UseMCastSpawning processes belonging to all groups Using ParaStation process pinningUsing memory binding Changing the default ports for psid8Port Problem node shown as down TroubleshootingProblem psiadmin returns error Problem cannot start process on frontend Problem cannot start parallel taskProblem bad performance Problem different groups of nodes are seen as up or downProblem psid does not startup, reports port in use Problem pssh failsProblem processes cannot access files on remote nodes Reference Pages ParaStation5 Administrators Guide Parameters InstallDir inst-dir , InstallationDir inst-dirParastation.conf DescriptionStatusscript SetupscriptStartscript StopscriptElan P4sockOpenib MvapiNrOfNodes num Accounter$GENERATE 1-96 node$0,2 $0 Node node17 16 HWType ethernet p4sock starter yes runJobs noMCastGroup group-num SelectTime timeDeadInterval num LogLevel numMemLock size Core sizeCPUTime time DataSize sizeProc CPUmap map Processes maxprocsRdpResendTimeout ms RdpTimeout msStatusTimeout ms RdpClosedTimeout msSee also ErrorsParaStation5 Administrators Guide Options PsiadminSynopsis Extended description Standard ErrorStandard Input Standard OutputAllproc cnt count ExitAll Load Count hw hwDown HardwareQuit RdpSummary max max Maxproc nodes Accounters nodesUser nodes Group nodesNodesSort nodes Master nodesFreeOnSuspend nodes HandleOldBins nodesRlrss nodes Cpumap nodesRdpResendTimeout nodes RdpTimeout nodesStatusTimeout nodes RdpClosedTimeout nodesRestart nodes Resolve nodesPsiddebug mask nodes Selecttime time nodesPattern Name Description HandleOldBins 0 1 nodes Rdpmaxretrans val nodes RdpResendTimeout ms nodes RdpTimeout ms nodesStatusTimeout ms nodes RdpClosedTimeout ms nodesVerbose FilesQuiet NormalPsid Logfile=file Configfile=fileDebug=level ? , --usage Show a help message TestconfigFilename NumParaStation5 Administrators Guide Map TestnodesNp num Cnt numParaStation5 Administrators Guide Testpse -npnum TestpseParaStation5 Administrators Guide ?,--help P4statSock NetParaStation5 Administrators Guide Delete P4tcpAdd ParaStation5 Administrators Guide Pattern Description PsaccounterVar/account/yyyymmdd Accounting files, one per day DumpcoreCoredir=dir ?, --helpPsaccview Ls,--ltotsum Lj,--ljobsLu,--ltotuser Lg,--ltotgroupEnd CpuweightAqtime CputimeInitialization file Mlisten ParaStation5 Administrators Guide Appendix A. Quick Installation Guide Testing # /opt/parastation/bin/psiadmin psiadmin add# chkconfig -a /etc/init.d/parastation Appendix B. ParaStation license Page Page Page Changes to the runtime environment Building and installing ParaStation5 packages# psiadmin -s Appendix C. Upgrading ParaStation4 to ParaStation5Page ARP GlossarySee ParaStation Logger To share a common address space within a node ParaStation5 Administrators Guide