PAR Technologies V5 manual Problem different groups of nodes are seen as up or down

Page 4

 

ParaStation5 Administrator's Guide

6.2. Problem: node shown as "down"

29

6.3. Problem: cannot start parallel task

30

6.4. Problem: bad performance

30

6.5. Problem: different groups of nodes are seen as up or down

............................................. 30

6.6. Problem: cannot start process on front end

30

6.7. Warning issued on task startup

31

6.8. Problem: pssh fails

31

6.9. Problem: psid does not startup, reports port in use

31

6.10. Problem: processes cannot access files on remote nodes

.............................................. 32

I. Reference Pages

33

parastation.conf

35

psiadmin

47

psid

63

test_config

65

test_nodes

67

test_pse

69

p4stat

71

p4tcp

73

psaccounter

75

psaccview

77

mlisten

81

A. Quick Installation Guide

83

B. ParaStation license

85

C. Upgrading ParaStation4 to ParaStation5

89

C.1. Building and installing ParaStation5 packages

89

C.2. Changes to the runtime environment

89

Glossary

91

iv

ParaStation5 Administrator's Guide

Image 4
Contents Administrators Guide ParaStation5 Administrators Guide Info@par-tec.comTable of Contents Problem different groups of nodes are seen as up or down Introduction History of ParaStationAbout this document Technical overview Runtime daemonLibraries Kernel modulesLicense Prerequisites InstallationHardware Software Directory structureKernel version Installation via RPM packages Getting the ParaStation5 RPM packagesMan Mpi2, mpi2-intel, mpi2-pgi, mpi2-pscCompiling the ParaStation5 packages from source Installing the RPMsFile Version Etc/init.d/xinetd reload Installing the documentationParaStation entries Installing MPI Further steps# rpm -Uv psdoc-5.0.0-1.noarch.rpm # rpm -Uv psmpi2.5.0.0-1.i586.rpmUninstalling ParaStation5 ParaStation5 Administrators Guide Configuration Configuration of the ParaStation systemCopy template Define Number of nodes# /opt/parastation/bin/testconfig Enable optimized network driversHostname id HWType runJob starter accounter Testing the installation # /opt/parastation/bin/psiadmin -s -c list # /opt/parastation/bin/testnodes -np nodesInsight ParaStation5 ParaStation5 pscom communication libraryDirectory /proc/sys/ps4/state # echo 10 /proc/sys/ps4/state/ResendTimeout# cat /proc/sys/ps4/state/connections Controlling process placement Directory /proc/sys/ps4/localUsing non-ParaStationapplications Using the ParaStation5 queuing facilityExporting environment variables for a task Controlling ParaStation5 communication paths Export LDPRELOAD=/opt/parastation/lib64/libp4tcp.soAuthentication within ParaStation5 Pspshm or PspsharedmemPSPP4S or PSPP4SOCK Export PSPLIB=/opt/parastation/lib64/libpscomopenib.soHomogeneous user ID space Single system viewParallel shell tool Nodes and CPUsIntegration with AFS Integrating external queuing systemsTok2env PSIRARGPRE0=/some/path/env2tokMulticasts Using ParaStation accounting Copying files in parallel# UseMCast Route add -net 224.0.0.0 netmask 240.0.0.0 dev ethXUsing ParaStation process pinning Using memory bindingChanging the default ports for psid8 Spawning processes belonging to all groupsPort Problem psiadmin returns error TroubleshootingProblem node shown as down Problem cannot start parallel task Problem bad performanceProblem different groups of nodes are seen as up or down Problem cannot start process on frontendProblem pssh fails Problem psid does not startup, reports port in useProblem processes cannot access files on remote nodes Reference Pages ParaStation5 Administrators Guide InstallDir inst-dir , InstallationDir inst-dir Parastation.confDescription ParametersSetupscript StartscriptStopscript StatusscriptP4sock OpenibMvapi ElanAccounter NrOfNodes numNode node17 16 HWType ethernet p4sock starter yes runJobs no $GENERATE 1-96 node$0,2 $0SelectTime time DeadInterval numLogLevel num MCastGroup group-numCore size CPUTime timeDataSize size MemLock sizeProc Processes maxprocs CPUmap mapRdpTimeout ms StatusTimeout msRdpClosedTimeout ms RdpResendTimeout msErrors See alsoParaStation5 Administrators Guide Synopsis PsiadminOptions Standard Error Standard InputStandard Output Extended descriptionAll ExitAllproc cnt count Count hw hw DownHardware LoadSummary max max RdpQuit Accounters nodes User nodesGroup nodes Maxproc nodesMaster nodes FreeOnSuspend nodesHandleOldBins nodes NodesSort nodesCpumap nodes Rlrss nodesRdpTimeout nodes StatusTimeout nodesRdpClosedTimeout nodes RdpResendTimeout nodesResolve nodes Restart nodesSelecttime time nodes Psiddebug mask nodesPattern Name Description HandleOldBins 0 1 nodes Rdpmaxretrans val nodes RdpTimeout ms nodes StatusTimeout ms nodesRdpClosedTimeout ms nodes RdpResendTimeout ms nodesFiles QuietNormal VerbosePsid Debug=level Configfile=fileLogfile=file Testconfig FilenameNum ? , --usage Show a help messageParaStation5 Administrators Guide Testnodes Np numCnt num MapParaStation5 Administrators Guide Testpse Testpse -npnumParaStation5 Administrators Guide P4stat SockNet ?,--helpParaStation5 Administrators Guide Add P4tcpDelete ParaStation5 Administrators Guide Psaccounter Pattern DescriptionDumpcore Coredir=dir?, --help Var/account/yyyymmdd Accounting files, one per dayPsaccview Lj,--ljobs Lu,--ltotuserLg,--ltotgroup Ls,--ltotsumCpuweight AqtimeCputime EndInitialization file Mlisten ParaStation5 Administrators Guide Appendix A. Quick Installation Guide # chkconfig -a /etc/init.d/parastation # /opt/parastation/bin/psiadmin psiadmin addTesting Appendix B. ParaStation license Page Page Page Building and installing ParaStation5 packages # psiadmin -sAppendix C. Upgrading ParaStation4 to ParaStation5 Changes to the runtime environmentPage Glossary ARPSee ParaStation Logger To share a common address space within a node ParaStation5 Administrators Guide