PAR Technologies V5 Troubleshooting, Problem psiadmin returns error, Problem node shown as down

Page 33

Chapter 6. Troubleshooting

This chapter provides some hints to common problems seen while installing or using ParaStation5. Of course, more help will be provided by <support@par-tec.com>.

6.1. Problem: psiadmin returns error

When starting up the ParaStation admin command psiadmin, an error is reported:

# psiadmin

PSC: PSC_startDaemon: connect() fails: Connection refused

Reason: the local ParaStation daemon could not be contacted. Verify that the psid(8) daemon is up and running. Check if the daemon is known to the xinetd:

# netstat -ant grep 888

tcp

0

0 *:888

*:*

LISTEN

If no "listening" socket is reported, check that the ParaStation daemon is configured within the xinet(8) configuration. Check the file /etc/xinet.d/psidstarter.

If this is ok, reload xinetd:

#kill -HUPpid of xinetd

If everything seems to be ok up to now, check for recent entries within the log file/var/log/ messages. Be aware, the log facility can be modified using the LogDestination within the config file parastation.conf. Look for lines like

Mar 24 17:19:12 pan psid[7361]: Starting ParaStation DAEMON

Mar 24 17:19:12 pan psid[7361]: Protocol Version 329

Mar 24 17:19:12 pan psid[7361]: (c) Cluster Competence \ Center GmbH

These lines indicate a normal startup of the psid. Other messages may indicate problems found by the psid, e.g. errors within the configuration file.

If the error message

Mar 24 17:19:12 pan psid[7361]: too many nodes.

is found, verify that the number of announced nodes is equal (or greater) than the number of defined nodes. See NrOfNodes within the config file parastation.conf.

6.2. Problem: node shown as "down"

Maybe the node is currently not available (shutdown or crashed), or the network connection to this node is not available.

Try to ping this node. If ok, try to startup ParaStation. From an other node, "add" this node:

psiadmin> add nodeid

ParaStation5 Administrator's Guide

29

Image 33
Contents Administrators Guide Info@par-tec.com ParaStation5 Administrators GuideTable of Contents Problem different groups of nodes are seen as up or down History of ParaStation IntroductionAbout this document Runtime daemon Technical overviewLibraries Kernel modulesLicense Installation PrerequisitesHardware Directory structure SoftwareKernel version Getting the ParaStation5 RPM packages Installation via RPM packagesMan Mpi2, mpi2-intel, mpi2-pgi, mpi2-pscInstalling the RPMs Compiling the ParaStation5 packages from sourceFile Version Installing the documentation Etc/init.d/xinetd reloadParaStation entries Further steps Installing MPI# rpm -Uv psdoc-5.0.0-1.noarch.rpm # rpm -Uv psmpi2.5.0.0-1.i586.rpmUninstalling ParaStation5 ParaStation5 Administrators Guide Configuration of the ParaStation system ConfigurationCopy template Define Number of nodesEnable optimized network drivers # /opt/parastation/bin/testconfigHostname id HWType runJob starter accounter Testing the installation # /opt/parastation/bin/testnodes -np nodes # /opt/parastation/bin/psiadmin -s -c listParaStation5 pscom communication library Insight ParaStation5# echo 10 /proc/sys/ps4/state/ResendTimeout Directory /proc/sys/ps4/state# cat /proc/sys/ps4/state/connections Directory /proc/sys/ps4/local Controlling process placementUsing the ParaStation5 queuing facility Using non-ParaStationapplicationsExporting environment variables for a task Export LDPRELOAD=/opt/parastation/lib64/libp4tcp.so Controlling ParaStation5 communication pathsPspshm or Pspsharedmem Authentication within ParaStation5PSPP4S or PSPP4SOCK Export PSPLIB=/opt/parastation/lib64/libpscomopenib.soSingle system view Homogeneous user ID spaceParallel shell tool Nodes and CPUsIntegrating external queuing systems Integration with AFSTok2env PSIRARGPRE0=/some/path/env2tokMulticasts Copying files in parallel Using ParaStation accounting# UseMCast Route add -net 224.0.0.0 netmask 240.0.0.0 dev ethXUsing memory binding Using ParaStation process pinningChanging the default ports for psid8 Spawning processes belonging to all groupsPort Troubleshooting Problem psiadmin returns errorProblem node shown as down Problem bad performance Problem cannot start parallel taskProblem different groups of nodes are seen as up or down Problem cannot start process on frontendProblem psid does not startup, reports port in use Problem pssh failsProblem processes cannot access files on remote nodes Reference Pages ParaStation5 Administrators Guide Parastation.conf InstallDir inst-dir , InstallationDir inst-dirDescription ParametersStartscript SetupscriptStopscript StatusscriptOpenib P4sockMvapi ElanNrOfNodes num Accounter$GENERATE 1-96 node$0,2 $0 Node node17 16 HWType ethernet p4sock starter yes runJobs noDeadInterval num SelectTime timeLogLevel num MCastGroup group-numCPUTime time Core sizeDataSize size MemLock sizeProc CPUmap map Processes maxprocsStatusTimeout ms RdpTimeout msRdpClosedTimeout ms RdpResendTimeout msSee also ErrorsParaStation5 Administrators Guide Psiadmin SynopsisOptions Standard Input Standard ErrorStandard Output Extended descriptionExit AllAllproc cnt count Down Count hw hwHardware LoadRdp Summary max maxQuit User nodes Accounters nodesGroup nodes Maxproc nodesFreeOnSuspend nodes Master nodesHandleOldBins nodes NodesSort nodesRlrss nodes Cpumap nodesStatusTimeout nodes RdpTimeout nodesRdpClosedTimeout nodes RdpResendTimeout nodesRestart nodes Resolve nodesPsiddebug mask nodes Selecttime time nodesPattern Name Description HandleOldBins 0 1 nodes Rdpmaxretrans val nodes StatusTimeout ms nodes RdpTimeout ms nodesRdpClosedTimeout ms nodes RdpResendTimeout ms nodesQuiet FilesNormal VerbosePsid Configfile=file Debug=levelLogfile=file Filename TestconfigNum ? , --usage Show a help messageParaStation5 Administrators Guide Np num TestnodesCnt num MapParaStation5 Administrators Guide Testpse -npnum TestpseParaStation5 Administrators Guide Sock P4statNet ?,--helpParaStation5 Administrators Guide P4tcp AddDelete ParaStation5 Administrators Guide Pattern Description PsaccounterCoredir=dir Dumpcore?, --help Var/account/yyyymmdd Accounting files, one per dayPsaccview Lu,--ltotuser Lj,--ljobsLg,--ltotgroup Ls,--ltotsumAqtime CpuweightCputime EndInitialization file Mlisten ParaStation5 Administrators Guide Appendix A. Quick Installation Guide # /opt/parastation/bin/psiadmin psiadmin add # chkconfig -a /etc/init.d/parastationTesting Appendix B. ParaStation license Page Page Page # psiadmin -s Building and installing ParaStation5 packagesAppendix C. Upgrading ParaStation4 to ParaStation5 Changes to the runtime environmentPage ARP GlossarySee ParaStation Logger To share a common address space within a node ParaStation5 Administrators Guide