PAR Technologies V5 manual Resolve nodes, Restart nodes

Page 60

hwstart [hw { hw all } ] [nodes]

Start the declared hardware on the selected nodes.

Starting a specific hardware will be tried on the selected nodes regardless, if this hardware is specified for this nodes within the parastation.conf configuration file or not. On the other hand, if hw all is specified or the hw option is missing at all, only the hardware types specified within the configuration file are started.

Starting or stopping a specific communication hardware only concerns the ParaStation part of hardware handling. I.e. stopping ethernet hardware should not touch the normal IP traffic running over this specific device.

hwstop [hw { hw all } ] [nodes]

Stop the declared hardware on the selected nodes.

If hw all is specified or the hw option is missing at all, all running hardware for this node is stopped.

Starting or stopping a specific communication hardware only concerns the ParaStation part of hardware handling. I.e. stopping ethernet hardware should not touch the normal IP traffic running over this specific device.

resolve [nodes]

Resolves a list of IDs to node names. Nodes selects one or more ranges of nodes. Nodes is either of the form s1[-e1]{,si[-ei]}*, where the s and e are positiv numbers representing ParaStation IDs, or 'all'. Each comma-separated part ofnodes denotes a range of nodes. If a range's '-e' part is missing, it represents a single node. In principle nodes might contain an unlimited number of ranges. If nodes

value is 'all', all nodes of the ParaStation cluster are selected. nodesIf is empty, the node range preselected via the 'range' command is used. The default preselected node range contains all nodes of the ParaStation cluster.

As an extension nodes might also be a hostname that can be resolved into a valid ParaStation ID.

reset [hw] [nodes]

Reset the ParaStation daemon on all selected node(s). As a consequence all processes using the selected node(s) are killed!

If the option hw is given, additionally the communication hardware is brought into a known state. Executing reset hw is the same as using restart.

restart [nodes]

Restart the ParaStation system on all selected node(s). This includes re-initialization of the communication hardware. On the selected node(s) the ParaStation daemon processes are forced to reinitialize the ParaStation cluster. As a consequence all processes using the selected node(s)s are killed!

This is the same as using reset hw.

set { adminuser [ + - ] { name any } admingroup [ + - ] { name any } user [ + - ] { name any } group [ + - ] { name any } maxproc { num any } selecttime time psiddebug mask rdpdebug mask mcastdebug mask freeOnSuspend { 0 1 } handleOldBins { 0 1 } nodesSort { PROC LOAD_1 LOAD_5 LOAD_15 PROC+LOAD NONE } overbook { 0 1 } starter { 0 1 } runjobs { 0 1 } rdpmaxretrans val exclusive bool pinprocs bool bindmem bool cpumap map supplementaryGroups bool statusBroadcasts num rdpTimeout ms deadLimit num statusTimeout ms rdpClosedTimeout ms rdpResendTimeout ms rdpMaxACKPend num } [nodes]

Modify various parameters of the ParaStation system.

56

ParaStation5 Administrator's Guide

Image 60
Contents Administrators Guide ParaStation5 Administrators Guide Info@par-tec.comTable of Contents Problem different groups of nodes are seen as up or down Introduction History of ParaStationAbout this document Technical overview Runtime daemonLibraries Kernel modulesLicense Installation PrerequisitesHardware Directory structure SoftwareKernel version Installation via RPM packages Getting the ParaStation5 RPM packagesMan Mpi2, mpi2-intel, mpi2-pgi, mpi2-pscInstalling the RPMs Compiling the ParaStation5 packages from sourceFile Version Installing the documentation Etc/init.d/xinetd reloadParaStation entries Installing MPI Further steps# rpm -Uv psdoc-5.0.0-1.noarch.rpm # rpm -Uv psmpi2.5.0.0-1.i586.rpmUninstalling ParaStation5 ParaStation5 Administrators Guide Configuration Configuration of the ParaStation systemCopy template Define Number of nodesEnable optimized network drivers # /opt/parastation/bin/testconfigHostname id HWType runJob starter accounter Testing the installation # /opt/parastation/bin/psiadmin -s -c list # /opt/parastation/bin/testnodes -np nodesInsight ParaStation5 ParaStation5 pscom communication library# echo 10 /proc/sys/ps4/state/ResendTimeout Directory /proc/sys/ps4/state# cat /proc/sys/ps4/state/connections Controlling process placement Directory /proc/sys/ps4/localUsing the ParaStation5 queuing facility Using non-ParaStationapplicationsExporting environment variables for a task Controlling ParaStation5 communication paths Export LDPRELOAD=/opt/parastation/lib64/libp4tcp.soAuthentication within ParaStation5 Pspshm or PspsharedmemPSPP4S or PSPP4SOCK Export PSPLIB=/opt/parastation/lib64/libpscomopenib.soHomogeneous user ID space Single system viewParallel shell tool Nodes and CPUsIntegration with AFS Integrating external queuing systemsTok2env PSIRARGPRE0=/some/path/env2tokMulticasts Using ParaStation accounting Copying files in parallel# UseMCast Route add -net 224.0.0.0 netmask 240.0.0.0 dev ethXUsing ParaStation process pinning Using memory bindingChanging the default ports for psid8 Spawning processes belonging to all groupsPort Troubleshooting Problem psiadmin returns errorProblem node shown as down Problem cannot start parallel task Problem bad performanceProblem different groups of nodes are seen as up or down Problem cannot start process on frontendProblem pssh fails Problem psid does not startup, reports port in useProblem processes cannot access files on remote nodes Reference Pages ParaStation5 Administrators Guide InstallDir inst-dir , InstallationDir inst-dir Parastation.confDescription ParametersSetupscript StartscriptStopscript StatusscriptP4sock OpenibMvapi ElanAccounter NrOfNodes numNode node17 16 HWType ethernet p4sock starter yes runJobs no $GENERATE 1-96 node$0,2 $0SelectTime time DeadInterval numLogLevel num MCastGroup group-numCore size CPUTime timeDataSize size MemLock sizeProc Processes maxprocs CPUmap mapRdpTimeout ms StatusTimeout msRdpClosedTimeout ms RdpResendTimeout msErrors See alsoParaStation5 Administrators Guide Psiadmin SynopsisOptions Standard Error Standard InputStandard Output Extended descriptionExit AllAllproc cnt count Count hw hw DownHardware LoadRdp Summary max maxQuit Accounters nodes User nodesGroup nodes Maxproc nodesMaster nodes FreeOnSuspend nodesHandleOldBins nodes NodesSort nodesCpumap nodes Rlrss nodesRdpTimeout nodes StatusTimeout nodesRdpClosedTimeout nodes RdpResendTimeout nodesResolve nodes Restart nodesSelecttime time nodes Psiddebug mask nodesPattern Name Description HandleOldBins 0 1 nodes Rdpmaxretrans val nodes RdpTimeout ms nodes StatusTimeout ms nodesRdpClosedTimeout ms nodes RdpResendTimeout ms nodesFiles QuietNormal VerbosePsid Configfile=file Debug=levelLogfile=file Testconfig FilenameNum ? , --usage Show a help messageParaStation5 Administrators Guide Testnodes Np numCnt num MapParaStation5 Administrators Guide Testpse Testpse -npnumParaStation5 Administrators Guide P4stat SockNet ?,--helpParaStation5 Administrators Guide P4tcp AddDelete ParaStation5 Administrators Guide Psaccounter Pattern DescriptionDumpcore Coredir=dir?, --help Var/account/yyyymmdd Accounting files, one per dayPsaccview Lj,--ljobs Lu,--ltotuserLg,--ltotgroup Ls,--ltotsumCpuweight AqtimeCputime EndInitialization file Mlisten ParaStation5 Administrators Guide Appendix A. Quick Installation Guide # /opt/parastation/bin/psiadmin psiadmin add # chkconfig -a /etc/init.d/parastationTesting Appendix B. ParaStation license Page Page Page Building and installing ParaStation5 packages # psiadmin -sAppendix C. Upgrading ParaStation4 to ParaStation5 Changes to the runtime environmentPage Glossary ARPSee ParaStation Logger To share a common address space within a node ParaStation5 Administrators Guide