PAR Technologies V5 manual RdpMaxRetrans number, StatusBroadcasts number, RdpTimeout ms

Page 48

rdpMaxRetrans number

Set the maximum number of retransmissions within the RDP facility. If more than this number of retransmission would have been necessary to deliver the packet to the remote destination, this connection is declared to be down.

See also psiadmin(1).

statusBroadcasts number

Set the maximum number of status broadcasts per round. This is used to limit the number of status- broadcasts per status-iteration. Too many broadcast might lead to running out of message-buffers within RDP on huge clusters.

If more than this number of broadcasts are triggered during one status-iteration, all future broadcasts will be ignored. The corresponding counter is reset upon start of the next status iteration.

A value of 0 will completly suppress sending of status-broadcasts. In this case information on dead nodes will be propagated by sending ACTIVENODES messages upon receive of too many wrong LOAD messages, only.

Only relevant, if MCast is *not* used.

See also psiadmin(1).

rdpTimeout ms

The timeout of the actual timer registered by RDP in milli-seconds. Each time the corresponding timer is elapsed, handleTimeoutRDP() is called handling all resend activities necessary. This parameter steers the actual load introduced by RDP. Within the daemon, there is a lower limit for all timeout-timers of 100 msec. Thus, the minimal value here is 100, too.

deadLimit number

Dead-limit of the RDP status module. After this number of consecutively missing RDP-pings the master declares the node to be dead.

Only relevant, if MCast is *not* used.

statusTimeout ms

Timeout of the RDP status module. After this number of milli-seconds a RDP-ping is sent to the master daemon. Additionally, the master daemon checks for received ping-messages. Within the daemon, there is a lower limit for all timeout-timers of 100 msec. Thus, the minimal value here is 100, too.

Only relevant, if MCast is *not* used.

rdpClosedTimeout ms

The closed timeout within the RDP facility in milli-seconds. If a RDP-connection is closed, during this timeout all messages from the corresponding partner are ignored. Thus, reconnection is avoided during this period. This helps handling packets still on the wire on connection close.

rdpResendTimeout ms

The resend timeout within the RDP facility in milli-seconds. If a pending message is available and not yet acknowledged, this is the timeout after which the message is retransmitted to the remote host.

rdpMaxACKPend number

The maximum number of pending ACKs within the RDP facility. If this number of packets is received from a remote node consecutively without any retransmission, an explicit ACK is sent. Otherwise the

44

ParaStation5 Administrator's Guide

Image 48
Contents Administrators Guide ParaStation5 Administrators Guide Info@par-tec.comTable of Contents Problem different groups of nodes are seen as up or down Introduction History of ParaStationAbout this document Technical overview Runtime daemonLibraries Kernel modulesLicense Installation PrerequisitesHardware Directory structure SoftwareKernel version Installation via RPM packages Getting the ParaStation5 RPM packagesMan Mpi2, mpi2-intel, mpi2-pgi, mpi2-pscInstalling the RPMs Compiling the ParaStation5 packages from sourceFile Version Installing the documentation Etc/init.d/xinetd reloadParaStation entries Installing MPI Further steps# rpm -Uv psdoc-5.0.0-1.noarch.rpm # rpm -Uv psmpi2.5.0.0-1.i586.rpmUninstalling ParaStation5 ParaStation5 Administrators Guide Configuration Configuration of the ParaStation systemCopy template Define Number of nodesEnable optimized network drivers # /opt/parastation/bin/testconfigHostname id HWType runJob starter accounter Testing the installation # /opt/parastation/bin/psiadmin -s -c list # /opt/parastation/bin/testnodes -np nodesInsight ParaStation5 ParaStation5 pscom communication library# echo 10 /proc/sys/ps4/state/ResendTimeout Directory /proc/sys/ps4/state# cat /proc/sys/ps4/state/connections Controlling process placement Directory /proc/sys/ps4/localUsing the ParaStation5 queuing facility Using non-ParaStationapplicationsExporting environment variables for a task Controlling ParaStation5 communication paths Export LDPRELOAD=/opt/parastation/lib64/libp4tcp.soAuthentication within ParaStation5 Pspshm or PspsharedmemPSPP4S or PSPP4SOCK Export PSPLIB=/opt/parastation/lib64/libpscomopenib.soHomogeneous user ID space Single system viewParallel shell tool Nodes and CPUsIntegration with AFS Integrating external queuing systemsTok2env PSIRARGPRE0=/some/path/env2tokMulticasts Using ParaStation accounting Copying files in parallel# UseMCast Route add -net 224.0.0.0 netmask 240.0.0.0 dev ethXUsing ParaStation process pinning Using memory bindingChanging the default ports for psid8 Spawning processes belonging to all groupsPort Troubleshooting Problem psiadmin returns errorProblem node shown as down Problem cannot start parallel task Problem bad performanceProblem different groups of nodes are seen as up or down Problem cannot start process on frontendProblem pssh fails Problem psid does not startup, reports port in useProblem processes cannot access files on remote nodes Reference Pages ParaStation5 Administrators Guide InstallDir inst-dir , InstallationDir inst-dir Parastation.confDescription ParametersSetupscript StartscriptStopscript StatusscriptP4sock OpenibMvapi ElanAccounter NrOfNodes numNode node17 16 HWType ethernet p4sock starter yes runJobs no $GENERATE 1-96 node$0,2 $0SelectTime time DeadInterval numLogLevel num MCastGroup group-numCore size CPUTime timeDataSize size MemLock sizeProc Processes maxprocs CPUmap mapRdpTimeout ms StatusTimeout msRdpClosedTimeout ms RdpResendTimeout msErrors See alsoParaStation5 Administrators Guide Psiadmin SynopsisOptions Standard Error Standard InputStandard Output Extended descriptionExit AllAllproc cnt count Count hw hw DownHardware LoadRdp Summary max maxQuit Accounters nodes User nodesGroup nodes Maxproc nodesMaster nodes FreeOnSuspend nodesHandleOldBins nodes NodesSort nodesCpumap nodes Rlrss nodesRdpTimeout nodes StatusTimeout nodesRdpClosedTimeout nodes RdpResendTimeout nodesResolve nodes Restart nodesSelecttime time nodes Psiddebug mask nodesPattern Name Description HandleOldBins 0 1 nodes Rdpmaxretrans val nodes RdpTimeout ms nodes StatusTimeout ms nodesRdpClosedTimeout ms nodes RdpResendTimeout ms nodesFiles QuietNormal VerbosePsid Configfile=file Debug=levelLogfile=file Testconfig FilenameNum ? , --usage Show a help messageParaStation5 Administrators Guide Testnodes Np numCnt num MapParaStation5 Administrators Guide Testpse Testpse -npnumParaStation5 Administrators Guide P4stat SockNet ?,--helpParaStation5 Administrators Guide P4tcp AddDelete ParaStation5 Administrators Guide Psaccounter Pattern DescriptionDumpcore Coredir=dir?, --help Var/account/yyyymmdd Accounting files, one per dayPsaccview Lj,--ljobs Lu,--ltotuserLg,--ltotgroup Ls,--ltotsumCpuweight AqtimeCputime EndInitialization file Mlisten ParaStation5 Administrators Guide Appendix A. Quick Installation Guide # /opt/parastation/bin/psiadmin psiadmin add # chkconfig -a /etc/init.d/parastationTesting Appendix B. ParaStation license Page Page Page Building and installing ParaStation5 packages # psiadmin -sAppendix C. Upgrading ParaStation4 to ParaStation5 Changes to the runtime environmentPage Glossary ARPSee ParaStation Logger To share a common address space within a node ParaStation5 Administrators Guide