PAR Technologies V5 manual HandleOldBins 0 1 nodes

Page 63

Pattern

Name

Description

 

 

 

0x0001

RDP_LOG_CONN

Uncritical errors on connection loss

 

 

 

0x0002

RDP_LOG_INIT

Info from initialization (IP, FE, NFTS etc.)

 

 

 

0x0004

RDP_LOG_INTR

Interrupted syscalls

 

 

 

0x0008

RDP_LOG_DROP

Message dropping and resequencing

 

 

 

0x0010

RDP_LOG_CNTR

Control messages and state changes

 

 

 

0x0020

RDP_LOG_EXTD

Extended reliable error messages (on linux)

 

 

 

0x0040

RDP_LOG_COMM

Sending and receiving of data (huge! amount)

 

 

 

0x0080

RDP_LOG_ACKS

Resending and acknowledging (huge! amount)

 

 

 

Table 3. RDP debug flags mcastdebug mask [nodes]

Set the debugging mask of the MCast protocol within the ParaStation daemon psid(8) to mask on the selected node(s).

Unless you want to debug the MCast protocol (i.e. the protocol used by the daemons to ping alive-messages to each other) this parameter is not really useful.

Mask is the bit-wise disjunction of the following bit patterns:

Pattern

Name

Description

 

 

 

0x0001

MCAST_LOG_INIT

Info from initialization (IP etc.)

 

 

 

0x0002

MCAST_LOG_INTR

Interrupted syscalls

 

 

 

0x0004

MCAST_LOG_CONN

T_CLOSE and new pings

 

 

 

0x0008

MCAST_LOG_5MIS

Every 5th missing ping

 

 

 

0x0010

MCAST_LOG_MSNG

Every missing ping

 

 

 

0x0020

MCAST_LOG_MSNG

Every received ping

 

 

 

0x0040

MCAST_LOG_SENT

Every sent ping

 

 

 

Table 4. Multicast debug flags freeOnSuspend [ 0 1 ] [nodes]

Switch the freeOnSuspend flag on or off on the selected nodes.

The freeOnSuspend flag steers the behavior of the resource management concerning suspended jobs. Basically there are two possible approaches: Either the resources used by the suspended job are freed for other jobs (this is done, if the flag is set to 1) or they are kept occupied in order to preserve them exclusively for the time the job continues to run (this is the behavior as long as the flag has the value 0).

Since the master node does all the resource management within the cluster, only the value on this node actually steers the behavior.

This flag can be set persistently via the freeOnSuspend option within the ParaStation configuration file parastation.conf(5).

handleOldBins [ 0 1 ] [nodes]

Switch the compatibility flag for applications linked against version 4.0.x of ParaStation on or off on the selected nodes.

ParaStation5 Administrator's Guide

59

Image 63
Contents Administrators Guide Info@par-tec.com ParaStation5 Administrators GuideTable of Contents Problem different groups of nodes are seen as up or down History of ParaStation IntroductionAbout this document Kernel modules Technical overviewRuntime daemon LibrariesLicense Installation PrerequisitesHardware Directory structure SoftwareKernel version Mpi2, mpi2-intel, mpi2-pgi, mpi2-psc Installation via RPM packagesGetting the ParaStation5 RPM packages ManInstalling the RPMs Compiling the ParaStation5 packages from sourceFile Version Installing the documentation Etc/init.d/xinetd reloadParaStation entries # rpm -Uv psmpi2.5.0.0-1.i586.rpm Installing MPIFurther steps # rpm -Uv psdoc-5.0.0-1.noarch.rpmUninstalling ParaStation5 ParaStation5 Administrators Guide Define Number of nodes ConfigurationConfiguration of the ParaStation system Copy templateEnable optimized network drivers # /opt/parastation/bin/testconfigHostname id HWType runJob starter accounter Testing the installation # /opt/parastation/bin/testnodes -np nodes # /opt/parastation/bin/psiadmin -s -c listParaStation5 pscom communication library Insight ParaStation5# echo 10 /proc/sys/ps4/state/ResendTimeout Directory /proc/sys/ps4/state# cat /proc/sys/ps4/state/connections Directory /proc/sys/ps4/local Controlling process placementUsing the ParaStation5 queuing facility Using non-ParaStationapplicationsExporting environment variables for a task Export LDPRELOAD=/opt/parastation/lib64/libp4tcp.so Controlling ParaStation5 communication pathsExport PSPLIB=/opt/parastation/lib64/libpscomopenib.so Authentication within ParaStation5Pspshm or Pspsharedmem PSPP4S or PSPP4SOCKNodes and CPUs Homogeneous user ID spaceSingle system view Parallel shell toolPSIRARGPRE0=/some/path/env2tok Integration with AFSIntegrating external queuing systems Tok2envMulticasts Route add -net 224.0.0.0 netmask 240.0.0.0 dev ethX Using ParaStation accountingCopying files in parallel # UseMCastSpawning processes belonging to all groups Using ParaStation process pinningUsing memory binding Changing the default ports for psid8Port Troubleshooting Problem psiadmin returns errorProblem node shown as down Problem cannot start process on frontend Problem cannot start parallel taskProblem bad performance Problem different groups of nodes are seen as up or downProblem psid does not startup, reports port in use Problem pssh failsProblem processes cannot access files on remote nodes Reference Pages ParaStation5 Administrators Guide Parameters InstallDir inst-dir , InstallationDir inst-dirParastation.conf DescriptionStatusscript SetupscriptStartscript StopscriptElan P4sockOpenib MvapiNrOfNodes num Accounter$GENERATE 1-96 node$0,2 $0 Node node17 16 HWType ethernet p4sock starter yes runJobs noMCastGroup group-num SelectTime timeDeadInterval num LogLevel numMemLock size Core sizeCPUTime time DataSize sizeProc CPUmap map Processes maxprocsRdpResendTimeout ms RdpTimeout msStatusTimeout ms RdpClosedTimeout msSee also ErrorsParaStation5 Administrators Guide Psiadmin SynopsisOptions Extended description Standard ErrorStandard Input Standard OutputExit AllAllproc cnt count Load Count hw hwDown HardwareRdp Summary max maxQuit Maxproc nodes Accounters nodesUser nodes Group nodesNodesSort nodes Master nodesFreeOnSuspend nodes HandleOldBins nodesRlrss nodes Cpumap nodesRdpResendTimeout nodes RdpTimeout nodesStatusTimeout nodes RdpClosedTimeout nodesRestart nodes Resolve nodesPsiddebug mask nodes Selecttime time nodesPattern Name Description HandleOldBins 0 1 nodes Rdpmaxretrans val nodes RdpResendTimeout ms nodes RdpTimeout ms nodesStatusTimeout ms nodes RdpClosedTimeout ms nodesVerbose FilesQuiet NormalPsid Configfile=file Debug=levelLogfile=file ? , --usage Show a help message TestconfigFilename NumParaStation5 Administrators Guide Map TestnodesNp num Cnt numParaStation5 Administrators Guide Testpse -npnum TestpseParaStation5 Administrators Guide ?,--help P4statSock NetParaStation5 Administrators Guide P4tcp AddDelete ParaStation5 Administrators Guide Pattern Description PsaccounterVar/account/yyyymmdd Accounting files, one per day DumpcoreCoredir=dir ?, --helpPsaccview Ls,--ltotsum Lj,--ljobsLu,--ltotuser Lg,--ltotgroupEnd CpuweightAqtime CputimeInitialization file Mlisten ParaStation5 Administrators Guide Appendix A. Quick Installation Guide # /opt/parastation/bin/psiadmin psiadmin add # chkconfig -a /etc/init.d/parastationTesting Appendix B. ParaStation license Page Page Page Changes to the runtime environment Building and installing ParaStation5 packages# psiadmin -s Appendix C. Upgrading ParaStation4 to ParaStation5Page ARP GlossarySee ParaStation Logger To share a common address space within a node ParaStation5 Administrators Guide