PAR Technologies V5 manual See ParaStation Logger

Page 96

 

 

to store it to a given address. The rest of the jobs is done by this controller

 

 

without producing further load to the CPU.

 

 

Obviously this concept helps to disburden the CPU from work which is not

 

 

its first task and thus gives more power to solve the actual application.

Forwarder

 

See ParaStation Forwarder.

Logger

 

See ParaStation Logger.

Master Node

 

The evaluation of temporary node lists while spawning new tasks is done

 

 

only by one particular psid(8) within the cluster. The node running this

 

 

daemon is called master node.

 

 

The master node is dynamically selected within the cluster and may

 

 

change, if the current master node is no longer available. Election is based

 

 

on the node IDs, refer to parastation.conf(5).

Network Interface Card

The physical device which connects a computer to a network. Examples

 

 

are Ethernet cards (which are nowadays often found to be on board) or

 

 

Myrinet cards.

NIC

 

See Network Interface Card.

Non-Uniform

memory

Non-Uniform memory access describes the fact that for some

access (NUMA)

 

multiprocessor design the access time to the memory depends on the

 

 

location of this memory. Within this designs, the memory is typically closely

 

 

attached to a CPU. CPUs have access to memory attached to other CPUs

 

 

using additional logic inducing additional latency. Therefore the access

 

 

time for different memory addresses may vary.

Parallel Task

 

A bunch of processes distributed within the cluster forming an instance

 

 

of a parallel application. E.g. a MPI program running on several nodes

 

 

of a cluster can only act as a whole but consists of individual processes

 

 

on each node. ParaStation knows about their relationship and can handle

 

 

them as a distributed parallel task running on the cluster.

 

 

Sometimes also referred as job.

ParaStation Logger

 

The counterpart to the ParaStation Forwarder. This process receives all

 

 

output collected by the forwarder processes and sends it to the final

 

 

destination, stdout or stderr. Furthermore input to the ParaStation task is

 

 

forwarded to a specific process.

 

 

The first process of the task started usually converts to the logger

 

 

processes after spawning all the other processes of the parallel task.

ParaStation Forwarder

Collects output written by ParaStation controlled processes to stdout or

 

 

stderr and sends it to the ParaStation Logger.

 

 

Furthermore the forwarder controls the process and sends information

 

 

about its exit status to the local daemon.

PMI

 

Process Manager Interface: protocol to standardize startup of tasks of a

 

 

parallel job. Implemented in mpd and ParaStation5 psid.

Process

 

The atomic part of a Parallel Task. A process is at first a standard Unix

 

 

process. Since ParaStation knows about its membership in a parallel task,

 

 

it can be handled in a peculiar way if an event takes place on some other

 

 

node (e.g. another process of the task dies unexpectedly, a signal is send

 

 

to the task, etc.).

92

ParaStation5 Administrator's Guide

Image 96
Contents Administrators Guide ParaStation5 Administrators Guide Info@par-tec.comTable of Contents Problem different groups of nodes are seen as up or down Introduction History of ParaStationAbout this document Technical overview Runtime daemonLibraries Kernel modulesLicense Installation PrerequisitesHardware Directory structure SoftwareKernel version Installation via RPM packages Getting the ParaStation5 RPM packagesMan Mpi2, mpi2-intel, mpi2-pgi, mpi2-pscInstalling the RPMs Compiling the ParaStation5 packages from sourceFile Version Installing the documentation Etc/init.d/xinetd reloadParaStation entries Installing MPI Further steps# rpm -Uv psdoc-5.0.0-1.noarch.rpm # rpm -Uv psmpi2.5.0.0-1.i586.rpmUninstalling ParaStation5 ParaStation5 Administrators Guide Configuration Configuration of the ParaStation systemCopy template Define Number of nodesEnable optimized network drivers # /opt/parastation/bin/testconfigHostname id HWType runJob starter accounter Testing the installation # /opt/parastation/bin/psiadmin -s -c list # /opt/parastation/bin/testnodes -np nodesInsight ParaStation5 ParaStation5 pscom communication library# echo 10 /proc/sys/ps4/state/ResendTimeout Directory /proc/sys/ps4/state# cat /proc/sys/ps4/state/connections Controlling process placement Directory /proc/sys/ps4/localUsing the ParaStation5 queuing facility Using non-ParaStationapplicationsExporting environment variables for a task Controlling ParaStation5 communication paths Export LDPRELOAD=/opt/parastation/lib64/libp4tcp.soAuthentication within ParaStation5 Pspshm or PspsharedmemPSPP4S or PSPP4SOCK Export PSPLIB=/opt/parastation/lib64/libpscomopenib.soHomogeneous user ID space Single system viewParallel shell tool Nodes and CPUsIntegration with AFS Integrating external queuing systemsTok2env PSIRARGPRE0=/some/path/env2tokMulticasts Using ParaStation accounting Copying files in parallel# UseMCast Route add -net 224.0.0.0 netmask 240.0.0.0 dev ethXUsing ParaStation process pinning Using memory bindingChanging the default ports for psid8 Spawning processes belonging to all groupsPort Troubleshooting Problem psiadmin returns errorProblem node shown as down Problem cannot start parallel task Problem bad performanceProblem different groups of nodes are seen as up or down Problem cannot start process on frontendProblem pssh fails Problem psid does not startup, reports port in useProblem processes cannot access files on remote nodes Reference Pages ParaStation5 Administrators Guide InstallDir inst-dir , InstallationDir inst-dir Parastation.confDescription ParametersSetupscript StartscriptStopscript StatusscriptP4sock OpenibMvapi ElanAccounter NrOfNodes numNode node17 16 HWType ethernet p4sock starter yes runJobs no $GENERATE 1-96 node$0,2 $0SelectTime time DeadInterval numLogLevel num MCastGroup group-numCore size CPUTime timeDataSize size MemLock sizeProc Processes maxprocs CPUmap mapRdpTimeout ms StatusTimeout msRdpClosedTimeout ms RdpResendTimeout msErrors See alsoParaStation5 Administrators Guide Psiadmin SynopsisOptions Standard Error Standard InputStandard Output Extended descriptionExit AllAllproc cnt count Count hw hw DownHardware LoadRdp Summary max maxQuit Accounters nodes User nodesGroup nodes Maxproc nodesMaster nodes FreeOnSuspend nodesHandleOldBins nodes NodesSort nodesCpumap nodes Rlrss nodesRdpTimeout nodes StatusTimeout nodesRdpClosedTimeout nodes RdpResendTimeout nodesResolve nodes Restart nodesSelecttime time nodes Psiddebug mask nodesPattern Name Description HandleOldBins 0 1 nodes Rdpmaxretrans val nodes RdpTimeout ms nodes StatusTimeout ms nodesRdpClosedTimeout ms nodes RdpResendTimeout ms nodesFiles QuietNormal VerbosePsid Configfile=file Debug=levelLogfile=file Testconfig FilenameNum ? , --usage Show a help messageParaStation5 Administrators Guide Testnodes Np numCnt num MapParaStation5 Administrators Guide Testpse Testpse -npnumParaStation5 Administrators Guide P4stat SockNet ?,--helpParaStation5 Administrators Guide P4tcp AddDelete ParaStation5 Administrators Guide Psaccounter Pattern DescriptionDumpcore Coredir=dir?, --help Var/account/yyyymmdd Accounting files, one per dayPsaccview Lj,--ljobs Lu,--ltotuserLg,--ltotgroup Ls,--ltotsumCpuweight AqtimeCputime EndInitialization file Mlisten ParaStation5 Administrators Guide Appendix A. Quick Installation Guide # /opt/parastation/bin/psiadmin psiadmin add # chkconfig -a /etc/init.d/parastationTesting Appendix B. ParaStation license Page Page Page Building and installing ParaStation5 packages # psiadmin -sAppendix C. Upgrading ParaStation4 to ParaStation5 Changes to the runtime environmentPage Glossary ARPSee ParaStation Logger To share a common address space within a node ParaStation5 Administrators Guide