PAR Technologies V5 manual Homogeneous user ID space, Single system view, Parallel shell tool

Page 27

Homogeneous user ID space

etc/passwd. Usage of common authentication schemes like NIS is not required and therefore limits user management to the frontend nodes.

Authentication of users is restricted to login or frontend nodes and is outside of the scope of ParaStation.

5.10. Homogeneous user ID space

As explained in the previous section, ParaStation uses only user and group IDs for starting up remote processes. Therefore, all processes will have identical user and group IDs on all nodes.

A homogeneous user ID space is stretched across the entire cluster.

5.11. Single system view

The ParaStation administration tool collects and displays information from all or a selected subset of nodes in the cluster. Actions can be initiated on each node and will be automatically and transparently forwarded to the destination node(s), if necessary. From a management perspective, all the nodes are seen as a homogeneous system. Thus, the administrator will have a single system view of the cluster.

5.12. Parallel shell tool

ParaStation provides a parallel shell tool called psh, which allows to run commands on all or selected nodes of the cluster in parallel. The output of the individual commands is presented in a sophisticated manner, showing common parts and differences.

psh may also be used to copy files to all nodes of the cluster in parallel.

This command is not intended to run interactive commands in parallel, but to run a single task in parallel on all or a bunch of nodes and prepare the output to be easily read by the user.

5.13. Nodes and CPUs

Though ParaStation by default tries to use a dedicated CPU per compute process, there is currently no way to bind a process to a particular CPU. Therefore, there is no guarantee, that each process will use its own CPU. But due to the nature of parallel tasks, the operating system scheduler will typically distribute each process to its own CPU.

Care must be taken if the hardware is able to simulate virtual CPUs, e.g. Intel Xeon CPUs using Hyperthreading. The ParaStation daemon detects virtual CPUs and uses all the virtual CPUs found for placing processes. Detecting virtual CPUs requires that the kernel module cpuid is loaded prior to starting the ParaStation daemon. Use

# psiadmin -c "s hw"

Node

CPUs

Available Hardware

04/ 2 ethernet p4sock

14/ 2 ethernet p4sock

to show the number of virtual and physical CPUs per node.

It's possible to spawn more processes than physical or virtual CPUs are available on a node ("overbooking"). See ParaStation5 User's Guide for details.

ParaStation5 Administrator's Guide

23

Image 27
Contents Administrators Guide Info@par-tec.com ParaStation5 Administrators GuideTable of Contents Problem different groups of nodes are seen as up or down History of ParaStation IntroductionAbout this document Kernel modules Technical overviewRuntime daemon LibrariesLicense Installation PrerequisitesHardware Directory structure SoftwareKernel version Mpi2, mpi2-intel, mpi2-pgi, mpi2-psc Installation via RPM packagesGetting the ParaStation5 RPM packages ManInstalling the RPMs Compiling the ParaStation5 packages from sourceFile Version Installing the documentation Etc/init.d/xinetd reloadParaStation entries # rpm -Uv psmpi2.5.0.0-1.i586.rpm Installing MPIFurther steps # rpm -Uv psdoc-5.0.0-1.noarch.rpmUninstalling ParaStation5 ParaStation5 Administrators Guide Define Number of nodes ConfigurationConfiguration of the ParaStation system Copy templateEnable optimized network drivers # /opt/parastation/bin/testconfigHostname id HWType runJob starter accounter Testing the installation # /opt/parastation/bin/testnodes -np nodes # /opt/parastation/bin/psiadmin -s -c listParaStation5 pscom communication library Insight ParaStation5# echo 10 /proc/sys/ps4/state/ResendTimeout Directory /proc/sys/ps4/state# cat /proc/sys/ps4/state/connections Directory /proc/sys/ps4/local Controlling process placementUsing the ParaStation5 queuing facility Using non-ParaStationapplicationsExporting environment variables for a task Export LDPRELOAD=/opt/parastation/lib64/libp4tcp.so Controlling ParaStation5 communication pathsExport PSPLIB=/opt/parastation/lib64/libpscomopenib.so Authentication within ParaStation5Pspshm or Pspsharedmem PSPP4S or PSPP4SOCKNodes and CPUs Homogeneous user ID spaceSingle system view Parallel shell toolPSIRARGPRE0=/some/path/env2tok Integration with AFSIntegrating external queuing systems Tok2envMulticasts Route add -net 224.0.0.0 netmask 240.0.0.0 dev ethX Using ParaStation accountingCopying files in parallel # UseMCastSpawning processes belonging to all groups Using ParaStation process pinningUsing memory binding Changing the default ports for psid8Port Troubleshooting Problem psiadmin returns errorProblem node shown as down Problem cannot start process on frontend Problem cannot start parallel taskProblem bad performance Problem different groups of nodes are seen as up or downProblem psid does not startup, reports port in use Problem pssh failsProblem processes cannot access files on remote nodes Reference Pages ParaStation5 Administrators Guide Parameters InstallDir inst-dir , InstallationDir inst-dirParastation.conf DescriptionStatusscript SetupscriptStartscript StopscriptElan P4sockOpenib MvapiNrOfNodes num Accounter$GENERATE 1-96 node$0,2 $0 Node node17 16 HWType ethernet p4sock starter yes runJobs noMCastGroup group-num SelectTime timeDeadInterval num LogLevel numMemLock size Core sizeCPUTime time DataSize sizeProc CPUmap map Processes maxprocsRdpResendTimeout ms RdpTimeout msStatusTimeout ms RdpClosedTimeout msSee also ErrorsParaStation5 Administrators Guide Psiadmin SynopsisOptions Extended description Standard ErrorStandard Input Standard OutputExit AllAllproc cnt count Load Count hw hwDown HardwareRdp Summary max maxQuit Maxproc nodes Accounters nodesUser nodes Group nodesNodesSort nodes Master nodesFreeOnSuspend nodes HandleOldBins nodesRlrss nodes Cpumap nodesRdpResendTimeout nodes RdpTimeout nodesStatusTimeout nodes RdpClosedTimeout nodesRestart nodes Resolve nodesPsiddebug mask nodes Selecttime time nodesPattern Name Description HandleOldBins 0 1 nodes Rdpmaxretrans val nodes RdpResendTimeout ms nodes RdpTimeout ms nodesStatusTimeout ms nodes RdpClosedTimeout ms nodesVerbose FilesQuiet NormalPsid Configfile=file Debug=levelLogfile=file ? , --usage Show a help message TestconfigFilename NumParaStation5 Administrators Guide Map TestnodesNp num Cnt numParaStation5 Administrators Guide Testpse -npnum TestpseParaStation5 Administrators Guide ?,--help P4statSock NetParaStation5 Administrators Guide P4tcp AddDelete ParaStation5 Administrators Guide Pattern Description PsaccounterVar/account/yyyymmdd Accounting files, one per day DumpcoreCoredir=dir ?, --helpPsaccview Ls,--ltotsum Lj,--ljobsLu,--ltotuser Lg,--ltotgroupEnd CpuweightAqtime CputimeInitialization file Mlisten ParaStation5 Administrators Guide Appendix A. Quick Installation Guide # /opt/parastation/bin/psiadmin psiadmin add # chkconfig -a /etc/init.d/parastationTesting Appendix B. ParaStation license Page Page Page Changes to the runtime environment Building and installing ParaStation5 packages# psiadmin -s Appendix C. Upgrading ParaStation4 to ParaStation5Page ARP GlossarySee ParaStation Logger To share a common address space within a node ParaStation5 Administrators Guide