PAR Technologies V5 manual Copying files in parallel, Using ParaStation accounting, # UseMCast

Page 30

Copying files in parallel

# UseMCast

statement.

If Multicast is enabled, the ParaStation daemons exchange status information using multicast messages. Thus, a Linux kernel supporting multicast on all nodes of the cluster is required. This is usually no problem, since all standard kernels from all common distribution are compiled with multicast support. If a customized kernel is used, multicast support must be enabled within the kernel configuration! In order to learn more about multicast take a look at the Multicast over TCP/IP HOWTO.

In addition, the hardware also has to support multicast packets. Since all modern Ethernet switches support multicast and the nodes of a cluster typically live in a private subnet, this should be not a problem. If the cluster nodes are connected by a gateway, it has to be configured appropriately to allow multicast packets to reach all nodes of the cluster from all nodes.

Using a gateway in order to link parts of a cluster is not a recommended configuration.

On nodes with more than one Ethernet interface, typically frontend or head nodes, or systems where the default route does not point to the private cluster subnet, a proper route for the multicast traffic must be setup. This is done by the command

route add -net 224.0.0.0 netmask 240.0.0.0 dev ethX

where ethX should be replaced by the actual name of the interface connecting to all other nodes. In order to enable this route at system startup, a corresponding entry has to be added to /etc/route.conf or / etc/sysconfig/networks/routes, depending on the type of Linux distribution in use.

5.17. Copying files in parallel

To copy large files to many or all nodes in a cluster at once, pscp is very handy. It overlaps storing data to disk and transfering data on the network, therefore it scales very well with respect to the number of nodes. Arbitrary size of files may be copied, even archives containing large lists of files may be created and unpacked on-the-fly.

Pscp uses the ParaStation pscom library for data transfers, that automatically will use the most effective communication channel available. If required, the communication layer may be controlled using environment variables, refer to ps_environment(7) for details. The client process on each node is spawned using the ParaStation process management.

As pscp uses administrative ParaStation tasks to spawn the client processes, the user must be a member of the adminuser list or the user's group must be a member of theadmingroup list. By default, only root is a member of the adminuser list and therefore allowed to use pscp. Refer to ParaStation5 User's Guide and psiadmin(8) for details.

For more details refer to ParaStation5 User's Guide and pscp(8).

5.18. Using ParaStation accounting

ParaStation may write accounting information about each finished job run on the cluster to /var/ account/yyyymmdd, where yyyymmdd denotes the current accounting file in the form year, month and day.

To enable accouting, the special hardware accounter must be set within the ParaStation configuration file for at least one node. On each configured node, an accounting daemon collecting all information for all jobs within the cluster will store the job information in the accouting file.

26

ParaStation5 Administrator's Guide

Image 30
Contents Administrators Guide ParaStation5 Administrators Guide Info@par-tec.comTable of Contents Problem different groups of nodes are seen as up or down Introduction History of ParaStationAbout this document Libraries Technical overviewRuntime daemon Kernel modulesLicense Installation PrerequisitesHardware Directory structure SoftwareKernel version Man Installation via RPM packagesGetting the ParaStation5 RPM packages Mpi2, mpi2-intel, mpi2-pgi, mpi2-pscInstalling the RPMs Compiling the ParaStation5 packages from sourceFile Version Installing the documentation Etc/init.d/xinetd reloadParaStation entries # rpm -Uv psdoc-5.0.0-1.noarch.rpm Installing MPIFurther steps # rpm -Uv psmpi2.5.0.0-1.i586.rpmUninstalling ParaStation5 ParaStation5 Administrators Guide Copy template ConfigurationConfiguration of the ParaStation system Define Number of nodesEnable optimized network drivers # /opt/parastation/bin/testconfigHostname id HWType runJob starter accounter Testing the installation # /opt/parastation/bin/psiadmin -s -c list # /opt/parastation/bin/testnodes -np nodesInsight ParaStation5 ParaStation5 pscom communication library# echo 10 /proc/sys/ps4/state/ResendTimeout Directory /proc/sys/ps4/state# cat /proc/sys/ps4/state/connections Controlling process placement Directory /proc/sys/ps4/localUsing the ParaStation5 queuing facility Using non-ParaStationapplicationsExporting environment variables for a task Controlling ParaStation5 communication paths Export LDPRELOAD=/opt/parastation/lib64/libp4tcp.soPSPP4S or PSPP4SOCK Authentication within ParaStation5Pspshm or Pspsharedmem Export PSPLIB=/opt/parastation/lib64/libpscomopenib.soParallel shell tool Homogeneous user ID spaceSingle system view Nodes and CPUsTok2env Integration with AFSIntegrating external queuing systems PSIRARGPRE0=/some/path/env2tokMulticasts # UseMCast Using ParaStation accountingCopying files in parallel Route add -net 224.0.0.0 netmask 240.0.0.0 dev ethXChanging the default ports for psid8 Using ParaStation process pinningUsing memory binding Spawning processes belonging to all groupsPort Troubleshooting Problem psiadmin returns errorProblem node shown as down Problem different groups of nodes are seen as up or down Problem cannot start parallel taskProblem bad performance Problem cannot start process on frontendProblem pssh fails Problem psid does not startup, reports port in useProblem processes cannot access files on remote nodes Reference Pages ParaStation5 Administrators Guide Description InstallDir inst-dir , InstallationDir inst-dirParastation.conf ParametersStopscript SetupscriptStartscript StatusscriptMvapi P4sockOpenib ElanAccounter NrOfNodes numNode node17 16 HWType ethernet p4sock starter yes runJobs no $GENERATE 1-96 node$0,2 $0LogLevel num SelectTime timeDeadInterval num MCastGroup group-numDataSize size Core sizeCPUTime time MemLock sizeProc Processes maxprocs CPUmap mapRdpClosedTimeout ms RdpTimeout msStatusTimeout ms RdpResendTimeout msErrors See alsoParaStation5 Administrators Guide Psiadmin SynopsisOptions Standard Output Standard ErrorStandard Input Extended descriptionExit AllAllproc cnt count Hardware Count hw hwDown LoadRdp Summary max maxQuit Group nodes Accounters nodesUser nodes Maxproc nodesHandleOldBins nodes Master nodesFreeOnSuspend nodes NodesSort nodesCpumap nodes Rlrss nodesRdpClosedTimeout nodes RdpTimeout nodesStatusTimeout nodes RdpResendTimeout nodesResolve nodes Restart nodesSelecttime time nodes Psiddebug mask nodesPattern Name Description HandleOldBins 0 1 nodes Rdpmaxretrans val nodes RdpClosedTimeout ms nodes RdpTimeout ms nodesStatusTimeout ms nodes RdpResendTimeout ms nodesNormal FilesQuiet VerbosePsid Configfile=file Debug=levelLogfile=file Num TestconfigFilename ? , --usage Show a help messageParaStation5 Administrators Guide Cnt num TestnodesNp num MapParaStation5 Administrators Guide Testpse Testpse -npnumParaStation5 Administrators Guide Net P4statSock ?,--helpParaStation5 Administrators Guide P4tcp AddDelete ParaStation5 Administrators Guide Psaccounter Pattern Description?, --help DumpcoreCoredir=dir Var/account/yyyymmdd Accounting files, one per dayPsaccview Lg,--ltotgroup Lj,--ljobsLu,--ltotuser Ls,--ltotsumCputime CpuweightAqtime EndInitialization file Mlisten ParaStation5 Administrators Guide Appendix A. Quick Installation Guide # /opt/parastation/bin/psiadmin psiadmin add # chkconfig -a /etc/init.d/parastationTesting Appendix B. ParaStation license Page Page Page Appendix C. Upgrading ParaStation4 to ParaStation5 Building and installing ParaStation5 packages# psiadmin -s Changes to the runtime environmentPage Glossary ARPSee ParaStation Logger To share a common address space within a node ParaStation5 Administrators Guide