PAR Technologies V5 manual Introduction, History of ParaStation

Page 5

Chapter 1. Introduction

1.1.What is ParaStation

ParaStation is an integrated cluster management and communication solution. It combines unique features only found in ParaStation with common techniques, widely used in high performance computing, to deliver an integrated, easy to use and reliable compute cluster environment.

The version 5 of ParaStation supports various communication technologies as interconnect network. It comes with an optimized communication protocol for Ethernet that enables Gigabit Ethernet to play a new role in the market of high throughput, low latency communication. Beside Infiniband and Myrinet, it also supports the upcoming 10G Ethernet networks.

Like previous versions, ParaStation5 includes an integrated cluster administration and management environment. Using communicating daemon processes on each cluster node, an effective resource management and single point of administration is implemented. This results in a single system image view of the cluster.

From the user's point of view this cluster management leads to an easier and more effective usage of the cluster. Important features like load balancing, job control and input/output management, common in classical supercomputers, but rarely found in compute clusters, are implemented by ParaStation thus being now also available on clusters.

1.2. The history of ParaStation

The fundamentals of the ParaStation software were laid in 1995, when the ParaStation communication hardware and software system was presented. It was developed at the chair of Professor Tichy at computer science department of Karlsruhe University.

When in 1998 ParaStation2 was presented, it was a pure software project. The communication platform used then was Myrinet, a Gigabit interconnect developed by Myricom. The development of ParaStation2 still took place at the University of Karlsruhe.

ParaStation became commercial in 1999 when ParTec AG was founded. This spin-off from the University of Karlsruhe now owns all rights and patents connected with the ParaStation software. ParTec promotes the further development and improvement of the software. This includes the support of a broader basis of supported processor types, communication interconnect and operating systems.

Version 3 of the ParaStation software for Myrinet is a rewrite from scratch now fully in the responsibility of ParTec. All the know-how and experiences achieved from the former versions of the software were incorporated into this version. It was presented in 2001 and was a major breakthrough with respect to throughput, latency and stability of the software. Nevertheless it is enhanced constantly with regard to performance, stability and usability.

In 2002 the ParaStation FE software was presented opening the ParaStation software environment towards Ethernet communication hardware. This first step in the direction of independence from the underlying communication hardware brought the convenient ParaStation management facility to Beowulf clusters for the first time. Furthermore the suboptimal communication performance for large packets gained from the MPIch/P4 implementation of the MPI message passing interface, the de facto standard on Beowulf clusters, was improved to the limits that may to be expected from the physical circumstances.

With ParaStation4 presented in 2003 the software became really communication platform independent. With this version of the software even Gigabit Ethernet became a serious alternative as a cluster interconnect due to the throughput and latency that could be achieved.

ParaStation5 Administrator's Guide

1

Image 5
Contents Administrators Guide Info@par-tec.com ParaStation5 Administrators GuideTable of Contents Problem different groups of nodes are seen as up or down History of ParaStation IntroductionAbout this document Runtime daemon Technical overviewLibraries Kernel modulesLicense Hardware InstallationPrerequisites Kernel version Directory structureSoftware Getting the ParaStation5 RPM packages Installation via RPM packagesMan Mpi2, mpi2-intel, mpi2-pgi, mpi2-pscFile Version Installing the RPMsCompiling the ParaStation5 packages from source ParaStation entries Installing the documentationEtc/init.d/xinetd reload Further steps Installing MPI# rpm -Uv psdoc-5.0.0-1.noarch.rpm # rpm -Uv psmpi2.5.0.0-1.i586.rpmUninstalling ParaStation5 ParaStation5 Administrators Guide Configuration of the ParaStation system ConfigurationCopy template Define Number of nodesHostname id HWType runJob starter accounter Enable optimized network drivers# /opt/parastation/bin/testconfig Testing the installation # /opt/parastation/bin/testnodes -np nodes # /opt/parastation/bin/psiadmin -s -c listParaStation5 pscom communication library Insight ParaStation5# cat /proc/sys/ps4/state/connections # echo 10 /proc/sys/ps4/state/ResendTimeoutDirectory /proc/sys/ps4/state Directory /proc/sys/ps4/local Controlling process placementExporting environment variables for a task Using the ParaStation5 queuing facilityUsing non-ParaStationapplications Export LDPRELOAD=/opt/parastation/lib64/libp4tcp.so Controlling ParaStation5 communication pathsPspshm or Pspsharedmem Authentication within ParaStation5PSPP4S or PSPP4SOCK Export PSPLIB=/opt/parastation/lib64/libpscomopenib.soSingle system view Homogeneous user ID spaceParallel shell tool Nodes and CPUsIntegrating external queuing systems Integration with AFSTok2env PSIRARGPRE0=/some/path/env2tokMulticasts Copying files in parallel Using ParaStation accounting# UseMCast Route add -net 224.0.0.0 netmask 240.0.0.0 dev ethXUsing memory binding Using ParaStation process pinningChanging the default ports for psid8 Spawning processes belonging to all groupsPort Problem node shown as down TroubleshootingProblem psiadmin returns error Problem bad performance Problem cannot start parallel taskProblem different groups of nodes are seen as up or down Problem cannot start process on frontendProblem psid does not startup, reports port in use Problem pssh failsProblem processes cannot access files on remote nodes Reference Pages ParaStation5 Administrators Guide Parastation.conf InstallDir inst-dir , InstallationDir inst-dirDescription ParametersStartscript SetupscriptStopscript StatusscriptOpenib P4sockMvapi ElanNrOfNodes num Accounter$GENERATE 1-96 node$0,2 $0 Node node17 16 HWType ethernet p4sock starter yes runJobs noDeadInterval num SelectTime timeLogLevel num MCastGroup group-numCPUTime time Core sizeDataSize size MemLock sizeProc CPUmap map Processes maxprocsStatusTimeout ms RdpTimeout msRdpClosedTimeout ms RdpResendTimeout msSee also ErrorsParaStation5 Administrators Guide Options PsiadminSynopsis Standard Input Standard ErrorStandard Output Extended descriptionAllproc cnt count ExitAll Down Count hw hwHardware LoadQuit RdpSummary max max User nodes Accounters nodesGroup nodes Maxproc nodesFreeOnSuspend nodes Master nodesHandleOldBins nodes NodesSort nodesRlrss nodes Cpumap nodesStatusTimeout nodes RdpTimeout nodesRdpClosedTimeout nodes RdpResendTimeout nodesRestart nodes Resolve nodesPsiddebug mask nodes Selecttime time nodesPattern Name Description HandleOldBins 0 1 nodes Rdpmaxretrans val nodes StatusTimeout ms nodes RdpTimeout ms nodesRdpClosedTimeout ms nodes RdpResendTimeout ms nodesQuiet FilesNormal VerbosePsid Logfile=file Configfile=fileDebug=level Filename TestconfigNum ? , --usage Show a help messageParaStation5 Administrators Guide Np num TestnodesCnt num MapParaStation5 Administrators Guide Testpse -npnum TestpseParaStation5 Administrators Guide Sock P4statNet ?,--helpParaStation5 Administrators Guide Delete P4tcpAdd ParaStation5 Administrators Guide Pattern Description PsaccounterCoredir=dir Dumpcore?, --help Var/account/yyyymmdd Accounting files, one per dayPsaccview Lu,--ltotuser Lj,--ljobsLg,--ltotgroup Ls,--ltotsumAqtime CpuweightCputime EndInitialization file Mlisten ParaStation5 Administrators Guide Appendix A. Quick Installation Guide Testing # /opt/parastation/bin/psiadmin psiadmin add# chkconfig -a /etc/init.d/parastation Appendix B. ParaStation license Page Page Page # psiadmin -s Building and installing ParaStation5 packagesAppendix C. Upgrading ParaStation4 to ParaStation5 Changes to the runtime environmentPage ARP GlossarySee ParaStation Logger To share a common address space within a node ParaStation5 Administrators Guide