Chapter 2. Technical overview
Within this section, a brief technical overview of ParaStation5 will be given. The various software modules constituting ParaStation5 are explained.
2.1. Runtime daemon
In order to enable ParaStation5 on a cluster, the ParaStation daemon psid(8) has to be installed on each cluster node. This daemon process implements various functions:
•Install and configure local communication devices and protocols, e.g. load the p4sock kernel module and set up proper routing information, if not already done at system startup.
•Queue parallel and serial tasks until requested resources are available.
•Distribute processes onto the available cluster nodes.
•Startup and monitor processes on cluster nodes. Also terminate and cleanup processes upon request.
•Monitor availability of other cluster nodes, send “I'm alive” messages.
•Handle input/output and signal forwarding.
•Service management commands from the administration tools.
The daemon processes periodically send information containing application processes, system load and others to all other nodes within the cluster. So each daemon is able to monitor each other node, and in case of absent alive messages, it will initiate proper actions, e.g. terminate a parallel task or mark this node as "no longer available". Also, if a previously unavailable node is now responding, it will be marked as "available" and will be used for upcoming parallel task. No intervention of the system administrator is required.
2.2. Libraries
In addition, a couple of libraries providing communication and management functionality, must be installed. All libraries are provided as static versions, which will be linked to the application at compile time, or as shared (dynamic) versions, which are
ParaStation5 comes with it's own version of MPI, based on MPIch2. The MPI library provides standard MPIch2 compatible MPI functions. For communication purposes, it supports a couple of communication paths in parallel, e.g. local communication using Shared memory, TCP or p4sock, Ethernet using p4sock and TCP, Infiniband using verbs, Myrinet using GM or 10G Ethernet using DAPL. Thus, ParaStation5 is able to spawn parallel tasks across nodes connected by different communication networks. ParaStation will also make use of redundant interconnects, if a failure is encountered during startup of a parallel task.
There are different versions of the ParaStation MPI library available, depending on the hardware architecture and compiler in use. For IA32, versions for GNU, Intel and Portland Group compilers are available. For x86_64, versions for the GCC, Intel, Portland Group and Pathscale EKO compiler suite are available. The versions support all available languages and language options for the selected compiler, e.g. Fortran, Fortran90, C or C++. The different versions of the MPI library can be installed in parallel, thus it is possible to compile and run applications using different compilers at the same node.
2.3. Kernel modules
Beside libraries enabling efficient communication and task management, ParaStation5 also provides a set of kernel modules:
ParaStation5 Administrator's Guide | 3 |