1.4.2 Serial Applications

You can build and run serial applications under the HP XC development environment. A serial application is a command or application that does not use any form of parallelism.

Full details and examples of how to build, run, debug, and troubleshoot serial applications are provided in “Building Serial Applications”.

1.5 Run-Time Environment

This section describes LSF, SLURM, and HP-MPI, and how these components work together to provide the HP XC run-time environment. LSF focuses on scheduling (and managing the workload) and SLURM provides efficient and scalable resource management of the compute nodes.

Another HP XC environment features standard LSF without the interaction with the SLURM resource manager.

1.5.1 SLURM

Simple Linux Utility for Resource Management (SLURM) is a resource management system that is integrated into the HP XC system. SLURM is suitable for use on large and small Linux clusters. It was developed by Lawrence Livermore National Lab and Linux Networks. As a resource manager, SLURM allocates exclusive or unrestricted access to resources (application and compute nodes) for users to perform work, and provides a framework to start, execute and monitor work (normally a parallel job) on the set of allocated nodes.

A SLURM system consists of two daemons, one configuration file, and a set of commands and APIs. The central controller daemon, slurmctld, maintains the global state and directs operations. A slurmd daemon is deployed to each computing node and responds to job-related requests, such as launching jobs, signalling, and terminating jobs. End users and system software (such as LSF) communicate with SLURM by means of commands or APIs — for example, allocating resources, launching parallel jobs on allocated resources, and terminating running jobs.

SLURM groups compute nodes (the nodes where jobs are run) together into “partitions”. The HP XC system can have one or several partitions. When HP XC is installed, a single partition of compute nodes is created by default for LSF batch jobs. The system administrator has the option of creating additional partitions. For example, another partition could be created for interactive jobs.

1.5.2 Load Sharing Facility (LSF)

The Load Sharing Facility (LSF) from Platform Computing, Inc. is a batch system resource manager that has been integrated with SLURM for use on the HP XC system. LSF for SLURM is included with the HP XC System Software, and is an integral part of the HP XC environment. LSF interacts with SLURM to obtain and allocate available resources, and to launch and control all the jobs submitted to LSF. LSF accepts, queues, schedules, dispatches, and controls all the batch jobs that users submit, according to policies and configurations established by the HP XC site administrator. On an HP XC system, LSF for SLURM is installed and runs on one HP XC node, known as the LSF execution host.

A complete description of LSF is provided in Chapter 10 “Using LSF”. In addition, for your convenience, the HP XC Documentation CD contains LSF manuals from Platform Computing.

1.5.3 Standard LSF

Standard LSF is also available on the HP XC system. The information for using standard LSF is documented in the LSF manuals from Platform Computing. For your convenience, the HP XC documentation CD contains these manuals.

1.5 Run-Time Environment 25