7

Using LSF

The Load Sharing Facility (LSF) from Platform Computing Corporation is a batch system resource manager used on the HP XC system. LSF is included with HP XC, and is an integral part of the HP XC environment. On an HP XC system, a job is submitted to LSF, which places the job in a queue and allows it to run when the necessary resources become available. In addition to launching jobs, LSF provides extensive job management and information capabilities. LSF schedules, launches, controls, and tracks jobs that are submitted to it according to the policies established by the HP XC site administrator.

This chapter describes the functionality of LSF in an HP XC system, and discusses how to use some basic LSF commands to submit jobs, manage jobs, and access job information. The following topics are discussed:

Introduction to LSF on HP XC (Section 7.1)

Determining the LSF execution host (Section 7.2)

Determining available LSF resources (Section 7.3)

Submitting jobs to LSF (Section 7.4)

Getting information about LSF jobs (Section 7.5)

Working interactively within an LSF-HPC allocation (Section 7.6)

LSF Equivalents of SLURM options (Section 7.7)

For full information about LSF, refer to the standard LSF documentation set, which is described in the Related Information section of this manual. LSF manpages are also available online on the HP XC system.

7.1 Introduction to LSF in the HP XC Environment

This section introduces you to LSF in the HP XC environment. It provides an overview of how LSF works, and discusses some of the features and differences of standard LSF compared to LSF on an HP XC system. This section also contains an important discussion of how LSF and SLURM work together to provide the HP XC job management environment. A description of SLURM is provided in Chapter 6.

7.1.1 Overview of LSF

LSF is a batch system resource manager. In the HP XC environment, LSF manages just one resource — the total number of HP XC processors designated for batch processing. The HP XC system is based on dedicating processors to jobs, and LSF is implemented to use these processors in the most efficient manner.

As jobs are submitted to LSF, LSF places the jobs in queues and determines an overall priority for launching the jobs. When the required number of HP XC processors become available to launch the next job, LSF reserves them and launches the job on these processors. When a job is completed, LSF returns job output, job information, and any errors.

A standard LSF installation on an HP XC system would consist of LSF daemons running on every node and providing activity and resource information for each node. LSF-HPC for SLURM on an HP XC system consists of one node running LSF-HPC daemons, and these daemons communicate with SLURM for resource information about the other nodes. LSF-HPC consolidates this resource information into one "virtual" node. Thus LSF-HPC integrated with

Using LSF 7-1

Page 85
Image 85
HP XC System 2.x Software manual Using LSF, Introduction to LSF in the HP XC Environment, Overview of LSF