HP XC System 3.x Software Introduction to LSF-HPC in the HP XC Environment, Overview of LSF-HPC

Models: XC System 3.x Software

1 118
Download 118 pages 14.9 Kb
Page 68
Image 68

job management and information capabilities. LSF-HPC schedules, launches, controls, and tracks jobs that are submitted to it according to the policies established by the HP XC site administrator.

This section describes the functionality of LSF-HPC in an HP XC system, and discusses how to use some basic LSF commands to submit jobs, manage jobs, and access job information. The following topics are discussed:

Introduction to LSF-HPC in the HP XC Environment (page 68)

Determining the LSF Execution Host (page 75)

Determining Available LSF-HPC System Resources (page 75)

Submitting Jobs (page 77)

Getting Information About Jobs (page 80)

Translating SLURM and LSF-HPC JOBIDs (page 83)

Working Interactively Within an LSF-HPC Allocation (page 84)

LSF-HPC Equivalents of SLURM srun Options (page 86)

Introduction to LSF-HPC in the HP XC Environment

This section introduces you to LSF-HPC in the HP XC environment. It provides an overview of how LSF-HPC works, and discusses some of the features and differences of standard LSF compared to LSF-HPC on an HP XC system. This section also contains an important discussion of how LSF-HPC and SLURM work together to provide the HP XC job management environment. A description of SLURM is provided in Chapter

Chapter 8. Using SLURM .

Overview of LSF-HPC

LSF-HPC was integrated with SLURM for the HP XC system to merge the scalable and efficient resource management of SLURM with the extensive scheduling capabilities of LSF. In this integration, SLURM manages the compute resources while LSF-HPC performs the job management. SLURM extends the parallel capabilities of LSF with its own fast parallel launcher (which is integrated with HP-MPI), full parallel I/O and signal support, and parallel job accounting capabilities. Managing the compute resources of the HP XC system with SLURM means that the LSF daemons run only on one HP XC node and can present the HP XC system as a single LSF host.

LSF-HPC interacts with SLURM to obtain resource information about the HP XC system. This information is consolidated and key information such as the total number of cores and the maximum memory available on all nodes becomes the characteristics of the single HP XC “LSF Execution Host”. Additional resource information from SLURM, such as pre-configured node “features”, are noted and processed during scheduling through the external SLURM scheduler for LSF-HPC.

Integrating LSF-HPC with SLURM on HP XC systems provides you with a parallel launch command to distribute and manage parallel tasks efficiently. The SLURM srun command offers much flexibility for requesting requirements across an HP XC system; for example, you can request

Request contiguous nodes

Execute only one task per node

Request nodes with specific features

This flexibility is preserved in LSF-HPC through the external SLURM scheduler; this is discussed in more detail in the section titled "HP XCCompute Node Resource Support"

In an HP XC system, only one node runs LSF-HPC, but all the nodes are configured as LSF-HPC Client Hosts: every node is able to access LSF-HPC. You can submit jobs from any node in the HP XC system.

The differences described in HP XC System Software documentation take precedence over descriptions in the LSF documentation from Platform Computing Corporation. See "Differences Between LSF-HPC and Standard LSF" and the lsf_diff(1) manpage for more information on the subtle differences between standard LSF and LSF-HPC.

68 Using LSF

Page 68
Image 68
HP XC System 3.x Software manual Introduction to LSF-HPC in the HP XC Environment, Overview of LSF-HPC