To illustrate how the external scheduler is used to launch an application, consider the following command line, which launches an application on ten nodes with one task per node:

$ bsub -n 10 -ext "SLURM[nodes=10]" srun my_app

The following command line launches the same application, also on ten nodes, but stipulates that node n16 should not be used:

$ bsub -n 10 -ext "SLURM[nodes=10;exclude=n16]" srun my_app

7.1.3 Notes on LSF-HPC

The following are noteworthy items for users of LSF-HPC on HP XC systems:

You must run jobs as a non-root user such as lsfadmin or any other local user; do not run jobs as the root user.

A SLURM partition named lsf is used to manage LSF jobs. You can view information about this partition with the sinfo command.

LSF daemons only run on one node in the HP XC system. As a result, the lshosts and bhosts commands only list one host that represents all the resources of the HP XC system. The total number of CPUs for that host should be equal to the total number of CPUs found in the nodes assigned to the SLURM lsf partition.

The total number of processors for that host should be equal to the total number of processors assigned to the SLURM lsf partition.

When a job is submitted and the resources are available, LSF-HPC creates a properly sized SLURM allocation and adds several standard LSF environment variables to the environment in which the job is to be run. The following two environment variables are also added:

SLURM_JOBID

This environment variable is created so that subsequent srun

 

commands make use of the SLURM allocation created by

 

LSF-HPC for the job. This variable can be used by a job script to

 

query information about the SLURM allocation, as shown here:

 

$ squeue --jobs $SLURM_JOBID

SLURM_NPROCS

This environment variable passes along the total number of

 

tasks requested with the bsub -ncommand to all subsequent

 

srun commands. User scripts can override this value with the

 

srun -ncommand, but the new value must be less than or

 

equal to the original number of requested tasks.

LSF-HPC dispatches all jobs locally. The default installation of LSF-HPC for SLURM on the HP XC system provides a job starter script that is configured for use by all LSF-HPC queues. This job starter script adjusts the LSB_HOSTS and LSB_MCPU_HOSTS environment variables to the correct resource values in the allocation. Then, the job starter script uses the srun command to launch the user task on the first node in the allocation.

If this job starter script is not configured for a queue, the user jobs begin execution locally on the LSF-HPC execution host. In this case, it is recommended that the user job uses one or more srun commands to make use of the resources allocated to the job. Work done on the LSF-HPC execution host competes for CPU time with the LSF-HPC daemons, and could affect the overall performance of LSF-HPC on the HP XC system.

The bqueues -lcommand displays the full queue configuration, including whether or not a job starter script has been configured. See the Platform LSF documentation or the bqueues(1) manpage for more information on the use of this command.

For example, consider an LSF-HPC LSF configuration in which node n20 is the LSF-HPC execution host and nodes n[1-10]are in the SLURM lsf partition. The default normal

Using LSF 7-3