“Translating SLURM and LSF JOBIDs” describes the relationship between the SLURM_JOBID and the LSF JOBID.

SLURM_NPROCS This environment variable passes along the total number of tasks requested with the bsub -ncommand to all subsequent srun commands. User scripts can override this value with the srun -ncommand, but the new value must be less than or equal to the original number of requested tasks.

LSF regards the entire HP XC system as a “SLURM machine.” LSF gathers resource information from SLURM and creates SLURM allocations for each job. As a consequence, every LSF job has a corresponding SLURM JOBID.

For a parallel job, LSF allocates multiple nodes for the job, but LSF always runs the batch script (or user command) on the first node. The batch script or the user command must start its tasks in parallel. The srun command is the SLURM “parallel launcher” command. HP-MPI uses the srun command through the mpirun -srunoption.

Example 10-1 Examples of LSF Job Launch

The following individual examples are run on a 4-node cluster with 2 cores per nodes:

[lsfadmin@n16 ~]$ sinfo

 

 

 

PARTITION AVAIL

TIMELIMIT NODES

STATE

NODELIST

lsf

up

infinite

4

idle

n[13-16]

This command line requests 4 cores, but runs the hostname command on the first node:

[lsfadmin@xc19n16 ~]$ bsub -n4-I hostname

Job <110> is submitted to default queue <interactive>. <<Waiting for dispatch...>>

<<Starting on lsfhost.localdomain>> n13

The following command line requests 4 cores and uses the srun to run the hostname command on all four:

[lsfadmin@n16 ~]$ bsub -n4 -I srun hostname

Job <111> is submitted to default queue <interactive>. <<Waiting for dispatch...>>

<<Starting on lsfhost.localdomain>> n13

n13

n14

n14

The following command line requests 4 cores across all 4 nodes and runs the hostname command on each node:

[lsfadmin@n16 ~]$ bsub -n4 -I -ext "SLURM[nodes=4]" srun hostname Job <112> is submitted to default queue <interactive>.

<<Waiting for dispatch...>> <<Starting on lsfhost.localdomain>> n13

n14

n15

n16

It is possible to set your SSH keys to avoid password prompting so that you can use SSH-based parallel launchers like the pdsh and mpirun commands. Use the LSB_HOSTS environment variable to pass the list of allocated nodes to the launcher.

10.2 Overview of LSF Integrated with SLURM 87