
“Translating SLURM and LSF JOBIDs” describes the relationship between the SLURM_JOBID and the LSF JOBID.
SLURM_NPROCS This environment variable passes along the total number of tasks requested with the bsub
LSF regards the entire HP XC system as a “SLURM machine.” LSF gathers resource information from SLURM and creates SLURM allocations for each job. As a consequence, every LSF job has a corresponding SLURM JOBID.
For a parallel job, LSF allocates multiple nodes for the job, but LSF always runs the batch script (or user command) on the first node. The batch script or the user command must start its tasks in parallel. The srun command is the SLURM “parallel launcher” command.
Example 10-1 Examples of LSF Job Launch
The following individual examples are run on a
[lsfadmin@n16 ~]$ sinfo |
|
|
| ||
PARTITION AVAIL | TIMELIMIT NODES | STATE | NODELIST | ||
lsf | up | infinite | 4 | idle |
This command line requests 4 cores, but runs the hostname command on the first node:
[lsfadmin@xc19n16 ~]$ bsub
Job <110> is submitted to default queue <interactive>. <<Waiting for dispatch...>>
<<Starting on lsfhost.localdomain>> n13
The following command line requests 4 cores and uses the srun to run the hostname command on all four:
[lsfadmin@n16 ~]$ bsub -n4 -I srun hostname
Job <111> is submitted to default queue <interactive>. <<Waiting for dispatch...>>
<<Starting on lsfhost.localdomain>> n13
n13
n14
n14
The following command line requests 4 cores across all 4 nodes and runs the hostname command on each node:
[lsfadmin@n16 ~]$ bsub
<<Waiting for dispatch...>> <<Starting on lsfhost.localdomain>> n13
n14
n15
n16
It is possible to set your SSH keys to avoid password prompting so that you can use
10.2 Overview of LSF Integrated with SLURM 87