Figure 7-1: How LSF-HPC and SLURM Launch and Manage a Job

User

1

N16

N166

Login node

$ bsub-n4 -ext”SLURM[nodes-4]” -o output.out./myscript

2

lsfhost.localdomain

LSF Execution Host

job_starter.sh

$ srun -nl myscript

4

 

 

 

 

3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

SLURM_JOBID=53

 

 

 

 

 

 

 

SLURM_NPROCS=4

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

N1

5

 

Compute Node

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

myscript

 

 

 

$ hostname

 

 

 

$ hostname

 

 

 

 

 

 

 

 

 

 

 

 

n1

 

 

 

$ srun hostname

 

 

 

srun

 

 

 

 

 

 

$ mpirun -srun ./hellompi

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6

hostname

n1

7

N2

6

hostname Compute Node

n2

7

N3

6

hostname Compute Node

n3

7

N4

6

hostname Compute Node

n4

7

1.A user logs in to login node n16.

2.The user executes the following LSF bsub command on login node n16:

$ bsub -n4 -ext "SLURM[nodes=4]" -o output.out ./myscript

This bsub command launches a request for four CPUs (from the -n4option of the bsub command) across four nodes (from the -ext "SLURM[nodes=4]" option); the job is launched on those CPUs. The script, myscript, which is shown here, runs the job:

#!/bin/sh hostname srun hostname

mpirun -srun ./hellompi

3.LSF-HPC schedules the job and monitors the state of the resources (compute nodes) in the SLURM lsf partition. When the LSF-HPC scheduler determines that the required resources are available, LSF-HPC allocates those resources in SLURM and obtains a SLURM job identifier (jobID) that corresponds to the allocation.

In this example, four processors spread over four nodes (n1,n2,n3,n4) are allocated for myscript, and the SLURM job id of 53 is assigned to the allocation.

Using LSF 7-5

Page 89
Image 89
HP XC System 2.x Software manual How LSF-HPC and Slurm Launch and Manage a Job