Be sure to unset the SLURM_JOBID when you are finished with the allocation, to prevent a previous SLURM JOBID from interfering with future jobs:

$ unset SLURM_JOBID

The following examples illustrate launching interactive MPI jobs. They use the hellompi job script introduced in Section (page 48).

Example 9-8 Launching an Interactive MPI Job

$ mpirun -srun --jobid=150 hellompi Hello! I'm rank 0 of 4 on n1 Hello! I'm rank 1 of 4 on n2 Hello! I'm rank 2 of 4 on n3 Hello! I'm rank 3 of 4 on n4

Example 9-9 Launching an Interactive MPI Job on All Cores in the Allocation

This example assumes 2 cores per node.

$ mpirun -srun --jobid=150 -n8 hellompi Hello! I'm rank 0 of 8 on n1

Hello! I'm rank 1 of 8 on n1

Hello! I'm rank 2 of 8 on n2

Hello! I'm rank 3 of 8 on n2

Hello! I'm rank 4 of 8 on n3

Hello! I'm rank 5 of 8 on n3

Hello! I'm rank 6 of 8 on n4

Hello! I'm rank 7 of 8 on n4

Alternatively, you can use the following:

$ export SLURM_JOBID=150 $ export SLURM_NPROCS=8 $ mpirun -srun hellompi Hello! I'm rank 0 of 8 on n1 Hello! I'm rank 1 of 8 on n1 Hello! I'm rank 2 of 8 on n2 Hello! I'm rank 3 of 8 on n2 Hello! I'm rank 4 of 8 on n3 Hello! I'm rank 5 of 8 on n3 Hello! I'm rank 6 of 8 on n4 Hello! I'm rank 7 of 8 on n4

Use ssh to launch a Totalview debugger session, assuming that TotalView is installed and licensed and that ssh X forwarding is properly configured:

$ export SLURM_JOBID=150 $ export SLURM_NPROCS=4

$ mpirun -tv srun additional parameters as needed

After you finish with this interactive allocation, exit the /bin/bash process in the first terminal; this ends the LSF job.

Note

If you exported any variables, such as SLURM_JOBID and SLURM_NPROCS, be sure to unset them as follows before submitting any further jobs from the second terminal:

$ unset SLURM_JOBID

$ unset SLURM_NPROCS

You do not need to launch the /bin/bash shell to be able to interact with any compute node resources; any running job will suffice. This is excellent for checking on long-running jobs. For example, if you had submitted a CPU-intensive job, you could execute the uptime command on all nodes in the allocation to

Using LSF-HPC 85

Page 85
Image 85
HP XC System 3.x Software manual $ unset Slurmjobid, Alternatively, you can use the following