6.4.1.1 srun Roles

srun options allow you submit a job by:

Specifying the parallel environment for your job, such as the number of nodes to use, partition, distribution of processes among nodes, and maximum time.

Controlling the behavior of your parallel job as it runs, such as by redirecting or labeling its output, sending it signals, or specifying its reporting verbosity.

6.4.1.2srun Modes

Because srun performs several different roles, it has five distinct ways, or modes, in which it can be used:

Simple Mode

In simple mode, srun submits your job to the local SLURM job

 

controller, initiates all processes on the specified nodes, and blocks

 

until needed resources are free to run the job if necessary. Many

 

control options can change the details of this general pattern.

 

The simplest way to use the srun command is to distribute the

 

execution of a serial program (such as a LINUX utility) across a

 

specified number or range of compute nodes. For example:

 

$ srun -N 8 cp ~/data1 /var/tmp/data1

 

This command copies (CP) file data1 from your common home

 

directory into local disk space on each of eight compute nodes. This

 

is similar to running simple programs in parallel.

Batch Mode

srun can also directly submit complex scripts to the job queue(s)

 

managed by SLURM for later execution when needed resources

 

become available and when no higher priority jobs are pending.

 

For example:

 

$ srun -N 16 -b myscript.sh

 

This command uses the srun -boption to place myscript.sh

 

into the batch queue to run later on 16 nodes. Scripts in turn

 

normally contain either MPI programs, or other simple invocations

 

of srun itself (as shown above). The srun -boption supports

 

basic, local batch service.

Allocate Mode

When you need to combine the job complexity of scripts with

 

the immediacy of interactive execution, you can use the allocate

 

mode. For example:

 

$ srun -A -N 4 myscript.sh

 

This command uses the srun -Aoption to allocate specified

 

resources (four nodes in the above example), spawn a subshell with

 

access to those resources, and then run multiple jobs using simple

 

srun commands within the specified script (myscript.sh in the

 

above example) that the subshell immediately starts to execute. This

 

is similar to allocating resources by setting environment variables at

 

the beginning of a script, and then using them for scripted tasks.

 

No job queues are involved.

Attach

You can monitor or intervene in an already running srun job,

 

either batch (started with -b), or interactive (allocated - started

 

with -A), by executing srun again and attaching (-a) to that

 

job. For example:

 

$ srun -a 6543 -j

Using SLURM 6-3