6.4.5 srun Control Options

srun control options determine how a SLURM job manages its nodes and other resources, what its working features (such as job name) are, and how it gives you help. Separate "constraint" options and I/O options are available and are described in other sections of this chapter. The following types of control options are available:

Node management

Working features

Resource control

Help options

6.4.5.1Node Management Options

-k (--no-kill)

The -koption avoids automatic termination if any node fails that has been allocated to this job. The job assumes responsibility for handling such node failures internally. (SLURM’s default is to terminate a job if any of its allocated nodes fail.)

-m dist (--distribution=dist)

The -moption tells SLURM how to distribute tasks among nodes for this job. The choices for dist are either block or cyclic.

block

Assigns tasks in order to each CPU on one node before assigning any to the

 

next node. This is the default if the number of tasks exceeds the number of

 

nodes requested.

cyclic

Assigns tasks "round robin" across all allocated nodes (task 1 goes to the first

 

node, task 2 goes to the second node, and so on). This is the default if the number

 

of nodes requested equals or exceeds the number of tasks.

-r n (--relative=n)

The -roption offsets the first job step to node n of this job’s allocated node set (where the first node is 0). Option -ris incompatible with "constraint" options -wand -x, and it is ignored when you run a job without a prior node allocation (default for n is 0).

-s (--share)

The -soption allows this job to share nodes with other running jobs. Sharing nodes often starts the job faster and boosts system utilization, but it can also lower application performance.

6.4.5.2 Working Features Options

-D path (--chdir=path)

The -Doption causes each remote process to change its default directory to path (by using CHDIR) before it begins execution (without -D, the current working directory of srun becomes the default directory for each process).

-d level (--slurmd-debug=level)

The -doption specifies level as the level at which daemon SLURMD reports debug information and deposits it in this job’s stderr location. Here, level can be any integer between 0 (quiet, reports only errors, the default) and 5 (extremely verbose messages).

Using SLURM 6-7

Page 77
Image 77
HP XC System 2.x Software manual Srun Control Options