Each partition’s node limits supersede those specified by -N. Jobs that request more nodes than the partition allows never leave the PENDING state. To use a specific partition, use the srun -poption. Combinations of -nand -Ncontrol how job processes are distributed among nodes according to the following srun policies:

-n/-Ncombinations srun infers your intended number of processes per node if you specify both the number of processes and the number of nodes for your job. Thus -n16 -N8 normally results in running 2 processes/node. But, see the next policy for exceptions.

Minimum interpretation

srun interprets all node requests as minimum node requests (-N16

 

means "at least 16 nodes"). If some nodes lack enough CPUs to

 

cover the process count specified by -n, srun will automatically

 

allocate more nodes (than mentioned with -N) to meet the need. For

 

example, if not all nodes have 2 working CPUs, then -n32-N16

 

together will allocate more than 16 nodes so that all processes are

 

supported. The actual number of nodes assigned (not the number

 

requested) is stored in environment variable SLURM_NNODES.

CPU overcommitment

By default, srun never allocates more than one process per CPU. If

 

you intend to assign multiple processes per CPU, you must invoke

 

the srun -Ooption along with -nand -N. Thus, -n16-N4-O

 

together allow 2 processes per CPU on the 4 allocated 2-CPU nodes.

Inconsistent allocation

srun rejects as errors inconsistent -n/-Ncombinations. For

 

example, -n15-N16requests the impossible assignment of 15

 

processes to 16 nodes.

-c cpt (--cpus-per-task=cpt)

The -c cpt option assigns cpt CPUs per process for this job (default is one CPU per process). This option supports multithreaded programs that require more than a single CPU per process for best performance.

For multithreaded programs where the density of CPUs is more important than a specific node count, use both -nand -con the srun execute line rather than -N. The options -n16and -c2result in whatever node allocation is needed to yield the requested 2 CPUs/process. This is the reverse of CPU overcommitment (see -Nand -Ooptions).

-p part (--partition=part)

The -p part option requests nodes only from the part partition. The default partition is assigned by the system administrator.

-t minutes (--time=minutes)

The -t minutes option allocates a total number of minutes for this job to run (default is the current partition’s time limit). If the number of minutes exceeds the partition’s time limit, then the job never leaves the PENDING state. When the time limit has been reached, SLURM sends each job process SIGTERM followed (after a pause specified by SLURM’s KillWait configuration parameter) by SIGKILL.

-T nthreads (--threads=nthreads)

The -T nthreads option requests that srun allocate nthreads threads to initiate and control the parallel tasks in this job. The default is the smaller of either 10 or the number of nodes actually allocated, SLURM_NNODES.

6-6Using SLURM

Page 76
Image 76
HP XC System 2.x Software manual Cpt --cpus-per-task=cpt, Part --partition=part, Minutes --time=minutes