9 Using SLURM
HP XC uses the Simple Linux Utility for Resource Management (SLURM) for system resource management and job scheduling.
This chapter addresses the following topics:
•“Introduction to SLURM” (page 81)
•“SLURM Utilities” (page 81)
•“Launching Jobs with the srun Command” (page 81)
•“Monitoring Jobs with the squeue Command” (page 82)
•“Terminating Jobs with the scancel Command” (page 83)
•“Getting System Information with the sinfo Command” (page 83)
•“Job Accounting” (page 84)
•“Fault Tolerance” (page 84)
•“Security” (page 84)
9.1Introduction to SLURM
SLURM is a reliable, efficient, open source,
The SLURM Reference Manual is available on the HP XC Documentation
https://computing.llnl.gov/linux/slurm/documentation.html.
SLURM manpages are also available online on the HP XC system.
As a system resource manager, SLURM has the following key functions:
•Allocate exclusive and/or
•Provide a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes
•Arbitrate conflicting requests for resources by managing a queue of pending work
“How LSF and SLURM Interact” describes the interaction between SLURM and LSF.
9.2 SLURM Utilities
You interact with SLURM through its command line utilities. The basic utilities are listed here:
•srun
•squeue
•scancel
•sinfo
•scontrol
For more information on any of these utilities, see the SLURM Reference Manual or the corresponding manpage.
9.3 Launching Jobs with the srun Command
The srun command submits and controls jobs that run under SLURM management. The srun command is used to submit interactive and batch jobs for execution, allocate resources, and initiate job steps.
9.1 Introduction to SLURM 81