6
Using SLURM
6.1 Introduction
HP XC uses the Simple Linux Utility for Resource Management (SLURM) for system resource management and job scheduling. SLURM is a reliable, efficient, open source,
The SLURM Reference Manual is available on the HP XC Documentation
As a system resource manager, SLURM has the following key functions:
•Allocate exclusive and/or
•Provide a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes
•Arbitrate conflicting requests for resources by managing a queue of pending work
Section 1.4.3 describes the interaction between SLURM and LSF.
6.2 SLURM Commands
Users interact with SLURM through its command line utilities. SLURM has the following basic commands: srun, scancel, squeue, sinfo, and scontrol, which can run on any node in the HP XC system. These commands are summarized in Table
in the following sections.
TableCommand Function
srun | Submits jobs to run under SLURM management. srun is used to submit a job for | |
| execution, allocate resources, attach to an existing allocation, or initiate job steps. | |
| srun can: | |
| • Submit a batch job and then terminate | |
| • Submit an interactive job and then persist to shepherd the job as it runs | |
| • | Allocate resources to a shell and then spawn that shell for use in running |
|
| subordinate jobs |
squeue | Displays the queue of running and waiting jobs (or "job steps"), including the JobID | |
| used for scancel), and the nodes assigned to each running job. It has a wide variety | |
| of filtering, sorting, and formatting options. By default, it reports the running jobs in | |
| priority order and then the pending jobs in priority order. | |
scancel | Cancels a pending or running job or job step. It can also be used to send a specified | |
| signal to all processes on all nodes associated with a job. Only job owners or | |
| administrators can cancel jobs. |
Using SLURM