Example 9-2 Displaying Queued Jobs by Their JobIDs
$ squeue |
|
|
|
| ||
JOBID PARTITION NAME USER | ST TIME_USED NODES | NODELIST(REASON) | ||||
12345 | debug | job1 jody | R | 0:21 | 4 | |
12346 | debug | job2 jody | PD | 0:00 | 8 |
|
The squeue command can report on jobs in the job queue according to their state; possible states are: pending, running, completing, completed, failed, timeout, and node_fail. Example
Example 9-3 Reporting on Failed Jobs in the Queue
$ squeue |
|
|
|
|
| ||
JOBID | PARTITION | NAME | USER | ST | TIME | NODES | NODELIST(REASON) |
59 | amt1 | hostname | root | F | 0:00 | 0 |
|
9.5 Terminating Jobs with the scancel Command
The scancel command cancels a pending or running job or job step. It can also be used to send a specified signal to all processes on all nodes associated with a job. Only job owners or administrators can cancel jobs.
Example
Example
$ scancel 415
Example 9-5 cancels all pending jobs.
Example
$ scancel
Example
Example
$ scancel
9.6 Getting System Information with the sinfo Command
The sinfo command reports the state of partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options. The sinfo command displays a summary of available partition and node (not job) information, such as partition names, nodes/partition, and cores/node.
Example 9-7 Using the sinfo Command (No Options)
$ sinfo |
|
|
|
|
|
PARTITION AVAIL TIMELIMIT NODES | STATE | NODELIST | |||
lsf | up | infinite | 3 | down* | n[0,5,8] |
lsf | up | infinite | 14 | idle |
The node STATE codes in these examples may be appended by an asterisk character (*) ; this indicates that the reported node is not responding. See the sinfo(1) manpage for a complete listing and description of STATE codes.
9.5 Terminating Jobs with the scancel Command 83