The squeue command can report on jobs in the job queue according to their state; possible states are: pending, running, completing, completed, failed, timeout, and node_fail. Example
Example | Reporting on Failed Jobs in the Queue |
|
|
| |||
$ squeue |
|
|
|
|
| ||
JOBID PARTITION | NAME | USER | ST | TIME | NODES | NODELIST | |
59 | amt1 | hostname | root | F | 0:00 | 0 |
|
Terminating Jobs with the scancel Command
The scancel command cancels a pending or running job or job step. It can also be used to send a specified signal to all processes on all nodes associated with a job. Only job owners or administrators can cancel jobs.
Example
Example
$ scancel 415
Example
Example
$ scancel --state=PENDING
Example
Example
$ scancel
Getting System Information with the sinfo Command
The sinfo command reports the state of partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options. sinfo displays a summary of available partition and node (not job) information, such as partition names, nodes/partition, and cores/node.
Example | Using the sinfo Command (No Options) | |||
$ sinfo |
|
|
|
|
PARTITION AVAIL TIMELIMIT NODES | STATE NODELIST | |||
lsf | up | infinite | 3 | down* n[0,5,8] |
lsf | up | infinite | 14 | idle |
The node STATE codes in these examples may be appended by an asterisk character (*) ; this indicates that the reported node is not responding. See the sinfo(1) manpage for a complete listing and description of STATE codes.
Example
$ sinfo |
|
REASON | NODELIST |
Memory errors | n[0,5] |
Not Responding | n8 |
Job Accounting
HP XC System Software provides an extension to SLURM for job accounting. The sacct command displays job accounting data in a variety of forms for your analysis. Job accounting data is stored in a log file; the sacct command filters that log file to report on your jobs, jobsteps, status, and errors. See your system administrator if job accounting is not configured on your system.
By default, only the superuser is allowed to access the job accounting data. To grant all system users read access to this data, the superuser must change the permission of the jobacct.log file, as follows:
Terminating Jobs with the scancel Command 65