Example 9-2 Displaying Queued Jobs by Their JobIDs

$ squeue --jobs 12345,12346

 

 

 

 

JOBID PARTITION NAME USER

ST TIME_USED NODES

NODELIST(REASON)

12345

debug

job1 jody

R

0:21

4

n[9-12]

12346

debug

job2 jody

PD

0:00

8

 

 

 

 

 

 

 

 

The squeue command can report on jobs in the job queue according to their state; possible states are: pending, running, completing, completed, failed, timeout, and node_fail. Example 9-3uses the squeue command to report on failed jobs.

Example 9-3 Reporting on Failed Jobs in the Queue

$ squeue --state=FAILED

 

 

 

 

 

JOBID

PARTITION

NAME

USER

ST

TIME

NODES

NODELIST(REASON)

59

amt1

hostname

root

F

0:00

0

 

 

 

 

 

 

 

 

 

9.5 Terminating Jobs with the scancel Command

The scancel command cancels a pending or running job or job step. It can also be used to send a specified signal to all processes on all nodes associated with a job. Only job owners or administrators can cancel jobs.

Example 9-4terminates job #415 and all its jobsteps.

Example 9-4 Terminating a Job by Its JobID

$ scancel 415

Example 9-5cancels all pending jobs.

Example 9-5 Cancelling All Pending Jobs

$ scancel --state=PENDING

Example 9-6sends the TERM signal to terminate jobsteps 421.2 and 421.3.

Example 9-6 Sending a Signal to a Job

$ scancel --signal=TERM 421.2 421.3

9.6 Getting System Information with the sinfo Command

The sinfo command reports the state of partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options. The sinfo command displays a summary of available partition and node (not job) information, such as partition names, nodes/partition, and cores/node.

Example 9-7 Using the sinfo Command (No Options)

$ sinfo

 

 

 

 

 

PARTITION AVAIL TIMELIMIT NODES

STATE

NODELIST

lsf

up

infinite

3

down*

n[0,5,8]

lsf

up

infinite

14

idle

n[1-4,6-7,9-16]

 

 

 

 

 

 

The node STATE codes in these examples may be appended by an asterisk character (*) ; this indicates that the reported node is not responding. See the sinfo(1) manpage for a complete listing and description of STATE codes.

9.5 Terminating Jobs with the scancel Command 83