HP XC System 3.x Software Getting Information About Jobs, Getting Job Allocation Information

Page 80

Getting Information About Jobs

There are several ways you can get information about a specific job after it has been submitted to LSF-HPC. This section briefly describes some of the commands that are available under LSF-HPC to gather information about a job. This section is not intended as complete information about this topic. It is intended only to give you an idea of the commands that are commonly used, and to describe any differences there may be in the way these commands operate in the HP XC environment. Refer to the LSF manpages for full information about the commands described in this section.

The following LSF commands are described in this section:

bjobs "Examining the Status of a Job"

bhist "Viewing the Historical Information for a Job"

Getting Job Allocation Information

Before a job runs, LSF-HPC allocates SLURM compute nodes based on job resource requirements. After LSF-HPC allocates nodes for a job, it attaches allocation information to the job. You can view job allocation information through the bjobs -land bhist -lcommands. Refer to the LSF manpages for details about using these commands.

A job allocation information string looks like the following:

slurm_id=slurm_jobid;ncpus=slurm_nprocs;slurm_alloc=node_list

This allocation string has the following values:

slurm_id

SLURM_JOBID environment variable. This is SLURM allocation ID (Associates LSF-HPC

 

job with SLURM allocated resources.)

ncpus

SLURM_NPROCS environment variable. This the actual number of allocated cores. Under

 

node-level allocation scheduling, this number may be bigger than what the job requests.)

slurm_alloc

Allocated node list (comma separated).

When LSF-HPC starts a job, LSF-HPC sets the SLURM_JOBID and SLURM_NPROCS environment variables.

Job Allocation Information for a Running Job

The following is an example of the output obtained using the bjobs -lcommand to obtain job allocation information about a running job:

$ bjobs -l 24

Job <24>, User <lsfadmin>, Project <default>, Status <RUN>, Queue <normal>, Interactive pseudo-terminal shell mode, Extsched <SLURM[nodes=4]>, Command </bin/bash>

date and time stamp: Submitted from host <n2>, CWD <$HOME>,

4 Processors Requested, Requested Resources <type=any>; date and time stamp: Started on 4 Hosts/Processors <4*lsfhost.localdomain>; date and time stamp: slurm_id=22;ncpus=8;slurm_alloc=n[5-8];

SCHEDULING PARAMETERS:

 

 

 

 

 

 

 

 

 

 

r15s

r1m

r15m

ut

pg

io

ls

it

tmp

swp

mem

loadSched

-

-

-

-

-

-

-

-

-

-

-

loadStop

-

-

-

-

-

-

-

-

-

-

-

EXTERNAL MESSAGES:

 

 

 

 

 

 

 

 

 

MSG_ID

FROM

 

POST_TIME

 

MESSAGE

 

 

ATTACHMENT

0

-

 

 

-

 

 

-

 

 

-

 

1

lsfadmin

date and time

stamp

SLURM[nodes=4]

N

In particular, note the node and job allocation information provided in the above output:

date and time stamp: Started on 4 Hosts/Processors <4*lsfhost.localdomain>;

date and time stamp: slurm_id=22;ncpus=8;slurm_alloc=n[5-8];

80 Using LSF

Image 80
Contents HP XC System Software Users Guide Page Table of Contents Submitting Jobs Configuring Your Environment with ModulefilesDeveloping Applications Tuning Applications Using SlurmUsing LSF Debugging ApplicationsGlossary 109 Index 115 Advanced TopicsExamples List of Figures Page List of Tables Determining the Node PlatformPage List of Examples Submitting a Job ScriptPage About This Document Intended AudienceDocument Organization This document is organized as followsHP XC Information Supplementary Information $ man lsfcommandnameFor More Information Related Information Manpages$ man discover $ man 8 discover $ man -k keywordRelated Linux Web Sites Related MPI Web SitesRelated Compiler Web Sites Additional PublicationsTypographic Conventions HP Encourages Your CommentsEnvironment Variable User inputOverview of the User Environment System ArchitectureHP XC System Software Operating SystemNode Specialization Storage and I/OFile System SAN StorageLocal Storage File System LayoutNetwork Address Translation NAT Determining System Configuration InformationSystem Interconnect Network Modules CommandsUser Environment Application Development Environment Run-Time EnvironmentParallel Applications Serial ApplicationsHow LSF-HPC and Slurm Interact Load Sharing Facility LSF-HPCStandard LSF Mpirun command Components, Tools, Compilers, Libraries, and DebuggersUsing the System LVS Login RoutingUsing the Secure Shell to Log Logging In to the SystemGetting Information About Resources IntroductionGetting Information About Queues Performing Other Common User Tasks Getting System Help and Information $ man sinfoConfiguring Your Environment with Modulefiles Overview of ModulesSupplied Modulefiles Modulefiles Automatically Loaded on the System Viewing Available ModulefilesViewing Loaded Modulefiles Loading a ModulefileAutomatically Loading a Modulefile at Login Unloading a ModulefileModulefile Conflicts Loading a Modulefile for the Current SessionCreating a Modulefile Viewing Modulefile-Specific Help$ module load modules $ man modulefile $ module help totalviewPage Compilers Developing ApplicationsApplication Development Environment Overview Examining Nodes and Partitions Before Running Jobs Interrupting a JobMPI Compiler Partition Avail Timelimit Nodes State NodelistSetting Debugging Options Developing Serial ApplicationsSerial Application Build Environment Building Serial ApplicationsDeveloping Parallel Applications Parallel Application Build EnvironmentModulefiles OpenMPPthreads Quadrics ShmemMPI Library Intel Fortran and C/C++CompilersBuilding Parallel Applications Examples of Compiling and Linking HP-MPI Applications Developing LibrariesDesigning Libraries for the CP4000 Platform To build a 64-bit application, you might enter Linkcommand 32-bit -L/opt/mypackage/lib/i686 -lmystuffLinkcommand 64-bit -L/opt/mypackage/lib/x8664 -lmystuff ExtSLURMslurm-arguments Submitting JobsOverview of Job Submission Submitting a Serial Job Using Standard LSF Submitting a Serial Job Using LSF-HPCSubmitting a Serial Job with the LSF bsub Command $ bsub hostnameSubmitting a Serial Job Through Slurm only Submitting a Non-MPI Parallel Job $ bsub -n4 -I srun hostnameBsub -nnum-procsbsub-optionsmpijob Mpirun mpirun--options-srunsrun-optionsmpi-jobnameBsub -nnum-procs bsub-optionsscript-name Submitting a Batch Job or Job Script$ bsub -n4 -I mpirun -srun ./helloworld $ cat myscript.sh #!/bin/sh Srun hostname mpirun -srun hellompi$ bsub -I -n4 Myscript.sh $ bsub -n4 -ext SLURMnodes=4 -I ./myscript.sh$ cat ./envscript.sh #!/bin/sh name=`hostname` Running Preexecution Programs$ bsub -n4 -I ./myscript.sh Opt/hptc/bin/srun Mypreexec Debugging Applications Debugging Serial ApplicationsDebugging Parallel Applications TotalViewSetting Up TotalView Using TotalView with SlurmSSH and TotalView Module load mpimodule load totalviewDebugging an Application Using TotalView with LSF-HPCSetting TotalView Preferences Sourcefile initfdte.f was not found, using assembler mode Debugging Running ApplicationsDirectories in File ⇒ Search Path $ mpirun -srun -n2 Psimple$ squeue $ scancel --user usernameExiting TotalView Page Building a Program Intel Trace Collector and HP-MPI Tuning ApplicationsUsing the Intel Trace Collector and Intel Trace Analyzer Running a Program Intel Trace Collector and HP-MPI Visualizing Data Intel Trace Analyzer and HP-MPILibs CldflagsUsing the Intel Trace Collector and Intel Trace Analyzer Page Using Slurm Launching Jobs with the srun CommandSrun Squeue Scancel Sinfo Scontrol Introduction to SlurmMonitoring Jobs with the squeue Command Using the srun Command with HP-MPIUsing the srun Command with LSF-HPC Srun Roles and ModesJob Accounting Terminating Jobs with the scancel CommandGetting System Information with the sinfo Command # chmod a+r /hptccluster/slurm/job/jobacct.log Fault ToleranceSecurity Using LSF-HPC Using LSFUsing Standard LSF on an HP XC System Introduction to LSF-HPC in the HP XC Environment Overview of LSF-HPCDifferences Between LSF-HPC and Standard LSF HostnameResources Hostname Status JL/U MAX Njobs RUN Ssusp Ususp RSVUnknown Unknown Job Terminology$ ssh n15 lshosts HP XCCompute Node Resource Support SLURMnodelist =nodelist if specified$ bsub -n 10 -I srun hostname $ bsub -n 10 -ext SLURMnodes=10 -I srun hostname$ bsub -n 10 -ext SLURMnodes=10exclude=n16 -I srun hostname $ bsub -n 10 -ext SLURMconstraint=dualcore -I srun hostnameHow LSF-HPC and Slurm Launch and Manage a Job $ bsub -n4 -ext SLURMnodes=4 -o output.out ./myscriptJob Startup and Job Control #!/bin/sh hostname srun hostname Mpirun -srun ./hellompiDetermining the LSF Execution Host Determining Available LSF-HPC System ResourcesGetting the Status of LSF-HPC Getting Information About LSF Execution Host NodeGetting Host Load Information Examining LSF-HPC System QueuesGetting Information About the lsf Partition SLINUX6 $ sinfo -p lsf -lNe Summary of the LSF bsub Command Format $ sinfo -p lsf LSF-SLURM External Scheduler For information about running scriptsBsub -n num-procs-ext SLURMslurm-arguments \ Bsub-options srun srun-optionsjobname job-optionsSubmitting a Job from a Non-HP XC Host Starting on lsfhost.localdomain n6Waiting for dispatch ... Starting on lsfhost.localdomain n1 Type=SLINUX64Getting Information About Jobs Getting Job Allocation InformationSlurmid=slurmjobidncpus=slurmnprocsslurmalloc=nodelist $ bjobs -lExamining the Status of a Job $ bhist -lTime stamp $ bjobsViewing the Historical Information for a Job $ bhistSummary of time in seconds spent Various States Jobid User Jobname Pend Psusp RUN Ususp Ssusp Unkwn TotalTranslating Slurm and LSF-HPC JOBIDs Working Interactively Within an LSF-HPC Allocation $ bsub -I -n4 -ext SLURMnodes=4 /bin/bash$ bjobs -l 124 grep slurm $ srun --jobid=150 hostname$ unset Slurmjobid Alternatively, you can use the following$ export SLURMJOBID=150 $ export SLURMNPROCS=4 $ unset Slurmjobid $ unset SlurmnprocsJob 125 is submitted to the default queue normal LSF-HPC Equivalents of Slurm srun Options$ srun --jobid=250 uptime $ bsub -n4 -ext SLURMnodes=4 -o %J.out sleepBsub -iinputfile Mpi=mpitype Quit-on-interrupt Page Advanced Topics Enabling Remote Execution with OpenSSHRunning an X Terminal Session from a Remote Node Determining IP Address of Your Local MachineLogging in to HP XC System Running an X terminal Session Using SlurmRunning an X terminal Session Using LSF-HPC Using the GNU Parallel Make Capability $ bsub -n4 -Ip srun -n1 xterm -display$ srun -n4 hostname n46 $ srun -n2 hostname n46$ cd subdir srun -n1 -N1 $MAKE -j4 Example Procedure $ make PREFIX=’srun -n1 -N1 MAKEJ=-j4Local Disks on Compute Nodes Performance ConsiderationsModified Makefile is invoked as follows $ make PREFIX=srun -n1 -N1 MAKEJ=-j4Communication Between Nodes Shared File ViewPrivate File View Fp = fopen myfile, a+Page Appendix a Examples Building and Running a Serial ApplicationLaunching a Serial Interactive Shell Through LSF-HPC Examine the LSF execution host informationRunning LSF-HPC Jobs with a Slurm Allocation Request Launching a Parallel Interactive Shell Through LSF-HPC Example 2. Four cores on Two Specific NodesR15s r1m r15m It tmp swp mem LoadSched LoadStop SLURMnodes=2124 Lsfad Examine the the running jobs information$ hostname n16 $ srun hostname n5 $ bjobs Submitting a Simple Job Script with LSF-HPC Examine the the finished jobs informationShow the environment Display the scriptRun some commands from the pseudo-terminal Submitting an Interactive Job with LSF-HPCSubmit the job Show the job allocationSubmitting an HP-MPI Job with LSF-HPC Exit the pseudo-terminalView the interactive jobs View the finished jobsLsfhost.localdomai View the running jobView the finished job $ bsub -n 8 -R ALPHA5 SLINUX64 \ -ext SLURMnodes=4-4 myjob Using a Resource Requirements String in an LSF-HPC CommandStates by date and time 108 Glossary To the queue First-come See Fcfs First-served Global storageAs local storage Are not appropriate for replicationSingle command Login requests and directs them to a node with a login roleLinux Virtual See LVS Server Load file LSF master hostNetwork See NIS Information Services Remotely. PXE booting is configured at the Bios levelNotably to install and remove software packages Slurm backupSymmetric See SMP Multiprocessing Power available per unit of spaceSsh 114 Index Index PGI Utilities, 63 Slurm commands