HP XC System 3.x Software Getting Host Load Information, Examining LSF-HPC System Queues, SLINUX6

Page 76

The following example shows the output from the lshosts command:

$ lshosts

 

 

 

 

 

 

 

 

HOST_NAME

type

model

cpuf ncpus maxmem maxswp server

RESOURCES

lsfhost.loc

SLINUX6

Itanium2

16.0

12

3456M

-

Yes

(slurm)

n7

UNKNOWN

UNKNOWN_

1.0

-

-

-

No

()

n8

UNKNOWN

UNKNOWN_

1.0

-

-

-

No

()

n2

UNKNOWN

UNKNOWN_

1.0

-

-

-

No

()

Of note in the lshosts output:

The HOST_NAME column displays the name of the LSF execution host, lsfhost.localdomain and any other HP XC nodes that have been granted a floating client license because LSF commands were executed on them. LSF-HPC does not know about these floating client hosts, so they are listed as UNKNOWN types and models.

The type column displays the type of resource. This value is SLINUX64 for all HP XC systems.

The ncpus column displays the total core count (usable cores) of all available computer nodes in the lsf partition.

The maxmem column displays minimum maxmem over all available computer nodes in the lsf partition.

The maxtmp column (not shown) displays minimum maxtmp over all available computer nodes in the lsf partition. Use the lshosts -lcommand to display this column.

Getting Host Load Information

The LSF lsload command displays load information for LSF execution hosts.

$ lsload

 

 

 

 

 

 

 

 

 

 

 

HOST_NAME

status

r15s

r1m

r15m

ut

pg

ls

it

tmp

swp

mem

lsfhost.loc

ok

-

-

-

-

-

4

-

-

-

-

In the previous example output, the LSF execution host (lsfhost.localdomain) is listed under the HOST_NAME column. The status is listed as ok, indicating that it can accept remote jobs. The ls column shows the number of current login users on this host.

See the OUTPUT section of the lsload manpage for further information about the output of this example. In addition, refer to the Platform Computing Corporation LSF documentation and the lsload(1) manpage for more information about the features of this command.

For individual compute node load information, see the discussion on metrics in shownode(1).

Examining LSF-HPC System Queues

All jobs on the HP XC system that are submitted to LSF-HPC are placed into an LSF-HPC job queue.HP recommends that you examine the status and availability of LSF-HPC system queues before launching a job so that you can select the most appropriate queue for your job.

You can easily verify the status, limits, and configurations of LSF-HPC queues with the bqueues command. This command is fully described in Platform Computing Corporation's LSF documentation and manpages.

See bsub(1) for more information on submitting jobs to specific queues.

For more information on the bqueues command, see bqueues(1).

Getting Information About the lsf Partition

Information about the SLURM lsf compute node partition can be viewed with the SLURM sinfo command. A partition is one or more compute nodes that have been grouped together. A SLURM lsf partition is created when the HP XC system is installed. This partition contains the resources that will be managed by LSF-HPC and available for jobs submitted to LSF-HPC.

The sinfo command reports the state of the lsf partition and all other partitions on the system. The sinfo command displays a summary of available partition and node information such as partition names, nodes/partition, and cores/node). It has a wide variety of filtering, sorting, and formatting options.

The following example shows the use of the sinfo command to obtain lsf partition information:

76 Using LSF

Image 76
Contents HP XC System Software Users Guide Page Table of Contents Developing Applications Configuring Your Environment with ModulefilesSubmitting Jobs Tuning Applications Using SlurmUsing LSF Debugging ApplicationsExamples Advanced TopicsGlossary 109 Index 115 List of Figures Page List of Tables Determining the Node PlatformPage List of Examples Submitting a Job ScriptPage About This Document Intended AudienceDocument Organization This document is organized as followsHP XC Information For More Information $ man lsfcommandnameSupplementary Information Related Information Manpages$ man discover $ man 8 discover $ man -k keywordRelated Linux Web Sites Related MPI Web SitesRelated Compiler Web Sites Additional PublicationsTypographic Conventions HP Encourages Your CommentsEnvironment Variable User inputOverview of the User Environment System ArchitectureHP XC System Software Operating SystemNode Specialization Storage and I/OFile System SAN StorageLocal Storage File System LayoutSystem Interconnect Network Determining System Configuration InformationNetwork Address Translation NAT User Environment CommandsModules Application Development Environment Run-Time EnvironmentParallel Applications Serial ApplicationsStandard LSF Load Sharing Facility LSF-HPCHow LSF-HPC and Slurm Interact Mpirun command Components, Tools, Compilers, Libraries, and DebuggersUsing the System LVS Login RoutingUsing the Secure Shell to Log Logging In to the SystemGetting Information About Queues IntroductionGetting Information About Resources Performing Other Common User Tasks Getting System Help and Information $ man sinfoConfiguring Your Environment with Modulefiles Overview of ModulesSupplied Modulefiles Modulefiles Automatically Loaded on the System Viewing Available ModulefilesViewing Loaded Modulefiles Loading a ModulefileAutomatically Loading a Modulefile at Login Unloading a ModulefileModulefile Conflicts Loading a Modulefile for the Current SessionCreating a Modulefile Viewing Modulefile-Specific Help$ module load modules $ man modulefile $ module help totalviewPage Application Development Environment Overview Developing ApplicationsCompilers Examining Nodes and Partitions Before Running Jobs Interrupting a JobMPI Compiler Partition Avail Timelimit Nodes State NodelistSetting Debugging Options Developing Serial ApplicationsSerial Application Build Environment Building Serial ApplicationsDeveloping Parallel Applications Parallel Application Build EnvironmentModulefiles OpenMPPthreads Quadrics ShmemMPI Library Intel Fortran and C/C++CompilersBuilding Parallel Applications Designing Libraries for the CP4000 Platform Developing LibrariesExamples of Compiling and Linking HP-MPI Applications Linkcommand 64-bit -L/opt/mypackage/lib/x8664 -lmystuff Linkcommand 32-bit -L/opt/mypackage/lib/i686 -lmystuffTo build a 64-bit application, you might enter Overview of Job Submission Submitting JobsExtSLURMslurm-arguments Submitting a Serial Job Using Standard LSF Submitting a Serial Job Using LSF-HPCSubmitting a Serial Job with the LSF bsub Command $ bsub hostnameSubmitting a Serial Job Through Slurm only Submitting a Non-MPI Parallel Job $ bsub -n4 -I srun hostnameBsub -nnum-procsbsub-optionsmpijob Mpirun mpirun--options-srunsrun-optionsmpi-jobname$ bsub -n4 -I mpirun -srun ./helloworld Submitting a Batch Job or Job ScriptBsub -nnum-procs bsub-optionsscript-name $ cat myscript.sh #!/bin/sh Srun hostname mpirun -srun hellompi$ bsub -I -n4 Myscript.sh $ bsub -n4 -ext SLURMnodes=4 -I ./myscript.sh$ bsub -n4 -I ./myscript.sh Running Preexecution Programs$ cat ./envscript.sh #!/bin/sh name=`hostname` Opt/hptc/bin/srun Mypreexec Debugging Applications Debugging Serial ApplicationsDebugging Parallel Applications TotalViewSetting Up TotalView Using TotalView with SlurmSSH and TotalView Module load mpimodule load totalviewSetting TotalView Preferences Using TotalView with LSF-HPCDebugging an Application Sourcefile initfdte.f was not found, using assembler mode Debugging Running ApplicationsDirectories in File ⇒ Search Path $ mpirun -srun -n2 PsimpleExiting TotalView $ scancel --user username$ squeue Page Using the Intel Trace Collector and Intel Trace Analyzer Tuning ApplicationsBuilding a Program Intel Trace Collector and HP-MPI Running a Program Intel Trace Collector and HP-MPI Visualizing Data Intel Trace Analyzer and HP-MPILibs CldflagsUsing the Intel Trace Collector and Intel Trace Analyzer Page Using Slurm Launching Jobs with the srun CommandSrun Squeue Scancel Sinfo Scontrol Introduction to SlurmMonitoring Jobs with the squeue Command Using the srun Command with HP-MPIUsing the srun Command with LSF-HPC Srun Roles and ModesGetting System Information with the sinfo Command Terminating Jobs with the scancel CommandJob Accounting Security Fault Tolerance# chmod a+r /hptccluster/slurm/job/jobacct.log Using Standard LSF on an HP XC System Using LSFUsing LSF-HPC Introduction to LSF-HPC in the HP XC Environment Overview of LSF-HPCDifferences Between LSF-HPC and Standard LSF HostnameResources Hostname Status JL/U MAX Njobs RUN Ssusp Ususp RSV$ ssh n15 lshosts Job TerminologyUnknown Unknown HP XCCompute Node Resource Support SLURMnodelist =nodelist if specified$ bsub -n 10 -I srun hostname $ bsub -n 10 -ext SLURMnodes=10 -I srun hostname$ bsub -n 10 -ext SLURMnodes=10exclude=n16 -I srun hostname $ bsub -n 10 -ext SLURMconstraint=dualcore -I srun hostname How LSF-HPC and Slurm Launch and Manage a Job $ bsub -n4 -ext SLURMnodes=4 -o output.out ./myscriptJob Startup and Job Control #!/bin/sh hostname srun hostname Mpirun -srun ./hellompiDetermining the LSF Execution Host Determining Available LSF-HPC System ResourcesGetting the Status of LSF-HPC Getting Information About LSF Execution Host NodeGetting Host Load Information Examining LSF-HPC System QueuesGetting Information About the lsf Partition SLINUX6$ sinfo -p lsf Summary of the LSF bsub Command Format$ sinfo -p lsf -lNe LSF-SLURM External Scheduler For information about running scriptsBsub -n num-procs-ext SLURMslurm-arguments \ Bsub-options srun srun-optionsjobname job-optionsSubmitting a Job from a Non-HP XC Host Starting on lsfhost.localdomain n6Waiting for dispatch ... Starting on lsfhost.localdomain n1 Type=SLINUX64Getting Information About Jobs Getting Job Allocation InformationSlurmid=slurmjobidncpus=slurmnprocsslurmalloc=nodelist $ bjobs -lExamining the Status of a Job $ bhist -lTime stamp $ bjobsViewing the Historical Information for a Job $ bhistSummary of time in seconds spent Various States Jobid User Jobname Pend Psusp RUN Ususp Ssusp Unkwn TotalTranslating Slurm and LSF-HPC JOBIDs Working Interactively Within an LSF-HPC Allocation $ bsub -I -n4 -ext SLURMnodes=4 /bin/bash$ bjobs -l 124 grep slurm $ srun --jobid=150 hostname$ unset Slurmjobid Alternatively, you can use the following$ export SLURMJOBID=150 $ export SLURMNPROCS=4 $ unset Slurmjobid $ unset SlurmnprocsJob 125 is submitted to the default queue normal LSF-HPC Equivalents of Slurm srun Options$ srun --jobid=250 uptime $ bsub -n4 -ext SLURMnodes=4 -o %J.out sleepBsub -iinputfile Mpi=mpitype Quit-on-interrupt Page Advanced Topics Enabling Remote Execution with OpenSSHRunning an X Terminal Session from a Remote Node Determining IP Address of Your Local MachineRunning an X terminal Session Using LSF-HPC Running an X terminal Session Using SlurmLogging in to HP XC System Using the GNU Parallel Make Capability $ bsub -n4 -Ip srun -n1 xterm -display$ srun -n4 hostname n46 $ srun -n2 hostname n46$ cd subdir srun -n1 -N1 $MAKE -j4 Example Procedure $ make PREFIX=’srun -n1 -N1 MAKEJ=-j4Local Disks on Compute Nodes Performance ConsiderationsModified Makefile is invoked as follows $ make PREFIX=srun -n1 -N1 MAKEJ=-j4Communication Between Nodes Shared File ViewPrivate File View Fp = fopen myfile, a+Page Appendix a Examples Building and Running a Serial ApplicationLaunching a Serial Interactive Shell Through LSF-HPC Examine the LSF execution host informationRunning LSF-HPC Jobs with a Slurm Allocation Request Launching a Parallel Interactive Shell Through LSF-HPC Example 2. Four cores on Two Specific NodesR15s r1m r15m It tmp swp mem LoadSched LoadStop SLURMnodes=2$ hostname n16 $ srun hostname n5 $ bjobs Examine the the running jobs information124 Lsfad Submitting a Simple Job Script with LSF-HPC Examine the the finished jobs informationShow the environment Display the scriptRun some commands from the pseudo-terminal Submitting an Interactive Job with LSF-HPCSubmit the job Show the job allocationSubmitting an HP-MPI Job with LSF-HPC Exit the pseudo-terminalView the interactive jobs View the finished jobsView the finished job View the running jobLsfhost.localdomai States by date and time Using a Resource Requirements String in an LSF-HPC Command$ bsub -n 8 -R ALPHA5 SLINUX64 \ -ext SLURMnodes=4-4 myjob 108 Glossary To the queue First-come See Fcfs First-served Global storageAs local storage Are not appropriate for replicationSingle command Login requests and directs them to a node with a login roleLinux Virtual See LVS Server Load file LSF master hostNetwork See NIS Information Services Remotely. PXE booting is configured at the Bios levelNotably to install and remove software packages Slurm backupSsh Power available per unit of spaceSymmetric See SMP Multiprocessing 114 Index Index PGI Utilities, 63 Slurm commands