HP XC System 3.x Software manual Determining the LSF Execution Host, Getting the Status of LSF-HPC

Page 75

Preemption

LSF-HPC uses the SLURM "node share" feature to facilitate preemption. When a low-priority is job preempted, job processes are suspended on allocated nodes, and LSF-HPC places the high-priority job on the same node. After the high-priority job completes, LSF-HPC resumes suspended low-priority jobs.

Determining the LSF Execution Host

The lsid command displays the name of the HP XC system, and the name of the LSF execution host, along with some general LSF-HPC information.

$ lsid

Platform LSF HPC 6.1 for SLURM, date and time stamp

Copyright 1992-2005 Platform Computing Corporation

My cluster name is hptclsf

My master name is lsfhost.localdomain

In this example, hptclsf is the LSF cluster name (where is user is logged in and which contains the compute nodes), and lsfhost.localdomain is the virtual IP name of the node where LSF-HPC is installed and runs (LSF execution host).

Determining Available LSF-HPC System Resources

For best use of system resources when launching an application, it is useful to know beforehand what system resources are available for your use. This section describes how to obtain information about system resources such as the number of cores available, LSF execution host node information, and LSF-HPC system queues.

Getting the Status of LSF-HPC

The bhosts command displays LSF-HPC resource usage information. This command is useful to examine the status of the system cores. The bhosts command provides a summary of the jobs on the system and information about the current state of LSF-HPC. For example, it can be used to determine if LSF-HPC is ready to start accepting batch jobs.

LSF-HPC daemons run on only one node in the HP XC system, so the bhosts command will list one host, which represents all the resources of the HP XC system. The total number of cores for that host should be equal to the total number of cores assigned to the SLURM lsf partition.

By default, this command returns the host name, host status, and job state statistics.

The following example shows the output from the bhosts command:

$ bhosts

 

 

 

 

 

 

 

 

HOST_NAME

STATUS JL/U

MAX

NJOBS RUN SSUSP USUSP RSV

lsfhost.localdomain

ok

-

16

0

0

0

0

0

Of note in the bhosts output:

The HOST_NAME column displays the name of the LSF execution host.

The MAX column displays the total core count (usable cores) of all available computer nodes in the lsf partition.

The STATUS column shows the state of LSF-HPC and displays a status of either ok or closed.

The NJOBS column displays the number of jobs. Note that in LSF terminology, a parallel job with 10 tasks counts as 10 jobs.

Getting Information About LSF Execution Host Node

The lshosts command displays resource information about the LSF-HPC cluster. This command is useful for verifying machine-specific information.

LSF-HPC daemons run on only one node in the HP XC system, so the lshosts command will list one host

which represents all the resources assigned to it by the HP XC system. The total number of cores for that host should be equal to the total number of cores assigned to the SLURM lsf partition.

By default, lshosts returns the following information: host name, host type, host model, core factor, number of cores, total memory, total swap space, server information, and static resources.

Using LSF-HPC 75

Image 75
Contents HP XC System Software Users Guide Page Table of Contents Configuring Your Environment with Modulefiles Developing ApplicationsSubmitting Jobs Debugging Applications Tuning ApplicationsUsing Slurm Using LSFAdvanced Topics ExamplesGlossary 109 Index 115 List of Figures Page Determining the Node Platform List of TablesPage Submitting a Job Script List of ExamplesPage This document is organized as follows About This DocumentIntended Audience Document OrganizationHP XC Information $ man lsfcommandname For More InformationSupplementary Information $ man -k keyword Related InformationManpages $ man discover $ man 8 discoverAdditional Publications Related Linux Web SitesRelated MPI Web Sites Related Compiler Web SitesUser input Typographic ConventionsHP Encourages Your Comments Environment VariableOperating System Overview of the User EnvironmentSystem Architecture HP XC System SoftwareStorage and I/O Node SpecializationFile System Layout File SystemSAN Storage Local StorageDetermining System Configuration Information System Interconnect NetworkNetwork Address Translation NAT Commands User EnvironmentModules Serial Applications Application Development EnvironmentRun-Time Environment Parallel ApplicationsLoad Sharing Facility LSF-HPC Standard LSFHow LSF-HPC and Slurm Interact Components, Tools, Compilers, Libraries, and Debuggers Mpirun commandLogging In to the System Using the SystemLVS Login Routing Using the Secure Shell to LogIntroduction Getting Information About QueuesGetting Information About Resources Performing Other Common User Tasks $ man sinfo Getting System Help and InformationOverview of Modules Configuring Your Environment with ModulefilesSupplied Modulefiles Loading a Modulefile Modulefiles Automatically Loaded on the SystemViewing Available Modulefiles Viewing Loaded ModulefilesLoading a Modulefile for the Current Session Automatically Loading a Modulefile at LoginUnloading a Modulefile Modulefile Conflicts$ module help totalview Creating a ModulefileViewing Modulefile-Specific Help $ module load modules $ man modulefilePage Developing Applications Application Development Environment OverviewCompilers Partition Avail Timelimit Nodes State Nodelist Examining Nodes and Partitions Before Running JobsInterrupting a Job MPI CompilerBuilding Serial Applications Setting Debugging OptionsDeveloping Serial Applications Serial Application Build EnvironmentOpenMP Developing Parallel ApplicationsParallel Application Build Environment ModulefilesIntel Fortran and C/C++Compilers PthreadsQuadrics Shmem MPI LibraryBuilding Parallel Applications Developing Libraries Designing Libraries for the CP4000 PlatformExamples of Compiling and Linking HP-MPI Applications Linkcommand 32-bit -L/opt/mypackage/lib/i686 -lmystuff Linkcommand 64-bit -L/opt/mypackage/lib/x8664 -lmystuffTo build a 64-bit application, you might enter Submitting Jobs Overview of Job SubmissionExtSLURMslurm-arguments $ bsub hostname Submitting a Serial Job Using Standard LSFSubmitting a Serial Job Using LSF-HPC Submitting a Serial Job with the LSF bsub CommandSubmitting a Serial Job Through Slurm only Mpirun mpirun--options-srunsrun-optionsmpi-jobname Submitting a Non-MPI Parallel Job$ bsub -n4 -I srun hostname Bsub -nnum-procsbsub-optionsmpijobSubmitting a Batch Job or Job Script $ bsub -n4 -I mpirun -srun ./helloworldBsub -nnum-procs bsub-optionsscript-name $ bsub -n4 -ext SLURMnodes=4 -I ./myscript.sh $ cat myscript.sh #!/bin/shSrun hostname mpirun -srun hellompi $ bsub -I -n4 Myscript.shRunning Preexecution Programs $ bsub -n4 -I ./myscript.sh$ cat ./envscript.sh #!/bin/sh name=`hostname` Opt/hptc/bin/srun Mypreexec TotalView Debugging ApplicationsDebugging Serial Applications Debugging Parallel ApplicationsModule load mpimodule load totalview Setting Up TotalViewUsing TotalView with Slurm SSH and TotalViewUsing TotalView with LSF-HPC Setting TotalView PreferencesDebugging an Application $ mpirun -srun -n2 Psimple Sourcefile initfdte.f was not found, using assembler modeDebugging Running Applications Directories in File ⇒ Search Path$ scancel --user username Exiting TotalView$ squeue Page Tuning Applications Using the Intel Trace Collector and Intel Trace AnalyzerBuilding a Program Intel Trace Collector and HP-MPI Cldflags Running a Program Intel Trace Collector and HP-MPIVisualizing Data Intel Trace Analyzer and HP-MPI LibsUsing the Intel Trace Collector and Intel Trace Analyzer Page Introduction to Slurm Using SlurmLaunching Jobs with the srun Command Srun Squeue Scancel Sinfo ScontrolSrun Roles and Modes Monitoring Jobs with the squeue CommandUsing the srun Command with HP-MPI Using the srun Command with LSF-HPCTerminating Jobs with the scancel Command Getting System Information with the sinfo CommandJob Accounting Fault Tolerance Security# chmod a+r /hptccluster/slurm/job/jobacct.log Using LSF Using Standard LSF on an HP XC SystemUsing LSF-HPC Overview of LSF-HPC Introduction to LSF-HPC in the HP XC EnvironmentHostname Status JL/U MAX Njobs RUN Ssusp Ususp RSV Differences Between LSF-HPC and Standard LSFHostname ResourcesJob Terminology $ ssh n15 lshostsUnknown Unknown SLURMnodelist =nodelist if specified HP XCCompute Node Resource Support $ bsub -n 10 -ext SLURMconstraint=dualcore -I srun hostname $ bsub -n 10 -I srun hostname $ bsub -n 10 -ext SLURMnodes=10 -I srun hostname $ bsub -n 10 -ext SLURMnodes=10exclude=n16 -I srun hostname$ bsub -n4 -ext SLURMnodes=4 -o output.out ./myscript How LSF-HPC and Slurm Launch and Manage a Job#!/bin/sh hostname srun hostname Mpirun -srun ./hellompi Job Startup and Job ControlGetting Information About LSF Execution Host Node Determining the LSF Execution HostDetermining Available LSF-HPC System Resources Getting the Status of LSF-HPCSLINUX6 Getting Host Load InformationExamining LSF-HPC System Queues Getting Information About the lsf PartitionSummary of the LSF bsub Command Format $ sinfo -p lsf$ sinfo -p lsf -lNe Bsub-options srun srun-optionsjobname job-options LSF-SLURM External SchedulerFor information about running scripts Bsub -n num-procs-ext SLURMslurm-arguments \Type=SLINUX64 Submitting a Job from a Non-HP XC HostStarting on lsfhost.localdomain n6 Waiting for dispatch ... Starting on lsfhost.localdomain n1$ bjobs -l Getting Information About JobsGetting Job Allocation Information Slurmid=slurmjobidncpus=slurmnprocsslurmalloc=nodelist$ bjobs Examining the Status of a Job$ bhist -l Time stampJobid User Jobname Pend Psusp RUN Ususp Ssusp Unkwn Total Viewing the Historical Information for a Job$ bhist Summary of time in seconds spent Various StatesTranslating Slurm and LSF-HPC JOBIDs $ srun --jobid=150 hostname Working Interactively Within an LSF-HPC Allocation$ bsub -I -n4 -ext SLURMnodes=4 /bin/bash $ bjobs -l 124 grep slurm$ unset Slurmjobid $ unset Slurmnprocs $ unset SlurmjobidAlternatively, you can use the following $ export SLURMJOBID=150 $ export SLURMNPROCS=4$ bsub -n4 -ext SLURMnodes=4 -o %J.out sleep Job 125 is submitted to the default queue normalLSF-HPC Equivalents of Slurm srun Options $ srun --jobid=250 uptimeBsub -iinputfile Mpi=mpitype Quit-on-interrupt Page Determining IP Address of Your Local Machine Advanced TopicsEnabling Remote Execution with OpenSSH Running an X Terminal Session from a Remote NodeRunning an X terminal Session Using Slurm Running an X terminal Session Using LSF-HPCLogging in to HP XC System $ srun -n2 hostname n46 Using the GNU Parallel Make Capability$ bsub -n4 -Ip srun -n1 xterm -display $ srun -n4 hostname n46$ cd subdir srun -n1 -N1 $MAKE -j4 $ make PREFIX=’srun -n1 -N1 MAKEJ=-j4 Example Procedure$ make PREFIX=srun -n1 -N1 MAKEJ=-j4 Local Disks on Compute NodesPerformance Considerations Modified Makefile is invoked as followsFp = fopen myfile, a+ Communication Between NodesShared File View Private File ViewPage Examine the LSF execution host information Appendix a ExamplesBuilding and Running a Serial Application Launching a Serial Interactive Shell Through LSF-HPCRunning LSF-HPC Jobs with a Slurm Allocation Request SLURMnodes=2 Launching a Parallel Interactive Shell Through LSF-HPCExample 2. Four cores on Two Specific Nodes R15s r1m r15m It tmp swp mem LoadSched LoadStopExamine the the running jobs information $ hostname n16 $ srun hostname n5 $ bjobs124 Lsfad Display the script Submitting a Simple Job Script with LSF-HPCExamine the the finished jobs information Show the environmentShow the job allocation Run some commands from the pseudo-terminalSubmitting an Interactive Job with LSF-HPC Submit the jobView the finished jobs Submitting an HP-MPI Job with LSF-HPCExit the pseudo-terminal View the interactive jobsView the running job View the finished jobLsfhost.localdomai Using a Resource Requirements String in an LSF-HPC Command States by date and time$ bsub -n 8 -R ALPHA5 SLINUX64 \ -ext SLURMnodes=4-4 myjob 108 Glossary Are not appropriate for replication To the queueFirst-come See Fcfs First-served Global storage As local storageLSF master host Single commandLogin requests and directs them to a node with a login role Linux Virtual See LVS Server Load fileSlurm backup Network See NIS Information ServicesRemotely. PXE booting is configured at the Bios level Notably to install and remove software packagesPower available per unit of space SshSymmetric See SMP Multiprocessing 114 Index Index PGI Utilities, 63 Slurm commands