HP XC System 3.x Software manual Using Standard LSF on an HP XC System, Using LSF-HPC

Page 67

9 Using LSF

The Load Sharing Facility (LSF) from Platform Computing Corporation is a batch system resource manager used on the HP XC system. LSF is an integral part of the HP XC environment.

On an HP XC system, a job is submitted to LSF, which places the job in a queue and allows it to run when the necessary resources become available. In addition to launching jobs, LSF provides extensive job management and information capabilities. LSF schedules, launches, controls, and tracks jobs that are submitted to it according to the policies established by the HP XC site administrator.

Two types of LSF are available for installation on the HP XC:

Standard LSF

This product is the popular batch system produced by Platform Computing Corporation that has become an industry standard.

For full information about LSF, refer to the standard LSF documentation set, which is described in the "Related Information" section of this manual. LSF manpages are also available online on the HP XC

system.

LSF-HPC integrated with SLURM (LSF-HPC)

This product is the High Performance LSF product from Platform Computing Corporation and has been integrated with SLURM to take advantage of SLURM's scalable, efficient resource management and parallel job support.

The intended primary use of the HP XC system determined which of these LSF products was installed.

Using LSF-HPC (page 67) LSF-HPC describes the extended parallel support capabilities available to the user when LSF-HPC is installed on the HP XC system.

Either Standard LSF or LSF-HPC are installed on the HP XC. The system administrator installs one or the other during system installation.

This chapter addresses the following topics:

Using Standard LSF on an HP XC System (page 67)

Using LSF-HPC (page 67)

Using Standard LSF on an HP XC System

The information for using Standard LSF is provided in the LSF documentation. This documentation is on the HP XC installation disk and manpages are online.

Standard LSF is installed and configured on all nodes of the HP XC system by default. Nodes without the compute role are closed with '0' job slots available for use.

The LSF environment is set up automatically for the user on login; LSF commands and their manpages are readily accessible:

The bhosts command is useful for viewing LSF batch host information.

The lshosts command provides static resource information.

The lsload command provides dynamic resource information.

The bsub command is used to submit jobs to LSF.

The bjobs command provides information on batch jobs.

For more information on using Standard LSF on the HP XC system, see the Platform LSF documentation available on the HP XC documentation disk.

Using LSF-HPC

On an HP XC system, a job is submitted to LSF-HPC, which places the job in a queue and allows it to run when the necessary resources become available. In addition to launching jobs, LSF-HPC provides extensive

Using Standard LSF on an HP XC System 67

Image 67
Contents HP XC System Software Users Guide Page Table of Contents Developing Applications Configuring Your Environment with ModulefilesSubmitting Jobs Debugging Applications Tuning ApplicationsUsing Slurm Using LSFExamples Advanced TopicsGlossary 109 Index 115 List of Figures Page Determining the Node Platform List of TablesPage Submitting a Job Script List of ExamplesPage This document is organized as follows About This DocumentIntended Audience Document OrganizationHP XC Information For More Information $ man lsfcommandnameSupplementary Information $ man -k keyword Related InformationManpages $ man discover $ man 8 discoverAdditional Publications Related Linux Web SitesRelated MPI Web Sites Related Compiler Web SitesUser input Typographic ConventionsHP Encourages Your Comments Environment VariableOperating System Overview of the User EnvironmentSystem Architecture HP XC System SoftwareStorage and I/O Node SpecializationFile System Layout File SystemSAN Storage Local StorageSystem Interconnect Network Determining System Configuration InformationNetwork Address Translation NAT User Environment CommandsModules Serial Applications Application Development EnvironmentRun-Time Environment Parallel ApplicationsStandard LSF Load Sharing Facility LSF-HPCHow LSF-HPC and Slurm Interact Components, Tools, Compilers, Libraries, and Debuggers Mpirun commandLogging In to the System Using the SystemLVS Login Routing Using the Secure Shell to LogGetting Information About Queues IntroductionGetting Information About Resources Performing Other Common User Tasks $ man sinfo Getting System Help and InformationOverview of Modules Configuring Your Environment with ModulefilesSupplied Modulefiles Loading a Modulefile Modulefiles Automatically Loaded on the SystemViewing Available Modulefiles Viewing Loaded ModulefilesLoading a Modulefile for the Current Session Automatically Loading a Modulefile at LoginUnloading a Modulefile Modulefile Conflicts$ module help totalview Creating a ModulefileViewing Modulefile-Specific Help $ module load modules $ man modulefilePage Application Development Environment Overview Developing ApplicationsCompilers Partition Avail Timelimit Nodes State Nodelist Examining Nodes and Partitions Before Running JobsInterrupting a Job MPI CompilerBuilding Serial Applications Setting Debugging OptionsDeveloping Serial Applications Serial Application Build EnvironmentOpenMP Developing Parallel ApplicationsParallel Application Build Environment ModulefilesIntel Fortran and C/C++Compilers PthreadsQuadrics Shmem MPI LibraryBuilding Parallel Applications Designing Libraries for the CP4000 Platform Developing LibrariesExamples of Compiling and Linking HP-MPI Applications Linkcommand 64-bit -L/opt/mypackage/lib/x8664 -lmystuff Linkcommand 32-bit -L/opt/mypackage/lib/i686 -lmystuffTo build a 64-bit application, you might enter Overview of Job Submission Submitting JobsExtSLURMslurm-arguments $ bsub hostname Submitting a Serial Job Using Standard LSFSubmitting a Serial Job Using LSF-HPC Submitting a Serial Job with the LSF bsub CommandSubmitting a Serial Job Through Slurm only Mpirun mpirun--options-srunsrun-optionsmpi-jobname Submitting a Non-MPI Parallel Job$ bsub -n4 -I srun hostname Bsub -nnum-procsbsub-optionsmpijob$ bsub -n4 -I mpirun -srun ./helloworld Submitting a Batch Job or Job ScriptBsub -nnum-procs bsub-optionsscript-name $ bsub -n4 -ext SLURMnodes=4 -I ./myscript.sh $ cat myscript.sh #!/bin/shSrun hostname mpirun -srun hellompi $ bsub -I -n4 Myscript.sh$ bsub -n4 -I ./myscript.sh Running Preexecution Programs$ cat ./envscript.sh #!/bin/sh name=`hostname` Opt/hptc/bin/srun Mypreexec TotalView Debugging ApplicationsDebugging Serial Applications Debugging Parallel ApplicationsModule load mpimodule load totalview Setting Up TotalViewUsing TotalView with Slurm SSH and TotalViewSetting TotalView Preferences Using TotalView with LSF-HPCDebugging an Application $ mpirun -srun -n2 Psimple Sourcefile initfdte.f was not found, using assembler modeDebugging Running Applications Directories in File ⇒ Search PathExiting TotalView $ scancel --user username$ squeue Page Using the Intel Trace Collector and Intel Trace Analyzer Tuning ApplicationsBuilding a Program Intel Trace Collector and HP-MPI Cldflags Running a Program Intel Trace Collector and HP-MPIVisualizing Data Intel Trace Analyzer and HP-MPI LibsUsing the Intel Trace Collector and Intel Trace Analyzer Page Introduction to Slurm Using SlurmLaunching Jobs with the srun Command Srun Squeue Scancel Sinfo Scontrol Srun Roles and Modes Monitoring Jobs with the squeue Command Using the srun Command with HP-MPI Using the srun Command with LSF-HPCGetting System Information with the sinfo Command Terminating Jobs with the scancel CommandJob Accounting Security Fault Tolerance# chmod a+r /hptccluster/slurm/job/jobacct.log Using Standard LSF on an HP XC System Using LSFUsing LSF-HPC Overview of LSF-HPC Introduction to LSF-HPC in the HP XC EnvironmentHostname Status JL/U MAX Njobs RUN Ssusp Ususp RSV Differences Between LSF-HPC and Standard LSFHostname Resources$ ssh n15 lshosts Job TerminologyUnknown Unknown SLURMnodelist =nodelist if specified HP XCCompute Node Resource Support$ bsub -n 10 -ext SLURMconstraint=dualcore -I srun hostname $ bsub -n 10 -I srun hostname$ bsub -n 10 -ext SLURMnodes=10 -I srun hostname $ bsub -n 10 -ext SLURMnodes=10exclude=n16 -I srun hostname$ bsub -n4 -ext SLURMnodes=4 -o output.out ./myscript How LSF-HPC and Slurm Launch and Manage a Job#!/bin/sh hostname srun hostname Mpirun -srun ./hellompi Job Startup and Job ControlGetting Information About LSF Execution Host Node Determining the LSF Execution HostDetermining Available LSF-HPC System Resources Getting the Status of LSF-HPCSLINUX6 Getting Host Load InformationExamining LSF-HPC System Queues Getting Information About the lsf Partition$ sinfo -p lsf Summary of the LSF bsub Command Format$ sinfo -p lsf -lNe Bsub-options srun srun-optionsjobname job-options LSF-SLURM External SchedulerFor information about running scripts Bsub -n num-procs-ext SLURMslurm-arguments \Type=SLINUX64 Submitting a Job from a Non-HP XC HostStarting on lsfhost.localdomain n6 Waiting for dispatch ... Starting on lsfhost.localdomain n1$ bjobs -l Getting Information About JobsGetting Job Allocation Information Slurmid=slurmjobidncpus=slurmnprocsslurmalloc=nodelist$ bjobs Examining the Status of a Job$ bhist -l Time stampJobid User Jobname Pend Psusp RUN Ususp Ssusp Unkwn Total Viewing the Historical Information for a Job$ bhist Summary of time in seconds spent Various StatesTranslating Slurm and LSF-HPC JOBIDs $ srun --jobid=150 hostname Working Interactively Within an LSF-HPC Allocation$ bsub -I -n4 -ext SLURMnodes=4 /bin/bash $ bjobs -l 124 grep slurm$ unset Slurmjobid $ unset Slurmnprocs $ unset SlurmjobidAlternatively, you can use the following $ export SLURMJOBID=150 $ export SLURMNPROCS=4$ bsub -n4 -ext SLURMnodes=4 -o %J.out sleep Job 125 is submitted to the default queue normalLSF-HPC Equivalents of Slurm srun Options $ srun --jobid=250 uptimeBsub -iinputfile Mpi=mpitype Quit-on-interrupt Page Determining IP Address of Your Local Machine Advanced TopicsEnabling Remote Execution with OpenSSH Running an X Terminal Session from a Remote NodeRunning an X terminal Session Using LSF-HPC Running an X terminal Session Using SlurmLogging in to HP XC System $ srun -n2 hostname n46 Using the GNU Parallel Make Capability$ bsub -n4 -Ip srun -n1 xterm -display $ srun -n4 hostname n46$ cd subdir srun -n1 -N1 $MAKE -j4 $ make PREFIX=’srun -n1 -N1 MAKEJ=-j4 Example Procedure$ make PREFIX=srun -n1 -N1 MAKEJ=-j4 Local Disks on Compute NodesPerformance Considerations Modified Makefile is invoked as followsFp = fopen myfile, a+ Communication Between NodesShared File View Private File ViewPage Examine the LSF execution host information Appendix a ExamplesBuilding and Running a Serial Application Launching a Serial Interactive Shell Through LSF-HPCRunning LSF-HPC Jobs with a Slurm Allocation Request SLURMnodes=2 Launching a Parallel Interactive Shell Through LSF-HPCExample 2. Four cores on Two Specific Nodes R15s r1m r15m It tmp swp mem LoadSched LoadStop$ hostname n16 $ srun hostname n5 $ bjobs Examine the the running jobs information124 Lsfad Display the script Submitting a Simple Job Script with LSF-HPCExamine the the finished jobs information Show the environmentShow the job allocation Run some commands from the pseudo-terminalSubmitting an Interactive Job with LSF-HPC Submit the jobView the finished jobs Submitting an HP-MPI Job with LSF-HPCExit the pseudo-terminal View the interactive jobsView the finished job View the running jobLsfhost.localdomai States by date and time Using a Resource Requirements String in an LSF-HPC Command$ bsub -n 8 -R ALPHA5 SLINUX64 \ -ext SLURMnodes=4-4 myjob 108 Glossary Are not appropriate for replication To the queueFirst-come See Fcfs First-served Global storage As local storageLSF master host Single commandLogin requests and directs them to a node with a login role Linux Virtual See LVS Server Load fileSlurm backup Network See NIS Information ServicesRemotely. PXE booting is configured at the Bios level Notably to install and remove software packagesSsh Power available per unit of spaceSymmetric See SMP Multiprocessing 114 Index Index PGI Utilities, 63 Slurm commands