HP XC System 3.x Software LSF-SLURM External Scheduler, For information about running scripts

Page 78

LSF-HPC node allocation (compute nodes). LSF-HPC node allocation is created by -nnum-procsparameter, which specifies the number of cores the job requests. The num-procsparameter may be expressed as minprocs[,maxprocs] where minprocs specifies the minimum number of cores and the optional value maxprocs specifies the maximum number of cores. Refer to "Submitting a Non-MPI Parallel Job" for information

about running jobs. Refer to "Submitting a Batch Job or Job Script"

for information about running scripts.

bsub -nnum-procs [bsub-options] srun [srun-options]jobname

[job-arguments]

This is the bsub command format to submit a parallel job to an LSF-HPC node allocation (compute nodes). An LSF-HPC node allocation is created by the -nnum-procsparameter, which specifies the minimum number of cores the job requests. The num-procsparameter may be expressed as minprocs[,maxprocs] where minprocs specifies the minimum number of cores and the optional value maxprocs specifies the maximum number of cores. An srun command is required to run jobs on an LSF-HPC node allocation. Refer to

"Submitting a Non-MPI Parallel Job" .

bsub-nnum-procs [bsub-options] mpirun [mpirun-options] \ -srun[srun-options]mpi-jobname [job-options]

This is the bsub command format to submit an HP-MPI job. The -srunoption is required. Refer to "Submitting a Parallel Job That Uses the HP-MPI Message Passing Interface" .

bsub -n num-procs-ext "SLURM[slurm-arguments]" \

[bsub-options] [srun [srun-options]]jobname [job-options]

This is the bsub command format to submit a parallel job to an LSF-HPC node allocation (compute nodes) using the external scheduler option. The external scheduler option provides additional capabilities at the job level and queue level by allowing the inclusion of several SLURM options in the LSF-HPC command line. Refer to "LSF-SLURM External Scheduler" .

LSF-SLURM External Scheduler

An important option that can be included in submitting parallel jobs with LSF-HPC is the external scheduler option: The external scheduler option provides application-specific external scheduling options for jobs capabilities and enables inclusion of several SLURM options in the LSF command line. For example, this option could be used to submit a job to run one task per node when you have a resource-intensive job which needs to have sole access to the full resources of a node. If your job needs particular resources found only on a specific set of nodes, this option could be used to submit a job to those specific nodes. There are several options available for use with the external scheduler. Refer to the list in this section.

The format for the external scheduler is:

-ext"SLURM[slurm-arguments]"

slurm-argumentscan consist of one or more of the following srun options, separated by semicolons:

SLURM Arguments

Function

nodes=min[-max]

Minimum and maximum number of nodes allocated to job. The job allocation will

 

at least contain the minimum number of nodes.

mincpus=<ncpus>

Specify minimum number of cores per node. Default value is 1.

mem=<value>

Specify a minimum amount of real memory of each node.

tmp=<value>

Specify a minimum amount of temporary disk space of each node.

constraint=<value>

Specify a list of constraints. The list may include multiple features separated by “&

 

or “”. “&” represents AND-ed,” represents OR-ed.

nodelist=<list of nodes>

Request a specific list of nodes. The job will at least contain these nodes. The list

 

may be specified as a comma-separated list of nodes, or a range of nodes.

exclude=<list of nodes>

Requests that a specific list of hosts not be included in resource allocated to this

 

job. The list may be specified as a comma-separated list of nodes, or a range of

 

nodes.

contiguous=yes

Request a mandatory contiguous range of nodes.

When this option is added to an LSF command line, it looks like the following:

bsub -n num-procs-ext "SLURM[slurm-arguments]" [bsub-options] [srun

[srun-options]]jobname [job-options]

78 Using LSF

Image 78
Contents HP XC System Software Users Guide Page Table of Contents Configuring Your Environment with Modulefiles Developing ApplicationsSubmitting Jobs Using LSF Tuning ApplicationsUsing Slurm Debugging ApplicationsAdvanced Topics ExamplesGlossary 109 Index 115 List of Figures Page List of Tables Determining the Node PlatformPage List of Examples Submitting a Job ScriptPage Document Organization About This DocumentIntended Audience This document is organized as followsHP XC Information $ man lsfcommandname For More InformationSupplementary Information $ man discover $ man 8 discover Related InformationManpages $ man -k keywordRelated Compiler Web Sites Related Linux Web SitesRelated MPI Web Sites Additional PublicationsEnvironment Variable Typographic ConventionsHP Encourages Your Comments User inputHP XC System Software Overview of the User EnvironmentSystem Architecture Operating SystemNode Specialization Storage and I/OLocal Storage File SystemSAN Storage File System LayoutDetermining System Configuration Information System Interconnect NetworkNetwork Address Translation NAT Commands User EnvironmentModules Parallel Applications Application Development EnvironmentRun-Time Environment Serial ApplicationsLoad Sharing Facility LSF-HPC Standard LSFHow LSF-HPC and Slurm Interact Mpirun command Components, Tools, Compilers, Libraries, and DebuggersUsing the Secure Shell to Log Using the SystemLVS Login Routing Logging In to the SystemIntroduction Getting Information About QueuesGetting Information About Resources Performing Other Common User Tasks Getting System Help and Information $ man sinfoConfiguring Your Environment with Modulefiles Overview of ModulesSupplied Modulefiles Viewing Loaded Modulefiles Modulefiles Automatically Loaded on the SystemViewing Available Modulefiles Loading a ModulefileModulefile Conflicts Automatically Loading a Modulefile at LoginUnloading a Modulefile Loading a Modulefile for the Current Session$ module load modules $ man modulefile Creating a ModulefileViewing Modulefile-Specific Help $ module help totalviewPage Developing Applications Application Development Environment OverviewCompilers MPI Compiler Examining Nodes and Partitions Before Running JobsInterrupting a Job Partition Avail Timelimit Nodes State NodelistSerial Application Build Environment Setting Debugging OptionsDeveloping Serial Applications Building Serial ApplicationsModulefiles Developing Parallel ApplicationsParallel Application Build Environment OpenMPMPI Library PthreadsQuadrics Shmem Intel Fortran and C/C++CompilersBuilding Parallel Applications Developing Libraries Designing Libraries for the CP4000 PlatformExamples of Compiling and Linking HP-MPI Applications Linkcommand 32-bit -L/opt/mypackage/lib/i686 -lmystuff Linkcommand 64-bit -L/opt/mypackage/lib/x8664 -lmystuffTo build a 64-bit application, you might enter Submitting Jobs Overview of Job SubmissionExtSLURMslurm-arguments Submitting a Serial Job with the LSF bsub Command Submitting a Serial Job Using Standard LSFSubmitting a Serial Job Using LSF-HPC $ bsub hostnameSubmitting a Serial Job Through Slurm only Bsub -nnum-procsbsub-optionsmpijob Submitting a Non-MPI Parallel Job$ bsub -n4 -I srun hostname Mpirun mpirun--options-srunsrun-optionsmpi-jobnameSubmitting a Batch Job or Job Script $ bsub -n4 -I mpirun -srun ./helloworldBsub -nnum-procs bsub-optionsscript-name $ bsub -I -n4 Myscript.sh $ cat myscript.sh #!/bin/shSrun hostname mpirun -srun hellompi $ bsub -n4 -ext SLURMnodes=4 -I ./myscript.shRunning Preexecution Programs $ bsub -n4 -I ./myscript.sh$ cat ./envscript.sh #!/bin/sh name=`hostname` Opt/hptc/bin/srun Mypreexec Debugging Parallel Applications Debugging ApplicationsDebugging Serial Applications TotalViewSSH and TotalView Setting Up TotalViewUsing TotalView with Slurm Module load mpimodule load totalviewUsing TotalView with LSF-HPC Setting TotalView PreferencesDebugging an Application Directories in File ⇒ Search Path Sourcefile initfdte.f was not found, using assembler modeDebugging Running Applications $ mpirun -srun -n2 Psimple$ scancel --user username Exiting TotalView$ squeue Page Tuning Applications Using the Intel Trace Collector and Intel Trace AnalyzerBuilding a Program Intel Trace Collector and HP-MPI Libs Running a Program Intel Trace Collector and HP-MPIVisualizing Data Intel Trace Analyzer and HP-MPI CldflagsUsing the Intel Trace Collector and Intel Trace Analyzer Page Srun Squeue Scancel Sinfo Scontrol Using SlurmLaunching Jobs with the srun Command Introduction to SlurmUsing the srun Command with LSF-HPC Monitoring Jobs with the squeue CommandUsing the srun Command with HP-MPI Srun Roles and ModesTerminating Jobs with the scancel Command Getting System Information with the sinfo CommandJob Accounting Fault Tolerance Security# chmod a+r /hptccluster/slurm/job/jobacct.log Using LSF Using Standard LSF on an HP XC SystemUsing LSF-HPC Introduction to LSF-HPC in the HP XC Environment Overview of LSF-HPCResources Differences Between LSF-HPC and Standard LSFHostname Hostname Status JL/U MAX Njobs RUN Ssusp Ususp RSVJob Terminology $ ssh n15 lshostsUnknown Unknown HP XCCompute Node Resource Support SLURMnodelist =nodelist if specified$ bsub -n 10 -ext SLURMnodes=10exclude=n16 -I srun hostname $ bsub -n 10 -I srun hostname$ bsub -n 10 -ext SLURMnodes=10 -I srun hostname $ bsub -n 10 -ext SLURMconstraint=dualcore -I srun hostnameHow LSF-HPC and Slurm Launch and Manage a Job $ bsub -n4 -ext SLURMnodes=4 -o output.out ./myscriptJob Startup and Job Control #!/bin/sh hostname srun hostname Mpirun -srun ./hellompiGetting the Status of LSF-HPC Determining the LSF Execution HostDetermining Available LSF-HPC System Resources Getting Information About LSF Execution Host NodeGetting Information About the lsf Partition Getting Host Load InformationExamining LSF-HPC System Queues SLINUX6Summary of the LSF bsub Command Format $ sinfo -p lsf$ sinfo -p lsf -lNe Bsub -n num-procs-ext SLURMslurm-arguments \ LSF-SLURM External SchedulerFor information about running scripts Bsub-options srun srun-optionsjobname job-optionsWaiting for dispatch ... Starting on lsfhost.localdomain n1 Submitting a Job from a Non-HP XC HostStarting on lsfhost.localdomain n6 Type=SLINUX64Slurmid=slurmjobidncpus=slurmnprocsslurmalloc=nodelist Getting Information About JobsGetting Job Allocation Information $ bjobs -lTime stamp Examining the Status of a Job$ bhist -l $ bjobsSummary of time in seconds spent Various States Viewing the Historical Information for a Job$ bhist Jobid User Jobname Pend Psusp RUN Ususp Ssusp Unkwn TotalTranslating Slurm and LSF-HPC JOBIDs $ bjobs -l 124 grep slurm Working Interactively Within an LSF-HPC Allocation$ bsub -I -n4 -ext SLURMnodes=4 /bin/bash $ srun --jobid=150 hostname$ export SLURMJOBID=150 $ export SLURMNPROCS=4 $ unset SlurmjobidAlternatively, you can use the following $ unset Slurmjobid $ unset Slurmnprocs$ srun --jobid=250 uptime Job 125 is submitted to the default queue normalLSF-HPC Equivalents of Slurm srun Options $ bsub -n4 -ext SLURMnodes=4 -o %J.out sleepBsub -iinputfile Mpi=mpitype Quit-on-interrupt Page Running an X Terminal Session from a Remote Node Advanced TopicsEnabling Remote Execution with OpenSSH Determining IP Address of Your Local MachineRunning an X terminal Session Using Slurm Running an X terminal Session Using LSF-HPCLogging in to HP XC System $ srun -n4 hostname n46 Using the GNU Parallel Make Capability$ bsub -n4 -Ip srun -n1 xterm -display $ srun -n2 hostname n46$ cd subdir srun -n1 -N1 $MAKE -j4 Example Procedure $ make PREFIX=’srun -n1 -N1 MAKEJ=-j4Modified Makefile is invoked as follows Local Disks on Compute NodesPerformance Considerations $ make PREFIX=srun -n1 -N1 MAKEJ=-j4Private File View Communication Between NodesShared File View Fp = fopen myfile, a+Page Launching a Serial Interactive Shell Through LSF-HPC Appendix a ExamplesBuilding and Running a Serial Application Examine the LSF execution host informationRunning LSF-HPC Jobs with a Slurm Allocation Request R15s r1m r15m It tmp swp mem LoadSched LoadStop Launching a Parallel Interactive Shell Through LSF-HPCExample 2. Four cores on Two Specific Nodes SLURMnodes=2Examine the the running jobs information $ hostname n16 $ srun hostname n5 $ bjobs124 Lsfad Show the environment Submitting a Simple Job Script with LSF-HPCExamine the the finished jobs information Display the scriptSubmit the job Run some commands from the pseudo-terminalSubmitting an Interactive Job with LSF-HPC Show the job allocationView the interactive jobs Submitting an HP-MPI Job with LSF-HPCExit the pseudo-terminal View the finished jobsView the running job View the finished jobLsfhost.localdomai Using a Resource Requirements String in an LSF-HPC Command States by date and time$ bsub -n 8 -R ALPHA5 SLINUX64 \ -ext SLURMnodes=4-4 myjob 108 Glossary As local storage To the queueFirst-come See Fcfs First-served Global storage Are not appropriate for replicationLinux Virtual See LVS Server Load file Single commandLogin requests and directs them to a node with a login role LSF master hostNotably to install and remove software packages Network See NIS Information ServicesRemotely. PXE booting is configured at the Bios level Slurm backupPower available per unit of space SshSymmetric See SMP Multiprocessing 114 Index Index PGI Utilities, 63 Slurm commands