HP XC System 3.x Software manual $ bsub -n 10 -I srun hostname, $ bqueues -l dualcore grep Slurm

Page 72

exclude= list-of-nodes

contiguous=yes

The srun(1) manpage provides details on these options and their arguments.

The following are interactive examples showing how these options can be used on an HP XC system.

To launch the hostname command on 10 cores in parallel:

$ bsub -n 10 -I srun hostname

To launch the hostname command on 10 nodes in parallel:

$ bsub -n 10 -ext "SLURM[nodes=10]" -I srun hostname

To launch the hostname command on 10 nodes in parallel, but avoiding node n16:

$ bsub -n 10 -ext "SLURM[nodes=10;exclude=n16]" -I srun hostname

To launch the hostname command on 10 cores on nodes with a dualcore SLURM feature assigned to them:

$ bsub -n 10 -ext "SLURM[constraint=dualcore]" -I srun hostname

To launch the hostname command once on nodes n1 through n10 (n[1-10]):

$ bsub -n 10 -ext "SLURM[nodelist=n[1-10]]" srun hostname

To determine the external SLURM scheduler options that apply to jobs submitted to the LSF dualcore queue:

$ bqueues -l dualcore grep SLURM

MANDATORY_EXTSCHED: SLURM[constraint=dualcore]

Notes on LSF-HPC

The following are noteworthy items for users of LSF-HPC on HP XC systems:

A SLURM partition named lsf is used to manage LSF-HPC jobs. You can view information about this partition with the sinfo command.

LSF-HPC daemons only run on one node in the HP XC system. As a result, the lshosts and bhosts commands only list one host that represents all the resources of the HP XC system.

The total number of cores listed by the lshosts and bhosts commands for that host should be equal to the total number of cores assigned to the SLURM lsf partition.

When a job is submitted and the resources are available, LSF-HPC creates a properly sized SLURM allocation and adds several standard LSF environment variables to the environment in which the job is to be run. The following two environment variables are also added:

SLURM_JOBID This environment variable is created so that subsequent srun commands make use of the SLURM allocation created by LSF-HPC for the job. This variable can be used by a job script to query information about the SLURM allocation, as shown here:

$ squeue --jobs $SLURM_JOBID

"Translating SLURM and LSF-HPC JOBIDs" describes the relationship between the SLURM_JOBID and the LSF-HPC JOBID.

SLURM_NPROCS This environment variable passes along the total number of tasks requested with the bsub -ncommand to all subsequent srun commands. User scripts can override this value with the srun -ncommand, but the new value must be less than or equal to the original number of requested tasks.

Use the bjobs -land bhist -lLSF commands to see the components of the actual SLURM allocation command.

Use the bkill command to kill jobs.

72 Using LSF

Image 72
Contents HP XC System Software Users Guide Page Table of Contents Configuring Your Environment with Modulefiles Developing ApplicationsSubmitting Jobs Tuning Applications Using SlurmUsing LSF Debugging ApplicationsAdvanced Topics ExamplesGlossary 109 Index 115 List of Figures Page List of Tables Determining the Node PlatformPage List of Examples Submitting a Job ScriptPage About This Document Intended AudienceDocument Organization This document is organized as followsHP XC Information $ man lsfcommandname For More InformationSupplementary Information Related Information Manpages$ man discover $ man 8 discover $ man -k keywordRelated Linux Web Sites Related MPI Web SitesRelated Compiler Web Sites Additional PublicationsTypographic Conventions HP Encourages Your CommentsEnvironment Variable User inputOverview of the User Environment System ArchitectureHP XC System Software Operating SystemNode Specialization Storage and I/OFile System SAN StorageLocal Storage File System LayoutDetermining System Configuration Information System Interconnect NetworkNetwork Address Translation NAT Commands User EnvironmentModules Application Development Environment Run-Time EnvironmentParallel Applications Serial ApplicationsLoad Sharing Facility LSF-HPC Standard LSFHow LSF-HPC and Slurm Interact Mpirun command Components, Tools, Compilers, Libraries, and DebuggersUsing the System LVS Login RoutingUsing the Secure Shell to Log Logging In to the SystemIntroduction Getting Information About QueuesGetting Information About Resources Performing Other Common User Tasks Getting System Help and Information $ man sinfoConfiguring Your Environment with Modulefiles Overview of ModulesSupplied Modulefiles Modulefiles Automatically Loaded on the System Viewing Available ModulefilesViewing Loaded Modulefiles Loading a ModulefileAutomatically Loading a Modulefile at Login Unloading a ModulefileModulefile Conflicts Loading a Modulefile for the Current SessionCreating a Modulefile Viewing Modulefile-Specific Help$ module load modules $ man modulefile $ module help totalviewPage Developing Applications Application Development Environment OverviewCompilers Examining Nodes and Partitions Before Running Jobs Interrupting a JobMPI Compiler Partition Avail Timelimit Nodes State NodelistSetting Debugging Options Developing Serial ApplicationsSerial Application Build Environment Building Serial ApplicationsDeveloping Parallel Applications Parallel Application Build EnvironmentModulefiles OpenMPPthreads Quadrics ShmemMPI Library Intel Fortran and C/C++CompilersBuilding Parallel Applications Developing Libraries Designing Libraries for the CP4000 PlatformExamples of Compiling and Linking HP-MPI Applications Linkcommand 32-bit -L/opt/mypackage/lib/i686 -lmystuff Linkcommand 64-bit -L/opt/mypackage/lib/x8664 -lmystuffTo build a 64-bit application, you might enter Submitting Jobs Overview of Job SubmissionExtSLURMslurm-arguments Submitting a Serial Job Using Standard LSF Submitting a Serial Job Using LSF-HPCSubmitting a Serial Job with the LSF bsub Command $ bsub hostnameSubmitting a Serial Job Through Slurm only Submitting a Non-MPI Parallel Job $ bsub -n4 -I srun hostnameBsub -nnum-procsbsub-optionsmpijob Mpirun mpirun--options-srunsrun-optionsmpi-jobnameSubmitting a Batch Job or Job Script $ bsub -n4 -I mpirun -srun ./helloworldBsub -nnum-procs bsub-optionsscript-name $ cat myscript.sh #!/bin/sh Srun hostname mpirun -srun hellompi$ bsub -I -n4 Myscript.sh $ bsub -n4 -ext SLURMnodes=4 -I ./myscript.shRunning Preexecution Programs $ bsub -n4 -I ./myscript.sh$ cat ./envscript.sh #!/bin/sh name=`hostname` Opt/hptc/bin/srun Mypreexec Debugging Applications Debugging Serial ApplicationsDebugging Parallel Applications TotalViewSetting Up TotalView Using TotalView with SlurmSSH and TotalView Module load mpimodule load totalviewUsing TotalView with LSF-HPC Setting TotalView PreferencesDebugging an Application Sourcefile initfdte.f was not found, using assembler mode Debugging Running ApplicationsDirectories in File ⇒ Search Path $ mpirun -srun -n2 Psimple$ scancel --user username Exiting TotalView$ squeue Page Tuning Applications Using the Intel Trace Collector and Intel Trace AnalyzerBuilding a Program Intel Trace Collector and HP-MPI Running a Program Intel Trace Collector and HP-MPI Visualizing Data Intel Trace Analyzer and HP-MPILibs CldflagsUsing the Intel Trace Collector and Intel Trace Analyzer Page Using Slurm Launching Jobs with the srun CommandSrun Squeue Scancel Sinfo Scontrol Introduction to SlurmMonitoring Jobs with the squeue Command Using the srun Command with HP-MPIUsing the srun Command with LSF-HPC Srun Roles and ModesTerminating Jobs with the scancel Command Getting System Information with the sinfo CommandJob Accounting Fault Tolerance Security# chmod a+r /hptccluster/slurm/job/jobacct.log Using LSF Using Standard LSF on an HP XC SystemUsing LSF-HPC Introduction to LSF-HPC in the HP XC Environment Overview of LSF-HPCDifferences Between LSF-HPC and Standard LSF HostnameResources Hostname Status JL/U MAX Njobs RUN Ssusp Ususp RSVJob Terminology $ ssh n15 lshostsUnknown Unknown HP XCCompute Node Resource Support SLURMnodelist =nodelist if specified$ bsub -n 10 -I srun hostname $ bsub -n 10 -ext SLURMnodes=10 -I srun hostname$ bsub -n 10 -ext SLURMnodes=10exclude=n16 -I srun hostname $ bsub -n 10 -ext SLURMconstraint=dualcore -I srun hostnameHow LSF-HPC and Slurm Launch and Manage a Job $ bsub -n4 -ext SLURMnodes=4 -o output.out ./myscriptJob Startup and Job Control #!/bin/sh hostname srun hostname Mpirun -srun ./hellompiDetermining the LSF Execution Host Determining Available LSF-HPC System ResourcesGetting the Status of LSF-HPC Getting Information About LSF Execution Host NodeGetting Host Load Information Examining LSF-HPC System QueuesGetting Information About the lsf Partition SLINUX6Summary of the LSF bsub Command Format $ sinfo -p lsf$ sinfo -p lsf -lNe LSF-SLURM External Scheduler For information about running scriptsBsub -n num-procs-ext SLURMslurm-arguments \ Bsub-options srun srun-optionsjobname job-optionsSubmitting a Job from a Non-HP XC Host Starting on lsfhost.localdomain n6Waiting for dispatch ... Starting on lsfhost.localdomain n1 Type=SLINUX64Getting Information About Jobs Getting Job Allocation InformationSlurmid=slurmjobidncpus=slurmnprocsslurmalloc=nodelist $ bjobs -lExamining the Status of a Job $ bhist -lTime stamp $ bjobsViewing the Historical Information for a Job $ bhistSummary of time in seconds spent Various States Jobid User Jobname Pend Psusp RUN Ususp Ssusp Unkwn TotalTranslating Slurm and LSF-HPC JOBIDs Working Interactively Within an LSF-HPC Allocation $ bsub -I -n4 -ext SLURMnodes=4 /bin/bash$ bjobs -l 124 grep slurm $ srun --jobid=150 hostname$ unset Slurmjobid Alternatively, you can use the following$ export SLURMJOBID=150 $ export SLURMNPROCS=4 $ unset Slurmjobid $ unset SlurmnprocsJob 125 is submitted to the default queue normal LSF-HPC Equivalents of Slurm srun Options$ srun --jobid=250 uptime $ bsub -n4 -ext SLURMnodes=4 -o %J.out sleepBsub -iinputfile Mpi=mpitype Quit-on-interrupt Page Advanced Topics Enabling Remote Execution with OpenSSHRunning an X Terminal Session from a Remote Node Determining IP Address of Your Local MachineRunning an X terminal Session Using Slurm Running an X terminal Session Using LSF-HPCLogging in to HP XC System Using the GNU Parallel Make Capability $ bsub -n4 -Ip srun -n1 xterm -display$ srun -n4 hostname n46 $ srun -n2 hostname n46$ cd subdir srun -n1 -N1 $MAKE -j4 Example Procedure $ make PREFIX=’srun -n1 -N1 MAKEJ=-j4Local Disks on Compute Nodes Performance ConsiderationsModified Makefile is invoked as follows $ make PREFIX=srun -n1 -N1 MAKEJ=-j4Communication Between Nodes Shared File ViewPrivate File View Fp = fopen myfile, a+Page Appendix a Examples Building and Running a Serial ApplicationLaunching a Serial Interactive Shell Through LSF-HPC Examine the LSF execution host informationRunning LSF-HPC Jobs with a Slurm Allocation Request Launching a Parallel Interactive Shell Through LSF-HPC Example 2. Four cores on Two Specific NodesR15s r1m r15m It tmp swp mem LoadSched LoadStop SLURMnodes=2Examine the the running jobs information $ hostname n16 $ srun hostname n5 $ bjobs124 Lsfad Submitting a Simple Job Script with LSF-HPC Examine the the finished jobs informationShow the environment Display the scriptRun some commands from the pseudo-terminal Submitting an Interactive Job with LSF-HPCSubmit the job Show the job allocationSubmitting an HP-MPI Job with LSF-HPC Exit the pseudo-terminalView the interactive jobs View the finished jobsView the running job View the finished jobLsfhost.localdomai Using a Resource Requirements String in an LSF-HPC Command States by date and time$ bsub -n 8 -R ALPHA5 SLINUX64 \ -ext SLURMnodes=4-4 myjob 108 Glossary To the queue First-come See Fcfs First-served Global storageAs local storage Are not appropriate for replicationSingle command Login requests and directs them to a node with a login roleLinux Virtual See LVS Server Load file LSF master hostNetwork See NIS Information Services Remotely. PXE booting is configured at the Bios levelNotably to install and remove software packages Slurm backupPower available per unit of space SshSymmetric See SMP Multiprocessing 114 Index Index PGI Utilities, 63 Slurm commands