HP XC System 3.x Software Submitting Jobs, Overview of Job Submission, ExtSLURMslurm-arguments

Page 45

5 Submitting Jobs

This chapter describes how to submit jobs on the HP XC system; it addresses the following topics:

Overview of Job Submission (page 45)

Submitting a Serial Job Using Standard LSF (page 46)

Submitting a Serial Job Using LSF-HPC (page 46)

Submitting a Non-MPI Parallel Job (page 48)

Submitting a Parallel Job That Uses the HP-MPI Message Passing Interface (page 48)

Submitting a Batch Job or Job Script (page 49)

Running Preexecution Programs (page 51)

Overview of Job Submission

On an HP XC system, a job is submitted to LSF-HPC, which places the job in a queue and allows it to run when the necessary resources become available. The LSF bsub command is the primary method for submitting jobs on the HP XC system.

"Summary of the LSF bsub Command Format" provides a summary of the LSF bsub command format.

The format of the bsub command depends on the type of the job, as listed here:

Serial job; that is, a job that runs on a single core

Non-MPI parallel job

HP-MPI parallel job

A batch job script

The remaining sections describe how to submit a job for each of these job types.

The examples in this section submit the hostname command or a variation of a "hello, world" program. Most examples are run as interactive jobs to display the output.

The examples in this chapter are run on an HP XC system configuration in which lsfhost.localdomain is the virtual IP name of the LSF execution host and nodes n[1-16]are compute nodes in the lsf partition. All nodes contain 2 cores, providing 32 cores for use by LSF-HPC jobs.

Some examples, particularly those that launch parallel jobs, employ the LSF-SLURM External Scheduler to specify SLURM options that specify the minimum number of nodes required for the job, specific nodes for the job, and so on. "LSF-SLURM External Scheduler" provides additional information on the LSF-SLURM External Scheduler. The format of this option is shown here:

-ext"SLURM[slurm-arguments]"

The slurm-argumentscan consist of one or more srun allocation options (in long format).

Refer to "LSF-SLURM External Scheduler" for additional information about using the LSF-SLURM external scheduler. The Platform Computing LSF documentation provides more information on general external scheduler support. Also see the lsf_diff(1) manpage for information on the specific srun options available in the external LSF-SLURM scheduler.

Note

The SLURM external scheduler is a plug-in developed by Platform Computing Corporation for LSF-HPC; it is not actually part of SLURM. This plug-in communicates with SLURM to gather resource information and request allocations of nodes, but it is integrated with the LSF-HPC scheduler.

Overview of Job Submission 45

Image 45
Contents HP XC System Software Users Guide Page Table of Contents Configuring Your Environment with Modulefiles Developing ApplicationsSubmitting Jobs Using Slurm Tuning ApplicationsUsing LSF Debugging ApplicationsAdvanced Topics ExamplesGlossary 109 Index 115 List of Figures Page Determining the Node Platform List of TablesPage Submitting a Job Script List of ExamplesPage Intended Audience About This DocumentDocument Organization This document is organized as followsHP XC Information $ man lsfcommandname For More InformationSupplementary Information Manpages Related Information$ man discover $ man 8 discover $ man -k keywordRelated MPI Web Sites Related Linux Web SitesRelated Compiler Web Sites Additional PublicationsHP Encourages Your Comments Typographic ConventionsEnvironment Variable User inputSystem Architecture Overview of the User EnvironmentHP XC System Software Operating SystemStorage and I/O Node SpecializationSAN Storage File SystemLocal Storage File System LayoutDetermining System Configuration Information System Interconnect NetworkNetwork Address Translation NAT Commands User EnvironmentModules Run-Time Environment Application Development EnvironmentParallel Applications Serial ApplicationsLoad Sharing Facility LSF-HPC Standard LSFHow LSF-HPC and Slurm Interact Components, Tools, Compilers, Libraries, and Debuggers Mpirun commandLVS Login Routing Using the SystemUsing the Secure Shell to Log Logging In to the SystemIntroduction Getting Information About QueuesGetting Information About Resources Performing Other Common User Tasks $ man sinfo Getting System Help and InformationOverview of Modules Configuring Your Environment with ModulefilesSupplied Modulefiles Viewing Available Modulefiles Modulefiles Automatically Loaded on the SystemViewing Loaded Modulefiles Loading a ModulefileUnloading a Modulefile Automatically Loading a Modulefile at LoginModulefile Conflicts Loading a Modulefile for the Current SessionViewing Modulefile-Specific Help Creating a Modulefile$ module load modules $ man modulefile $ module help totalviewPage Developing Applications Application Development Environment OverviewCompilers Interrupting a Job Examining Nodes and Partitions Before Running JobsMPI Compiler Partition Avail Timelimit Nodes State NodelistDeveloping Serial Applications Setting Debugging OptionsSerial Application Build Environment Building Serial ApplicationsParallel Application Build Environment Developing Parallel ApplicationsModulefiles OpenMPQuadrics Shmem PthreadsMPI Library Intel Fortran and C/C++Compilers Building Parallel Applications Developing Libraries Designing Libraries for the CP4000 PlatformExamples of Compiling and Linking HP-MPI Applications Linkcommand 32-bit -L/opt/mypackage/lib/i686 -lmystuff Linkcommand 64-bit -L/opt/mypackage/lib/x8664 -lmystuffTo build a 64-bit application, you might enter Submitting Jobs Overview of Job SubmissionExtSLURMslurm-arguments Submitting a Serial Job Using LSF-HPC Submitting a Serial Job Using Standard LSFSubmitting a Serial Job with the LSF bsub Command $ bsub hostnameSubmitting a Serial Job Through Slurm only $ bsub -n4 -I srun hostname Submitting a Non-MPI Parallel JobBsub -nnum-procsbsub-optionsmpijob Mpirun mpirun--options-srunsrun-optionsmpi-jobnameSubmitting a Batch Job or Job Script $ bsub -n4 -I mpirun -srun ./helloworldBsub -nnum-procs bsub-optionsscript-name Srun hostname mpirun -srun hellompi $ cat myscript.sh #!/bin/sh$ bsub -I -n4 Myscript.sh $ bsub -n4 -ext SLURMnodes=4 -I ./myscript.shRunning Preexecution Programs $ bsub -n4 -I ./myscript.sh$ cat ./envscript.sh #!/bin/sh name=`hostname` Opt/hptc/bin/srun Mypreexec Debugging Serial Applications Debugging ApplicationsDebugging Parallel Applications TotalViewUsing TotalView with Slurm Setting Up TotalViewSSH and TotalView Module load mpimodule load totalviewUsing TotalView with LSF-HPC Setting TotalView PreferencesDebugging an Application Debugging Running Applications Sourcefile initfdte.f was not found, using assembler modeDirectories in File ⇒ Search Path $ mpirun -srun -n2 Psimple$ scancel --user username Exiting TotalView$ squeue Page Tuning Applications Using the Intel Trace Collector and Intel Trace AnalyzerBuilding a Program Intel Trace Collector and HP-MPI Visualizing Data Intel Trace Analyzer and HP-MPI Running a Program Intel Trace Collector and HP-MPILibs CldflagsUsing the Intel Trace Collector and Intel Trace Analyzer Page Launching Jobs with the srun Command Using SlurmSrun Squeue Scancel Sinfo Scontrol Introduction to SlurmUsing the srun Command with HP-MPI Monitoring Jobs with the squeue CommandUsing the srun Command with LSF-HPC Srun Roles and ModesTerminating Jobs with the scancel Command Getting System Information with the sinfo CommandJob Accounting Fault Tolerance Security# chmod a+r /hptccluster/slurm/job/jobacct.log Using LSF Using Standard LSF on an HP XC SystemUsing LSF-HPC Overview of LSF-HPC Introduction to LSF-HPC in the HP XC EnvironmentHostname Differences Between LSF-HPC and Standard LSFResources Hostname Status JL/U MAX Njobs RUN Ssusp Ususp RSVJob Terminology $ ssh n15 lshostsUnknown Unknown SLURMnodelist =nodelist if specified HP XCCompute Node Resource Support$ bsub -n 10 -ext SLURMnodes=10 -I srun hostname $ bsub -n 10 -I srun hostname$ bsub -n 10 -ext SLURMnodes=10exclude=n16 -I srun hostname $ bsub -n 10 -ext SLURMconstraint=dualcore -I srun hostname$ bsub -n4 -ext SLURMnodes=4 -o output.out ./myscript How LSF-HPC and Slurm Launch and Manage a Job#!/bin/sh hostname srun hostname Mpirun -srun ./hellompi Job Startup and Job ControlDetermining Available LSF-HPC System Resources Determining the LSF Execution HostGetting the Status of LSF-HPC Getting Information About LSF Execution Host NodeExamining LSF-HPC System Queues Getting Host Load InformationGetting Information About the lsf Partition SLINUX6Summary of the LSF bsub Command Format $ sinfo -p lsf$ sinfo -p lsf -lNe For information about running scripts LSF-SLURM External SchedulerBsub -n num-procs-ext SLURMslurm-arguments \ Bsub-options srun srun-optionsjobname job-optionsStarting on lsfhost.localdomain n6 Submitting a Job from a Non-HP XC HostWaiting for dispatch ... Starting on lsfhost.localdomain n1 Type=SLINUX64Getting Job Allocation Information Getting Information About JobsSlurmid=slurmjobidncpus=slurmnprocsslurmalloc=nodelist $ bjobs -l$ bhist -l Examining the Status of a JobTime stamp $ bjobs$ bhist Viewing the Historical Information for a JobSummary of time in seconds spent Various States Jobid User Jobname Pend Psusp RUN Ususp Ssusp Unkwn TotalTranslating Slurm and LSF-HPC JOBIDs $ bsub -I -n4 -ext SLURMnodes=4 /bin/bash Working Interactively Within an LSF-HPC Allocation$ bjobs -l 124 grep slurm $ srun --jobid=150 hostnameAlternatively, you can use the following $ unset Slurmjobid$ export SLURMJOBID=150 $ export SLURMNPROCS=4 $ unset Slurmjobid $ unset SlurmnprocsLSF-HPC Equivalents of Slurm srun Options Job 125 is submitted to the default queue normal$ srun --jobid=250 uptime $ bsub -n4 -ext SLURMnodes=4 -o %J.out sleepBsub -iinputfile Mpi=mpitype Quit-on-interrupt Page Enabling Remote Execution with OpenSSH Advanced TopicsRunning an X Terminal Session from a Remote Node Determining IP Address of Your Local MachineRunning an X terminal Session Using Slurm Running an X terminal Session Using LSF-HPCLogging in to HP XC System $ bsub -n4 -Ip srun -n1 xterm -display Using the GNU Parallel Make Capability$ srun -n4 hostname n46 $ srun -n2 hostname n46$ cd subdir srun -n1 -N1 $MAKE -j4 $ make PREFIX=’srun -n1 -N1 MAKEJ=-j4 Example ProcedurePerformance Considerations Local Disks on Compute NodesModified Makefile is invoked as follows $ make PREFIX=srun -n1 -N1 MAKEJ=-j4Shared File View Communication Between NodesPrivate File View Fp = fopen myfile, a+Page Building and Running a Serial Application Appendix a ExamplesLaunching a Serial Interactive Shell Through LSF-HPC Examine the LSF execution host informationRunning LSF-HPC Jobs with a Slurm Allocation Request Example 2. Four cores on Two Specific Nodes Launching a Parallel Interactive Shell Through LSF-HPCR15s r1m r15m It tmp swp mem LoadSched LoadStop SLURMnodes=2Examine the the running jobs information $ hostname n16 $ srun hostname n5 $ bjobs124 Lsfad Examine the the finished jobs information Submitting a Simple Job Script with LSF-HPCShow the environment Display the scriptSubmitting an Interactive Job with LSF-HPC Run some commands from the pseudo-terminalSubmit the job Show the job allocationExit the pseudo-terminal Submitting an HP-MPI Job with LSF-HPCView the interactive jobs View the finished jobsView the running job View the finished jobLsfhost.localdomai Using a Resource Requirements String in an LSF-HPC Command States by date and time$ bsub -n 8 -R ALPHA5 SLINUX64 \ -ext SLURMnodes=4-4 myjob 108 Glossary First-come See Fcfs First-served Global storage To the queueAs local storage Are not appropriate for replicationLogin requests and directs them to a node with a login role Single commandLinux Virtual See LVS Server Load file LSF master hostRemotely. PXE booting is configured at the Bios level Network See NIS Information ServicesNotably to install and remove software packages Slurm backupPower available per unit of space SshSymmetric See SMP Multiprocessing 114 Index Index PGI Utilities, 63 Slurm commands