HP XC System 3.x Software manual Job Terminology, $ ssh n15 lshosts, Unknown Unknown

Page 70

All HP XC nodes are dynamically configured as “LSF Floating Client Hosts” so that you can execute LSF commands from any HP XC node. When you do execute an LSF command from an HP XC node, an entry in the output of the lshosts acknowledges the node is licensed to run LSF commands.

In the following example, node n15 is configured as an LSF Client Host, not the LSF execution host. This is shown in the output when you run lshosts command is run on that node: The values for the type and model are UNKNOWN and the value for server is No.

$ lshosts

 

 

 

 

 

 

 

 

HOST_NAME

type

model

cpuf ncpus maxmem maxswp server

RESOURCES

lsfhost.loc SLINUX6 Opteron8

60.0

8

2007M

-

Yes

(slurm)

$ ssh n15 lshosts

 

 

 

 

 

 

 

HOST_NAME

type

model

cpuf ncpus maxmem maxswp server

RESOURCES

lsfhost.loc SLINUX6 Opteron8

60.0

8

2007M

-

Yes

(slurm)

n15

UNKNOWN UNKNOWN_

1.0

-

-

-

No

()

LSF-HPC-enforced job-level run-time limits are not supported.

LSF-HPC does not support parallel or SLURM-based interactive jobs in PTY mode (bsub -Isand bsub

-Ip). However, after LSF dispatches a user job on the HP XC system, you can use the srun or ssh command to access the job resources directly accessible. For more information, see "Working Interactively Within an LSF-HPC Allocation" .

LSF-HPC does not support user-account mapping and system-account mapping.

LSF-HPC does not support chunk jobs. If a job is submitted to chunk queue, SLURM will let the job pend.

LSF-HPC does not support topology-aware advanced reservation scheduling.

Job Terminology

The following terms are used to describe jobs submitted to LSF-HPC

Batch job

A job submitted to LSF or SLURM that runs without any I/O connection

 

back to the terminal from which the job was submitted. This job may run

 

immediately, or it may run sometime in the future, depending on resource

 

availability and batch system scheduling policies.

 

Batch job submissions typically provide instructions on I/O management,

 

such as files from which to read input and filenames to collect output.

 

By default, LSF jobs are batch jobs. The output is e-mailed to the user,

 

which requires that e-mail be set up properly. SLURM batch jobs are

 

submitted with the srun -bcommand. By default, the output is written

 

to $CWD/slurm-SLURMjobID.outfrom the node on which the batch

 

job was launched.

 

Use Ctrl-C at any time to terminate the job.

Interactive batch job

A job submitted to LSF or SLURM that maintains I/O connections with the

 

terminal from which the job was submitted. The job is also subject to

 

resource availability and scheduling policies, so it may pause before

 

starting. After running, the job output displays on the terminal and the user

 

can provide input if the job allows it.

 

By default, SLURM jobs are interactive. Interactive LSF jobs are submitted

 

with the bsub -Icommand.

 

Use Ctrl-C at any time to terminate the job.

Serial job

A job that requests only one slot and does not specify any of the following

 

constraints:

 

mem

 

tmp

 

mincpus

 

nodes

70 Using LSF

Image 70
Contents HP XC System Software Users Guide Page Table of Contents Developing Applications Configuring Your Environment with ModulefilesSubmitting Jobs Using LSF Tuning ApplicationsUsing Slurm Debugging ApplicationsExamples Advanced TopicsGlossary 109 Index 115 List of Figures Page List of Tables Determining the Node PlatformPage List of Examples Submitting a Job ScriptPage Document Organization About This DocumentIntended Audience This document is organized as followsHP XC Information For More Information $ man lsfcommandnameSupplementary Information $ man discover $ man 8 discover Related InformationManpages $ man -k keywordRelated Compiler Web Sites Related Linux Web SitesRelated MPI Web Sites Additional PublicationsEnvironment Variable Typographic ConventionsHP Encourages Your Comments User inputHP XC System Software Overview of the User EnvironmentSystem Architecture Operating SystemNode Specialization Storage and I/OLocal Storage File SystemSAN Storage File System LayoutSystem Interconnect Network Determining System Configuration InformationNetwork Address Translation NAT User Environment CommandsModules Parallel Applications Application Development EnvironmentRun-Time Environment Serial ApplicationsStandard LSF Load Sharing Facility LSF-HPCHow LSF-HPC and Slurm Interact Mpirun command Components, Tools, Compilers, Libraries, and DebuggersUsing the Secure Shell to Log Using the SystemLVS Login Routing Logging In to the SystemGetting Information About Queues IntroductionGetting Information About Resources Performing Other Common User Tasks Getting System Help and Information $ man sinfoConfiguring Your Environment with Modulefiles Overview of ModulesSupplied Modulefiles Viewing Loaded Modulefiles Modulefiles Automatically Loaded on the SystemViewing Available Modulefiles Loading a ModulefileModulefile Conflicts Automatically Loading a Modulefile at LoginUnloading a Modulefile Loading a Modulefile for the Current Session$ module load modules $ man modulefile Creating a ModulefileViewing Modulefile-Specific Help $ module help totalviewPage Application Development Environment Overview Developing ApplicationsCompilers MPI Compiler Examining Nodes and Partitions Before Running JobsInterrupting a Job Partition Avail Timelimit Nodes State NodelistSerial Application Build Environment Setting Debugging OptionsDeveloping Serial Applications Building Serial ApplicationsModulefiles Developing Parallel ApplicationsParallel Application Build Environment OpenMPMPI Library PthreadsQuadrics Shmem Intel Fortran and C/C++CompilersBuilding Parallel Applications Designing Libraries for the CP4000 Platform Developing LibrariesExamples of Compiling and Linking HP-MPI Applications Linkcommand 64-bit -L/opt/mypackage/lib/x8664 -lmystuff Linkcommand 32-bit -L/opt/mypackage/lib/i686 -lmystuffTo build a 64-bit application, you might enter Overview of Job Submission Submitting JobsExtSLURMslurm-arguments Submitting a Serial Job with the LSF bsub Command Submitting a Serial Job Using Standard LSFSubmitting a Serial Job Using LSF-HPC $ bsub hostnameSubmitting a Serial Job Through Slurm only Bsub -nnum-procsbsub-optionsmpijob Submitting a Non-MPI Parallel Job$ bsub -n4 -I srun hostname Mpirun mpirun--options-srunsrun-optionsmpi-jobname$ bsub -n4 -I mpirun -srun ./helloworld Submitting a Batch Job or Job ScriptBsub -nnum-procs bsub-optionsscript-name $ bsub -I -n4 Myscript.sh $ cat myscript.sh #!/bin/shSrun hostname mpirun -srun hellompi $ bsub -n4 -ext SLURMnodes=4 -I ./myscript.sh$ bsub -n4 -I ./myscript.sh Running Preexecution Programs$ cat ./envscript.sh #!/bin/sh name=`hostname` Opt/hptc/bin/srun Mypreexec Debugging Parallel Applications Debugging ApplicationsDebugging Serial Applications TotalViewSSH and TotalView Setting Up TotalViewUsing TotalView with Slurm Module load mpimodule load totalviewSetting TotalView Preferences Using TotalView with LSF-HPCDebugging an Application Directories in File ⇒ Search Path Sourcefile initfdte.f was not found, using assembler modeDebugging Running Applications $ mpirun -srun -n2 PsimpleExiting TotalView $ scancel --user username$ squeue Page Using the Intel Trace Collector and Intel Trace Analyzer Tuning ApplicationsBuilding a Program Intel Trace Collector and HP-MPI Libs Running a Program Intel Trace Collector and HP-MPIVisualizing Data Intel Trace Analyzer and HP-MPI CldflagsUsing the Intel Trace Collector and Intel Trace Analyzer Page Srun Squeue Scancel Sinfo Scontrol Using SlurmLaunching Jobs with the srun Command Introduction to SlurmUsing the srun Command with LSF-HPC Monitoring Jobs with the squeue CommandUsing the srun Command with HP-MPI Srun Roles and ModesGetting System Information with the sinfo Command Terminating Jobs with the scancel CommandJob Accounting Security Fault Tolerance# chmod a+r /hptccluster/slurm/job/jobacct.log Using Standard LSF on an HP XC System Using LSFUsing LSF-HPC Introduction to LSF-HPC in the HP XC Environment Overview of LSF-HPCResources Differences Between LSF-HPC and Standard LSFHostname Hostname Status JL/U MAX Njobs RUN Ssusp Ususp RSV$ ssh n15 lshosts Job TerminologyUnknown Unknown HP XCCompute Node Resource Support SLURMnodelist =nodelist if specified$ bsub -n 10 -ext SLURMnodes=10exclude=n16 -I srun hostname $ bsub -n 10 -I srun hostname$ bsub -n 10 -ext SLURMnodes=10 -I srun hostname $ bsub -n 10 -ext SLURMconstraint=dualcore -I srun hostnameHow LSF-HPC and Slurm Launch and Manage a Job $ bsub -n4 -ext SLURMnodes=4 -o output.out ./myscriptJob Startup and Job Control #!/bin/sh hostname srun hostname Mpirun -srun ./hellompiGetting the Status of LSF-HPC Determining the LSF Execution HostDetermining Available LSF-HPC System Resources Getting Information About LSF Execution Host NodeGetting Information About the lsf Partition Getting Host Load InformationExamining LSF-HPC System Queues SLINUX6$ sinfo -p lsf Summary of the LSF bsub Command Format$ sinfo -p lsf -lNe Bsub -n num-procs-ext SLURMslurm-arguments \ LSF-SLURM External SchedulerFor information about running scripts Bsub-options srun srun-optionsjobname job-optionsWaiting for dispatch ... Starting on lsfhost.localdomain n1 Submitting a Job from a Non-HP XC HostStarting on lsfhost.localdomain n6 Type=SLINUX64Slurmid=slurmjobidncpus=slurmnprocsslurmalloc=nodelist Getting Information About JobsGetting Job Allocation Information $ bjobs -lTime stamp Examining the Status of a Job$ bhist -l $ bjobsSummary of time in seconds spent Various States Viewing the Historical Information for a Job$ bhist Jobid User Jobname Pend Psusp RUN Ususp Ssusp Unkwn TotalTranslating Slurm and LSF-HPC JOBIDs $ bjobs -l 124 grep slurm Working Interactively Within an LSF-HPC Allocation$ bsub -I -n4 -ext SLURMnodes=4 /bin/bash $ srun --jobid=150 hostname$ export SLURMJOBID=150 $ export SLURMNPROCS=4 $ unset SlurmjobidAlternatively, you can use the following $ unset Slurmjobid $ unset Slurmnprocs$ srun --jobid=250 uptime Job 125 is submitted to the default queue normalLSF-HPC Equivalents of Slurm srun Options $ bsub -n4 -ext SLURMnodes=4 -o %J.out sleepBsub -iinputfile Mpi=mpitype Quit-on-interrupt Page Running an X Terminal Session from a Remote Node Advanced TopicsEnabling Remote Execution with OpenSSH Determining IP Address of Your Local MachineRunning an X terminal Session Using LSF-HPC Running an X terminal Session Using SlurmLogging in to HP XC System $ srun -n4 hostname n46 Using the GNU Parallel Make Capability$ bsub -n4 -Ip srun -n1 xterm -display $ srun -n2 hostname n46$ cd subdir srun -n1 -N1 $MAKE -j4 Example Procedure $ make PREFIX=’srun -n1 -N1 MAKEJ=-j4Modified Makefile is invoked as follows Local Disks on Compute NodesPerformance Considerations $ make PREFIX=srun -n1 -N1 MAKEJ=-j4Private File View Communication Between NodesShared File View Fp = fopen myfile, a+Page Launching a Serial Interactive Shell Through LSF-HPC Appendix a ExamplesBuilding and Running a Serial Application Examine the LSF execution host informationRunning LSF-HPC Jobs with a Slurm Allocation Request R15s r1m r15m It tmp swp mem LoadSched LoadStop Launching a Parallel Interactive Shell Through LSF-HPCExample 2. Four cores on Two Specific Nodes SLURMnodes=2$ hostname n16 $ srun hostname n5 $ bjobs Examine the the running jobs information124 Lsfad Show the environment Submitting a Simple Job Script with LSF-HPCExamine the the finished jobs information Display the scriptSubmit the job Run some commands from the pseudo-terminalSubmitting an Interactive Job with LSF-HPC Show the job allocationView the interactive jobs Submitting an HP-MPI Job with LSF-HPCExit the pseudo-terminal View the finished jobsView the finished job View the running jobLsfhost.localdomai States by date and time Using a Resource Requirements String in an LSF-HPC Command$ bsub -n 8 -R ALPHA5 SLINUX64 \ -ext SLURMnodes=4-4 myjob 108 Glossary As local storage To the queueFirst-come See Fcfs First-served Global storage Are not appropriate for replicationLinux Virtual See LVS Server Load file Single commandLogin requests and directs them to a node with a login role LSF master hostNotably to install and remove software packages Network See NIS Information ServicesRemotely. PXE booting is configured at the Bios level Slurm backupSsh Power available per unit of spaceSymmetric See SMP Multiprocessing 114 Index Index PGI Utilities, 63 Slurm commands