HP XC System 3.x Software manual Using the GNU Parallel Make Capability, $ srun -n4 hostname n46

Page 93

Determine the address of your monitor's display server, as shown at the beginning of "Running an X Terminal Session from a Remote Node" . You can start an X terminal session using this address information in a bsub command with the appropriate options. For example:

$ bsub -n4 -Ip srun -n1 xterm -display 14.26.206.134:0.0

Job <159> is submitted to default queue <normal>. <<Waiting for dispatch ...>>

<<Starting on lsfhost.localdomain>>

The options used in this command are:

-n4

allocate 4 cores

-Ip

interact with the X terminal session

srun -n1

run the job on 1 core

xterm

the job is an X terminal session

-display <address>

monitor's display server address

A remote X terminal session appears on your monitor. The X terminal session job is launched from node n47, which is the LSF execution host node. You can view this job using LSF-HPC and SLURM commands. For example:

$ sinfo

 

 

 

 

 

 

 

 

 

PARTITION AVAIL TIMELIMIT NODES

STATE NODELIST

 

 

 

lsf

up

infinite

2

alloc n[46,48]

 

 

 

$ squeue

 

 

 

 

 

 

 

 

JOBID

PARTITION

NAME

USER

ST

TIME

NODES NODELIST

117

lsf

 

hptclsf@

username R

0:25

2

n[46,48]

$ bjobs

 

 

 

 

 

 

 

 

 

JOBID

USER

STAT QUEUE

FROM_HOST EXEC_HOST JOB_NAME

SUBMIT_TIME

119

lsfadmi RUN

norma

n48

4*n47

*8.136:0.0 date and time

You can now run some jobs from the X terminal session that you started and make use of the full allocation within the LSF-HPC node allocation. For example:

$ srun -n4 hostname n46

n48

n46

n48

$ srun -n2 hostname n46

n48

Exiting from the X terminal session ends the LSF-HPC job.

Using the GNU Parallel Make Capability

By default, the make command invokes the GNU make program. GNU make has the ability to make independent targets concurrently. For example, if building a program requires compiling 10 source files, and the compilations can be done independently, make can manage multiple compilations at once — the number of jobs is user selectable. More precisely, each target's rules are run normally (sequentially within the rule). Typically the rules for an object file target is a single compilation line, so it is common to talk about concurrent compilations, though GNU make is more general.

On non-cluster platforms or command nodes, matching concurrency to the number of cores often works well. It also often works well to specify a few more jobs than cores so that one job can proceed while another is waiting for I/O. On an HP XC system, there is the potential to use compute nodes to do compilations, and there are a variety of ways to make this happen.

One way is to prefix the actual compilation line in the rule with an srun command. So, instead of executing

ccfoo.c -o foo.o it would execute srun cc foo.c -o foo.o. With concurrency, multiple command nodes would have multiple srun commands instead of multiple cc commands. For projects that recursively run make on subdirectories, the recursive make can be run on the compute nodes. For example:

$ cd subdir; srun $(MAKE)...

Using the GNU Parallel Make Capability 93

Image 93
Contents HP XC System Software Users Guide Page Table of Contents Configuring Your Environment with Modulefiles Developing ApplicationsSubmitting Jobs Using Slurm Tuning ApplicationsUsing LSF Debugging ApplicationsAdvanced Topics ExamplesGlossary 109 Index 115 List of Figures Page Determining the Node Platform List of TablesPage Submitting a Job Script List of ExamplesPage Intended Audience About This DocumentDocument Organization This document is organized as followsHP XC Information $ man lsfcommandname For More InformationSupplementary Information Manpages Related Information$ man discover $ man 8 discover $ man -k keywordRelated MPI Web Sites Related Linux Web SitesRelated Compiler Web Sites Additional PublicationsHP Encourages Your Comments Typographic ConventionsEnvironment Variable User inputSystem Architecture Overview of the User EnvironmentHP XC System Software Operating SystemStorage and I/O Node SpecializationSAN Storage File SystemLocal Storage File System LayoutDetermining System Configuration Information System Interconnect NetworkNetwork Address Translation NAT Commands User EnvironmentModules Run-Time Environment Application Development EnvironmentParallel Applications Serial ApplicationsLoad Sharing Facility LSF-HPC Standard LSFHow LSF-HPC and Slurm Interact Components, Tools, Compilers, Libraries, and Debuggers Mpirun commandLVS Login Routing Using the SystemUsing the Secure Shell to Log Logging In to the SystemIntroduction Getting Information About QueuesGetting Information About Resources Performing Other Common User Tasks $ man sinfo Getting System Help and InformationOverview of Modules Configuring Your Environment with ModulefilesSupplied Modulefiles Viewing Available Modulefiles Modulefiles Automatically Loaded on the SystemViewing Loaded Modulefiles Loading a ModulefileUnloading a Modulefile Automatically Loading a Modulefile at LoginModulefile Conflicts Loading a Modulefile for the Current SessionViewing Modulefile-Specific Help Creating a Modulefile$ module load modules $ man modulefile $ module help totalviewPage Developing Applications Application Development Environment OverviewCompilers Interrupting a Job Examining Nodes and Partitions Before Running JobsMPI Compiler Partition Avail Timelimit Nodes State NodelistDeveloping Serial Applications Setting Debugging OptionsSerial Application Build Environment Building Serial ApplicationsParallel Application Build Environment Developing Parallel ApplicationsModulefiles OpenMPQuadrics Shmem PthreadsMPI Library Intel Fortran and C/C++CompilersBuilding Parallel Applications Developing Libraries Designing Libraries for the CP4000 PlatformExamples of Compiling and Linking HP-MPI Applications Linkcommand 32-bit -L/opt/mypackage/lib/i686 -lmystuff Linkcommand 64-bit -L/opt/mypackage/lib/x8664 -lmystuffTo build a 64-bit application, you might enter Submitting Jobs Overview of Job SubmissionExtSLURMslurm-arguments Submitting a Serial Job Using LSF-HPC Submitting a Serial Job Using Standard LSFSubmitting a Serial Job with the LSF bsub Command $ bsub hostnameSubmitting a Serial Job Through Slurm only $ bsub -n4 -I srun hostname Submitting a Non-MPI Parallel JobBsub -nnum-procsbsub-optionsmpijob Mpirun mpirun--options-srunsrun-optionsmpi-jobnameSubmitting a Batch Job or Job Script $ bsub -n4 -I mpirun -srun ./helloworldBsub -nnum-procs bsub-optionsscript-name Srun hostname mpirun -srun hellompi $ cat myscript.sh #!/bin/sh$ bsub -I -n4 Myscript.sh $ bsub -n4 -ext SLURMnodes=4 -I ./myscript.shRunning Preexecution Programs $ bsub -n4 -I ./myscript.sh$ cat ./envscript.sh #!/bin/sh name=`hostname` Opt/hptc/bin/srun Mypreexec Debugging Serial Applications Debugging ApplicationsDebugging Parallel Applications TotalViewUsing TotalView with Slurm Setting Up TotalViewSSH and TotalView Module load mpimodule load totalviewUsing TotalView with LSF-HPC Setting TotalView PreferencesDebugging an Application Debugging Running Applications Sourcefile initfdte.f was not found, using assembler modeDirectories in File ⇒ Search Path $ mpirun -srun -n2 Psimple$ scancel --user username Exiting TotalView$ squeue Page Tuning Applications Using the Intel Trace Collector and Intel Trace AnalyzerBuilding a Program Intel Trace Collector and HP-MPI Visualizing Data Intel Trace Analyzer and HP-MPI Running a Program Intel Trace Collector and HP-MPILibs CldflagsUsing the Intel Trace Collector and Intel Trace Analyzer Page Launching Jobs with the srun Command Using SlurmSrun Squeue Scancel Sinfo Scontrol Introduction to SlurmUsing the srun Command with HP-MPI Monitoring Jobs with the squeue CommandUsing the srun Command with LSF-HPC Srun Roles and ModesTerminating Jobs with the scancel Command Getting System Information with the sinfo CommandJob Accounting Fault Tolerance Security# chmod a+r /hptccluster/slurm/job/jobacct.log Using LSF Using Standard LSF on an HP XC SystemUsing LSF-HPC Overview of LSF-HPC Introduction to LSF-HPC in the HP XC EnvironmentHostname Differences Between LSF-HPC and Standard LSFResources Hostname Status JL/U MAX Njobs RUN Ssusp Ususp RSVJob Terminology $ ssh n15 lshostsUnknown Unknown SLURMnodelist =nodelist if specified HP XCCompute Node Resource Support$ bsub -n 10 -ext SLURMnodes=10 -I srun hostname $ bsub -n 10 -I srun hostname$ bsub -n 10 -ext SLURMnodes=10exclude=n16 -I srun hostname $ bsub -n 10 -ext SLURMconstraint=dualcore -I srun hostname$ bsub -n4 -ext SLURMnodes=4 -o output.out ./myscript How LSF-HPC and Slurm Launch and Manage a Job#!/bin/sh hostname srun hostname Mpirun -srun ./hellompi Job Startup and Job ControlDetermining Available LSF-HPC System Resources Determining the LSF Execution HostGetting the Status of LSF-HPC Getting Information About LSF Execution Host NodeExamining LSF-HPC System Queues Getting Host Load InformationGetting Information About the lsf Partition SLINUX6Summary of the LSF bsub Command Format $ sinfo -p lsf$ sinfo -p lsf -lNe For information about running scripts LSF-SLURM External SchedulerBsub -n num-procs-ext SLURMslurm-arguments \ Bsub-options srun srun-optionsjobname job-optionsStarting on lsfhost.localdomain n6 Submitting a Job from a Non-HP XC HostWaiting for dispatch ... Starting on lsfhost.localdomain n1 Type=SLINUX64Getting Job Allocation Information Getting Information About JobsSlurmid=slurmjobidncpus=slurmnprocsslurmalloc=nodelist $ bjobs -l$ bhist -l Examining the Status of a JobTime stamp $ bjobs$ bhist Viewing the Historical Information for a JobSummary of time in seconds spent Various States Jobid User Jobname Pend Psusp RUN Ususp Ssusp Unkwn TotalTranslating Slurm and LSF-HPC JOBIDs $ bsub -I -n4 -ext SLURMnodes=4 /bin/bash Working Interactively Within an LSF-HPC Allocation$ bjobs -l 124 grep slurm $ srun --jobid=150 hostnameAlternatively, you can use the following $ unset Slurmjobid$ export SLURMJOBID=150 $ export SLURMNPROCS=4 $ unset Slurmjobid $ unset SlurmnprocsLSF-HPC Equivalents of Slurm srun Options Job 125 is submitted to the default queue normal$ srun --jobid=250 uptime $ bsub -n4 -ext SLURMnodes=4 -o %J.out sleepBsub -iinputfile Mpi=mpitype Quit-on-interrupt Page Enabling Remote Execution with OpenSSH Advanced TopicsRunning an X Terminal Session from a Remote Node Determining IP Address of Your Local MachineRunning an X terminal Session Using Slurm Running an X terminal Session Using LSF-HPCLogging in to HP XC System $ bsub -n4 -Ip srun -n1 xterm -display Using the GNU Parallel Make Capability$ srun -n4 hostname n46 $ srun -n2 hostname n46$ cd subdir srun -n1 -N1 $MAKE -j4 $ make PREFIX=’srun -n1 -N1 MAKEJ=-j4 Example ProcedurePerformance Considerations Local Disks on Compute NodesModified Makefile is invoked as follows $ make PREFIX=srun -n1 -N1 MAKEJ=-j4Shared File View Communication Between NodesPrivate File View Fp = fopen myfile, a+Page Building and Running a Serial Application Appendix a ExamplesLaunching a Serial Interactive Shell Through LSF-HPC Examine the LSF execution host informationRunning LSF-HPC Jobs with a Slurm Allocation Request Example 2. Four cores on Two Specific Nodes Launching a Parallel Interactive Shell Through LSF-HPCR15s r1m r15m It tmp swp mem LoadSched LoadStop SLURMnodes=2Examine the the running jobs information $ hostname n16 $ srun hostname n5 $ bjobs124 Lsfad Examine the the finished jobs information Submitting a Simple Job Script with LSF-HPCShow the environment Display the scriptSubmitting an Interactive Job with LSF-HPC Run some commands from the pseudo-terminalSubmit the job Show the job allocationExit the pseudo-terminal Submitting an HP-MPI Job with LSF-HPCView the interactive jobs View the finished jobsView the running job View the finished jobLsfhost.localdomai Using a Resource Requirements String in an LSF-HPC Command States by date and time$ bsub -n 8 -R ALPHA5 SLINUX64 \ -ext SLURMnodes=4-4 myjob 108 Glossary First-come See Fcfs First-served Global storage To the queueAs local storage Are not appropriate for replicationLogin requests and directs them to a node with a login role Single commandLinux Virtual See LVS Server Load file LSF master hostRemotely. PXE booting is configured at the Bios level Network See NIS Information ServicesNotably to install and remove software packages Slurm backupPower available per unit of space SshSymmetric See SMP Multiprocessing 114 Index Index PGI Utilities, 63 Slurm commands