HP XC System 3.x Software Examining Nodes and Partitions Before Running Jobs, Interrupting a Job

Page 38

Table 4-1, “Compiler Commands” displays the compiler commands for Standard Linux, Intel, and PGI compilers for the C, C++, and Fortran languages.

Table 4-1 Compiler Commands

Type

 

Compilers

 

Notes

 

C

C++

Fortran

 

Standard Linux

gcc

gcc++

g77

All HP XC platforms.

 

 

 

 

The HP XC System Software supplies these

 

 

 

 

compilers by default.

Intel

icc

icc

ifort

Version 9.0 compilers

 

 

 

 

For use on the Intel-based 64–bit platform.

Intel

icc

icc

ifort

Version 8.0 compilers

 

 

 

 

For use on the Intel-based 64–bit platform.

Intel

ecc

ecc

efc

Version 7.1 compilers

 

 

 

 

For use on the Intel-based 64–bit platform.

 

 

 

 

These compilers can be used but Intel may

 

 

 

 

not support them much longer.

PGI

pgcc

pgCC

pgf95, pgf77

For use on the CP4000 platform

Pathscale

pathcc

pathCC

pathf95, pathf90

For use on the CP4000 platform

 

 

 

 

pathf90 provided for backward

 

 

 

 

compatibility.

MPI Compiler

The HP XC System Software includes MPI. The MPI library on the HP XC system is HP MPI Version 2.1.

Examining Nodes and Partitions Before Running Jobs

Before launching an application, you can determine the availability and status of the system's nodes and partitions. Node and partition information is useful to have before launching a job so that you can launch the job to properly match the resources that are available on the system.

When invoked with no options, the SLURM sinfo command returns information about node availability and partitions, along with other information:

$ sinfo

 

 

 

 

 

 

PARTITION AVAIL TIMELIMIT

NODES

STATE

NODELIST

lsf

up

infinite

 

4

down*

n[12-15]

slurm*

up

infinite

2

idle

n[10-11]

The previous sinfo output shows that there are two partitions on the system:

one for LSF jobs

one for SLURM jobs

The asterisk in the PARTITION column indicates the default partition. An asterisk in the STATE column indicates nodes that are currently not responding.

Refer to Chapter 8. Using SLURM for information about using the sinfo command. The SLURM sinfo manpage also provides detailed information about the sinfo command.

Interrupting a Job

A job launched by the srun command can be interrupted by sending a signal to the command by issuing one or more Ctrl/C key sequences. Signals sent to the srun command are automatically forwarded to the tasks that it is controlling.

38 Developing Applications

Image 38
Contents HP XC System Software Users Guide Page Table of Contents Submitting Jobs Configuring Your Environment with ModulefilesDeveloping Applications Using LSF Tuning ApplicationsUsing Slurm Debugging ApplicationsGlossary 109 Index 115 Advanced TopicsExamples List of Figures Page List of Tables Determining the Node PlatformPage List of Examples Submitting a Job ScriptPage Document Organization About This DocumentIntended Audience This document is organized as followsHP XC Information Supplementary Information $ man lsfcommandnameFor More Information $ man discover $ man 8 discover Related InformationManpages $ man -k keywordRelated Compiler Web Sites Related Linux Web SitesRelated MPI Web Sites Additional PublicationsEnvironment Variable Typographic ConventionsHP Encourages Your Comments User inputHP XC System Software Overview of the User EnvironmentSystem Architecture Operating SystemNode Specialization Storage and I/OLocal Storage File SystemSAN Storage File System LayoutNetwork Address Translation NAT Determining System Configuration InformationSystem Interconnect Network Modules CommandsUser Environment Parallel Applications Application Development EnvironmentRun-Time Environment Serial ApplicationsHow LSF-HPC and Slurm Interact Load Sharing Facility LSF-HPCStandard LSF Mpirun command Components, Tools, Compilers, Libraries, and DebuggersUsing the Secure Shell to Log Using the SystemLVS Login Routing Logging In to the SystemGetting Information About Resources IntroductionGetting Information About Queues Performing Other Common User Tasks Getting System Help and Information $ man sinfoConfiguring Your Environment with Modulefiles Overview of ModulesSupplied Modulefiles Viewing Loaded Modulefiles Modulefiles Automatically Loaded on the SystemViewing Available Modulefiles Loading a ModulefileModulefile Conflicts Automatically Loading a Modulefile at LoginUnloading a Modulefile Loading a Modulefile for the Current Session$ module load modules $ man modulefile Creating a ModulefileViewing Modulefile-Specific Help $ module help totalviewPage Compilers Developing ApplicationsApplication Development Environment Overview MPI Compiler Examining Nodes and Partitions Before Running JobsInterrupting a Job Partition Avail Timelimit Nodes State NodelistSerial Application Build Environment Setting Debugging OptionsDeveloping Serial Applications Building Serial ApplicationsModulefiles Developing Parallel ApplicationsParallel Application Build Environment OpenMPMPI Library PthreadsQuadrics Shmem Intel Fortran and C/C++CompilersBuilding Parallel Applications Examples of Compiling and Linking HP-MPI Applications Developing LibrariesDesigning Libraries for the CP4000 Platform To build a 64-bit application, you might enter Linkcommand 32-bit -L/opt/mypackage/lib/i686 -lmystuffLinkcommand 64-bit -L/opt/mypackage/lib/x8664 -lmystuff ExtSLURMslurm-arguments Submitting JobsOverview of Job Submission Submitting a Serial Job with the LSF bsub Command Submitting a Serial Job Using Standard LSFSubmitting a Serial Job Using LSF-HPC $ bsub hostnameSubmitting a Serial Job Through Slurm only Bsub -nnum-procsbsub-optionsmpijob Submitting a Non-MPI Parallel Job$ bsub -n4 -I srun hostname Mpirun mpirun--options-srunsrun-optionsmpi-jobnameBsub -nnum-procs bsub-optionsscript-name Submitting a Batch Job or Job Script$ bsub -n4 -I mpirun -srun ./helloworld $ bsub -I -n4 Myscript.sh $ cat myscript.sh #!/bin/shSrun hostname mpirun -srun hellompi $ bsub -n4 -ext SLURMnodes=4 -I ./myscript.sh$ cat ./envscript.sh #!/bin/sh name=`hostname` Running Preexecution Programs$ bsub -n4 -I ./myscript.sh Opt/hptc/bin/srun Mypreexec Debugging Parallel Applications Debugging ApplicationsDebugging Serial Applications TotalViewSSH and TotalView Setting Up TotalViewUsing TotalView with Slurm Module load mpimodule load totalviewDebugging an Application Using TotalView with LSF-HPCSetting TotalView Preferences Directories in File ⇒ Search Path Sourcefile initfdte.f was not found, using assembler modeDebugging Running Applications $ mpirun -srun -n2 Psimple$ squeue $ scancel --user usernameExiting TotalView Page Building a Program Intel Trace Collector and HP-MPI Tuning ApplicationsUsing the Intel Trace Collector and Intel Trace Analyzer Libs Running a Program Intel Trace Collector and HP-MPIVisualizing Data Intel Trace Analyzer and HP-MPI CldflagsUsing the Intel Trace Collector and Intel Trace Analyzer Page Srun Squeue Scancel Sinfo Scontrol Using SlurmLaunching Jobs with the srun Command Introduction to SlurmUsing the srun Command with LSF-HPC Monitoring Jobs with the squeue CommandUsing the srun Command with HP-MPI Srun Roles and ModesJob Accounting Terminating Jobs with the scancel CommandGetting System Information with the sinfo Command # chmod a+r /hptccluster/slurm/job/jobacct.log Fault ToleranceSecurity Using LSF-HPC Using LSFUsing Standard LSF on an HP XC System Introduction to LSF-HPC in the HP XC Environment Overview of LSF-HPCResources Differences Between LSF-HPC and Standard LSFHostname Hostname Status JL/U MAX Njobs RUN Ssusp Ususp RSVUnknown Unknown Job Terminology$ ssh n15 lshosts HP XCCompute Node Resource Support SLURMnodelist =nodelist if specified$ bsub -n 10 -ext SLURMnodes=10exclude=n16 -I srun hostname $ bsub -n 10 -I srun hostname$ bsub -n 10 -ext SLURMnodes=10 -I srun hostname $ bsub -n 10 -ext SLURMconstraint=dualcore -I srun hostnameHow LSF-HPC and Slurm Launch and Manage a Job $ bsub -n4 -ext SLURMnodes=4 -o output.out ./myscriptJob Startup and Job Control #!/bin/sh hostname srun hostname Mpirun -srun ./hellompiGetting the Status of LSF-HPC Determining the LSF Execution HostDetermining Available LSF-HPC System Resources Getting Information About LSF Execution Host NodeGetting Information About the lsf Partition Getting Host Load InformationExamining LSF-HPC System Queues SLINUX6$ sinfo -p lsf -lNe Summary of the LSF bsub Command Format$ sinfo -p lsf Bsub -n num-procs-ext SLURMslurm-arguments \ LSF-SLURM External SchedulerFor information about running scripts Bsub-options srun srun-optionsjobname job-optionsWaiting for dispatch ... Starting on lsfhost.localdomain n1 Submitting a Job from a Non-HP XC HostStarting on lsfhost.localdomain n6 Type=SLINUX64Slurmid=slurmjobidncpus=slurmnprocsslurmalloc=nodelist Getting Information About JobsGetting Job Allocation Information $ bjobs -lTime stamp Examining the Status of a Job$ bhist -l $ bjobsSummary of time in seconds spent Various States Viewing the Historical Information for a Job$ bhist Jobid User Jobname Pend Psusp RUN Ususp Ssusp Unkwn TotalTranslating Slurm and LSF-HPC JOBIDs $ bjobs -l 124 grep slurm Working Interactively Within an LSF-HPC Allocation$ bsub -I -n4 -ext SLURMnodes=4 /bin/bash $ srun --jobid=150 hostname$ export SLURMJOBID=150 $ export SLURMNPROCS=4 $ unset SlurmjobidAlternatively, you can use the following $ unset Slurmjobid $ unset Slurmnprocs$ srun --jobid=250 uptime Job 125 is submitted to the default queue normalLSF-HPC Equivalents of Slurm srun Options $ bsub -n4 -ext SLURMnodes=4 -o %J.out sleepBsub -iinputfile Mpi=mpitype Quit-on-interrupt Page Running an X Terminal Session from a Remote Node Advanced TopicsEnabling Remote Execution with OpenSSH Determining IP Address of Your Local MachineLogging in to HP XC System Running an X terminal Session Using SlurmRunning an X terminal Session Using LSF-HPC $ srun -n4 hostname n46 Using the GNU Parallel Make Capability$ bsub -n4 -Ip srun -n1 xterm -display $ srun -n2 hostname n46$ cd subdir srun -n1 -N1 $MAKE -j4 Example Procedure $ make PREFIX=’srun -n1 -N1 MAKEJ=-j4Modified Makefile is invoked as follows Local Disks on Compute NodesPerformance Considerations $ make PREFIX=srun -n1 -N1 MAKEJ=-j4Private File View Communication Between NodesShared File View Fp = fopen myfile, a+Page Launching a Serial Interactive Shell Through LSF-HPC Appendix a ExamplesBuilding and Running a Serial Application Examine the LSF execution host informationRunning LSF-HPC Jobs with a Slurm Allocation Request R15s r1m r15m It tmp swp mem LoadSched LoadStop Launching a Parallel Interactive Shell Through LSF-HPCExample 2. Four cores on Two Specific Nodes SLURMnodes=2124 Lsfad Examine the the running jobs information$ hostname n16 $ srun hostname n5 $ bjobs Show the environment Submitting a Simple Job Script with LSF-HPCExamine the the finished jobs information Display the scriptSubmit the job Run some commands from the pseudo-terminalSubmitting an Interactive Job with LSF-HPC Show the job allocationView the interactive jobs Submitting an HP-MPI Job with LSF-HPCExit the pseudo-terminal View the finished jobsLsfhost.localdomai View the running jobView the finished job $ bsub -n 8 -R ALPHA5 SLINUX64 \ -ext SLURMnodes=4-4 myjob Using a Resource Requirements String in an LSF-HPC CommandStates by date and time 108 Glossary As local storage To the queueFirst-come See Fcfs First-served Global storage Are not appropriate for replicationLinux Virtual See LVS Server Load file Single commandLogin requests and directs them to a node with a login role LSF master hostNotably to install and remove software packages Network See NIS Information ServicesRemotely. PXE booting is configured at the Bios level Slurm backupSymmetric See SMP Multiprocessing Power available per unit of spaceSsh 114 Index Index PGI Utilities, 63 Slurm commands