HP XC System 3.x Software Components, Tools, Compilers, Libraries, and Debuggers, Mpirun command

Page 26

SLURM

Allocates nodes for jobs as determined by LSF-HPC. It CONTROLS task/rank distribution within

 

the allocated nodes. SLURM also starts the executables on each host as requested by the HP-MPI

 

mpirun command.

HP-MPI

Determines HOW the job runs. It is part of the application, so it performs communication.

 

HP-MPI can also pinpoint the processor on which each rank runs.

HP-MPI

HP-MPI is a high-performance implementation of the Message Passing Interface (MPI) standard and is included with the HP XC system. HP-MPI uses SLURM to launch jobs on an HP XC system — however, it manages the global MPI exchange so that all processes can communicate with each other.

See the HP-MPI documentation for more information.

Components, Tools, Compilers, Libraries, and Debuggers

This section provides a brief overview of the some of the common tools, compilers, libraries, and debuggers available for use on HP XC.

An HP XC system is integrated with several open source software components. HP XC incorporates the Linux operating system and its standard commands and tools, and does not diminish the Linux ABI in any way. In addition, HP XC incorporates LSF-HPC and SLURM to launch and manage jobs, and includes HP-MPI for high performance, parallel, message-passing applications.

You can use most standard open source compilers and tools on an HP XC system; however, you must purchase them separately. Several open source and commercially available software packages have been tested with the HP XC Software. The following lists some of the software packages that have been tested for use with HP XC. This list provides a sample of what is available on HP XC and is not intended as a complete list. Some packages listed are actually included as part of the HPC Linux distribution and as such can be used as part of the HP XC development environment. The tested software packages include, but are not limited to, the following:

Intel Fortran 95, C, C++ Compiler Version 7.1, 8.0, and 9.0, including OpenMP, for Itanium® (includes ldb debugger)

gcc version 3.2.3 (included in the HP XC distribution)

g77 version 3.2.3 (included in the HP XC distribution)

Portland Group PGI Fortran90, C, C++ Version 5.1, including OpenMP, for CP4000

Quadrics SHMEM, as part of QsNet II user libraries, on Itanium systems connected with the Quadrics QsNet II switch (included in the HP XC distribution)

Etnus TotalView debugger Version 6.4

gdb (part of the HP XC Linux distribution)

Intel MKL V6.1 on Itanium

AMD Math Core Library Version 2.0 on CP4000

valgrind 2.0.0 (http://valgrind.kde.org) in 32-bit mode only

oprofile 0.7.1 (http://oprofile.sourceforge.net)

PAPI 3.2 (http://icl.cs.utk.edu/papi)

Intel Visual Analyzer/Tracer (formally Pallas Vampir and Vampirtrace performance analyzer ) on Itanium

GNU make, including distributed parallel make (included in the HP XC distribution)

Other standard tools and libraries are available and can most likely be used on HP XC as they would on any other standard Linux system. Software that is not described in HP XC documentation may not have been tested with HP XC and may not function in a standard manner.

26 Overview of the User Environment

Image 26
Contents HP XC System Software Users Guide Page Table of Contents Submitting Jobs Configuring Your Environment with ModulefilesDeveloping Applications Using LSF Tuning ApplicationsUsing Slurm Debugging ApplicationsGlossary 109 Index 115 Advanced TopicsExamples List of Figures Page List of Tables Determining the Node PlatformPage List of Examples Submitting a Job ScriptPage Document Organization About This DocumentIntended Audience This document is organized as followsHP XC Information Supplementary Information $ man lsfcommandnameFor More Information $ man discover $ man 8 discover Related InformationManpages $ man -k keywordRelated Compiler Web Sites Related Linux Web SitesRelated MPI Web Sites Additional PublicationsEnvironment Variable Typographic ConventionsHP Encourages Your Comments User inputHP XC System Software Overview of the User EnvironmentSystem Architecture Operating SystemNode Specialization Storage and I/OLocal Storage File SystemSAN Storage File System LayoutNetwork Address Translation NAT Determining System Configuration InformationSystem Interconnect Network Modules Commands User Environment Parallel Applications Application Development EnvironmentRun-Time Environment Serial ApplicationsHow LSF-HPC and Slurm Interact Load Sharing Facility LSF-HPCStandard LSF Mpirun command Components, Tools, Compilers, Libraries, and DebuggersUsing the Secure Shell to Log Using the SystemLVS Login Routing Logging In to the SystemGetting Information About Resources IntroductionGetting Information About Queues Performing Other Common User Tasks Getting System Help and Information $ man sinfoConfiguring Your Environment with Modulefiles Overview of ModulesSupplied Modulefiles Viewing Loaded Modulefiles Modulefiles Automatically Loaded on the SystemViewing Available Modulefiles Loading a ModulefileModulefile Conflicts Automatically Loading a Modulefile at LoginUnloading a Modulefile Loading a Modulefile for the Current Session$ module load modules $ man modulefile Creating a ModulefileViewing Modulefile-Specific Help $ module help totalviewPage Compilers Developing ApplicationsApplication Development Environment Overview MPI Compiler Examining Nodes and Partitions Before Running JobsInterrupting a Job Partition Avail Timelimit Nodes State NodelistSerial Application Build Environment Setting Debugging OptionsDeveloping Serial Applications Building Serial ApplicationsModulefiles Developing Parallel ApplicationsParallel Application Build Environment OpenMPMPI Library PthreadsQuadrics Shmem Intel Fortran and C/C++CompilersBuilding Parallel Applications Examples of Compiling and Linking HP-MPI Applications Developing LibrariesDesigning Libraries for the CP4000 Platform To build a 64-bit application, you might enter Linkcommand 32-bit -L/opt/mypackage/lib/i686 -lmystuffLinkcommand 64-bit -L/opt/mypackage/lib/x8664 -lmystuff ExtSLURMslurm-arguments Submitting JobsOverview of Job Submission Submitting a Serial Job with the LSF bsub Command Submitting a Serial Job Using Standard LSFSubmitting a Serial Job Using LSF-HPC $ bsub hostnameSubmitting a Serial Job Through Slurm only Bsub -nnum-procsbsub-optionsmpijob Submitting a Non-MPI Parallel Job$ bsub -n4 -I srun hostname Mpirun mpirun--options-srunsrun-optionsmpi-jobnameBsub -nnum-procs bsub-optionsscript-name Submitting a Batch Job or Job Script$ bsub -n4 -I mpirun -srun ./helloworld $ bsub -I -n4 Myscript.sh $ cat myscript.sh #!/bin/shSrun hostname mpirun -srun hellompi $ bsub -n4 -ext SLURMnodes=4 -I ./myscript.sh$ cat ./envscript.sh #!/bin/sh name=`hostname` Running Preexecution Programs$ bsub -n4 -I ./myscript.sh Opt/hptc/bin/srun Mypreexec Debugging Parallel Applications Debugging ApplicationsDebugging Serial Applications TotalViewSSH and TotalView Setting Up TotalViewUsing TotalView with Slurm Module load mpimodule load totalviewDebugging an Application Using TotalView with LSF-HPCSetting TotalView Preferences Directories in File ⇒ Search Path Sourcefile initfdte.f was not found, using assembler modeDebugging Running Applications $ mpirun -srun -n2 Psimple$ squeue $ scancel --user usernameExiting TotalView Page Building a Program Intel Trace Collector and HP-MPI Tuning ApplicationsUsing the Intel Trace Collector and Intel Trace Analyzer Libs Running a Program Intel Trace Collector and HP-MPIVisualizing Data Intel Trace Analyzer and HP-MPI CldflagsUsing the Intel Trace Collector and Intel Trace Analyzer Page Srun Squeue Scancel Sinfo Scontrol Using SlurmLaunching Jobs with the srun Command Introduction to SlurmUsing the srun Command with LSF-HPC Monitoring Jobs with the squeue CommandUsing the srun Command with HP-MPI Srun Roles and ModesJob Accounting Terminating Jobs with the scancel CommandGetting System Information with the sinfo Command # chmod a+r /hptccluster/slurm/job/jobacct.log Fault ToleranceSecurity Using LSF-HPC Using LSFUsing Standard LSF on an HP XC System Introduction to LSF-HPC in the HP XC Environment Overview of LSF-HPCResources Differences Between LSF-HPC and Standard LSFHostname Hostname Status JL/U MAX Njobs RUN Ssusp Ususp RSVUnknown Unknown Job Terminology$ ssh n15 lshosts HP XCCompute Node Resource Support SLURMnodelist =nodelist if specified$ bsub -n 10 -ext SLURMnodes=10exclude=n16 -I srun hostname $ bsub -n 10 -I srun hostname$ bsub -n 10 -ext SLURMnodes=10 -I srun hostname $ bsub -n 10 -ext SLURMconstraint=dualcore -I srun hostnameHow LSF-HPC and Slurm Launch and Manage a Job $ bsub -n4 -ext SLURMnodes=4 -o output.out ./myscriptJob Startup and Job Control #!/bin/sh hostname srun hostname Mpirun -srun ./hellompiGetting the Status of LSF-HPC Determining the LSF Execution HostDetermining Available LSF-HPC System Resources Getting Information About LSF Execution Host NodeGetting Information About the lsf Partition Getting Host Load InformationExamining LSF-HPC System Queues SLINUX6$ sinfo -p lsf -lNe Summary of the LSF bsub Command Format$ sinfo -p lsf Bsub -n num-procs-ext SLURMslurm-arguments \ LSF-SLURM External SchedulerFor information about running scripts Bsub-options srun srun-optionsjobname job-optionsWaiting for dispatch ... Starting on lsfhost.localdomain n1 Submitting a Job from a Non-HP XC HostStarting on lsfhost.localdomain n6 Type=SLINUX64Slurmid=slurmjobidncpus=slurmnprocsslurmalloc=nodelist Getting Information About JobsGetting Job Allocation Information $ bjobs -lTime stamp Examining the Status of a Job$ bhist -l $ bjobsSummary of time in seconds spent Various States Viewing the Historical Information for a Job$ bhist Jobid User Jobname Pend Psusp RUN Ususp Ssusp Unkwn TotalTranslating Slurm and LSF-HPC JOBIDs $ bjobs -l 124 grep slurm Working Interactively Within an LSF-HPC Allocation$ bsub -I -n4 -ext SLURMnodes=4 /bin/bash $ srun --jobid=150 hostname$ export SLURMJOBID=150 $ export SLURMNPROCS=4 $ unset SlurmjobidAlternatively, you can use the following $ unset Slurmjobid $ unset Slurmnprocs$ srun --jobid=250 uptime Job 125 is submitted to the default queue normalLSF-HPC Equivalents of Slurm srun Options $ bsub -n4 -ext SLURMnodes=4 -o %J.out sleepBsub -iinputfile Mpi=mpitype Quit-on-interrupt Page Running an X Terminal Session from a Remote Node Advanced TopicsEnabling Remote Execution with OpenSSH Determining IP Address of Your Local MachineLogging in to HP XC System Running an X terminal Session Using SlurmRunning an X terminal Session Using LSF-HPC $ srun -n4 hostname n46 Using the GNU Parallel Make Capability$ bsub -n4 -Ip srun -n1 xterm -display $ srun -n2 hostname n46$ cd subdir srun -n1 -N1 $MAKE -j4 Example Procedure $ make PREFIX=’srun -n1 -N1 MAKEJ=-j4Modified Makefile is invoked as follows Local Disks on Compute NodesPerformance Considerations $ make PREFIX=srun -n1 -N1 MAKEJ=-j4Private File View Communication Between NodesShared File View Fp = fopen myfile, a+Page Launching a Serial Interactive Shell Through LSF-HPC Appendix a ExamplesBuilding and Running a Serial Application Examine the LSF execution host informationRunning LSF-HPC Jobs with a Slurm Allocation Request R15s r1m r15m It tmp swp mem LoadSched LoadStop Launching a Parallel Interactive Shell Through LSF-HPCExample 2. Four cores on Two Specific Nodes SLURMnodes=2124 Lsfad Examine the the running jobs information$ hostname n16 $ srun hostname n5 $ bjobs Show the environment Submitting a Simple Job Script with LSF-HPCExamine the the finished jobs information Display the scriptSubmit the job Run some commands from the pseudo-terminalSubmitting an Interactive Job with LSF-HPC Show the job allocationView the interactive jobs Submitting an HP-MPI Job with LSF-HPCExit the pseudo-terminal View the finished jobsLsfhost.localdomai View the running jobView the finished job $ bsub -n 8 -R ALPHA5 SLINUX64 \ -ext SLURMnodes=4-4 myjob Using a Resource Requirements String in an LSF-HPC CommandStates by date and time 108 Glossary As local storage To the queueFirst-come See Fcfs First-served Global storage Are not appropriate for replicationLinux Virtual See LVS Server Load file Single commandLogin requests and directs them to a node with a login role LSF master hostNotably to install and remove software packages Network See NIS Information ServicesRemotely. PXE booting is configured at the Bios level Slurm backupSymmetric See SMP Multiprocessing Power available per unit of spaceSsh 114 Index Index PGI Utilities, 63 Slurm commands