June Product Version
Hewlett-Packard Company Palo Alto, California
Page
Contents
Developing Applications
5.4
Debugging Applications
Tuning Applications
Using Slurm
Using LSF
Using HP-MPI
Using HP Mlib
6.4
Glossary Index Examples
Advanced Topics
Examples
Figures
Tables
Page
Document Organization
About This Document
Intended Audience
Linux Administration Handbook
HP XC Information
QuickSpecs for HP XC System Software
HP XC Program Development Environment
For More Information
Supplementary Information
HP Message Passing Interface
HP Mathematical Library
Manpages
Http//supermon.sourceforge.net
Http//systemimager.org
Http//sourceforge.net/projects/modules
Http//www-unix.mcs.anl.gov/mpi
Related Information
Http//linuxvirtualserver.org
Bold text
# cd /opt/hptc/config/sbin
Typographical Conventions
HP Encourages Your Comments
Ctrl/x
Page
Overview of the User Environment
System Architecture
Operating System
Node Specialization
Storage and I/O
File System
SAN Storage
Local Storage
System Interconnect Network
File System Layout
Commands
User Environment
Network Address Translation NAT
1 LVS
Serial Applications
Application Development Environment
Parallel Applications
Run-Time Environment
Slurm
Load Sharing Facility LSF-HPC
How LSF-HPC and Slurm Interact
Components, Tools, Compilers, Libraries, and Debuggers
HP-MPI
8Overview of the User Environment
Using the System
Configuring Your Environment with Modulefiles
LVS Login Routing
Using ssh to Log
2Using the System
Modulefile Sets the HP XC User Environment
Supplied Modulefiles
Supplied Modulefiles
Modulefiles Automatically Loaded on the System
Viewing Available Modulefiles
Viewing Loaded Modulefiles
Loading a Modulefile
Automatically Loading a Modulefile at Login
Unloading a Modulefile
Modulefile Conflicts
Loading a Modulefile for the Current Session
Creating a Modulefile
Viewing Modulefile-Specific Help
$ module load modules $ man modulefile
$ module help totalview
Launching and Managing Jobs Quick Start
Introduction
Getting Information About Queues
Getting Information About Resources
Getting Information About the System’s Partitions
Launching Jobs
Submitting a Serial Job
Example 2-1 Submitting a Serial Job
Using Slurm Options with the LSF External Scheduler
Submitting a Non-MPI Parallel Job
Example 2-2 Submitting a Non-MPI Parallel Job
$ bsub -n4 -I srun hostname
$ bsub -n4 Mpirun -srun ./helloworld
Submitting an MPI Job
Example 2-4 Running an MPI Job with LSF
$ bsub -I -n4 myjobscript.sh
Submitting a Batch Job or Job Script
Example 2-6 Submitting a Job Script
Performing Other Common User Tasks
Getting System Help and Information
$ man sinfo
$ man -k keyword
Page
Developing Applications
Overview
Using Compilers
Standard Linux Compilers
Intel Compilers
PGI Compilers
Setting Debugging Options
Checking Nodes and Partitions Before Running Jobs
Interrupting a Job
Developing Serial Applications
Using Mlib in Serial Applications
Developing Parallel Applications
Serial Application Build Environment
Building Serial Applications
Parallel Application Build Environment
Modulefiles
HP-MPI
OpenMP
Quadrics Shmem
Mlib Math Library
MPI Library
$ mpicc object1.o ... -pthread -o myapp.exe
Intel Fortran and C/C++Compilers
PGI Fortran and C/C++ Compilers
GNU C and C++ Compilers
GNU Parallel Make
Reserved Symbols and Names
Building Parallel Applications
Compiling and Linking Non-MPI Applications
Compiling and Linking HP-MPI Applications
$ mpicc -c -g foo.c
Developing Libraries
Designing Libraries for XC4000
Using the GNU Parallel Make Capability
Advanced Topics
Example 3-1 Directory Structure
Example 3-2 Recommended Directory Structure
$ cd subdir srun $MAKE
$ cd subdir srun -n1 -N1 $MAKE -j4
Example Procedure
$ make PREFIX=’srun -n1 -N1 MAKEJ=’-j4’
$ make PREFIX=’srun -n1 -N1’ MAKEJ=’-j4’
Local Disks on Compute Nodes
3 I/O Performance Considerations
Shared File View
Private File View
Communication Between Nodes
Page
Debugging Applications
Debugging Serial Applications
Debugging Parallel Applications
TotalView
SSH and TotalView
Setting Up TotalView
Debugging with TotalView
Using TotalView with Slurm
Using TotalView with LSF-HPC
$ srun -Nx-A $ mpirun -tv -srun application
$ bsub -nx-ext SLURMnodes=x \ -Is /usr/bin/xterm
Starting TotalView for the First Time
$ totalview
TotalView Preferences Window
Preferences window, click on the Launch Strings tab
Debugging Applications
$ mpirun -tv -srun -n2 ./Psimple
Debugging an Application
$ mpicc -g -o Psimple simple.c -lm
TotalView Process Window Example
Debugging Running Applications
$ mpirun -srun -n2 Psimple
$ squeue
$ scancel --user username
Exiting TotalView
Page
Tuning Applications
Using the Intel Trace Collector/Analyzer
Building a Program Intel Trace Collector and HP-MPI
Example
Example Running the vtjacobic Example Program
Running a Program Intel Trace Collector and HP-MPI
Visualizing Data Intel Trace Analyzer and HP-MPI
Using Slurm
Slurm Commands
Command Function
Introduction
Accessing the Slurm Manpages
Launching Jobs with the srun Command
Srun Roles and Modes
Example 6-1 Simple Launch of a Serial Program
Srun Roles
Srun Modes
Srun Signal Handling
Srun Run-Mode Options
Batch
Allocate
Srun Resource-Allocation Options
Cpt --cpus-per-task=cpt
Part --partition=part
Minutes --time=minutes
Nthreads --threads=nthreads
Srun Control Options
Srun I/O Options
Mode --output=mode
Mode --input=mode
Mode --error=mode
Label
Srun Constraint Options
Clist --constraint=clist
Contiguous=yesno
Mem=size
Mincpus=n
Vmem=size
Monitoring Jobs with the squeue Command
Using srun with HP-MPI
Using srun with LSF
Srun Environment Variables
Killing Jobs with the scancel Command
Getting System Information with the sinfo Command
Fault Tolerance
Job Accounting
Security
$ sinfo -R
Overview of LSF
Using LSF
Introduction to LSF in the HP XC Environment
Topology Support
Nodelist= list-of-nodes Exclude= list-of-nodes
$ bsub -n 10 -ext SLURMnodes=10 srun myapp
$ bsub -n 10 -ext SLURMnodes=10exclude=n16 srun myapp
$ bsub -Is hostname
How LSF and Slurm Launch and Manage a Job
$ bqueues -l normal grep Jobstarter
How LSF-HPC and Slurm Launch and Manage a Job
$ bsub -n4 -ext SLURMnodes=4 -o output.out ./myscript
Differences Between LSF on HP XC and Standard LSF
Determining Execution Host
Determining Available System Resources
Getting Status of LSF
Job Startup and Job Control
Getting Information About LSF-HPC Execution Host Node
Getting Host Load Information
Submitting Jobs
Checking LSF System Queues
Getting Information About the lsf Partition
$ sinfo -p lsf
Summary of the LSF bsub Command Format
Bsub bsuboptions jobname joboptions
LSF-SLURM External Scheduler
Slurm Arguments Function
Starting on lsfhost.localdomain n6
Submitting a Serial Job
Submitting a Job in Parallel
Submitting an HP-MPI Job
Example 7-5 Submitting an Interactive Serial Job
$ bsub -n4 -I mpirun -srun ./helloworld
Submitting a Batch Job or Job Script
Example 7-6 Submitting an HP-MPI Job
Examples
Example 7-8 Submitting a Batch Job Script
$ bsub -n4 -I ./myscript.sh
$ bsub -n4 -ext SLURMnodes=4 -I ./myscript.sh
Type=SLINUX64
Submitting a Job from a Non-HP XC Host
$ bsub -n4 -I ./myscript.sh -n8 -O
Getting Information About Jobs
Getting Job Allocation Information
Job Allocation Information for a Running Job
$ bsub -R type=SLINUX64 -n4 -I srun hostname
Job Allocation Information for a Finished Job
Example 7-13 Using the bjobs Command Short Output
Checking Status of a Job
Example 7-14 Using the bjobs Command Long Output
Example 7-15 Using the bhist Command Short Output
Output Provided by the bhist Command
Viewing a Job’s Historical Information
Submitting an Interactive Job to Launch the xterm Program
Example 7-16 Using the bhist Command Long Output
Working Interactively Within an LSF-HPC Allocation
Example 7-17 View Your Environment
Example 7-18 View Your Allocation in Slurm
Example 7-19 View Your Running Job in LSF
Example 7-20 View Job Details in LSF
Submitting an Interactive Job to Launch a Shell
Example 7-21 Running Jobs from an xterm Window
Example 7-22 Submitting an Interactive Shell Program
$ hostname $ srun hostname $ srun -n2 hostname
LSF Equivalents of Slurm srun Options
LSF Equivalents of Slurm srun Options
$ srun hostname n1 $ exit
Srun Option Description LSF Equivalent
24Using LSF
Control the parallel job
Page
Using HP-MPI
Setting Environment Variables
HP-MPI Directory Structure
Compiling and Running Applications
Building and Running an Example Application
Example Application helloworld
Building and Running helloworld
$ mpicc -o helloworld $MPIROOT/help/helloworld.c
$ $MPIROOT/bin/mpirun -srun -n4 helloworld
Launching MPI Jobs
HP-MPI options allowed with -srun
Creating Subshells and Launching Jobsteps
System Interconnect Selection
$ mpirun -srun -n6 -O -N2 -m cyclic ./a.out host1 rank1
$ mpirun -srun -n4 -N2 -O -m cyclic ./a.out host1 rank1
Using LSF and HP-MPI
$ mpirun -subnet 192.168.1.1 -prot -srun -n4 ./a.out
$ /usr/sbin/ifconfig -a
System Interconnect Support
MPI Versioning
Example 8-5 Allocating 12 Processors on 6 Nodes
Allowing Windows to Use Exclusive Locks
32-Bit Builds on XC4000
Truncated Messages
Mpirun Command Options
$ mpirun -TCP -srun -N8 ./a.out
Environment Variables
$MPIROOT/bin/mpirun -v -prot -np 2 /path/to/program.x
Export MPIPHYSICALMEMORY=1048576
Export MPIPINPERCENTAGE=30
Export MPIPAGEALIGNMEM=1
Export MPIMAXWINDOW=10
Mpich Object Compatibility
Export MPIUSELIBELAN=0
Export MPIUSELIBELANUSE=5
$MPIROOT/bin/mpirun.mpich -np 2 ./prog.x
HP-MPI Documentation and Manpages
At http//docs.hp.com
HP-MPI Manpage Categories
Category Manpages Description
Additional Information, Known Problems, and Work-arounds
Using HP Mlib
Intel Compiler Notes
Version 8 Fortran Compiler
Version 7 Fortran Compiler
HP Mlib for the HP XC6000 Platform
Mlib and Module Files
Platform Support
Library Support
Compiling and Linking
Using Intel Compilers with HP Mlib
Modulefiles and Mlib
HP Mlib for the HP XC4000 Platform
Licensing
Mlib Manpages
Linking SuperLUDIST
$ pgcc options file
6Using HP Mlib
Advanced Topics
Enabling Remote Execution with OpenSSH
Running an X Terminal Session from a Remote Node
Determining IP Address of Your Local Machine
Running an X terminal Session Using Slurm
Logging in to HP XC System
$ hostname
$ host mymachine
$ bjobs
Running an X terminal Session Using LSF
$ bsub -n4 -Ip srun -n1 xterm -display
Page
Launching a Serial Interactive Shell Through LSF
Examples
Building and Running a Serial Application
Running LSF Jobs with a Slurm Allocation Request
Example 1. Two Processors on Any Two Nodes
View the job
Launching a Parallel Interactive Shell Through LSF
Example 2. Four Processors on Two Specific Nodes
Check the running job’s information
$ bsub -Is -n4 -ext SLURMnodes=4 /bin/bash
Submitting a Simple Job Script with LSF
Check the finished job’s information
Show the environment
Display the script
Submitting an Interactive Job with LSF
Show the job allocation
Show the Slurm job ID
$ bsub -n8 -Ip /bin/sh
Submitting an HP-MPI Job with LSF
$ bjobs -l
View the running job
View the finished job
Using a Resource Requirements String in an LSF Command
$ bsub -n 8 -R ALPHA5 SLINUX64 \ -ext SLURMnodes=4-4 myjob
Page
Glossary
Extensible firmware interface
External network node
Fairshare
First come first served
Image server
Integrated Lights Out
Interconnect
Internet address
Network Information Services
LSF master host
Management Processor
Master host
Root Administration Switch
Parallel application
Resource manager role
Role
Symmetric multiprocessing
Glossary-6
Index
Index-1
Gdb, 4-1 GNU
Index-2
Modulefile
Index-3
Resource manager, 7-1role Serial applications
Index-4