Manuals
/
HP
/
Computer Equipment
/
Software
HP
XC System 4.x Software
manual
124
Models:
XC System 4.x Software
1
124
135
135
Download
135 pages
34.25 Kb
121
122
123
124
125
126
127
128
<
>
Install
Fault Tolerance
Login
Setting Up TotalView
Commands
Example Procedure
Tuning Applications
Setting Debugging Options
$ scancel --user username
Using TotalView with Slurm
Page 124
Image 124
124
Page 123
Page 125
Page 124
Image 124
Page 123
Page 125
Contents
HP XC System Software XC User Guide
Page
Table of Contents
Configuring Your Environment with Modulefiles
Compiling and Linking Serial Applications
Debugging Applications
Srun Roles Srun Modes
107
115
125
131
10-1
List of Figures
11-1
List of Tables
List of Examples
About This Document
Intended Audience
New and Changed Information in This Edition
Typographic Conventions
HP XC and Related HP Products Information
Related Information
HP XC System Software Users Guide
Https//computing.llnl.gov/linux/slurm/documentation.html
Http//systemimager.org
Manpages
Http//www-unix.mcs.anl.gov/mpi
$ man discover $ man 8 discover
HP Encourages Your Comments
$ man -k keyword
Page
Page
Only to the administrator of the HP XC system
Node Specialization
Determining the Node Platform
/proc/cpuinfo file is dynamic
Storage and I/O
File System
SAN Storage
Local Storage
File System Layout
System Interconnect Network
HP XC System Interconnects
Determining System Configuration Information
User Environment
Network Address Translation NAT
1 LVS
Application Development Environment
Commands
Parallel Applications
Run-Time Environment
Serial Applications
Slurm
Load Sharing Facility LSF
Requested by the HP-MPI mpirun command
How LSF and Slurm Interact
HP-MPI
Nodes for the job
Components, Tools, Compilers, Libraries, and Debuggers
Page
Using the System
LVS Login Routing
Using the Secure Shell to Log
Logging In to the System
Introduction
Getting Information About Queues
Getting Information About Resources
$ bqueues
Performing Other Common User Tasks
Getting System Help and Information
Determining the LSF Cluster Name and the LSF Execution Host
$ lsid
$ man sinfo
Configuring Your Environment with Modulefiles
Overview of Modules
Supplied Modulefiles
Supplied Modulefiles
HP-MPI
Modulefiles Automatically Loaded on the System
Viewing Available Modulefiles
Viewing Loaded Modulefiles
Loading a Modulefile
Viewing Modulefile-Specific Help
Unloading a Modulefile
Modulefile Conflicts
Creating a Modulefile
$ module load modules $ man modulefile
Developing Applications
Application Development Environment Overview
Compilers
Compiler Commands
MPI Compiler
Setting Debugging Options
Examining Nodes and Partitions Before Running Jobs
Interrupting a Job
Developing Serial Applications
Developing Parallel Applications
Serial Application Build Environment
Building Serial Applications
Parallel Application Build Environment
Modulefiles
HP-MPI
OpenMP
Pthreads
MPI Library
Intel Fortran and C/C++Compilers
PGI Fortran and C/C++ Compilers
GNU C and C++ Compilers
Building Parallel Applications
Developing Libraries
To compile and link a C application using the mpicc command
Designing Libraries for the CP4000 Platform
Linkcommand 64-bit -L/opt/mypackage/lib/x8664 -lmystuff
Linkcommand 32-bit -L/opt/mypackage/lib/i686 -lmystuff
To build a 64-bit application, you might enter
Developing Applications
Submitting a Serial Job Using LSF
Submitting a Serial Job with the LSF bsub Command
Submitting Jobs
Overview of Job Submission
Example 5-2 Submitting a Serial Job Using LSF
Submitting a Serial Job Through Slurm Only
Example 5-1 Submitting a Job from the Standard Input
$ bsub -I srun hostname
Following is the command line used to compile this program
Submitting a Parallel Job
Submitting a Non-MPI Parallel Job
To submit a parallel job
Example 5-5 Submitting a Non-MPI Parallel Job
$ bsub -n4 -I srun hostname
Example 5-7 Submitting an MPI Job
$ bsub -n4 -I mpirun -srun ./helloworld
Arguments for the Slurm External Scheduler
Nodelist=list of nodes
$ bsub -n 10 -ext SLURMnodelist=n1-10 srun hostname
Submitting a Batch Job or Job Script
Example 5-14 Submitting a Job Script
$ bsub -n 10 -ext SLURMconstraint=dualcore -I srun hostname
$ bqueues -l dualcore grep Slurm
$ bsub -n4 -ext SLURMnodes=4 -I ./myscript.sh
$ bsub -n4 -I ./myscript.sh
Using a Script to Submit Multiple Jobs
Using a Makefile to Submit Multiple Jobs
Use the squeue command to acquire information on the jobs
Submitting Multiple MPI Jobs Across the Same Set of Nodes
$ cat mymake
Following command line makes the program and executes it
$ tail 113.out
$ cat 117.out
Submitting a Job from a Host Other Than an HP XC Host
Running Preexecution Programs
Type=SLINUX64
$ bsub -R type=SLINUX64 -n4 -I srun hostname
Page
Debugging Applications
Debugging Serial Applications
Debugging Parallel Applications
TotalView
Setting Up TotalView
Debugging with TotalView
SSH and TotalView
Module load mpi module load totalview
Using TotalView with Slurm
Using TotalView with LSF
Setting TotalView Preferences
$ srun -Nx-A $ mpirun -tv -srun application
Debugging an Application
$ mpirun -tv -srun -n2 ./Psimple
$ scancel --user username
Debugging Running Applications
Exiting TotalView
Run the application
Page
Monitoring Node Activity
Xtools Utilities
Running Performance Health Tests
Where
You can list the available tests with the ovp -lcommand
$ ovp -l
$ ovp --verify=perfhealth/cpuusage
$ ovp --verbose --verify=perfhealth/cpuusage
HOMEDIRECTORY/ovpn16mmddyy.log
HOMEDIRECTORY/ovpn16mmddyyr1.log
Page
Using the Intel Trace Collector and Intel Trace Analyzer
Tuning Applications
Building a Program Intel Trace Collector and HP-MPI
Running a Program Intel Trace Collector and HP-MPI
Example 8-1 The vtjacobic Example Program
For more information, see the following Web site
Example 8-2 C Example Running the vtjacobic Example Program
Intel Trace Collector and Analyzer with HP-MPI on HP XC
Installation Kit
HP-MPI and the Intel Trace Collector
Running a Program
Following is a Fortran example called vtjacobif
Visualizing Data Intel Trace Analyzer and HP-MPI
Running a Program Across Nodes Using LSF
# bsub -n4 -I mpirun.mpich -np 2 ./vtjacobic
Page
Using Slurm
Launching Jobs with the srun Command
Introduction to Slurm
Slurm Utilities
Monitoring Jobs with the squeue Command
Using the srun Command with HP-MPI
Using the srun Command with LSF
Srun Roles and Modes
Getting System Information with the sinfo Command
Terminating Jobs with the scancel Command
Example 9-5cancels all pending jobs
Fault Tolerance
Job Accounting
Security
$ sinfo -R
Using LSF
Information for LSF
Overview of LSF Integrated with Slurm
$ squeue --jobs $SLURMJOBID
Lsfadmin@n16 ~$ bsub -n4 -I srun hostname
Example 10-1 Examples of LSF Job Launch
Lshosts and lsload commands display for each of these items
Differences Between LSF and LSF Integrated with Slurm
Job Terminology
Connection back to the terminal from which the job was
Submitted. This job may run immediately, or it may run
$ ssh n15 lshosts
Batch jobs are submitted with the srun -bcommand. By
Default, the output is written to
Are submitted with the bsub -Icommand
Batch system scheduling policies
Using LSF Integrated with Slurm in the HP XC Environment
Useful Commands
Submitting Jobs
Job Startup and Job Control
How LSF and Slurm Launch and Manage a Job
LSF-SLURM External Scheduler
LSF with Slurm Job Launch Exit Codes
User logs in to login node n16
How LSF and Slurm Launch and Manage a Job
Determining the LSF Execution Host
Determining Available System Resources
Examining System Core Status
Following example shows the output from the lshosts command
Getting Information About the LSF Execution Host Node
Getting Information About Jobs
Getting Host Load Information
Examining System Queues
Getting Information About the lsf Partition
Getting Job Allocation Information
This allocation string has the following values
LSF job with Slurm allocated resources
Than what the job requests
Examining the Status of a Job
Example 10-5 Using the bjobs Command Short Output
Example 10-4 Job Allocation Information for a Finished Job
Example 10-6 Using the bjobs Command Long Output
Example 10-7 Using the bhist Command Short Output
Output Provided by the bhist Command
Viewing the Historical Information for a Job
Use the bjobs command to view the Slurm Jobid
Example 10-8 Using the bhist Command Long Output
Translating Slurm and LSF JOBIDs
Working Interactively Within an Allocation
$ sacct -j
$ bsub -I -n4 -ext SLURMnodes=4 /bin/bash
$ bjobs -l 124 grep slurm
This example assumes 2 cores per node
Example 10-9 Launching an Interactive MPI Job
Alternatively, you can use the following
LSF Equivalents of Slurm srun Options
3describes the srun options and lists their LSF equivalents
LSF Equivalents of Slurm srun Options
$ unset Slurmjobid $ unset Slurmnprocs
Requests a specific list of nodes.
Slurm
Suppress informational message
Advanced Topics
Enabling Remote Execution with OpenSSH
Running an X Terminal Session from a Remote Node
Determining IP Address of Your Local Machine
Running an X terminal Session Using LSF
Running an X terminal Session Using Slurm
Logging in to HP XC System
Using the GNU Parallel Make Capability
Options used in this command are Allocate 4 cores
Run the job on 1 core
Monitors display server address
$ cd subdir srun $MAKE
$ cd subdir srun -n1 -N1 $MAKE -j4
Example Procedure
$ make PREFIX=’srun -n1 -N1 MAKEJ=-j4
Modified Makefile is invoked as follows
Local Disks on Compute Nodes
$ make PREFIX=srun -n1 -N1 MAKEJ=-j4
Using Mpich on the HP XC System
11.5 I/O Performance Considerations
Communication Between Nodes
Shared File View
Using Mpich with LSF Allocation
Using Mpich with Slurm Allocation
Bsub command launches the wrapper script
Examples
Building and Running a Serial Application
Launching a Serial Interactive Shell Through LSF
Examine the LSF execution host information
Running LSF Jobs with a Slurm Allocation Request
Launching a Parallel Interactive Shell Through LSF
Example 2. Four Cores on Two Specific Nodes
Examine the running jobs information
$ hostname n16 $ srun hostname n5 $ bjobs
Submitting a Simple Job Script with LSF
Examine the finished jobs information
Show the environment
Display the script
Submitting an Interactive Job with LSF
Run the job
Submit the job
Show the job allocation
Run some commands from the pseudo-terminal
Show the Slurm job ID
Exit the pseudo-terminal
View the interactive jobs
View the node state
Submitting an HP-MPI Job with LSF
View the running job
View the finished job
Using a Resource Requirements String in an LSF Command
$ bsub -n 8 -R ALPHA5 SLINUX64 \ -ext SLURMnodes=4-4 myjob
124
Glossary
Network Availability set
Fcfs
Ipmi
LVS
PXE
SVA
Index
Index
LVS
PGI
135
Top
Page
Image
Contents