Manuals
/
HP
/
Computer Equipment
/
Software
HP
XC System 4.x Software
manual
Models:
XC System 4.x Software
1
80
135
135
Download
135 pages
34.25 Kb
77
78
79
80
81
82
83
84
Install
Fault Tolerance
Login
Setting Up TotalView
Commands
Example Procedure
Tuning Applications
Setting Debugging Options
$ scancel --user username
Using TotalView with Slurm
Page 80
Image 80
80
Page 79
Page 81
Page 80
Image 80
Page 79
Page 81
Contents
HP XC System Software XC User Guide
Page
Table of Contents
Configuring Your Environment with Modulefiles
Compiling and Linking Serial Applications
Debugging Applications
Srun Roles Srun Modes
107
115
125
131
11-1
List of Figures
10-1
List of Tables
List of Examples
About This Document
Intended Audience
New and Changed Information in This Edition
Typographic Conventions
HP XC and Related HP Products Information
Related Information
HP XC System Software Users Guide
Https//computing.llnl.gov/linux/slurm/documentation.html
Http//systemimager.org
Manpages
Http//www-unix.mcs.anl.gov/mpi
$ man -k keyword
HP Encourages Your Comments
$ man discover $ man 8 discover
Page
Page
Only to the administrator of the HP XC system
Node Specialization
Determining the Node Platform
/proc/cpuinfo file is dynamic
Storage and I/O
File System
SAN Storage
Local Storage
HP XC System Interconnects
System Interconnect Network
File System Layout
Determining System Configuration Information
User Environment
Network Address Translation NAT
1 LVS
Parallel Applications
Commands
Application Development Environment
Run-Time Environment
Serial Applications
Slurm
Load Sharing Facility LSF
Requested by the HP-MPI mpirun command
How LSF and Slurm Interact
HP-MPI
Nodes for the job
Components, Tools, Compilers, Libraries, and Debuggers
Page
Using the System
LVS Login Routing
Using the Secure Shell to Log
Logging In to the System
Introduction
Getting Information About Queues
Getting Information About Resources
$ bqueues
Performing Other Common User Tasks
Getting System Help and Information
Determining the LSF Cluster Name and the LSF Execution Host
$ lsid
$ man sinfo
Configuring Your Environment with Modulefiles
Overview of Modules
Supplied Modulefiles
Supplied Modulefiles
HP-MPI
Modulefiles Automatically Loaded on the System
Viewing Available Modulefiles
Viewing Loaded Modulefiles
Loading a Modulefile
Modulefile Conflicts
Unloading a Modulefile
Viewing Modulefile-Specific Help
Creating a Modulefile
$ module load modules $ man modulefile
Developing Applications
Application Development Environment Overview
MPI Compiler
Compiler Commands
Compilers
Setting Debugging Options
Examining Nodes and Partitions Before Running Jobs
Interrupting a Job
Developing Serial Applications
Developing Parallel Applications
Serial Application Build Environment
Building Serial Applications
Parallel Application Build Environment
Modulefiles
HP-MPI
OpenMP
Pthreads
MPI Library
Intel Fortran and C/C++Compilers
PGI Fortran and C/C++ Compilers
GNU C and C++ Compilers
Building Parallel Applications
Designing Libraries for the CP4000 Platform
To compile and link a C application using the mpicc command
Developing Libraries
To build a 64-bit application, you might enter
Linkcommand 32-bit -L/opt/mypackage/lib/i686 -lmystuff
Linkcommand 64-bit -L/opt/mypackage/lib/x8664 -lmystuff
Developing Applications
Submitting a Serial Job Using LSF
Submitting a Serial Job with the LSF bsub Command
Submitting Jobs
Overview of Job Submission
Example 5-2 Submitting a Serial Job Using LSF
Submitting a Serial Job Through Slurm Only
Example 5-1 Submitting a Job from the Standard Input
$ bsub -I srun hostname
Following is the command line used to compile this program
Submitting a Parallel Job
Submitting a Non-MPI Parallel Job
To submit a parallel job
Example 5-5 Submitting a Non-MPI Parallel Job
$ bsub -n4 -I srun hostname
Example 5-7 Submitting an MPI Job
$ bsub -n4 -I mpirun -srun ./helloworld
Arguments for the Slurm External Scheduler
Nodelist=list of nodes
$ bsub -n 10 -ext SLURMnodelist=n1-10 srun hostname
Submitting a Batch Job or Job Script
Example 5-14 Submitting a Job Script
$ bsub -n 10 -ext SLURMconstraint=dualcore -I srun hostname
$ bqueues -l dualcore grep Slurm
$ bsub -n4 -ext SLURMnodes=4 -I ./myscript.sh
$ bsub -n4 -I ./myscript.sh
Using a Script to Submit Multiple Jobs
Using a Makefile to Submit Multiple Jobs
Use the squeue command to acquire information on the jobs
Submitting Multiple MPI Jobs Across the Same Set of Nodes
$ tail 113.out
Following command line makes the program and executes it
$ cat mymake
$ cat 117.out
Submitting a Job from a Host Other Than an HP XC Host
Running Preexecution Programs
Type=SLINUX64
$ bsub -R type=SLINUX64 -n4 -I srun hostname
Page
Debugging Applications
Debugging Serial Applications
Debugging Parallel Applications
TotalView
Setting Up TotalView
Debugging with TotalView
SSH and TotalView
Module load mpi module load totalview
Using TotalView with Slurm
Using TotalView with LSF
Setting TotalView Preferences
$ srun -Nx-A $ mpirun -tv -srun application
Debugging an Application
$ mpirun -tv -srun -n2 ./Psimple
$ scancel --user username
Debugging Running Applications
Exiting TotalView
Run the application
Page
Monitoring Node Activity
Xtools Utilities
Running Performance Health Tests
Where
You can list the available tests with the ovp -lcommand
$ ovp -l
$ ovp --verify=perfhealth/cpuusage
$ ovp --verbose --verify=perfhealth/cpuusage
HOMEDIRECTORY/ovpn16mmddyy.log
HOMEDIRECTORY/ovpn16mmddyyr1.log
Page
Building a Program Intel Trace Collector and HP-MPI
Tuning Applications
Using the Intel Trace Collector and Intel Trace Analyzer
Running a Program Intel Trace Collector and HP-MPI
Example 8-1 The vtjacobic Example Program
For more information, see the following Web site
Example 8-2 C Example Running the vtjacobic Example Program
HP-MPI and the Intel Trace Collector
Installation Kit
Intel Trace Collector and Analyzer with HP-MPI on HP XC
Running a Program
Following is a Fortran example called vtjacobif
# bsub -n4 -I mpirun.mpich -np 2 ./vtjacobic
Running a Program Across Nodes Using LSF
Visualizing Data Intel Trace Analyzer and HP-MPI
Page
Using Slurm
Launching Jobs with the srun Command
Introduction to Slurm
Slurm Utilities
Monitoring Jobs with the squeue Command
Using the srun Command with HP-MPI
Using the srun Command with LSF
Srun Roles and Modes
Example 9-5cancels all pending jobs
Terminating Jobs with the scancel Command
Getting System Information with the sinfo Command
Fault Tolerance
Job Accounting
Security
$ sinfo -R
Using LSF
Information for LSF
Overview of LSF Integrated with Slurm
$ squeue --jobs $SLURMJOBID
Lsfadmin@n16 ~$ bsub -n4 -I srun hostname
Example 10-1 Examples of LSF Job Launch
Lshosts and lsload commands display for each of these items
Differences Between LSF and LSF Integrated with Slurm
Job Terminology
Connection back to the terminal from which the job was
Submitted. This job may run immediately, or it may run
$ ssh n15 lshosts
Batch jobs are submitted with the srun -bcommand. By
Default, the output is written to
Are submitted with the bsub -Icommand
Batch system scheduling policies
Using LSF Integrated with Slurm in the HP XC Environment
Useful Commands
Submitting Jobs
Job Startup and Job Control
LSF with Slurm Job Launch Exit Codes
LSF-SLURM External Scheduler
How LSF and Slurm Launch and Manage a Job
User logs in to login node n16
How LSF and Slurm Launch and Manage a Job
Determining the LSF Execution Host
Determining Available System Resources
Getting Information About the LSF Execution Host Node
Following example shows the output from the lshosts command
Examining System Core Status
Getting Information About Jobs
Getting Host Load Information
Examining System Queues
Getting Information About the lsf Partition
Getting Job Allocation Information
This allocation string has the following values
LSF job with Slurm allocated resources
Than what the job requests
Example 10-4 Job Allocation Information for a Finished Job
Example 10-5 Using the bjobs Command Short Output
Examining the Status of a Job
Example 10-6 Using the bjobs Command Long Output
Example 10-7 Using the bhist Command Short Output
Output Provided by the bhist Command
Viewing the Historical Information for a Job
Translating Slurm and LSF JOBIDs
Example 10-8 Using the bhist Command Long Output
Use the bjobs command to view the Slurm Jobid
Working Interactively Within an Allocation
$ sacct -j
$ bsub -I -n4 -ext SLURMnodes=4 /bin/bash
$ bjobs -l 124 grep slurm
Alternatively, you can use the following
Example 10-9 Launching an Interactive MPI Job
This example assumes 2 cores per node
LSF Equivalents of Slurm srun Options
3describes the srun options and lists their LSF equivalents
LSF Equivalents of Slurm srun Options
$ unset Slurmjobid $ unset Slurmnprocs
Requests a specific list of nodes.
Slurm
Suppress informational message
Advanced Topics
Enabling Remote Execution with OpenSSH
Running an X Terminal Session from a Remote Node
Determining IP Address of Your Local Machine
Logging in to HP XC System
Running an X terminal Session Using Slurm
Running an X terminal Session Using LSF
Using the GNU Parallel Make Capability
Options used in this command are Allocate 4 cores
Run the job on 1 core
Monitors display server address
$ cd subdir srun $MAKE
$ cd subdir srun -n1 -N1 $MAKE -j4
Example Procedure
$ make PREFIX=’srun -n1 -N1 MAKEJ=-j4
$ make PREFIX=srun -n1 -N1 MAKEJ=-j4
Local Disks on Compute Nodes
Modified Makefile is invoked as follows
Using Mpich on the HP XC System
11.5 I/O Performance Considerations
Communication Between Nodes
Shared File View
Bsub command launches the wrapper script
Using Mpich with Slurm Allocation
Using Mpich with LSF Allocation
Examples
Building and Running a Serial Application
Launching a Serial Interactive Shell Through LSF
Examine the LSF execution host information
Running LSF Jobs with a Slurm Allocation Request
Launching a Parallel Interactive Shell Through LSF
Example 2. Four Cores on Two Specific Nodes
Examine the running jobs information
$ hostname n16 $ srun hostname n5 $ bjobs
Submitting a Simple Job Script with LSF
Examine the finished jobs information
Show the environment
Display the script
Submitting an Interactive Job with LSF
Run the job
Submit the job
Show the job allocation
Run some commands from the pseudo-terminal
Show the Slurm job ID
Exit the pseudo-terminal
View the interactive jobs
View the running job
Submitting an HP-MPI Job with LSF
View the node state
$ bsub -n 8 -R ALPHA5 SLINUX64 \ -ext SLURMnodes=4-4 myjob
Using a Resource Requirements String in an LSF Command
View the finished job
124
Glossary
Network Availability set
Fcfs
Ipmi
LVS
PXE
SVA
Index
Index
LVS
PGI
135
Top
Page
Image
Contents