Manuals
/
Brands
/
Computer Equipment
/
Software
/
HP
/
Computer Equipment
/
Software
HP
XC System 4.x Software manual
1
1
135
135
Download
135 pages, 1.41 Mb
HP XC System Software XC User Guide
Version 4.0
HP Part Number:
A-XCUSR-40a
Published: February 2009
Contents
Version
Page
Table of Contents
About This Document
1 Overview of the User Environment
2 Using the System
3 Configuring Your Environment with Modulefiles
4 Developing Applications
5 Submitting Jobs
6 Debugging Applications
7 Monitoring Node Activity
8 Tuning Applications
9 Using SLURM
10 Using LSF
11 Advanced Topics
A Examples
Glossary
Index
List of Figures
List of Tables
List of Examples
About This Document
Intended Audience
New and Changed Information in This Edition
Typographic Conventions
HP XC and Related HP Products Information
Related Information
•http://www.platform.com
—Administering Platform LSF
—Administration Primer
—Platform LSF Reference
—Quick Reference Card
•http://www.balabit.com/products/syslog_ng
syslog-ng
syslog
•http://systemimager.org
•http://linuxvirtualserver.org
Manpages
HP Encourages Your Comments
Page
1.1.1 HP XC System Software
1.1.4 Node Specialization
1.1.5 Storage and I/O
1.1.6 File System
1.1.7 System Interconnect Network
1.1.8 Network Address Translation (NAT)
1.2 Determining System Configuration Information
1.3 User Environment
1.3.1 LVS
1.3.2 Modules
1.3.3 Commands
1.4 Application Development Environment
1.4.1 Parallel Applications
1.4.2 Serial Applications
1.5 Run-TimeEnvironment
1.5.1 SLURM
1.5.2 Load Sharing Facility (LSF)
1.5.3 Standard LSF
1.5.4 How LSF and SLURM Interact
1.5.5 HP-MPI
1.6 Components, Tools, Compilers, Libraries, and Debuggers
Page
2 Using the System
2.1Logging In to the System
2.1.1 LVS Login Routing
2.1.2 Using the Secure Shell to Log In
2.2 Overview of Launching and Managing Jobs
2.2.1 Introduction
2.2.2 Getting Information About Queues
2.2.3 Getting Information About Resources
2.2.4Getting Information About System Partitions
2.2.5 Launching Jobs
2.3 Performing Other Common User Tasks
2.3.1 Determining the LSF Cluster Name and the LSF Execution Host
2.4 Getting System Help and Information
3 Configuring Your Environment with Modulefiles
3.1Overview of Modules
3.2 Supplied Modulefiles
Table 3-1Supplied Modulefiles (continued)
“Viewing Available Modulefiles”
“Viewing Loaded Modulefiles”
3.3 Modulefiles Automatically Loaded on the System
3.4 Viewing Available Modulefiles
3.5 Viewing Loaded Modulefiles
3.6 Loading a Modulefile
3.6.1 Loading a Modulefile for the Current Session
3.7 Unloading a Modulefile
3.8Viewing Modulefile-SpecificHelp
3.9 Modulefile Conflicts
3.10 Creating a Modulefile
4 Developing Applications
4.1Application Development Environment Overview
4.2 Compilers
4.2.1 MPI Compiler
4.3 Examining Nodes and Partitions Before Running Jobs
4.4 Interrupting a Job
4.5 Setting Debugging Options
4.6 Developing Serial Applications
4.6.1 Serial Application Build Environment
4.7 Developing Parallel Applications
4.7.1Parallel Application Build Environment
4.7.1.1 Modulefiles
4.7.1.2 HP-MPI
4.7.1.3 OpenMP
4.7.1.4 Pthreads
-pthread
shmem.h
4.7.1.6 MPI Library
MPI: A Message Passing Interface Standard
mpi.h
mpif.h
4.7.2 Building Parallel Applications
4.8 Developing Libraries
4.8.1 Designing Libraries for the CP4000 Platform
Page
Page
5 Submitting Jobs
5.1Overview of Job Submission
5.2 Submitting a Serial Job Using LSF
5.2.1 Submitting a Serial Job with the LSF bsub Command
5.2.2 Submitting a Serial Job Through SLURM Only
5.3 Submitting a Parallel Job
5.3.1Submitting a Non-MPIParallel Job
5.3.2 Submitting a Parallel Job That Uses the HP-MPIMessage Passing Interface
5.3.3 Submitting a Parallel Job Using the SLURM External Scheduler
Page
Example 5-10 shows one way to submit a parallel job to run one task per node
n10
dualcore
5.4 Submitting a Batch Job or Job Script
Example 5-16Submitting a Batch Job Script That Uses a Subset of the Allocation
Example 5-17Submitting a Batch job Script That Uses the srun --overcommitOption
SLURM Reference Manual
5.5 Submitting Multiple MPI Jobs Across the Same Set of Nodes
5.5.1 Using a Script to Submit Multiple Jobs
5.5.2 Using a Makefile to Submit Multiple Jobs
mymake
run1
run2
The following command line makes the program and executes it:
1This line attempts to submit a program that does not exist
5.6 Submitting a Job from a Host Other Than an HP XC Host
5.7 Running Preexecution Programs
Page
6 Debugging Applications
6.1Debugging Serial Applications
6.2 Debugging Parallel Applications
TotalView
DDT
6.2.1 Debugging with TotalView
6.2.1.3 Using TotalView with SLURM
These commands allocate
nodes and run TotalView to debug the program named
6.2.1.4 Using TotalView with LSF
You may need to verify the full path name of the xterm and mpirun commands:
6.2.1.6 Debugging an Application
This section describes how to use TotalView to debug an application
Compile the application to be debugged. For example:
Use the -g option to enable debugging information
2.Run the application in TotalView:
6.2.1.7 Debugging Running Applications
“Debugging an Application”
Compile a
long-running
application as in
Page
7 Monitoring Node Activity
7.1The Xtools Utilities
7.2 Running Performance Health Tests
Page
n11
n12
n14
n15
--keep
Page
Page
8 Tuning Applications
8.1 Using the Intel Trace Collector and Intel Trace Analyzer
8.1.1Building a Program — Intel Trace Collector and HP-MPI
8.1.2 Running a Program – Intel Trace Collector and HP-MPI
8.2 The Intel Trace Collector and Analyzer with HP-MPIon HP XC
8.2.1 Installation Kit
8.2.2 HP-MPIand the Intel Trace Collector
Running a Program
The following is a C example called vtjacobic:
The following is a Fortran example called vtjacobif:
8.3 Visualizing Data – Intel Trace Analyzer and HP-MPI
Page
9 Using SLURM
9.1Introduction to SLURM
9.2 SLURM Utilities
9.3 Launching Jobs with the srun Command
9.3.1 The srun Roles and Modes
9.4 Monitoring Jobs with the squeue Command
9.5 Terminating Jobs with the scancel Command
9.6 Getting System Information with the sinfo Command
9.7 Job Accounting
9.8 Fault Tolerance
9.9 Security
10 Using LSF
10.1Information for LSF
10.2Overview of LSF Integrated with SLURM
describes the relationship between the
and the LSF JOBID
Example 10-1Examples of LSF Job Launch
10.3 Differences Between LSF and LSF Integrated with SLURM
10.4Job Terminology
batch system scheduling policies
Batch job submissions typically provide instructions on I/O
management, such as files from which to read input and
filenames to collect output
batch jobs are submitted with the srun -b command. By
10.5 Using LSF Integrated with SLURM in the HP XC Environment
10.5.1 Useful Commands
10.5.2Job Startup and Job Control
10.5.3 Preemption
10.6 Submitting Jobs
10.7 LSF-SLURMExternal Scheduler
10.8 How LSF and SLURM Launch and Manage a Job
Figure 10-1How LSF and SLURM Launch and Manage a Job
User
1.A user logs in to login node n16
The user executes the following LSF
command on login node
10.9 Determining the LSF Execution Host
10.10 Determining Available System Resources
10.10.1 Examining System Core Status
10.10.2Getting Information About the LSF Execution Host Node
10.10.3 Getting Host Load Information
10.10.4 Examining System Queues
10.10.5 Getting Information About the lsf Partition
10.11 Getting Information About Jobs
10.11.1 Getting Job Allocation Information
10.11.2 Examining the Status of a Job
10.11.3 Viewing the Historical Information for a Job
10.12 Translating SLURM and LSF JOBIDs
10.13Working Interactively Within an Allocation
Section 5.3.2 (page 52)
Example 10-9Launching an Interactive MPI Job
Example 10-10 uses the -n8 option to launch on all cores in the allocation
Example 10-10Launching an Interactive MPI Job on All Cores in the Allocation
This example assumes 2 cores per node
10.14 LSF Equivalents of SLURM srun Options
Table 10-3LSF Equivalents of SLURM srun Options (continued)
Page
Page
11 Advanced Topics
11.1Enabling Remote Execution with OpenSSH
11.2 Running an X Terminal Session from a Remote Node
Step 1. Determining IP Address of Your Local Machine
Step 2. Logging in to HP XC System
Step 3. Running an X terminal Session Using SLURM
Step 4. Running an X terminal Session Using LSF
11.3 Using the GNU Parallel Make Capability
One way is to prefix the actual compilation line in the rule with an
command. So, instead of executing
it would execute
. With concurrency, multiple command nodes would have multiple
commands instead of multiple
11.3.1 Example Procedure
11.3.2 Example Procedure
11.3.3 Example Procedure
11.4 Local Disks on Compute Nodes
11.5 I/O Performance Considerations
11.5.1Shared File View
11.5.2 Private File View
11.6 Communication Between Nodes
11.7 Using MPICH on the HP XC System
11.7.1 Using MPICH with SLURM Allocation
11.7.2Using MPICH with LSF Allocation
A Examples
A.1 Building and Running a Serial Application
A.2 Launching a Serial Interactive Shell Through LSF
Examine the LSF execution host information:
Examine the partition information:
A.3 Running LSF Jobs with a SLURM Allocation Request
A.3.1 Example 1. Two Cores on Any Two Nodes
A.3.2 Example 2. Four Cores on Two Specific Nodes
A.4 Launching a Parallel Interactive Shell Through LSF
Note the output from the bjobs command:
Examine the running job's information:
Examine the finished job's information:
A.5 Submitting a Simple Job Script with LSF
Show the environment:
Display the script:
Run the job:
A.6 Submitting an Interactive Job with LSF
Submit the job:
Show the job allocation:
Show the SLURM job ID:
Run some commands from the pseudo-terminal:
Exit the pseudo-terminal:
View the interactive jobs:
View the finished jobs:
A.7 Submitting an HP-MPIJob with LSF
View the running job:
View the finished job:
A.8 Using a Resource Requirements String in an LSF Command
Page
Glossary
Page
Page
Page
Page
Page
Index