HP
XC System 3.x Software
manual
HP XC System Software Users Guide
Fault Tolerance
Login
Setting Up TotalView
$ man lsfcommandname
Example Procedure
Tuning Applications
Slurm backup
Setting Debugging Options
$ scancel --user username
Using TotalView with LSF-HPC
Page 1
HP XC System Software User's Guide
Version 3.0
Part number:
5991-4847
published January 2006
Page 1
Page 2
Image 1
Page 1
Page 2
Contents
HP XC System Software Users Guide
Page
Table of Contents
Developing Applications
Configuring Your Environment with Modulefiles
Submitting Jobs
Using Slurm
Tuning Applications
Using LSF
Debugging Applications
Examples
Advanced Topics
Glossary 109 Index 115
List of Figures
Page
Determining the Node Platform
List of Tables
Page
Submitting a Job Script
List of Examples
Page
Intended Audience
About This Document
Document Organization
This document is organized as follows
HP XC Information
For More Information
$ man lsfcommandname
Supplementary Information
Manpages
Related Information
$ man discover $ man 8 discover
$ man -k keyword
Related MPI Web Sites
Related Linux Web Sites
Related Compiler Web Sites
Additional Publications
HP Encourages Your Comments
Typographic Conventions
Environment Variable
User input
System Architecture
Overview of the User Environment
HP XC System Software
Operating System
Storage and I/O
Node Specialization
SAN Storage
File System
Local Storage
File System Layout
System Interconnect Network
Determining System Configuration Information
Network Address Translation NAT
User Environment
Commands
Modules
Run-Time Environment
Application Development Environment
Parallel Applications
Serial Applications
Standard LSF
Load Sharing Facility LSF-HPC
How LSF-HPC and Slurm Interact
Components, Tools, Compilers, Libraries, and Debuggers
Mpirun command
LVS Login Routing
Using the System
Using the Secure Shell to Log
Logging In to the System
Getting Information About Queues
Introduction
Getting Information About Resources
Performing Other Common User Tasks
$ man sinfo
Getting System Help and Information
Overview of Modules
Configuring Your Environment with Modulefiles
Supplied Modulefiles
Viewing Available Modulefiles
Modulefiles Automatically Loaded on the System
Viewing Loaded Modulefiles
Loading a Modulefile
Unloading a Modulefile
Automatically Loading a Modulefile at Login
Modulefile Conflicts
Loading a Modulefile for the Current Session
Viewing Modulefile-Specific Help
Creating a Modulefile
$ module load modules $ man modulefile
$ module help totalview
Page
Application Development Environment Overview
Developing Applications
Compilers
Interrupting a Job
Examining Nodes and Partitions Before Running Jobs
MPI Compiler
Partition Avail Timelimit Nodes State Nodelist
Developing Serial Applications
Setting Debugging Options
Serial Application Build Environment
Building Serial Applications
Parallel Application Build Environment
Developing Parallel Applications
Modulefiles
OpenMP
Quadrics Shmem
Pthreads
MPI Library
Intel Fortran and C/C++Compilers
Building Parallel Applications
Designing Libraries for the CP4000 Platform
Developing Libraries
Examples of Compiling and Linking HP-MPI Applications
Linkcommand 64-bit -L/opt/mypackage/lib/x8664 -lmystuff
Linkcommand 32-bit -L/opt/mypackage/lib/i686 -lmystuff
To build a 64-bit application, you might enter
Overview of Job Submission
Submitting Jobs
ExtSLURMslurm-arguments
Submitting a Serial Job Using LSF-HPC
Submitting a Serial Job Using Standard LSF
Submitting a Serial Job with the LSF bsub Command
$ bsub hostname
Submitting a Serial Job Through Slurm only
$ bsub -n4 -I srun hostname
Submitting a Non-MPI Parallel Job
Bsub -nnum-procsbsub-optionsmpijob
Mpirun mpirun--options-srunsrun-optionsmpi-jobname
$ bsub -n4 -I mpirun -srun ./helloworld
Submitting a Batch Job or Job Script
Bsub -nnum-procs bsub-optionsscript-name
Srun hostname mpirun -srun hellompi
$ cat myscript.sh #!/bin/sh
$ bsub -I -n4 Myscript.sh
$ bsub -n4 -ext SLURMnodes=4 -I ./myscript.sh
$ bsub -n4 -I ./myscript.sh
Running Preexecution Programs
$ cat ./envscript.sh #!/bin/sh name=`hostname`
Opt/hptc/bin/srun Mypreexec
Debugging Serial Applications
Debugging Applications
Debugging Parallel Applications
TotalView
Using TotalView with Slurm
Setting Up TotalView
SSH and TotalView
Module load mpimodule load totalview
Setting TotalView Preferences
Using TotalView with LSF-HPC
Debugging an Application
Debugging Running Applications
Sourcefile initfdte.f was not found, using assembler mode
Directories in File ⇒ Search Path
$ mpirun -srun -n2 Psimple
Exiting TotalView
$ scancel --user username
$ squeue
Page
Using the Intel Trace Collector and Intel Trace Analyzer
Tuning Applications
Building a Program Intel Trace Collector and HP-MPI
Visualizing Data Intel Trace Analyzer and HP-MPI
Running a Program Intel Trace Collector and HP-MPI
Libs
Cldflags
Using the Intel Trace Collector and Intel Trace Analyzer
Page
Launching Jobs with the srun Command
Using Slurm
Srun Squeue Scancel Sinfo Scontrol
Introduction to Slurm
Using the srun Command with HP-MPI
Monitoring Jobs with the squeue Command
Using the srun Command with LSF-HPC
Srun Roles and Modes
Getting System Information with the sinfo Command
Terminating Jobs with the scancel Command
Job Accounting
Security
Fault Tolerance
# chmod a+r /hptccluster/slurm/job/jobacct.log
Using Standard LSF on an HP XC System
Using LSF
Using LSF-HPC
Overview of LSF-HPC
Introduction to LSF-HPC in the HP XC Environment
Hostname
Differences Between LSF-HPC and Standard LSF
Resources
Hostname Status JL/U MAX Njobs RUN Ssusp Ususp RSV
$ ssh n15 lshosts
Job Terminology
Unknown Unknown
SLURMnodelist =nodelist if specified
HP XCCompute Node Resource Support
$ bsub -n 10 -ext SLURMnodes=10 -I srun hostname
$ bsub -n 10 -I srun hostname
$ bsub -n 10 -ext SLURMnodes=10exclude=n16 -I srun hostname
$ bsub -n 10 -ext SLURMconstraint=dualcore -I srun hostname
$ bsub -n4 -ext SLURMnodes=4 -o output.out ./myscript
How LSF-HPC and Slurm Launch and Manage a Job
#!/bin/sh hostname srun hostname Mpirun -srun ./hellompi
Job Startup and Job Control
Determining Available LSF-HPC System Resources
Determining the LSF Execution Host
Getting the Status of LSF-HPC
Getting Information About LSF Execution Host Node
Examining LSF-HPC System Queues
Getting Host Load Information
Getting Information About the lsf Partition
SLINUX6
$ sinfo -p lsf
Summary of the LSF bsub Command Format
$ sinfo -p lsf -lNe
For information about running scripts
LSF-SLURM External Scheduler
Bsub -n num-procs-ext SLURMslurm-arguments \
Bsub-options srun srun-optionsjobname job-options
Starting on lsfhost.localdomain n6
Submitting a Job from a Non-HP XC Host
Waiting for dispatch ... Starting on lsfhost.localdomain n1
Type=SLINUX64
Getting Job Allocation Information
Getting Information About Jobs
Slurmid=slurmjobidncpus=slurmnprocsslurmalloc=nodelist
$ bjobs -l
$ bhist -l
Examining the Status of a Job
Time stamp
$ bjobs
$ bhist
Viewing the Historical Information for a Job
Summary of time in seconds spent Various States
Jobid User Jobname Pend Psusp RUN Ususp Ssusp Unkwn Total
Translating Slurm and LSF-HPC JOBIDs
$ bsub -I -n4 -ext SLURMnodes=4 /bin/bash
Working Interactively Within an LSF-HPC Allocation
$ bjobs -l 124 grep slurm
$ srun --jobid=150 hostname
Alternatively, you can use the following
$ unset Slurmjobid
$ export SLURMJOBID=150 $ export SLURMNPROCS=4
$ unset Slurmjobid $ unset Slurmnprocs
LSF-HPC Equivalents of Slurm srun Options
Job 125 is submitted to the default queue normal
$ srun --jobid=250 uptime
$ bsub -n4 -ext SLURMnodes=4 -o %J.out sleep
Bsub -iinputfile
Mpi=mpitype
Quit-on-interrupt
Page
Enabling Remote Execution with OpenSSH
Advanced Topics
Running an X Terminal Session from a Remote Node
Determining IP Address of Your Local Machine
Running an X terminal Session Using LSF-HPC
Running an X terminal Session Using Slurm
Logging in to HP XC System
$ bsub -n4 -Ip srun -n1 xterm -display
Using the GNU Parallel Make Capability
$ srun -n4 hostname n46
$ srun -n2 hostname n46
$ cd subdir srun -n1 -N1 $MAKE -j4
$ make PREFIX=’srun -n1 -N1 MAKEJ=-j4
Example Procedure
Performance Considerations
Local Disks on Compute Nodes
Modified Makefile is invoked as follows
$ make PREFIX=srun -n1 -N1 MAKEJ=-j4
Shared File View
Communication Between Nodes
Private File View
Fp = fopen myfile, a+
Page
Building and Running a Serial Application
Appendix a Examples
Launching a Serial Interactive Shell Through LSF-HPC
Examine the LSF execution host information
Running LSF-HPC Jobs with a Slurm Allocation Request
Example 2. Four cores on Two Specific Nodes
Launching a Parallel Interactive Shell Through LSF-HPC
R15s r1m r15m It tmp swp mem LoadSched LoadStop
SLURMnodes=2
$ hostname n16 $ srun hostname n5 $ bjobs
Examine the the running jobs information
124 Lsfad
Examine the the finished jobs information
Submitting a Simple Job Script with LSF-HPC
Show the environment
Display the script
Submitting an Interactive Job with LSF-HPC
Run some commands from the pseudo-terminal
Submit the job
Show the job allocation
Exit the pseudo-terminal
Submitting an HP-MPI Job with LSF-HPC
View the interactive jobs
View the finished jobs
View the finished job
View the running job
Lsfhost.localdomai
States by date and time
Using a Resource Requirements String in an LSF-HPC Command
$ bsub -n 8 -R ALPHA5 SLINUX64 \ -ext SLURMnodes=4-4 myjob
108
Glossary
First-come See Fcfs First-served Global storage
To the queue
As local storage
Are not appropriate for replication
Login requests and directs them to a node with a login role
Single command
Linux Virtual See LVS Server Load file
LSF master host
Remotely. PXE booting is configured at the Bios level
Network See NIS Information Services
Notably to install and remove software packages
Slurm backup
Ssh
Power available per unit of space
Symmetric See SMP Multiprocessing
114
Index
Index
PGI
Utilities, 63 Slurm commands
Related pages
How to check for proper wiring in my
19xr chiller
?
Top
Page
Image
Contents