HP XC System 3.x Software manual About This Document, Intended Audience, Document Organization

Page 13

About This Document

This document provides information about using the features and functions of the HP XC System Software. It describes how the HP XC user and programming environments differ from standard Linux® system environments. In addition, this manual focuses on building and running applications in the HP XC environment and is intended to guide an application developer to take maximum advantage of HP XC features and functions by providing an understanding of the underlying mechanisms of the HP XC programming environment.

An HP XC system is integrated with several open source software components. Some open source software components are being used for underlying technology, and their deployment is transparent. Some open source software components require user-level documentation, specific to HP XC systems, and that kind of information is included in this document, if required.

HP relies on the documentation provided by the open source developers to supply the information you need to use their products. For links to open source software documentation for products that are integrated with your XC system, see “Supplementary Information” (page 15).

Documentation for third-party hardware and software components that are supported on the HP XC system is supplied by the third-party vendors. However, information about the operation of third-party software is included in this document if the functionality of the third-party component differs from standard behavior when used in the XC environment. In this case, HP XC documentation supersedes information supplied by the third-party vendor. For links to related third-party Web sites, see “Supplementary Information” (page 15).

Standard Linux® administrative tasks or the functions provided by standard Linux tools and commands are documented in commercially available Linux reference manuals and on various Web sites. For more information about obtaining documentation for standard Linux administrative tasks and associated topics, see the list of Web sites and additional publications provided in “Related Information” (page 16).

Intended Audience

This document is intended for experienced Linux users who run applications developed by others, and for experienced system or application developers who develop, build, and run application code on an HP XC system.

This document assumes that the user understands, and has experience with, multiprocessor systems and the Message Passing Interface (MPI), and is familiar with HP XC architecture and concepts.

Document Organization

This document is organized as follows:

Chapter 1: Overview of the User Environment (page 19) provides an overview of the HP XC user, programming, and run-time environment.

Chapter 2: Using the System (page 27) describes how to perform common user tasks on the HP XC system.

Chapter 3: Configuring Your Environment with Modulefiles (page 31) discusses loading, unloading, and creating modulefiles.

Chapter 4: Developing Applications (page 37) describes how to build and run applications on the HP XC system.

Chapter 5: Submitting Jobs (page 45) describes the methods by which jobs are submitted to HP XC system node.

Chapter 6: Debugging Applications (page 53) describes how to debug applications on the HP XC system.

Chapter 7: Tuning Applications (page 59) describes how to tune applications for the HP XC system.

Chapter 8: Using SLURM (page 63) describes how to use SLURM on the HP XC system.

Chapter 9: Using LSF (page 67) describes how to use LSF® on the HP XC system.

Intended Audience 13

Image 13
Contents HP XC System Software Users Guide Page Table of Contents Developing Applications Configuring Your Environment with ModulefilesSubmitting Jobs Using Slurm Tuning ApplicationsUsing LSF Debugging ApplicationsExamples Advanced TopicsGlossary 109 Index 115 List of Figures Page Determining the Node Platform List of TablesPage Submitting a Job Script List of ExamplesPage Intended Audience About This DocumentDocument Organization This document is organized as followsHP XC Information For More Information $ man lsfcommandnameSupplementary Information Manpages Related Information$ man discover $ man 8 discover $ man -k keywordRelated MPI Web Sites Related Linux Web SitesRelated Compiler Web Sites Additional PublicationsHP Encourages Your Comments Typographic ConventionsEnvironment Variable User inputSystem Architecture Overview of the User EnvironmentHP XC System Software Operating SystemStorage and I/O Node SpecializationSAN Storage File SystemLocal Storage File System LayoutSystem Interconnect Network Determining System Configuration InformationNetwork Address Translation NAT User Environment CommandsModules Run-Time Environment Application Development EnvironmentParallel Applications Serial ApplicationsStandard LSF Load Sharing Facility LSF-HPCHow LSF-HPC and Slurm Interact Components, Tools, Compilers, Libraries, and Debuggers Mpirun commandLVS Login Routing Using the SystemUsing the Secure Shell to Log Logging In to the SystemGetting Information About Queues IntroductionGetting Information About Resources Performing Other Common User Tasks $ man sinfo Getting System Help and InformationOverview of Modules Configuring Your Environment with ModulefilesSupplied Modulefiles Viewing Available Modulefiles Modulefiles Automatically Loaded on the SystemViewing Loaded Modulefiles Loading a ModulefileUnloading a Modulefile Automatically Loading a Modulefile at LoginModulefile Conflicts Loading a Modulefile for the Current SessionViewing Modulefile-Specific Help Creating a Modulefile$ module load modules $ man modulefile $ module help totalviewPage Application Development Environment Overview Developing ApplicationsCompilers Interrupting a Job Examining Nodes and Partitions Before Running JobsMPI Compiler Partition Avail Timelimit Nodes State NodelistDeveloping Serial Applications Setting Debugging OptionsSerial Application Build Environment Building Serial ApplicationsParallel Application Build Environment Developing Parallel ApplicationsModulefiles OpenMPQuadrics Shmem PthreadsMPI Library Intel Fortran and C/C++CompilersBuilding Parallel Applications Designing Libraries for the CP4000 Platform Developing LibrariesExamples of Compiling and Linking HP-MPI Applications Linkcommand 64-bit -L/opt/mypackage/lib/x8664 -lmystuff Linkcommand 32-bit -L/opt/mypackage/lib/i686 -lmystuffTo build a 64-bit application, you might enter Overview of Job Submission Submitting JobsExtSLURMslurm-arguments Submitting a Serial Job Using LSF-HPC Submitting a Serial Job Using Standard LSFSubmitting a Serial Job with the LSF bsub Command $ bsub hostnameSubmitting a Serial Job Through Slurm only $ bsub -n4 -I srun hostname Submitting a Non-MPI Parallel JobBsub -nnum-procsbsub-optionsmpijob Mpirun mpirun--options-srunsrun-optionsmpi-jobname$ bsub -n4 -I mpirun -srun ./helloworld Submitting a Batch Job or Job ScriptBsub -nnum-procs bsub-optionsscript-name Srun hostname mpirun -srun hellompi $ cat myscript.sh #!/bin/sh$ bsub -I -n4 Myscript.sh $ bsub -n4 -ext SLURMnodes=4 -I ./myscript.sh$ bsub -n4 -I ./myscript.sh Running Preexecution Programs$ cat ./envscript.sh #!/bin/sh name=`hostname` Opt/hptc/bin/srun Mypreexec Debugging Serial Applications Debugging ApplicationsDebugging Parallel Applications TotalViewUsing TotalView with Slurm Setting Up TotalViewSSH and TotalView Module load mpimodule load totalviewSetting TotalView Preferences Using TotalView with LSF-HPCDebugging an Application Debugging Running Applications Sourcefile initfdte.f was not found, using assembler modeDirectories in File ⇒ Search Path $ mpirun -srun -n2 PsimpleExiting TotalView $ scancel --user username$ squeue Page Using the Intel Trace Collector and Intel Trace Analyzer Tuning ApplicationsBuilding a Program Intel Trace Collector and HP-MPI Visualizing Data Intel Trace Analyzer and HP-MPI Running a Program Intel Trace Collector and HP-MPILibs CldflagsUsing the Intel Trace Collector and Intel Trace Analyzer Page Launching Jobs with the srun Command Using SlurmSrun Squeue Scancel Sinfo Scontrol Introduction to SlurmUsing the srun Command with HP-MPI Monitoring Jobs with the squeue CommandUsing the srun Command with LSF-HPC Srun Roles and ModesGetting System Information with the sinfo Command Terminating Jobs with the scancel CommandJob Accounting Security Fault Tolerance# chmod a+r /hptccluster/slurm/job/jobacct.log Using Standard LSF on an HP XC System Using LSFUsing LSF-HPC Overview of LSF-HPC Introduction to LSF-HPC in the HP XC EnvironmentHostname Differences Between LSF-HPC and Standard LSFResources Hostname Status JL/U MAX Njobs RUN Ssusp Ususp RSV$ ssh n15 lshosts Job TerminologyUnknown Unknown SLURMnodelist =nodelist if specified HP XCCompute Node Resource Support$ bsub -n 10 -ext SLURMnodes=10 -I srun hostname $ bsub -n 10 -I srun hostname$ bsub -n 10 -ext SLURMnodes=10exclude=n16 -I srun hostname $ bsub -n 10 -ext SLURMconstraint=dualcore -I srun hostname$ bsub -n4 -ext SLURMnodes=4 -o output.out ./myscript How LSF-HPC and Slurm Launch and Manage a Job#!/bin/sh hostname srun hostname Mpirun -srun ./hellompi Job Startup and Job ControlDetermining Available LSF-HPC System Resources Determining the LSF Execution HostGetting the Status of LSF-HPC Getting Information About LSF Execution Host NodeExamining LSF-HPC System Queues Getting Host Load InformationGetting Information About the lsf Partition SLINUX6$ sinfo -p lsf Summary of the LSF bsub Command Format$ sinfo -p lsf -lNe For information about running scripts LSF-SLURM External SchedulerBsub -n num-procs-ext SLURMslurm-arguments \ Bsub-options srun srun-optionsjobname job-optionsStarting on lsfhost.localdomain n6 Submitting a Job from a Non-HP XC HostWaiting for dispatch ... Starting on lsfhost.localdomain n1 Type=SLINUX64Getting Job Allocation Information Getting Information About JobsSlurmid=slurmjobidncpus=slurmnprocsslurmalloc=nodelist $ bjobs -l$ bhist -l Examining the Status of a JobTime stamp $ bjobs$ bhist Viewing the Historical Information for a JobSummary of time in seconds spent Various States Jobid User Jobname Pend Psusp RUN Ususp Ssusp Unkwn TotalTranslating Slurm and LSF-HPC JOBIDs $ bsub -I -n4 -ext SLURMnodes=4 /bin/bash Working Interactively Within an LSF-HPC Allocation$ bjobs -l 124 grep slurm $ srun --jobid=150 hostnameAlternatively, you can use the following $ unset Slurmjobid$ export SLURMJOBID=150 $ export SLURMNPROCS=4 $ unset Slurmjobid $ unset SlurmnprocsLSF-HPC Equivalents of Slurm srun Options Job 125 is submitted to the default queue normal$ srun --jobid=250 uptime $ bsub -n4 -ext SLURMnodes=4 -o %J.out sleepBsub -iinputfile Mpi=mpitype Quit-on-interrupt Page Enabling Remote Execution with OpenSSH Advanced TopicsRunning an X Terminal Session from a Remote Node Determining IP Address of Your Local MachineRunning an X terminal Session Using LSF-HPC Running an X terminal Session Using SlurmLogging in to HP XC System $ bsub -n4 -Ip srun -n1 xterm -display Using the GNU Parallel Make Capability$ srun -n4 hostname n46 $ srun -n2 hostname n46$ cd subdir srun -n1 -N1 $MAKE -j4 $ make PREFIX=’srun -n1 -N1 MAKEJ=-j4 Example ProcedurePerformance Considerations Local Disks on Compute NodesModified Makefile is invoked as follows $ make PREFIX=srun -n1 -N1 MAKEJ=-j4Shared File View Communication Between NodesPrivate File View Fp = fopen myfile, a+Page Building and Running a Serial Application Appendix a ExamplesLaunching a Serial Interactive Shell Through LSF-HPC Examine the LSF execution host informationRunning LSF-HPC Jobs with a Slurm Allocation Request Example 2. Four cores on Two Specific Nodes Launching a Parallel Interactive Shell Through LSF-HPCR15s r1m r15m It tmp swp mem LoadSched LoadStop SLURMnodes=2$ hostname n16 $ srun hostname n5 $ bjobs Examine the the running jobs information124 Lsfad Examine the the finished jobs information Submitting a Simple Job Script with LSF-HPCShow the environment Display the scriptSubmitting an Interactive Job with LSF-HPC Run some commands from the pseudo-terminalSubmit the job Show the job allocationExit the pseudo-terminal Submitting an HP-MPI Job with LSF-HPCView the interactive jobs View the finished jobsView the finished job View the running jobLsfhost.localdomai States by date and time Using a Resource Requirements String in an LSF-HPC Command$ bsub -n 8 -R ALPHA5 SLINUX64 \ -ext SLURMnodes=4-4 myjob 108 Glossary First-come See Fcfs First-served Global storage To the queueAs local storage Are not appropriate for replicationLogin requests and directs them to a node with a login role Single commandLinux Virtual See LVS Server Load file LSF master hostRemotely. PXE booting is configured at the Bios level Network See NIS Information ServicesNotably to install and remove software packages Slurm backupSsh Power available per unit of spaceSymmetric See SMP Multiprocessing 114 Index Index PGI Utilities, 63 Slurm commands