HP XC System 3.x Software manual To the queue, First-come See Fcfs First-served Global storage

Page 110

FCFS

First-come, first-served. An LSF job-scheduling policy that specifies that jobs are dispatched

 

according to their order in a queue, which is determined by job priority, not by order of submission

 

to the queue.

first-come,

See FCFS.

first-served

 

G

 

global storage

Storage within the HP XC system that is available to all of the nodes in the system. Also known

 

as local storage.

golden client

The node from which a standard file system image is created. The golden image is distributed

 

by the image server. In a standard HP XC installation, the head node acts as the image server

 

and golden client.

golden image

A collection of files, created from the golden client file system that are distributed to one or more

 

client systems. Specific files on the golden client may be excluded from the golden image if they

 

are not appropriate for replication.

golden master

The collection of directories and files that represents all of the software and configuration data

 

of an HP XC system. The software for any and all nodes of an HP XC system can be produced

 

solely by the use of this collection of directories and files.

H

 

head node

The single node that is the basis for software installation, system configuration, and administrative

 

functions in an HP XC system. There may be another node that can provide a failover function

 

for the head node, but HP XC system has only one head node at any one time.

host name

The name given to a computer. Lowercase and uppercase letters (a–z and A–Z), numbers (0–9),

 

periods, and dashes are permitted in host names. Valid host names contain from 2 to 63

 

characters, with the first character being a letter.

I

 

I/O node

A node that has more storage available than the majority of server nodes in an HP XC system.

 

This storage is frequently externally connected storage, for example, SAN attached storage.

 

When configured properly, an I/O server node makes the additional storage available as global

 

storage within the HP XC system.

iLO

Integrated Lights Out. A self-contained hardware technology available on CP3000 cluster platform

 

hardware models that enables remote management of any node within a system.

image server

A node specifically designated to hold images that will be distributed to one or more client

 

systems. In a standard HP XC installation, the head node acts as the image server and golden

 

client.

Integrated Lights

See iLO.

Out

 

interconnect

A hardware component that provides high-speed connectivity between the nodes in the HP XC

 

system. It is used for message passing and remote memory access capabilities for parallel

 

applications.

interconnect

The private network within the HP XC system that is used primarily for user file access and for

network

communications within applications.

Internet address

A unique 32-bit number that identifies a host's connection to an Internet network. An Internet

 

address is commonly represented as a network number and a host number and takes a form

 

similar to the following: 192.0.2.0.

IPMI

Intelligent Platform Management Interface. A self-contained hardware technology available on

 

HP ProLiant DL145 servers that enables remote management of any node within a system.

110 Glossary

Image 110
Contents HP XC System Software Users Guide Page Table of Contents Submitting Jobs Configuring Your Environment with ModulefilesDeveloping Applications Using LSF Tuning ApplicationsUsing Slurm Debugging ApplicationsGlossary 109 Index 115 Advanced TopicsExamples List of Figures Page List of Tables Determining the Node PlatformPage List of Examples Submitting a Job ScriptPage Document Organization About This DocumentIntended Audience This document is organized as followsHP XC Information Supplementary Information $ man lsfcommandnameFor More Information $ man discover $ man 8 discover Related InformationManpages $ man -k keywordRelated Compiler Web Sites Related Linux Web SitesRelated MPI Web Sites Additional PublicationsEnvironment Variable Typographic ConventionsHP Encourages Your Comments User inputHP XC System Software Overview of the User EnvironmentSystem Architecture Operating SystemNode Specialization Storage and I/OLocal Storage File SystemSAN Storage File System LayoutNetwork Address Translation NAT Determining System Configuration InformationSystem Interconnect Network Modules CommandsUser Environment Parallel Applications Application Development EnvironmentRun-Time Environment Serial ApplicationsHow LSF-HPC and Slurm Interact Load Sharing Facility LSF-HPCStandard LSF Mpirun command Components, Tools, Compilers, Libraries, and DebuggersUsing the Secure Shell to Log Using the SystemLVS Login Routing Logging In to the SystemGetting Information About Resources IntroductionGetting Information About Queues Performing Other Common User Tasks Getting System Help and Information $ man sinfoConfiguring Your Environment with Modulefiles Overview of ModulesSupplied Modulefiles Viewing Loaded Modulefiles Modulefiles Automatically Loaded on the SystemViewing Available Modulefiles Loading a ModulefileModulefile Conflicts Automatically Loading a Modulefile at LoginUnloading a Modulefile Loading a Modulefile for the Current Session$ module load modules $ man modulefile Creating a ModulefileViewing Modulefile-Specific Help $ module help totalviewPage Compilers Developing ApplicationsApplication Development Environment Overview MPI Compiler Examining Nodes and Partitions Before Running JobsInterrupting a Job Partition Avail Timelimit Nodes State NodelistSerial Application Build Environment Setting Debugging OptionsDeveloping Serial Applications Building Serial ApplicationsModulefiles Developing Parallel ApplicationsParallel Application Build Environment OpenMPMPI Library PthreadsQuadrics Shmem Intel Fortran and C/C++CompilersBuilding Parallel Applications Examples of Compiling and Linking HP-MPI Applications Developing LibrariesDesigning Libraries for the CP4000 Platform To build a 64-bit application, you might enter Linkcommand 32-bit -L/opt/mypackage/lib/i686 -lmystuffLinkcommand 64-bit -L/opt/mypackage/lib/x8664 -lmystuff ExtSLURMslurm-arguments Submitting JobsOverview of Job Submission Submitting a Serial Job with the LSF bsub Command Submitting a Serial Job Using Standard LSFSubmitting a Serial Job Using LSF-HPC $ bsub hostnameSubmitting a Serial Job Through Slurm only Bsub -nnum-procsbsub-optionsmpijob Submitting a Non-MPI Parallel Job$ bsub -n4 -I srun hostname Mpirun mpirun--options-srunsrun-optionsmpi-jobnameBsub -nnum-procs bsub-optionsscript-name Submitting a Batch Job or Job Script$ bsub -n4 -I mpirun -srun ./helloworld $ bsub -I -n4 Myscript.sh $ cat myscript.sh #!/bin/shSrun hostname mpirun -srun hellompi $ bsub -n4 -ext SLURMnodes=4 -I ./myscript.sh$ cat ./envscript.sh #!/bin/sh name=`hostname` Running Preexecution Programs$ bsub -n4 -I ./myscript.sh Opt/hptc/bin/srun Mypreexec Debugging Parallel Applications Debugging ApplicationsDebugging Serial Applications TotalViewSSH and TotalView Setting Up TotalViewUsing TotalView with Slurm Module load mpimodule load totalviewDebugging an Application Using TotalView with LSF-HPCSetting TotalView Preferences Directories in File ⇒ Search Path Sourcefile initfdte.f was not found, using assembler modeDebugging Running Applications $ mpirun -srun -n2 Psimple$ squeue $ scancel --user usernameExiting TotalView Page Building a Program Intel Trace Collector and HP-MPI Tuning ApplicationsUsing the Intel Trace Collector and Intel Trace Analyzer Libs Running a Program Intel Trace Collector and HP-MPIVisualizing Data Intel Trace Analyzer and HP-MPI CldflagsUsing the Intel Trace Collector and Intel Trace Analyzer Page Srun Squeue Scancel Sinfo Scontrol Using SlurmLaunching Jobs with the srun Command Introduction to SlurmUsing the srun Command with LSF-HPC Monitoring Jobs with the squeue CommandUsing the srun Command with HP-MPI Srun Roles and ModesJob Accounting Terminating Jobs with the scancel CommandGetting System Information with the sinfo Command # chmod a+r /hptccluster/slurm/job/jobacct.log Fault ToleranceSecurity Using LSF-HPC Using LSFUsing Standard LSF on an HP XC System Introduction to LSF-HPC in the HP XC Environment Overview of LSF-HPCResources Differences Between LSF-HPC and Standard LSFHostname Hostname Status JL/U MAX Njobs RUN Ssusp Ususp RSVUnknown Unknown Job Terminology$ ssh n15 lshosts HP XCCompute Node Resource Support SLURMnodelist =nodelist if specified$ bsub -n 10 -ext SLURMnodes=10exclude=n16 -I srun hostname $ bsub -n 10 -I srun hostname$ bsub -n 10 -ext SLURMnodes=10 -I srun hostname $ bsub -n 10 -ext SLURMconstraint=dualcore -I srun hostnameHow LSF-HPC and Slurm Launch and Manage a Job $ bsub -n4 -ext SLURMnodes=4 -o output.out ./myscriptJob Startup and Job Control #!/bin/sh hostname srun hostname Mpirun -srun ./hellompiGetting the Status of LSF-HPC Determining the LSF Execution HostDetermining Available LSF-HPC System Resources Getting Information About LSF Execution Host NodeGetting Information About the lsf Partition Getting Host Load InformationExamining LSF-HPC System Queues SLINUX6$ sinfo -p lsf -lNe Summary of the LSF bsub Command Format$ sinfo -p lsf Bsub -n num-procs-ext SLURMslurm-arguments \ LSF-SLURM External SchedulerFor information about running scripts Bsub-options srun srun-optionsjobname job-optionsWaiting for dispatch ... Starting on lsfhost.localdomain n1 Submitting a Job from a Non-HP XC HostStarting on lsfhost.localdomain n6 Type=SLINUX64Slurmid=slurmjobidncpus=slurmnprocsslurmalloc=nodelist Getting Information About JobsGetting Job Allocation Information $ bjobs -lTime stamp Examining the Status of a Job$ bhist -l $ bjobsSummary of time in seconds spent Various States Viewing the Historical Information for a Job$ bhist Jobid User Jobname Pend Psusp RUN Ususp Ssusp Unkwn TotalTranslating Slurm and LSF-HPC JOBIDs $ bjobs -l 124 grep slurm Working Interactively Within an LSF-HPC Allocation$ bsub -I -n4 -ext SLURMnodes=4 /bin/bash $ srun --jobid=150 hostname$ export SLURMJOBID=150 $ export SLURMNPROCS=4 $ unset SlurmjobidAlternatively, you can use the following $ unset Slurmjobid $ unset Slurmnprocs$ srun --jobid=250 uptime Job 125 is submitted to the default queue normalLSF-HPC Equivalents of Slurm srun Options $ bsub -n4 -ext SLURMnodes=4 -o %J.out sleepBsub -iinputfile Mpi=mpitype Quit-on-interrupt Page Running an X Terminal Session from a Remote Node Advanced TopicsEnabling Remote Execution with OpenSSH Determining IP Address of Your Local MachineLogging in to HP XC System Running an X terminal Session Using SlurmRunning an X terminal Session Using LSF-HPC $ srun -n4 hostname n46 Using the GNU Parallel Make Capability$ bsub -n4 -Ip srun -n1 xterm -display $ srun -n2 hostname n46$ cd subdir srun -n1 -N1 $MAKE -j4 Example Procedure $ make PREFIX=’srun -n1 -N1 MAKEJ=-j4Modified Makefile is invoked as follows Local Disks on Compute NodesPerformance Considerations $ make PREFIX=srun -n1 -N1 MAKEJ=-j4Private File View Communication Between NodesShared File View Fp = fopen myfile, a+Page Launching a Serial Interactive Shell Through LSF-HPC Appendix a ExamplesBuilding and Running a Serial Application Examine the LSF execution host informationRunning LSF-HPC Jobs with a Slurm Allocation Request R15s r1m r15m It tmp swp mem LoadSched LoadStop Launching a Parallel Interactive Shell Through LSF-HPCExample 2. Four cores on Two Specific Nodes SLURMnodes=2124 Lsfad Examine the the running jobs information$ hostname n16 $ srun hostname n5 $ bjobs Show the environment Submitting a Simple Job Script with LSF-HPCExamine the the finished jobs information Display the scriptSubmit the job Run some commands from the pseudo-terminalSubmitting an Interactive Job with LSF-HPC Show the job allocationView the interactive jobs Submitting an HP-MPI Job with LSF-HPCExit the pseudo-terminal View the finished jobsLsfhost.localdomai View the running jobView the finished job $ bsub -n 8 -R ALPHA5 SLINUX64 \ -ext SLURMnodes=4-4 myjob Using a Resource Requirements String in an LSF-HPC CommandStates by date and time 108 Glossary As local storage To the queueFirst-come See Fcfs First-served Global storage Are not appropriate for replicationLinux Virtual See LVS Server Load file Single commandLogin requests and directs them to a node with a login role LSF master hostNotably to install and remove software packages Network See NIS Information ServicesRemotely. PXE booting is configured at the Bios level Slurm backupSymmetric See SMP Multiprocessing Power available per unit of spaceSsh 114 Index Index PGI Utilities, 63 Slurm commands