9.7 Job Accounting................................................................................................................................
84
9.8 Fault Tolerance................................................................................................................................
9.9 Security............................................................................................................................................
10 Using LSF....................................................................................................................
85
10.1 Information for LSF.......................................................................................................................
10.2 Overview of LSF Integrated with SLURM....................................................................................
86
10.3 Differences Between LSF and LSF Integrated with SLURM.........................................................
88
10.4 Job Terminology............................................................................................................................
89
10.5 Using LSF Integrated with SLURM in the HP XC Environment..................................................
91
10.5.1 Useful Commands.................................................................................................................
10.5.2 Job Startup and Job Control...................................................................................................
10.5.3 Preemption............................................................................................................................
10.6 Submitting Jobs..............................................................................................................................
10.7 LSF-SLURM External Scheduler....................................................................................................
92
10.8 How LSF and SLURM Launch and Manage a Job........................................................................
10.9 Determining the LSF Execution Host............................................................................................
94
10.10 Determining Available System Resources...................................................................................
10.10.1 Examining System Core Status............................................................................................
95
10.10.2 Getting Information About the LSF Execution Host Node.................................................
10.10.3 Getting Host Load Information...........................................................................................
96
10.10.4 Examining System Queues..................................................................................................
10.10.5 Getting Information About the lsf Partition.....................................................................
10.11 Getting Information About Jobs..................................................................................................
10.11.1 Getting Job Allocation Information.....................................................................................
97
10.11.2 Examining the Status of a Job..............................................................................................
98
10.11.3 Viewing the Historical Information for a Job......................................................................
99
10.12 Translating SLURM and LSF JOBIDs.........................................................................................
100
10.13 Working Interactively Within an Allocation..............................................................................
101
10.14 LSF Equivalents of SLURM srun Options.................................................................................
103
11 Advanced Topics......................................................................................................
107
11.1 Enabling Remote Execution with OpenSSH................................................................................
11.2 Running an X Terminal Session from a Remote Node................................................................
11.3 Using the GNU Parallel Make Capability...................................................................................
109
11.3.1 Example Procedure 1...........................................................................................................
111
11.3.2 Example Procedure 2...........................................................................................................
11.3.3 Example Procedure 3...........................................................................................................
112
11.4 Local Disks on Compute Nodes..................................................................................................
11.5 I/O Performance Considerations.................................................................................................
113
11.5.1 Shared File View..................................................................................................................
11.5.2 Private File View..................................................................................................................
11.6 Communication Between Nodes.................................................................................................
11.7 Using MPICH on the HP XC System...........................................................................................
11.7.1 Using MPICH with SLURM Allocation..............................................................................
114
11.7.2 Using MPICH with LSF Allocation.....................................................................................
A Examples....................................................................................................................
115
A.1 Building and Running a Serial Application.................................................................................
A.2 Launching a Serial Interactive Shell Through LSF.......................................................................
A.3 Running LSF Jobs with a SLURM Allocation Request.................................................................
116
A.3.1 Example 1. Two Cores on Any Two Nodes..........................................................................
A.3.2 Example 2. Four Cores on Two Specific Nodes....................................................................
117
6Table of Contents