9.7 Job Accounting................................................................................................................................

84

9.8 Fault Tolerance................................................................................................................................

84

9.9 Security............................................................................................................................................

84

10 Using LSF....................................................................................................................

85

10.1 Information for LSF.......................................................................................................................

85

10.2 Overview of LSF Integrated with SLURM....................................................................................

86

10.3 Differences Between LSF and LSF Integrated with SLURM.........................................................

88

10.4 Job Terminology............................................................................................................................

89

10.5 Using LSF Integrated with SLURM in the HP XC Environment..................................................

91

10.5.1 Useful Commands.................................................................................................................

91

10.5.2 Job Startup and Job Control...................................................................................................

91

10.5.3 Preemption............................................................................................................................

91

10.6 Submitting Jobs..............................................................................................................................

91

10.7 LSF-SLURM External Scheduler....................................................................................................

92

10.8 How LSF and SLURM Launch and Manage a Job........................................................................

92

10.9 Determining the LSF Execution Host............................................................................................

94

10.10 Determining Available System Resources...................................................................................

94

10.10.1 Examining System Core Status............................................................................................

95

10.10.2 Getting Information About the LSF Execution Host Node.................................................

95

10.10.3 Getting Host Load Information...........................................................................................

96

10.10.4 Examining System Queues..................................................................................................

96

10.10.5 Getting Information About the lsf Partition.....................................................................

96

10.11 Getting Information About Jobs..................................................................................................

96

10.11.1 Getting Job Allocation Information.....................................................................................

97

10.11.2 Examining the Status of a Job..............................................................................................

98

10.11.3 Viewing the Historical Information for a Job......................................................................

99

10.12 Translating SLURM and LSF JOBIDs.........................................................................................

100

10.13 Working Interactively Within an Allocation..............................................................................

101

10.14 LSF Equivalents of SLURM srun Options.................................................................................

103

11 Advanced Topics......................................................................................................

107

11.1 Enabling Remote Execution with OpenSSH................................................................................

107

11.2 Running an X Terminal Session from a Remote Node................................................................

107

11.3 Using the GNU Parallel Make Capability...................................................................................

109

11.3.1 Example Procedure 1...........................................................................................................

111

11.3.2 Example Procedure 2...........................................................................................................

111

11.3.3 Example Procedure 3...........................................................................................................

112

11.4 Local Disks on Compute Nodes..................................................................................................

112

11.5 I/O Performance Considerations.................................................................................................

113

11.5.1 Shared File View..................................................................................................................

113

11.5.2 Private File View..................................................................................................................

113

11.6 Communication Between Nodes.................................................................................................

113

11.7 Using MPICH on the HP XC System...........................................................................................

113

11.7.1 Using MPICH with SLURM Allocation..............................................................................

114

11.7.2 Using MPICH with LSF Allocation.....................................................................................

114

A Examples....................................................................................................................

115

A.1 Building and Running a Serial Application.................................................................................

115

A.2 Launching a Serial Interactive Shell Through LSF.......................................................................

115

A.3 Running LSF Jobs with a SLURM Allocation Request.................................................................

116

A.3.1 Example 1. Two Cores on Any Two Nodes..........................................................................

116

A.3.2 Example 2. Four Cores on Two Specific Nodes....................................................................

117

6Table of Contents