5.2

Submitting a Serial Job Using LSF...................................................................................................

49

 

5.2.1 Submitting a Serial Job with the LSF bsub Command............................................................

49

 

5.2.2 Submitting a Serial Job Through SLURM Only......................................................................

50

5.3 Submitting a Parallel Job.................................................................................................................

51

 

5.3.1 Submitting a Non-MPI Parallel Job.........................................................................................

51

 

5.3.2 Submitting a Parallel Job That Uses the HP-MPI Message Passing Interface.........................

52

 

5.3.3 Submitting a Parallel Job Using the SLURM External Scheduler...........................................

53

5.4

Submitting a Batch Job or Job Script...............................................................................................

56

5.5

Submitting Multiple MPI Jobs Across the Same Set of Nodes........................................................

58

 

5.5.1 Using a Script to Submit Multiple Jobs...................................................................................

58

 

5.5.2 Using a Makefile to Submit Multiple Jobs..............................................................................

58

5.6 Submitting a Job from a Host Other Than an HP XC Host.............................................................

61

5.7

Running Preexecution Programs....................................................................................................

61

6 Debugging Applications.............................................................................................

63

6.1

Debugging Serial Applications.......................................................................................................

63

6.2 Debugging Parallel Applications....................................................................................................

63

 

6.2.1 Debugging with TotalView.....................................................................................................

64

 

6.2.1.1 SSH and TotalView..........................................................................................................

64

 

6.2.1.2 Setting Up TotalView......................................................................................................

64

 

6.2.1.3 Using TotalView with SLURM........................................................................................

65

 

6.2.1.4 Using TotalView with LSF...............................................................................................

65

 

6.2.1.5 Setting TotalView Preferences.........................................................................................

65

 

6.2.1.6 Debugging an Application..............................................................................................

66

 

6.2.1.7 Debugging Running Applications..................................................................................

67

 

6.2.1.8 Exiting TotalView............................................................................................................

67

7 Monitoring Node Activity............................................................................................

69

7.1 The Xtools Utilities..........................................................................................................................

69

7.2

Running Performance Health Tests.................................................................................................

70

8 Tuning Applications.....................................................................................................

75

8.1

Using the Intel Trace Collector and Intel Trace Analyzer...............................................................

75

 

8.1.1 Building a Program — Intel Trace Collector and HP-MPI......................................................

75

 

8.1.2 Running a Program – Intel Trace Collector and HP-MPI.......................................................

76

8.2 The Intel Trace Collector and Analyzer with HP-MPI on HP XC...................................................

77

 

8.2.1 Installation Kit.........................................................................................................................

77

 

8.2.2 HP-MPI and the Intel Trace Collector.....................................................................................

77

8.3

Visualizing Data – Intel Trace Analyzer and HP-MPI....................................................................

79

9 Using SLURM................................................................................................................

81

9.1

Introduction to SLURM...................................................................................................................

81

9.2

SLURM Utilities...............................................................................................................................

81

9.3

Launching Jobs with the srun Command.......................................................................................

81

 

9.3.1 The srun Roles and Modes......................................................................................................

82

 

9.3.1.1 The srun Roles.................................................................................................................

82

 

9.3.1.2 The srun Modes...............................................................................................................

82

 

9.3.2 Using the srun Command with HP-MPI................................................................................

82

 

9.3.3 Using the srun Command with LSF........................................................................................

82

9.4

Monitoring Jobs with the squeue Command..................................................................................

82

9.5 Terminating Jobs with the scancel Command.................................................................................

83

9.6 Getting System Information with the sinfo Command...................................................................

83

Table of Contents

5