Compaq SC RMS Parallel Programs Under RMS

3

Parallel Programs Under RMS

3.1 Introduction

RMS provides users with tools for running parallel programs and monitoring their

execution, as described in Chapter 5 (RMS Commands). Users can determine what

resources are available to themand request allocation of the CPUs and memory required

to run their programs. This chapter describes the structure of parallel programs under

RMS and how they are run.

A parallel program consists of a controlling process, prun, and a number of application

processes distributed over oneor more nodes. Each process may havemultiple threads

running on one or more CPUs. prun can run on any node in the system but it normally

runs in a login partition or on an interactive node.

In a system with SMP nodes,RMS can allocate CPUs so as to use all of the CPUs on the

minimum number of nodes (a block distribution); alternatively, it can allocate a speciﬁed

number of CPUs on each node (a cyclic distribution). This ﬂexibility allows users to

choose between the competing beneﬁts of increased CPU count and memory size on each

node (generally good for multithreaded applications) and increased numbers of nodes

(generally best for applications requiring increased total memory size, memory

bandwidth and I/O bandwidth).

Parallel programs can be written so that they will run with varying numbers of CPUs

and varying numbers of CPUs per node. They can, for example, query the number of

processors allocated and determinetheir data distributions and communications

patterns accordingly (see Appendix C (RMS Kernel Module)for details).

Parallel ProgramsUnder RMS 3-1

Contents

Main Document Revision History Contents Page Page Page Page Page Page Page List of Figures Page List of Tables Page Introduction 1.1 Scope of Manual 1.2 Audience 1.3 Using this Manual Page 1.4 Related Information 1.5 Location of Online Documentation 1.6 Readers Comments 1.7 Conventions Page Overview of RMS 2.1 Introduction 2.2 The System Architecture 2.2.1 Nodes ... 2.3 The Role of the RMS 2.3.1 The Structure of the RMS 2.3.2 The RMS Daemons 2.3.3 The RMS Commands 2.3.4 The RMS Database 2.4 RMS Management Functions 2.4.1 Allocating Resources Parallel 2.4.2 Scheduling 16Nodes 4CPUs 2.4.3 Access Control and Accounting 2.4.4 RMS Conguration Parallel Programs Under RMS 3.1 Introduction 3.2 Resource Requests 3.3 Loading and Running Programs Page Page Page RMS Daemons 4.1 Introduction 4.1.1 Startup 4.1.2 Log Files 4.1.3 Daemon Status 4.2 The Database Manager 4.3 The Machine Manager 4.3.1 Interaction with the Database 4.4 The Partition Manager 4.4.1 Partition Startup 4.4.2 Interaction with the Database 4.5 The Switch Network Manager 4.5.1 Interaction with the Database 4.6 The Transaction Log Manager 4.6.1 Interaction with the Database 4.7 The Event Manager 4.7.1 Interaction with the Database 4.8 The Process Manager 4.8.1 Interaction with the Database 4.9 The RMS Daemon 4.9.1 Interaction with the Database RMS Commands 5.1 Introduction Page Page Page ENVIRONMENT VARIABLES WARNINGS Page Page Page Page Page Page Page Page Page ENVIRONMENT VARIABLES Page Page WARNINGS Page Page Page Page Page Page Page Page Page Page Page Page Page Page Page Page Page Page rmsctl(1) Tostart the RMS system, use rmsctl as follows: Toshow the status of the RMS system, use rmsctl as follows: rcontrol 5-38 RMS Commands Page Page Page Page Page Page Page Page Access Control, Usage Limits and Accounting 6.1 Introduction 6.2 Users and Projects 6.3 Access Controls 6.3.1 Access Controls Example 6.4 How Access Controls are Applied 6.4.1 Memory Limit Rules 6.4.2 Priority Rules 6.4.3 CPU Usage Limit Rules 6.5 Accounting Page Page RMS Scheduling 7.1 Introduction 7.2 Scheduling Policies 7.3 Scheduling Constraints 7.4 What Happens When a Request is Received Page 7.4.1 Memory Limits 7.4.2 Swap Space 7.4.3 Time Slicing 7.4.4 Suspend and Resume 7.4.5 Idle Time Event Handling 8.1 Introduction 8.1.1 Posting Events 8.1.2 Waiting on Events 8.2 Event Handling 8.3 List of Events Generated Page 8.3.1 Extending the RMS Event Handling Mechanism Setting up RMS 9.1 Introduction 9.2 Installation Planning 9.2.1 Node Names 9.3 Setting up RMS 9.3.1 Starting RMS 9.3.2 Initial Setup with One Partition 9.3.3 Simple Day/Night Setup 9.4 Day-to-Day Operation 9.4.1 Periodic Shift Changes 9.4.2 Backing Up the Database 9.4.3 Summarizing Accounting Data 9.4.4 Archiving Data 9.4.5 Database Maintenance Page 9.4.6 Conguring Nodes Out 9.5 Local Customization of RMS 9.5.1 Partition Startup 9.5.2 Core File Handling 9.5.3 Event Handling 9.5.4 Switch Manager Conguration 9.6 Log Files The RMS Database 10.1 Introduction 10.1.1 General Information about the Tables 10.1.2 Access to the Database 10.1.3 Categories of Table Page 10.2 Listing of Tables 10.2.1 The Access Controls Table 10.2.2 The Accounting Statistics Table Page 10.2.3 The Attributes Table Page 10.2.4 The Elans Table 10.2.5 The Elites Table 10.2.6 The Events Table 10.2.7 The Event Handlers Table 10.2.8 The Fields Table 10.2.9 The Installed Components Table 10.2.10 The Jobs Table 10.2.11 The Link Errors Table 10.2.12 The Modules Table 10.2.13 The Module Types Table 10.2.14 The Nodes Table 10.2.15 The Node Statistics Table 10.2.16 The Partitions Table Page 10.2.17 The Projects Table 10.2.18 The Resources Table 10.2.19 The Servers Table 10.2.20 The Services Table 10.2.21 The Software Products Table 10.2.22 The Switch Boards Table 10.2.23 The Transactions Table 10.2.24 The Users Table A Compaq AlphaServer SC Interconnect Terms A.1 Introduction Page Page A.2 Link States A.3 Link Errors B RMS Status Values B.1 Overview B.2 Generic Status Values B.3 Job Status Values B.4 Link Status Values B.5 Module Status Values B.6 Node Status Values B.7 Partition Status Values B.8 Resource Status Values B.9 Transaction Status Values C RMS Kernel Module C.1 Introduction C.2 Capabilities C.3 System Call Interface Page Page Page Page Page Page Page Page Page Page Page Page D RMS Application Interface D.1 Introduction Page Page Page Page Page Page Page E Accounting Summary Script E.1 Introduction E.2 Command Line Interface E.3 Example Output E.4 Listing of the Script Accounting Summary Script E-3 E-4 Accounting Summary Script Accounting Summary Script E-5 E-6 Accounting Summary Script Accounting Summary Script E-7 E-8 Accounting Summary Script Glossary Abbreviations Page Terms Page Page Page Index A B C D E G I J L P R S T U