Compaq SC RMS 7.4 What Happens When a Request is Received

What Happens When a Request is Received

Time Limit

Jobs are normally run to completion or until they are preempted by a higher priority

request. Each partition may have a time limit associated with it which restricts the

amount of time the Partition Manager may allow for a parallel job. On expiry of this

time limit, the job is sent a SIGXCPU signal. A period of grace is allowed following this

signal for the job to clean up and exit. After this period, the job is killed and the resource

deallocated. The duration of the grace period is speciﬁed in the attributes table (see

Section 10.2.3) and can be set using rcontrol.

Memory Size

The Partition Manager can enforce memory limits that restrict the size of a job. The

default memory limits are designed to prevent memory starvation (a node having free

CPUs but no memory) and to control whether parallel jobs page or not.

7.4 What Happens When a Request is Received

A user’s request for resources, made through the RMS commands prun or allocate,

speciﬁes the following parameters:

cpus The total number of CPUs to be allocated.

nodes The number of nodes across which the CPUs are to be allocated. This

parameter is optional.

base node The identiﬁer of the ﬁrst node to be allocated. This parameteris

optional.

hwbcast A contiguous range of nodes. This parameter is optional. When a

contiguous range ofnodes is allocated to a job, messages can be

broadcast in hardware. This offers advantages of speed over a

software implementation if the job makes use of broadcast operations.

memory The amount of memory required per CPU. This parameter is optional

(set through the environment variable RMS_MEMLIMIT) but jobs with

low memory requirements may be scheduled sooner if they make

these requirements explicit.

time limit The length of time for which the CPUs are required. This parameter is

optional (set through the environment variable RMS_TIMELIMIT).

samecpus The same set of CPUs on each node. This parameter is optional.

RMS Scheduling 7-3

Contents

Main Document Revision History Contents Page Page Page Page Page Page Page List of Figures Page List of Tables Page Introduction 1.1 Scope of Manual 1.2 Audience 1.3 Using this Manual Page 1.4 Related Information 1.5 Location of Online Documentation 1.6 Readers Comments 1.7 Conventions Page Overview of RMS 2.1 Introduction 2.2 The System Architecture 2.2.1 Nodes ... 2.3 The Role of the RMS 2.3.1 The Structure of the RMS 2.3.2 The RMS Daemons 2.3.3 The RMS Commands 2.3.4 The RMS Database 2.4 RMS Management Functions 2.4.1 Allocating Resources Parallel 2.4.2 Scheduling 16Nodes 4CPUs 2.4.3 Access Control and Accounting 2.4.4 RMS Conguration Parallel Programs Under RMS 3.1 Introduction 3.2 Resource Requests 3.3 Loading and Running Programs Page Page Page RMS Daemons 4.1 Introduction 4.1.1 Startup 4.1.2 Log Files 4.1.3 Daemon Status 4.2 The Database Manager 4.3 The Machine Manager 4.3.1 Interaction with the Database 4.4 The Partition Manager 4.4.1 Partition Startup 4.4.2 Interaction with the Database 4.5 The Switch Network Manager 4.5.1 Interaction with the Database 4.6 The Transaction Log Manager 4.6.1 Interaction with the Database 4.7 The Event Manager 4.7.1 Interaction with the Database 4.8 The Process Manager 4.8.1 Interaction with the Database 4.9 The RMS Daemon 4.9.1 Interaction with the Database RMS Commands 5.1 Introduction Page Page Page ENVIRONMENT VARIABLES WARNINGS Page Page Page Page Page Page Page Page Page ENVIRONMENT VARIABLES Page Page WARNINGS Page Page Page Page Page Page Page Page Page Page Page Page Page Page Page Page Page Page rmsctl(1) Tostart the RMS system, use rmsctl as follows: Toshow the status of the RMS system, use rmsctl as follows: rcontrol 5-38 RMS Commands Page Page Page Page Page Page Page Page Access Control, Usage Limits and Accounting 6.1 Introduction 6.2 Users and Projects 6.3 Access Controls 6.3.1 Access Controls Example 6.4 How Access Controls are Applied 6.4.1 Memory Limit Rules 6.4.2 Priority Rules 6.4.3 CPU Usage Limit Rules 6.5 Accounting Page Page RMS Scheduling 7.1 Introduction 7.2 Scheduling Policies 7.3 Scheduling Constraints 7.4 What Happens When a Request is Received Page 7.4.1 Memory Limits 7.4.2 Swap Space 7.4.3 Time Slicing 7.4.4 Suspend and Resume 7.4.5 Idle Time Event Handling 8.1 Introduction 8.1.1 Posting Events 8.1.2 Waiting on Events 8.2 Event Handling 8.3 List of Events Generated Page 8.3.1 Extending the RMS Event Handling Mechanism Setting up RMS 9.1 Introduction 9.2 Installation Planning 9.2.1 Node Names 9.3 Setting up RMS 9.3.1 Starting RMS 9.3.2 Initial Setup with One Partition 9.3.3 Simple Day/Night Setup 9.4 Day-to-Day Operation 9.4.1 Periodic Shift Changes 9.4.2 Backing Up the Database 9.4.3 Summarizing Accounting Data 9.4.4 Archiving Data 9.4.5 Database Maintenance Page 9.4.6 Conguring Nodes Out 9.5 Local Customization of RMS 9.5.1 Partition Startup 9.5.2 Core File Handling 9.5.3 Event Handling 9.5.4 Switch Manager Conguration 9.6 Log Files The RMS Database 10.1 Introduction 10.1.1 General Information about the Tables 10.1.2 Access to the Database 10.1.3 Categories of Table Page 10.2 Listing of Tables 10.2.1 The Access Controls Table 10.2.2 The Accounting Statistics Table Page 10.2.3 The Attributes Table Page 10.2.4 The Elans Table 10.2.5 The Elites Table 10.2.6 The Events Table 10.2.7 The Event Handlers Table 10.2.8 The Fields Table 10.2.9 The Installed Components Table 10.2.10 The Jobs Table 10.2.11 The Link Errors Table 10.2.12 The Modules Table 10.2.13 The Module Types Table 10.2.14 The Nodes Table 10.2.15 The Node Statistics Table 10.2.16 The Partitions Table Page 10.2.17 The Projects Table 10.2.18 The Resources Table 10.2.19 The Servers Table 10.2.20 The Services Table 10.2.21 The Software Products Table 10.2.22 The Switch Boards Table 10.2.23 The Transactions Table 10.2.24 The Users Table A Compaq AlphaServer SC Interconnect Terms A.1 Introduction Page Page A.2 Link States A.3 Link Errors B RMS Status Values B.1 Overview B.2 Generic Status Values B.3 Job Status Values B.4 Link Status Values B.5 Module Status Values B.6 Node Status Values B.7 Partition Status Values B.8 Resource Status Values B.9 Transaction Status Values C RMS Kernel Module C.1 Introduction C.2 Capabilities C.3 System Call Interface Page Page Page Page Page Page Page Page Page Page Page Page D RMS Application Interface D.1 Introduction Page Page Page Page Page Page Page E Accounting Summary Script E.1 Introduction E.2 Command Line Interface E.3 Example Output E.4 Listing of the Script Accounting Summary Script E-3 E-4 Accounting Summary Script Accounting Summary Script E-5 E-6 Accounting Summary Script Accounting Summary Script E-7 E-8 Accounting Summary Script Glossary Abbreviations Page Terms Page Page Page Index A B C D E G I J L P R S T U