The Partition Manager
The Partition Manager makes new scheduling decisions periodically and in response to
incoming resource requests (see Chapter 7 (RMS Scheduling)for details). These
decisions may result in jobs being suspendedor resumed. Such schedulingoperations,
together with those performed as jobs are killed, are performedby the Partition
Manager sending scheduling or signal delivery requests to the rmsds.
The Partition Manager is connected to its rmsds by a tree of sockets. Commands are
routed down this tree; they complete when an acknowledgement is returned. For
example,jobs are only marked as finished when the Partition Manager has confirmed
that all of their processes have exited.
If the tree of sockets is broken by a node crash, the Partition Manager marksthe node’s
partition as blocked and generates an event. The node can then berebooted or
configured outof the machine. If the nodeis rebooted, the rmsds reconnect and the
Partition Manager continues as before. If the node cannot be rebooted then the partition
must be halted, the node configuredout and the partition restarted. Jobs that spanned
the failing node are cleanedup at this point. The other jobs run on unless explicitly
killed. Scheduling and signal delivery operations are suspended while the partition is
blocked.
4.4.1 Partition Startup
The Partition Manager is started by the rmsmhd daemon, running on the rmshost node,
on instruction from rcontrol (see Page 5-20). Once the partition is running, a startup
script /opt/rms/etc/pstartupis executed. This script performs site-specific and
OS-specific actions depending upon the partition type.
4.4.2 Interaction with the Database
The Partition Manager makes updates to the partitions table (see Section 10.2.16)
when it starts and as CPUs are allocated and freed.
The Partition Manager creates an entry in the resources table (see Section 10.2.18)
each time a user makes a request for resourcesto run a job. This entry is updated each
time CPUs are allocated or deallocated. The Partition Manager adds an entry to the
jobs table (see Section 10.2.10) as each job starts, updating it if the job is suspendedor
resumed and when the job completes.
The Partition Manager creates an entry in the accounting statistics (acctstats) table
(see Section 10.2.2) when CPUs are allocated. The entry is updated periodically until the
request completes.
The Partition Manager consults the users table (see Section 10.2.24), the projects
table (see Section 10.2.17) and the access_controls table (see Section 10.2.1) to verify
users’ access permissions and usage limits.
4-4 RMS Daemons