pam

pam

Parallel Application Manager – job starter for MPI applications

Synopsis

HP-UX vendor MPI syntax

bsub pam -mpi mpirun [mpirun_options ] mpi_app [argument ...]

SGI vendor MPI syntax

bsub pam [-n num_tasks ] -mpi -auto_place mpi_app [argument ...]

Generic PJL framework syntax

bsub pam [-t][-v][-nnum_tasks ] -g[num_args] pjl_wrapper [pjl_options] mpi_app [argument ...]

pam [-h][-V]

Description

The Parallel Application Manager (PAM) is the point of control for Platform LSF. PAM is fully integrated with Platform LSF to interface the user application with LSF. PAM acts as the supervisor of a parallel LSF job.

MPI jobs started by pam can only be submitted through the LSF Batch system. PAM cannot be used interactively to start parallel jobs. sbatchd starts PAM on the first execution host.

For all parallel application processes (tasks), PAM:

Uses a vendor MPI library or an MPI Parallel Job Launcher (PJL; for example, mpirun, poe) to start a parallel job on a specified set of hosts in a LSF cluster.

PAM contacts RES on each execution host allocated to the parallel job.

PAM queries RES periodically to collect resource usage for each parallel task and passes control signals through RES to all process groups and individual running tasks, and cleans up tasks as needed.

Passes job-level resource usage and process IDs (PIDs and PGIDs) to sbatchd for enforcement

Collects resource usage information and exit status upon termination

Task startup for vendor MPI jobs

The pam command starts a vendor MPI job on a specified set of hosts in a LSF cluster. Using pam to start an MPI job requires the underlying MPI system to be LSF aware, using a vendor MPI implementation that supports LSF (SGI IRIX vendor MPI or HP-UX vendor MPI).

PAM uses the vendor MPI library to spawn the child processes needed for the parallel tasks that make up your MPI application. It starts these tasks on the systems allocated by LSF. The allocation includes the number of execution hosts needed, and the number of child processes needed on each host.

282Platform LSF Command Reference