Problem: cannot start parallel task
Or logged on to this node, run psiadmin which also starts up the ParaStation daemon psid. See Section 6.1, “ Problem:psiadmin returns error ” for more details.
Check the logfile /var/log/messages on this node for error messages. Verify that all nodes have an identical configuration (/etc/parastation.conf).
6.3. Problem: cannot start parallel task
Problem: a parallel task cannot be launched, an error is reported:
PSI: PSI_createPartition: Resource temporarily unavailable
Check for available nodes and active parallel tasks. Check for user or group restrictions.
If the error
PSI: dospawn: spawn to node 1 failed.
PSE: Could not spawn './mpi_latency' process 1, error = Bad \ file descriptor.
is reported, check if the current directory holding the program mpi_latency is accessible on all nodes. Verify that the program is executable on all nodes.
6.4. Problem: bad performance
Verify that the proper interconnect and/or transport is used: check for environment variables controlling transport (see Section 5.8, “ControllingParaStation5 communication paths” and ps_environment(5)).
Watch protocol counters, e.g. counters indicating timeouts, retries, errors or other bad conditions. For p4sock, check recv_net_data and recv_user. See Section 5.2, “ParaStation5 protocol p4sock”.
Look for a crystal bowl!
Or contact
6.5.Problem: different groups of nodes are seen as up or down
Problem: depending on which node the psiadmin is run, different groups of nodes are seen as "up" or "down".
Check for identical configuration on each node, e.g. compare the configuration file /etc/ parastation.conf on each node.
6.6. Problem: cannot start process on frontend
Problem: Starting a job is canceled giving the error message
Connecting client 139.27.166.22:44784 (rank 6) failed : Network is unreachable
PSIlogger: Child with rank 12 exited with status 1.
30 | ParaStation5 Administrator's Guide |