Appendix B

Troubleshooting

 

 

 

 

This appendix offers initial suggestions for what to do when something goes wrong with applications running together with SMC. When problems occur, first check the list of common errors and their solutions; an updated list of SMC-related Frequently Asked Questions (FAQ) is posted in the Support section of the Scali website (http://www.scali.com). If you are unable to find a solution to the problem(s) there, please read this chapter before contacting support@scali.com.

Problems and fixes reported to Scali will eventually be included in the appropriate sections of this manual. Please send relevant remarks by e-mail to support@scali.com.

Many problems find their origin in not using the right application code, daemons that Scali MPI Connect rely on are stopped, and incomplete specification of network drivers. Below some typical problems and their solutions are described. Troubleshooting the DAT functionality is described in C-11.

B-1 When things do not work - troubleshooting

This section is intended to serve as a starting point to help with software and hardware debugging. The main focus is on locating and repairing faulty hardware and software setup, but can also be helpful in getting started after installing a new system. For a description of the Scali Manage GUI, see the Scali System Guide.

B-1.1 Why does not my program start to run?

mpimon: command not found.

‹Include /opt/scali/bin in the PATH environment variable. mpimon can’t find mpisubmon.

‹Set MPI_HOME=/opt/scali or use the -execpath option.

The application has problems loading libraries (libsca*).

‹Update the LD_LIBRARY_PATH to include /opt/scali/lib.

Incompatible MPI versions.

mpid, mpimon, mpisubmon and the libraries all have version variables that are checked at start-up. To insure that these are correct, try the following:

1.Set the environment variable MPI_HOME correctly

2.Restart mpid, because a new version of ScaMPI has been installed without restarting mpid

3.Reinstall SMC, because a new version of SMC was not cleanly installed on all nodes.

Set working directory failed

‹SMC assumes that there is a homogenous file-structure. If you start mpimon from a directory that is not available on all nodes you must set SCAMPI_WORKING_DIRECTORY to point to a directory that is available on all nodes.

ScaMPI uses wrong interface for TCP-IP on frontend with more than one interface

‹Set SCAMPI_NODENAME to hostname of correct interface.

MPI_Wtime gives strange values

‹SMC uses a hardware-supported high precision timer for MPI_Wtime. This timer can be disabled by using SCAMPI_DISABLE_HPT=1

Scali MPI Connect Release 4.4 Users Guide

54

Page 66
Image 66
Escali 4.4 manual Appendix B, When things do not work troubleshooting, Why does not my program start to run?