On the HMC GUI, select Service Applications
5.7 errpt command
On AIX 5L, the errpt command lists a summary of system error messages. Some of the HPS subsystem errors are collected by errpt. To find out if you have hardware errors, you can either run the errpt command, or you can run the dsh command from the CSM manager:
dsh errpt grep “ 0223” grep sysplanar0 (The value 0223 is the month and day.)
You can also look at /var/adm/sni/sni_errpt_capture on the LPAR that is reporting the error.
If you see any errors from sni in the errpt listing, check the sni logs for more specific information. The HPS logs are found in a set of directories under the /var/adm/sni directory.
5.8 HMC error logging
The HMC records errors in the /var/hsc/log directory. Here is an example of a command to check for cyclical redundancy check (CRC) errors in the FNM_Recover.log:
grep
In general, if Service Focal Point is working properly, you should not need to check the
FNM_Comm.log
FNM_Ice.log
FNM_Init.log
FNM_Route.log
Another debug command you can run on the HMC is lsswtopol
lsswtopol
If the lsswtopol command calls out links as ”service required,” but these links do not show up in Service Focal Point, contact IBM service.
5.9 Multiple versions of MPI libraries
One common problem on clustered systems is having different MPI library levels on various nodes. This can occur when a node is down for service while an upgrade is made, or when there are multiple versions of the libraries for each node and the links are broken. To check the library levels across a large system, use the following dsh commands:
•For LAPI libraries: dsh sum /opt/rsct/lapi/lib/liblapi_r.a (or run with MP_INFOLEVEL=2)
pshpstuningguidewp040105.doc | Page 21 |