MAC WOF | (2F870): Bit: 1 |
[. . .] |
|
5.12.4 Packets dropped in the switch hardware
If a packet is dropped within the switch hardware itself (for example, when traversing the link between two switch chips), evidence of the packet drop is on the HMC, where the switch Federation Network Manager (FNM) runs. You can run /opt/hsc/bin/fnm.snap to create a snap archive in /var/hsc/log (for
The FNM code handles errors associated with packet drops in the switch. To run the fnm.snap command (/opt/hsc/bin/fnm.snap), you must have root access or set up proper authentication. In the snap data, check the FNM_Recov.* logs for switch errors. If a certain type of error reached a threshold in the hardware, reporting for that type of error might be disabled. As a result, packet loss might not be reported. Generally, when you are looking for packet loss, it's a good idea to restart the FNM code to ensure that error reporting is reset.
5.13 MP_INFOLEVEL
You can get additional information from an MPI job by setting the MPI_INFOLEVEL variable to 2. In addition, if you set the MP_LABELIO variable to yes, you can get information for each task. Here is an example of the output using these settings:
INFO: | Contacting LoadLeveler to set and query information for interactive job | |
INFO: | LoadLeveler step ID is test_mach1.customer.com.2507.0 | |
INFO: | Host test_mach1.customer.com requested for task 0 | |
INFO: | Host test_mach2.customer.com requested for task 1 | |
INFO: | Host test_mach1.customer.com allocated for task 0 | |
INFO: | Host address 10.10.10.1 allocated for task 0 | |
INFO: | Using sn1 for MPI euidevice for task 0 | |
INFO: | Host test_mach2.customer.com allocated for task 1 | |
INFO: | Host address 10.10.10.2 allocated for task 1 | |
INFO: | Using sn1 for MPI euidevice for task 1 | |
1:INFO: | Executing program: | |
0:INFO: | Executing program: |
1:LAPI version #7.9 2004/11/05 1.144 src/rsct/lapi/lapi.c, lapi, rsct_rir2, rir20446a 32bit(us) library compiled on Wed Nov 10 06:44:38 2004
1:LAPI is using lightweight lock. 1:Bulk Transfer is enabled.
1:Shared memory not used on this node due to sole task running. 1:The LAPI lock is used for the job
0:INFO:
library compiled on Wed Nov 10 06:44:38 2004 0:LAPI is using lightweight lock.
0:Bulk Transfer is enabled.
0:Shared memory not used on this node due to sole task running. 0:The LAPI lock is used for the job
pshpstuningguidewp040105.doc | Page 28 |