IBM pSeries manual Lapidebugcommtimeout, Lapidebugperf

Page 29

5.14 LAPI_DEBUG_COMM_TIMEOUT

If the LAPI protocol experiences communication timeouts, set the environment variable

LAPI_DEBUG_COMM_TIMEOUT to PAUSE. This causes the application to issue a pause() call when encountering a timeout, which stops the application instead of closing it.

5.15 LAPI_DEBUG_PERF

The LAPI_DEBUG_PERF flag is not supported and should not be used in production. However, it can provide useful information about packet loss. If you suspect packet drops are reducing performance, set the LAPI_DEBUG_PERF flag to yes (export LAPI_DEBUG_PERF=yes). The following additional information is sent to standard error in the job output:

_retransmit_pkt_cnt Tot_retrans_pkt_cnt LAPI Tot_retrans_pkt_cnt Shared Tot_retrans_pkt_cnt

Be aware that some retransmissions in the initialization stage are normal.

Here is a simple Perl script (count_drops) to count the number of lost packets. When LAPI_DEBUG_PERF is set to yes, this script is run against the STDERR of an LAPI job.

=============================================

#!/usr/bin/perl $retrans=0; $dup=0; $ftbl_drop=0; $diff=0;

while ( <STDIN> )

{

if( /(.*)Shared Tot_dup_pkt_cnt(.*)= (.*)/) { $dup += $3;

}

if( /(.*)Shared Tot_retrans_pkt_cnt(.*)= (.*)/) { $retrans += $3;

}

if( /(.*)_drop_due_to_usr_ftbl_not_setup(.*)= (.*)/) { $ftbl_drop += $3;

}

}

$diff=$retrans-$dup-$ftbl_drop;

printf "retrans=$retrans dup=$dup ftbl_drop=$ftbl_drop lost=$diff\n";

==============================================

pshpstuningguidewp040105.doc

Page 29

Image 29
Contents IBM ~pSeries High Performance Switch Contents Mpprintenv Mpstatistics Introduction MPI tunables for Parallel Environment Tunables and settings for switch softwareMpeagerlimit MppollingintervalMemoryaffinity Mprexmitbufsize and MprexmitbufcntMPI-IO MptaskaffinityMpcssinterrupt Chgsni command File cache Tunables and settings for AIX 5LIP tunables Svmon and vmstat commands Vsid Esid Type Description LPage Inuse Pin Pgsp Virtual SvmonPin Pgsp Virtual VmstatLarge page sizing Pshpstuningguidewp040105.doc Amount of memory available Large pages and IP supportMemory affinity for a single Lpar Rsct daemons Debug settings in the AIX 5L kernelDaemon configuration Reducing logging LoadLeveler daemonsReducing the number of daemons running AIX 5L mail, spool, and sync daemons Settings for AIX 5L threadsPlacement of POE managers and LoadLeveler scheduler Lsattr tuning Debug settings and data collection toolsDriverdebug setting Iptrclvl settingDeconfigured L3 cache Small Real Mode Address Region on HMC GUIService focal point Affinity LPARsMultiple versions of MPI libraries Errpt commandHMC error logging Mpprintenv Memoryaffinity MpstatisticsDropped switch packets Nddipacketsmsw 0x00000000 Nddipacketslsw Packets dropped in the ML0 interface Packets dropped because of a hardware problem on an endpoint Mpinfolevel Packets dropped in the switch hardwareLapidebugperf LapidebugcommtimeoutConclusions and summary AIX 5L trace for daemon activityAdditional reading HPS documentationMPI documentation POWER4AIX 5L performance guides IBM RedbooksPshpstuningguidewp040105.doc