IBM pSeries manual Large page sizing

Page 13

statistics in 5-second intervals, with the first set of statistics being the statistics since the node or LPAR was last booted.

vmstat 5

The pi and po of the page group is the number of 4KB pages read from and written to the paging device between consecutive samplings. If po is high, it could indicate that thrashing is taking place. In that case, it is a good idea to run the svmon command to see the system-wide virtual segment allocation.

3.4 Large page sizing

Some HPC applications that use Technical Large Pages (TLPs) can benefit from a 5 - 20% increase in performance. There are two reasons why TLPs boost performance:

Because the hardware prefetch streams cross fewer page boundaries, they are more efficient.

Because missing the translation lookaside buffer is less likely, there is a better chance of using a fast path for address translation.

TLPs must be configured by the root user and require a system reboot as described below. The operating system limits the maximum number of TLP to about 80% of the total physical storage on the system. The application can choose to use small pages only, large pages only, or both.

Using both small and large pages is also known as an advisory mode recommended for high performance computing applications.

You can enable the application for TLPs by using the loader flag, by means of the ldedit command, or by using the environment variable at run time. The ldedit command enables the application for TLPs in the advisory mode:

ldedit –b lpdata <executable path name>

You can use –b nolpdata to turn TLPs off. The –b lpdata loader flag on the ld command does the same thing.

Setting the LDR_CNTRL environment variable enables TLPs in the advisory mode for all processes spawned from a shell process and their children. Here is an example:

export LDR_CNTRL=LARGE_PAGE_DATA=Y

Setting the environment variable has a side effect for MPI jobs spawned by the MPI daemons from the shell process, because it also enables the daemons for TLPs. This takes away about 512MB of physical memory from an application. TLPs by their nature are pinned in memory (they cannot be paged out). In addition, TLPs are mapped into the process address space with segment granularity (256MB) even if the process uses only a few bytes in that segment. As a result, each of the two MPI daemons gets 256MB of pinned memory. For that reason, you should avoid using the LDR_CNTRL environment variable with MPI jobs.

Using TLPs boosts the performance of the MPI protocol stack. Some of the TLPs are reserved by the HPS adapter code at boot time and are not available to an application as long as the HPS

pshpstuningguidewp040105.doc

Page 13

Image 13
Contents IBM ~pSeries High Performance Switch Contents Mpprintenv Mpstatistics Introduction MPI tunables for Parallel Environment Tunables and settings for switch softwareMpeagerlimit MppollingintervalMemoryaffinity Mprexmitbufsize and MprexmitbufcntMpcssinterrupt MptaskaffinityMPI-IO Chgsni command IP tunables Tunables and settings for AIX 5LFile cache Svmon and vmstat commands Vsid Esid Type Description LPage Inuse Pin Pgsp Virtual SvmonPin Pgsp Virtual VmstatLarge page sizing Pshpstuningguidewp040105.doc Memory affinity for a single Lpar Large pages and IP supportAmount of memory available Daemon configuration Debug settings in the AIX 5L kernelRsct daemons Reducing the number of daemons running LoadLeveler daemonsReducing logging Placement of POE managers and LoadLeveler scheduler Settings for AIX 5L threadsAIX 5L mail, spool, and sync daemons Lsattr tuning Debug settings and data collection toolsDriverdebug setting Iptrclvl settingDeconfigured L3 cache Small Real Mode Address Region on HMC GUIService focal point Affinity LPARsHMC error logging Errpt commandMultiple versions of MPI libraries Mpprintenv Memoryaffinity MpstatisticsDropped switch packets Nddipacketsmsw 0x00000000 Nddipacketslsw Packets dropped in the ML0 interface Packets dropped because of a hardware problem on an endpoint Mpinfolevel Packets dropped in the switch hardwareLapidebugperf LapidebugcommtimeoutConclusions and summary AIX 5L trace for daemon activityAdditional reading HPS documentationMPI documentation POWER4AIX 5L performance guides IBM RedbooksPshpstuningguidewp040105.doc