statistics in
vmstat 5
The pi and po of the page group is the number of 4KB pages read from and written to the paging device between consecutive samplings. If po is high, it could indicate that thrashing is taking place. In that case, it is a good idea to run the svmon command to see the
3.4 Large page sizing
Some HPC applications that use Technical Large Pages (TLPs) can benefit from a 5 - 20% increase in performance. There are two reasons why TLPs boost performance:
•Because the hardware prefetch streams cross fewer page boundaries, they are more efficient.
•Because missing the translation lookaside buffer is less likely, there is a better chance of using a fast path for address translation.
TLPs must be configured by the root user and require a system reboot as described below. The operating system limits the maximum number of TLP to about 80% of the total physical storage on the system. The application can choose to use small pages only, large pages only, or both.
Using both small and large pages is also known as an advisory mode recommended for high performance computing applications.
You can enable the application for TLPs by using the loader flag, by means of the ldedit command, or by using the environment variable at run time. The ldedit command enables the application for TLPs in the advisory mode:
ldedit
You can use
Setting the LDR_CNTRL environment variable enables TLPs in the advisory mode for all processes spawned from a shell process and their children. Here is an example:
export LDR_CNTRL=LARGE_PAGE_DATA=Y
Setting the environment variable has a side effect for MPI jobs spawned by the MPI daemons from the shell process, because it also enables the daemons for TLPs. This takes away about 512MB of physical memory from an application. TLPs by their nature are pinned in memory (they cannot be paged out). In addition, TLPs are mapped into the process address space with segment granularity (256MB) even if the process uses only a few bytes in that segment. As a result, each of the two MPI daemons gets 256MB of pinned memory. For that reason, you should avoid using the LDR_CNTRL environment variable with MPI jobs.
Using TLPs boosts the performance of the MPI protocol stack. Some of the TLPs are reserved by the HPS adapter code at boot time and are not available to an application as long as the HPS
pshpstuningguidewp040105.doc | Page 13 |