4285ch04.fm Draft Document for Review May 4, 2007 11:35 am
110 Linux Performance and Tuning Guidelines
To help with spotting bottlenecks, statistics provided by the numastat tool are available in the
/sys/devices/system/node/%{node number}/numastat file. High values in numa_miss and
the other_node field signal a likely NUMA issue. If you find that a process is allocated memory
that does not reside on the local node for the process (the node that holds the processors that
run the application), try to renice the process to the other node or work with NUMA affinity.
4.5 Tuning the vm subsystem
Tuning the memory subsystem is a difficult task that requires constant monitoring to ensure
that changes do not negatively affect other subsystems in the server. If you do choose to
modify the virtual memory parameters (in /proc/sys/vm), we recommend that you change
only one parameter at a time and monitor how the server performs.
Remember that most applications under Linux do not write directly to the disk, but to the file
system cache maintained by the virtual memory manager that will eventually flush out the
data. When using an IBM ServeRAID controller or an IBM TotalStorage disk subsystem, you
should try to the decrease the number of flushes, effectively increasing the I/O stream caused
by each flush. The high-performance disk controller can handle the larger I/O stream more
efficiently than multiple small ones.

4.5.1 Setting kernel swap and pdflush behavior

With the introduction of the improved virtual memory subsystem in the Linux kernel 2.6,
administrators now have a simple interface to fine-tune the swapping behavior of the kernel.
򐂰The parameter stored in /proc/sys/vm/swappiness can be used to define how
aggressively memory pages are swapped to disk. An introduction to the Linux virtual
memory manager and the general use of swap space in Linux is discussed in “Page frame
reclaiming” on page14. It states that Linux moves memory pages that have not been
accessed for some time to the swap space even if there is enough free memory available.
By changing the percentage in /proc/sys/vm/swappiness you can control that behavior,
depending on the system configuration. If swapping is not desired,
/proc/sys/vm/swappiness should have low values. Systems with memory constraints that
run batch jobs (processes that sleep for a long time) might benefit from an aggressive
swapping behavior. To change swapping behavior, use either echo or sysctl as shown in
Example 4-7.
Example 4-7 Changing swappiness behavior
# sysctl -w vm.swappiness=100
򐂰Especially for fast disk subsystems, it may also be desirable to cause large flushes of dirty
memory pages. The value stored in /proc/sys/vm/dirty_background_ratio defines at
what percentage of main memory the pdflush daemon should write data out to the disk. If
larger flushes are desired then increasing the default value of 10% to a larger value will
cause less frequent flushes. As in the example above the value can be changed as shown
in Example 4-8.
Example 4-8 Increasing the wake up time of pdflush
# sysctl -w vm.dirty_background_ratio=25
򐂰Another related setting in the virtual memory subsystem is the ratio at which dirty pages
created by application disk writes will be flushed out to disk. As explained in chapter one
1.3.1, “Virtual file system”, writes to the file system will not be written instantly but rather
written in the page cache and flushed out to the disk subsystem at a later stage. Using the