IBM REDP-4285-00 4.5.3 HugeTLBfs

4285ch04.fm Draft Document for Review May 4, 2007 11:35 am

112 Linux Performance and Tuning Guidelines

/dev/sdd2 swap swap sw,pri=1 0 0

Swap partitions are used from the highest priority to the lowest (where 32767 is the highest

and 0 is the lowest). Giving the same priority to the first three disks causes the data to be

written to all three disks; the system does not wait until the first swap partition is full before it

starts to write on the next partition. The system uses the first three partitions in parallel and

performance generally improves.

The fourth partition is used if additional space is needed for swapping after the first three are

completely filled up. It is also possible to give all partitions the same priority to stripe the data

over all partitions, but if one drive is slower than the others, performance would decrease. A

general rule is that the swap partitions should be on the fastest drives available.

4.5.3 HugeTLBfs

This memory management feature is valuable for applications that use a large virtual address

space. It is especially useful for database applications.

The CPU’s Translation Lookaside Buffer (TLB) is a small cache used for storing virtual-to-

physical mapping information. By using the TLB, a translation can be performed without

referencing the in-memory page table entry that maps the virtual address. However, to keep

translations as fast as possible, the TLB is typically quite small. It is not uncommon for large

memory applications to exceed the mapping capacity of the TLB.

The HugeTLBfs feature permits an application to use a much larger page size than normal,

so that a single TLB entry can map a correspondingly larger address space. A HugeTLB

entry can vary in size. For example, in an Itanium® 2 system, a huge page might be 1000

times larger than a normal page. This enables the TLB to map 1000 times the virtual address

space of a normal process without incurring a TLB cache miss. For simplicity, this feature is

exposed to applications by means of a file system interface.

To allocate hugepage, you can define number of hugepages by configuring value at

/proc/sys/vm/nr_hugepages using sysctl command.

sysctl -w vm.nr_hugepages=512

If your application use huge pages through the mmap() system call, you have to mount a file

system of type hugetlbfs like this:

mount -t hugetlbfs none /mnt/hugepages

/proc/meminfo file will provide information about hugetlb pages as shown in Example 4-12.

Example 4-12 Hugepage information in /proc/meminfo

[root@lnxsu4 ~]# cat /proc/meminfo

MemTotal: 4037420 kB

MemFree: 386664 kB

Buffers: 60596 kB

Cached: 238264 kB

Important: Although there are good tools to tune the memory subsystem, frequent page

outs should be avoided as much as possible. The swap space is not a replacement for

RAM because it is stored on physical drives that have a significantly slower access time

than memory. Then frequent page out (or swap out) may is almost never a good behavior.

Before trying to improve the swap process, ensure that your server simply has enough

memory or that there is no memory leak.