HP c-Class Performance Tuning manual General tuning techniques, Multiple outstanding IOs

Page 16

General tuning techniques

Using direct I/O, unbuffered, or zero copy 10

Traditional I/O paths include the page cache, a DRAM cache of data stored on the disk. The IO Accelerator is fast enough that this and other traditional optimizations, such as I/O merging and reordering, are actually detrimental to performance. I/O merging and reordering are eliminated naturally by the IO Accelerator, but the page cache must be bypassed at the application level.

Direct I/O bypasses the page cache. This allows the same memory regions written by the application to be DMA-transferred or copied directly to the IO Accelerator rather than having to be copied into kernel-owned memory first, thus eliminating a copy.

Bypassing the page cache provides the following benefits:

Less complex write path

Lower overall CPU utilization

Less memory bandwidth usage

In most cases, direct I/O is beneficial for IO Accelerator-based systems, but you should benchmark your application to be sure of this. Some applications that do not use AIO, threads, or multiple processes to create multiple outstanding requests might benefit from the page cache instead.

Many I/O-intensive applications have tunable parameters that control how they interact with the low-level IO subsystem, including turning on direct I/O.

For other applications, it is necessary for the application provider to enable direct I/O or to modify the source to enable direct I/O, and then recompile.

For a more in-depth look at how to write C code using Direct IO, see "Programming using Direct IO."

dd support

More recent versions of dd support the oflag=direct and iflag=direc}}t parameters. These enable direct I/O for either the file being written to or the file being read from, respectively. Use the {{oflag=direct parameter when writing to an IO Accelerator and the iflag=direct parameter when reading from an IO Accelerator.

ioZone Benchmark

ioZone supports the -Ioption to enable Direct I/O.

fio Benchmark

Fio uses the direct=1 setting in the job file, or --direct=1on the command line to enable Direct I/O.

Multiple outstanding IOs

The IO Accelerator is more like a storage controller than a single disk. Like other storage controllers, it performs best when multiple requests are outstanding. Unlike other storage solutions that rely on legacy

General tuning techniques 16

Image 16
Contents HP IO Accelerator Performance Tuning Guide Page Contents Setting Windows driver affinity About the Performance and Tuning Guide IntroductionSystem performance Verifying Linux system performanceWrite bandwidth test System performance Verifying Windows system performance with Iometer Improperly configured benchmark Debugging performance issuesOversubscribed bus Handling PCIe errors PCIe link width improperly negotiated CPU thermal throttling or auto-idling Using CP and other system utilities Slow performance using RAID5 on LinuxBenchmarking through a filesystem To avoid this issue. For more information, see the patch Using direct I/O, unbuffered, or zero copy General tuning techniquesMultiple outstanding IOs Pre-conditioning $ dd if=/dev/zero of=/dev/fioX bs=10M oflag=directPre-allocating memory $ echo 4096 /sys/block/fio name/queue/nrrequestsPreallocatemb Tuning techniques for writes Increased steady-state write performance with fio-formatLinux filesystem tuning Ext2-3-4 tuningStride = chunk size / filesystem block size Stripewidth = dbd * strideUsing the IO Accelerator as swap space Options iomemory-vsl preallocatememory=1072,4997,6710,10345Compiling the fio benchmark Fio benchmark$ tar xjvf fio-X.Y.Z.tar.bz2 $ cd fio-X.Y.Z Page Programming using direct I/O Using direct I/O on LinuxFd = openfilename, Owronly Fd = openfilename, Owronly OdirectUsing direct I/O on Windows ++ code sample Programming using direct I/O Programming using direct I/O Setting Windows driver affinity Windows driver affinityCreate the SetWorkerAffinity2 tag of type Regdword Acronyms and abbreviations Index Index