HP c-Class Performance Tuning manual Setting Windows driver affinity

Page 29

Windows driver affinity

Setting Windows driver affinity

On a multiprocessor system, the operating system routes an I/O request through as efficient a path as its programming permits. Often this path is not the optimal performance path, primarily due to system architecture. A user who is aware of the particular hardware layout of a system can maximize driver performance by specifying the routing of its I/O. The HP IO Accelerator Windows® driver provides a mechanism to specify an affinity of its I/O to a particular processor or set of processors. This helps the operating system route the requests through a more efficient path.

Many multiprocessor systems employ a uniform method of routing request threads. These threads receive a relatively equal amount of processor time from any processor assigned by the operating system. More recently, some systems have been developed with the NUMA architecture which couples a subset of total physical memory with a node containing one or more processors. One advantage to the NUMA architecture is the improvement in throughput of operations that can be handled using a particular processor and its locally associated memory. The disadvantage to NUMA becomes apparent when the operation must take place between a particular processor and memory that is physically associated with a processor on a different node.

To overcome this disadvantage, the operating system permits applications to programatically specify the affinity of an operation, interrupt, or a thread with a particular processor or set of processors. For the Windows Server® 2008 operating system, the IO Accelerator driver has been updated to accept user-specified values that represent processor masks. These masks enable interrupts and worker threads to be associated with the processors bitmapped by the masks.

To implement one of these values:

1.Open the Windows® Registry editor.

2.In the Windows® Registry, locate the following key:

HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Services/fiodrive/Parameters

/

3.Create a Windows® Registry tag of type REG_DWORD, and then name it SetWorkerAffinity?

The ? is replaced by a decimal value equal to the bus number of the IO Accelerator card for which the affinity is being set. This bus number is the same decimal value that appears in the fct? display after an IO Accelerator utility such as fio-statushas run.

To make the tag effective in the driver, reboot the system or enable the driver, assuming defaults for other fiodrive tags.

Example

For example, if your system employs a NUMA architecture with four modular nodes, then each of which contains 2 GB of local memory and a dual-core CPU. From specifications, the machine contains eight PCIe buses, numbered 2, 11, 36, 37, 82, 86, 169, and 170 (decimal). You further determine that the buses are directly associated with specific nodes, as follows:

Buses 2 and 11 : Node 0 (processors 0 and 1)

Windows driver affinity 29

Image 29
Contents HP IO Accelerator Performance Tuning Guide Page Contents Setting Windows driver affinity Introduction About the Performance and Tuning GuideVerifying Linux system performance System performanceWrite bandwidth test System performance Verifying Windows system performance with Iometer Oversubscribed bus Debugging performance issuesImproperly configured benchmark Handling PCIe errors PCIe link width improperly negotiated CPU thermal throttling or auto-idling Benchmarking through a filesystem Slow performance using RAID5 on LinuxUsing CP and other system utilities To avoid this issue. For more information, see the patch Multiple outstanding IOs General tuning techniquesUsing direct I/O, unbuffered, or zero copy $ dd if=/dev/zero of=/dev/fioX bs=10M oflag=direct Pre-conditioning$ echo 4096 /sys/block/fio name/queue/nrrequests Pre-allocating memoryPreallocatemb Increased steady-state write performance with fio-format Tuning techniques for writesExt2-3-4 tuning Linux filesystem tuningStride = chunk size / filesystem block size Stripewidth = dbd * strideOptions iomemory-vsl preallocatememory=1072,4997,6710,10345 Using the IO Accelerator as swap space$ tar xjvf fio-X.Y.Z.tar.bz2 $ cd fio-X.Y.Z Fio benchmarkCompiling the fio benchmark Page Using direct I/O on Linux Programming using direct I/OFd = openfilename, Owronly Fd = openfilename, Owronly OdirectUsing direct I/O on Windows ++ code sample Programming using direct I/O Programming using direct I/O Windows driver affinity Setting Windows driver affinityCreate the SetWorkerAffinity2 tag of type Regdword Acronyms and abbreviations Index Index