HP c-Class Performance Tuning manual Pre-conditioning

Page 17

protocols and have long pipelines, the IO Accelerator does not suffer from major latency increases as the number of outstanding I/Os increases.

The primary methods for generating outstanding I/Os are:

Using multiple threads

Using multiple processes

Using AIO

For small-packet IOPS-geared applications, having multiple threads or outstanding AIO requests generally yields a significant performance improvement over a single thread. For larger block size bandwidth-oriented applications, having multiple outstanding I/Os is less important.

Pre-conditioning

Unlike traditional storage, the characteristics of writes issued to a solid state storage device can affect the performance of both future write and read operations. Some of the more interesting characteristics to consider are the size of individual writes (the block size or record size), the order in which writes are performed, and the block size used to read the data back. Providing the details for this is outside the scope of this document. The most common pre-conditioning issues are addressed here.

The fio-formatcommand reinitializes the data on the IO Accelerator to an empty state. This eliminates all history of the data writes on the drive as well as removing all data. Deleting this history might initially cause higher performance results for both reads and writes. Ensure that the application and benchmark have had time to stabilize at a performance level.

Read performance can be artificially boosted when reads are performed from previously unwritten sectors.

After fio-formatis complete, any sector that is read before data is written to it returns all binary zeros (0x0). It returns data at an accelerated rate when compared to data read from a sector that has previously had data written. This behavior is the same that filesystems and operating systems use to manipulate sparse files. The read performance achieved from these uninitialized sectors is not indicative of IO Accelerator real-world read performance and should be disregarded. The published numbers from HP disregard this performance acceleration.

To avoid measuring invalid read performance, ensure that you write data to each sector that will be used in benchmarking. In Linux, the entire device can be easily written to using the dd command:

CAUTION: The dd command destroys all data on the drive.

$ dd if=/dev/zero of=/dev/fioX bs=10M oflag=direct

Under Windows® operating systems, when a raw block test is being used, that same test can generally be used to write data to the device before testing. Testing that is run on top of a filesystem must first populate the data and cannot be affected by this artificial performance boost.

If an application writes in a smaller block size than it uses to read the data back, the read bandwidth might be constrained to the maximum bandwidth achievable at a block size equivalent to the original write block size.

For example, if an application performs random 512 byte writes and then reads the data back using 4 KiB, the performance might be limited to that of issuing 512 byte reads directly (the IO Accelerator is IOP limited rather than bandwidth limited.)

The most common ways to reset a device state are:

General tuning techniques 17

Image 17
Contents HP IO Accelerator Performance Tuning Guide Page Contents Setting Windows driver affinity Introduction About the Performance and Tuning GuideVerifying Linux system performance System performanceWrite bandwidth test System performance Verifying Windows system performance with Iometer Oversubscribed bus Debugging performance issuesImproperly configured benchmark Handling PCIe errors PCIe link width improperly negotiated CPU thermal throttling or auto-idling Benchmarking through a filesystem Slow performance using RAID5 on LinuxUsing CP and other system utilities To avoid this issue. For more information, see the patch Multiple outstanding IOs General tuning techniquesUsing direct I/O, unbuffered, or zero copy $ dd if=/dev/zero of=/dev/fioX bs=10M oflag=direct Pre-conditioning$ echo 4096 /sys/block/fio name/queue/nrrequests Pre-allocating memoryPreallocatemb Increased steady-state write performance with fio-format Tuning techniques for writesExt2-3-4 tuning Linux filesystem tuningStride = chunk size / filesystem block size Stripewidth = dbd * strideOptions iomemory-vsl preallocatememory=1072,4997,6710,10345 Using the IO Accelerator as swap space$ tar xjvf fio-X.Y.Z.tar.bz2 $ cd fio-X.Y.Z Fio benchmarkCompiling the fio benchmark Page Using direct I/O on Linux Programming using direct I/OFd = openfilename, Owronly Fd = openfilename, Owronly OdirectUsing direct I/O on Windows ++ code sample Programming using direct I/O Programming using direct I/O Windows driver affinity Setting Windows driver affinityCreate the SetWorkerAffinity2 tag of type Regdword Acronyms and abbreviations Index Index