HP B- Accelerators Linux manual Monitoring IO Accelerator health, Nand flash and component failure

Page 52

Monitoring IO Accelerator health

NAND flash and component failure

The IO Accelerator is a highly fault-tolerant storage subsystem that provides many levels of protection against component failure and the loss nature of solid state storage. However, as in all storage subsystems, component failures might occur.

By pro-actively monitoring device age and health, you can ensure reliable performance over the intended product life.

Health metrics

The IO Accelerator manages block retirement using pre-determined retirement thresholds. The HP IO Accelerator Management Tool and the fio-statusutilities show a health indicator that starts at 100 and counts down to 0. As certain thresholds are crossed, various actions are taken.

At the 10% healthy threshold, a one-time warning is issued. For more information, see "Health monitoring techniques."

At 0%, the device is considered unhealthy. It enters write-reduced mode, which somewhat prolongs its lifespan so data can be safely migrated off. In this state the IO Accelerator device behaves normally, except for the reduced write performance.

After the 0% threshold, the device will soon enter read-only mode, and any attempt to write to the IO Accelerator device causes an error. Some filesystems might require special mount options to mount a read-only block device in addition to specifying that the mount must be read-only.

For example, under Linux, ext3 requires that -o ro, noload is used. The noload option tells the filesystem to not try and replay the journal.

Consider the read-only mode as a final opportunity to migrate data off the device, as device failure is more likely with continued use.

The IO Accelerator device might enter failure mode. In this case, the device is offline and inaccessible. This can be caused by an internal catastrophic failure, improper firmware upgrade procedures, or device wearout.

The IO Accelerator driver manages LEB retirement via use of pre-determined retirement thresholds. The IO Accelerator Management Tool and the fio-statusutility show a health indicator that starts at 100 and counts down to 0. As certain thresholds are crossed, various actions are taken.

At the 10% healthy threshold, a one-time warning is issued. For more information, see "Health monitoring techniques."

At 0%, the device is considered unhealthy. It enters write-reduced mode, which somewhat prolongs its lifespan so data can be safely migrated. In this state, the IO Accelerator behaves normally except for the reduced write performance.

At some point after the 0% threshold, the device enters read-only mode. Any attempt to write to the IO Accelerator causes an error. Some file systems might require special mount options to mount a read-only

Monitoring IO Accelerator health 52

Image 52
Contents HP IO Accelerator Version 3.2.3 Linux User Guide Page Contents Maintenance Resources Contents summary About this guideOverview IntroductionProduct naming Performance attributes IO Accelerator capacity 320GB 640GB Models AJ878B BK836ASupported firmware revisions Required operating environmentSupported hardware Introduction Introduction Software installation Installation overviewInstalling RPM packages on SUSE, RHEL, and OEL $ uname -rRpm -Uvh iomemory-vsl-kernel-version-VSL-version.x8664.rpm Rpm -Uvh fio*.rpmBuilding the IO Accelerator driver from source Building an RPM installation package$ rpmbuild --rebuild iomemory-vsl-VSL-version.src.rpm Upgrading device firmware from VSL 1.x.x or 2.x.x to Fio-bugreport Upgrading procedure$ rpm -qa grep -i iomemory $ rpm -e iomemory-vsl-2.6.18-194.el5-2.2.0.82-1.0 Fio-update-iodrive iodriveversion.fff$ modprobe iomemory-vsl # Provides iomemory-vsl # Required-Start boot.udev Loading the IO Accelerator driverControlling IO Accelerator driver loading Fio-attach /dev/fctUsing the init script $ chkconfig --del iomemory-vsl$ chkconfig --add iomemory-vsl # blacklist iomemory-vslSetting the IO Accelerator driver options Using module parametersMounting filesystems Handling IO Accelerator driver unloadsOne-time configuration Persistent configurationUpgrading the firmware $ modprobe iomemory-vsl auto-attach=0Enabling PCIe power Using the device as swapUsing the Logical Volume Manager Options iomemoryvsl preallocatememory=1072,4997,6710,10345Configuring RAID Device partitionsDevice /dev/fio Etc/mdadm.conf$ mdadm --detail --scan Chkconfig boot.md on Chkconfig mdadmd on$ mdadm --assemble --scan Building a RAID 10 across multiple devices Fio-statusUnderstanding Discard Trim support Discard Trim on LinuxSetting up Snmp for Linux Snmp details for LinuxFiles and directories Snmp master agentConfiguring the Snmp master agent Installing the Snmp subagentYum install net-snmp rsync Snmp agentX subagentManually running the Snmp subagent Running and configuring the Snmp subagentSubagent log file Using the Snmp sample config files Enabling Snmp test modePCI0100.0 Setting up Snmp for Linux Troubleshooting Snmp Supported Snmp MIB fieldsSnmp MIB Maintenance Maintenance toolsDevice LED indicators Command-line utilitiesEnabling PCIe power override Fio-update-iodriveOptions iomemory-vsl externalpoweroverride=value Common maintenance tasksEnabling the override parameter Unloading the IO Accelerator driver Uninstalling the IO Accelerator driver RPM packageDisabling auto attach Unmanaged shutdown issues Disabling the driverEtc/modprobe.d/iomemory-vsl.conf Options iomemoryvsl autoattach=0Utilities Utilities referenceFio-attach Fio-attach device optionsFio-beacon Fio-bugreportFio-beacon device options Tmp/fio-bugreport-20100121.173256-sdv9ko.tar.bz2Fio-detach Fio-detach device options Fio-formatFio-format options device Fio-pci-check Fio-pci-check optionsFio-snmp-agentx Fio-statusFio-snmp-agentx options Fio-status device optionsUtilities Fio-sure-erase Fio-sure-erase options device Fio-update-iodrive Fio-update-iodrive options iodriveversion.fff Domainbusslot.func Nand flash and component failure Monitoring IO Accelerator healthHealth metrics Health monitoring techniques About flashback protection technology Software RAID and health monitoringPerformance and tuning Introduction to performance and tuningDisabling Dvfs Limiting Apci C-statesSetting Numa affinity Setting the interrupt handler affinityNuma configuration Advanced configuration exampleIntroduction to Numa architecture Numa node override parameter13,14,18,19 Resources Subscription serviceFor more information Warranty information Safety and regulatory complianceRegulatory information Support and other resources Before you contact HPHP contact information Customer Self RepairRéparation par le client CSR Riparazione da parte del cliente Reparaciones del propio cliente Reparo feito pelo cliente Support and other resources Support and other resources Support and other resources Acronyms and abbreviations SMH Documentation feedback Index Uninstalling utilities