Monitoring IO Accelerator health

NAND flash and component failure

The IO Accelerator is a highly fault-tolerant storage subsystem that provides many levels of protection against component failure and the loss nature of solid state storage. However, as in all storage subsystems, component failures might occur.

When a large enough number of data blocks is retired due to error, the NAND flash media is considered worn out. By properly selecting NAND flash media for the hosted application and proactively monitoring device age and health, you can assure reliable performance over the intended product life.

Health metrics

The IO Accelerator driver manages block retirement using pre-determined retirement thresholds. The fio-statusutility show a health indicator that starts at 100 and counts down to 0. As certain thresholds are crossed, various actions are taken.

At the 10% healthy threshold, a one-time warning is issued. For methods on capturing this alarm event, see "Health monitoring techniques (on page 39)."

At 0%, the device is considered unhealthy. It enters write-reduced mode, which somewhat prolongs its lifespan so data can be safely migrated. In this state, the IO Accelerator behaves normally except for the reduced write performance.

At some point after the 0% threshold, the device enters read-only mode. Any attempt to write to the IO Accelerator causes an error. Some file systems might require special mount options to mount a read-only block device, beyond specifying that the mount should be read-only.

Consider the read-only mode as a final opportunity to migrate data off the device since device failure is more likely with continued use.

The IO Accelerator might enter failure mode. In this case, the device is offline and inaccessible. Failure mode can be caused by an internal catastrophic failure, improper firmware upgrade procedures, or device wears out.

Health monitoring techniques

Output from the fio-statusutility shows the health percentage and drive state. These items are referenced as Media Status in the following sample output.

Found 1 ioDrive in this system Fusion-io driver version: 2.2.3 build 240 Adapter: ioDrive

Fusion-io ioDrive 160GB, Product Number:FS1-002-161-ES

...

Media status: Healthy; Reserves: 100.00%, warn at 10.00%; Data: 99.12%

Monitoring IO Accelerator health 39