Predictive Failure Analysis PFA, Disk scrubbing

DDM, then approximately half of the 146 GB DDM would be wasted since that space is not needed. The problem here is that the failed 73 GB DDM will be replaced with a new 73 GB DDM. So the DS6000 microcode will most likely migrate the data on the 146 GB DDM onto the recently replaced 73 GB DDM. When this process completes, the 73 GB DDM will rejoin the array and the 146 GB will become the spare again. Another example would be if we fail a 10k RPM DDM onto a 15k RPM DDM. While this means that the data has now moved to a faster DDM, the replacement DDM will be the same as the failed DDM. This means the spare will now be a 10k RPM DDM. This could result in a 15k RPM DDM being spared onto a 10k RPM DDM. This is not desirable. Again a smart failback of the spare will be performed once a suitable replacement DDM has been made available.

Hot plugable DDMs

Replacement of a failed drive does not affect the operation of the DS6000 because the drives are fully hot plugable. Due to the fact that each disk plugs into a switch, there is no loop break associated with the removal or replacement of a disk. In addition, there is no potentially disruptive loop initialization process.

3.3.4 Predictive Failure Analysis (PFA)

The drives used in the DS6000 incorporate Predictive Failure Analysis (PFA) and can anticipate certain forms of failures by keeping internal statistics of read and write errors. If the error rates exceed predetermined threshold values, the drive will be nominated for replacement. Because the drive has not yet failed, data can be copied directly to a spare drive. This avoids using RAID-5 or RAID-10 recovery to reconstruct all of the data onto the spare drive. The DS6000 will alert the user and can also use call home e-mail notification.

3.3.5 Disk scrubbing

The DS6000 will periodically read all sectors on a disk. This is designed to occur without any interference to application performance. If ECC-correctable bad bits are identified, the bits are corrected immediately by the DS6000. This reduces the possibility of multiple bad bits accumulating in a sector beyond the ability of ECC to correct them. If a sector contains data that is beyond ECC's ability to correct, then RAID is used to regenerate the data and write a new copy onto a spare sector on the disk. The scrubbing process applies to both array members and spare DDMs.

3.3.6 Disk path redundancy

Each DDM in the DS6000 is attached to two 22 port SAN switches. These switches are built into the RAID or SBOD controller cards. Figure 3-5 on page 56 depicts the redundancy features of the DS6000 switched disk architecture. Each disk has two separate connections to the midplane. This allows it to be simultaneously attached to both switches. If either a RAID or SBOD controller card is removed from an enclosure, the switch that is included in that controller is also removed. However, the remaining controller retains the ability to communicate with all the disks via the remaining switch.

Figure 3-5also shows the connection paths to the expansion enclosures. To the left and right you can see paths from the switches and Fibre Channel chipset that travel to the device adapter ports at top left and top right. These ports are depicted in Figure 2-2 on page 24. From each controller we have two paths to each expansion enclosure. This means that we can easily survive the loss of a single path (which would mean the loss of one out of four paths) due to the failure of, for instance, a cable or an optical port. We can also survive the loss of an entire RAID controller or SBOD controller (which would remove two out of four

Chapter 3. RAS 55

IBM DS6000 Series manual Predictive Failure Analysis PFA, Disk scrubbing, Disk path redundancy

Models: DS6000 Series

Hot plugable DDMs

3.3.4 Predictive Failure Analysis (PFA)

3.3.5 Disk scrubbing

3.3.6 Disk path redundancy