If two array sites are used to make a RAID-10 array and the array sites contain spares, then six DDMs are used to make two RAID-0 arrays which are mirrored. If spares do not exist on the array sites then eight DDMs are used to make two RAID-0 arrays which are mirrored.

Drive failure

When a disk drive module (DDM) fails in a RAID-10 array, the controller starts an operation to reconstruct the data from the failed drive onto one of the spare drives. The spare that is used is chosen based on a smart algorithm that looks at the location of the spares and the size and location of the failed DDM. Remember, a RAID-10 array is effectively a RAID-0 array that is mirrored. Thus when a drive fails in one of the RAID-0 arrays we can rebuild the failed drive by reading the data from the equivalent drive in the other RAID-0 array.

While this data reconstruction is going on, the controller can still service read and write requests to the array from the hosts. There may be some degradation in performance while the sparing operation is in progress because some controller and switched network resources are being used to do the reconstruction. Due to the switched architecture of the DS6000, this effect will be minimal. Read requests for data on the failed drive should not be affected because they can all be directed to the good RAID-0 array.

Write operations will not be affected. Performance of the RAID-10 array returns to normal when the data reconstruction onto the spare device completes. The time taken for sparing can vary, depending on the size of the failed DDM and on the workload on the array and the controller.

3.3.3 Spare creation

There are four array sites in each enclosure of the DS6000. The first and third array sites created on each loop are used to supply spares. This normally means that two spares will be created in the server enclosure and two spares in the first expansion enclosure. Spares are created as the array sites are created, which occurs when the DDMs are installed. After four spares have been created for the entire storage unit, no more spares are normally needed.

On the ESS 800 the spare creation policy was to have four DDMs on each SSA (Serial Storage Architecture) loop for each DDM type. This meant that on a specific SSA loop, it was possible to have 12 spare DDMs, if you chose to populate a loop with three different DDM types. With the DS6000 the intention is to not do this. Where DDMs with different sizes, but the same RPM, exist in the complex, the spares will be taken from the array sites with the larger sized DDMs. This means in most cases the DS6000 will continue to have only four spares for the entire complex regardless of DDM size intermix.

Floating spares

The DS6000 implements a smart floating technique for spare DDMs. When a spare floats, this means that when a DDM fails and the data it contained is rebuilt onto a spare, then the disk is replaced, the replacement disk becomes the spare. The data is not copied back to the

original position which the failed DDM occupied. The DS6000 microcode may choose to allow the hot spare to remain where it has been moved, but it may instead choose to move the

spare to a more optimum position. This will be done to better balance the spares across the DA pairs and enclosures. It may be preferable that a DDM that is currently in use as an array member, be converted to a spare. In this case the data on that DDM will be migrated in the background onto an existing spare. This process does not fail the disk that is being migrated, though it does reduce the number of available spares in the DS6000 until the migration process is complete.

A smart process may be used to ensure that the larger or higher RPM DDMs act as spares. This is preferable because if we were to rebuild the contents of a 73 GB DDM onto a 146 GB

54DS6000 Series: Concepts and Architecture

Page 78
Image 78
IBM DS6000 Series manual Spare creation, Floating spares