What is data deduplication?

Data deduplication is the ability of an appliance or software to compare blocks of data being written to the backup device with data blocks previously stored on the device. If duplicate data is found, a pointer is established to the original data, rather than storing the duplicate data sets. This removes, or “deduplicates,” the redundant data blocks. Data deduplication is done at the block or chunk level, not at the file level.

This greatly reduces the volume of data stored.

Data deduplication is often used in conjunction with other forms of data reduction, such as conventional data compression, to further reduce the data volume stored.

The best approach to data deduplication depends on your size and backup needs.

Deduplication for enterprises: Object-level differencing, or accelerated deduplication, is a good choice for enterprise customers because it focuses on performance and scalability. It delivers the fastest restores, as well as the fastest possible backup by deduplicating data after it has been written to disk. You can scale up to increase performance simply by adding extra nodes.

Deduplication for midsize businesses and remote enterprise sites: Hash-based chunking, or dynamic deduplication, is a good choice for small and midsize businesses or large enterprises with remote sites because it focuses on compatibility and cost. It delivers a low-cost, small footprint in a format-independent solution.

A detailed description about deduplication techniques can be found in the “Understanding the HP Data Deduplication Strategy” HP white paper at : http://h71028.www7.hp.com/ERC/downloads/4AA1-9796ENW.pdf

Figure 1 shows the principal deduplication concept.

Figure 1: Deduplication Concept

3