Figure 2: Disk Space requirements

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Data stored with

 

 

 

 

 

Data stored normally

 

 

deduplication

 

 

 

 

 

 

 

 

 

 

 

 

1st daily full backup

500

GB

500

GB

 

 

1st daily incremental backup

 

50

GB

 

5

GB

 

 

2nd daily incremental backup

 

50

GB

 

5

GB

 

 

3rd daily incremental backup

 

50

GB

 

5

GB

 

 

4th daily incremental backup

 

50

GB

 

5

GB

 

 

5th daily incremental backup

 

50

GB

 

5

GB

 

 

2nd weekly full backup

 

500

GB

 

25

GB

 

 

3rd weekly full backup

 

500

GB

 

25

GB

 

 

25th weekly full backup

 

500

GB

 

25

GB

 

 

 

 

 

 

 

 

 

Total

12,750 GB

 

1,125 GB

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This example uses a system containing 500 GB of backup data that equates to 500 GB of storage for the first traditional full backup. If 10% of the files change between backups, then a traditional incremental backup would send about 10% of the size of the full backup or about 50 GB to the backup device. However, because data deduplication operates at the block level, instead of the file level, in actuality only a 1% change in the data has occurred. This means only 5 GB of block level changes or 5 GB of data stored with deduplication. Over time, the savings multiply. When the next full backup is stored, it will not be 500 GB. With deduplication the equivalent full backup is only 25 GB. A backup system with data deduplication enabled would use the same amount of storage in six months that would typically be required to store only one week of traditional backup data. Over a 6 month period data deduplication would provide an 11:1 effective savings in storage capacity. It also provides the ability to restore from further back in time without having to go to physical tape for the data. The key thing to remember here is that the deduplication ratio depends primarily on two things:

What percentage of the data is changing between backups (percentage of data in percentage of files)

How long is the retention period of the backups stored on disk

For example, a 0.5% daily change in the data in 10% of the files would yield a 50:1 deduplication ratio over one year of daily full backups. Obviously, the percentage daily change rate is quite difficult to predict for complex systems, especially for applications like Exchange, SQL, and Oracle so benchmarking is strongly advised.

As already indicated, backup data retention period and backup data change rate matters to find out what the approximate deduplication ratio will be. Figure 3 shows the approximate space saving based on the given daily change rate and backup policy.

5