Figure 2: Disk Space requirements |
|
|
|
|
|
| ||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Data stored with |
|
|
|
|
| Data stored normally |
|
| deduplication |
|
|
|
|
|
|
|
|
|
|
|
| 1st daily full backup | 500 | GB | 500 | GB |
| ||
| 1st daily incremental backup |
| 50 | GB |
| 5 | GB |
|
| 2nd daily incremental backup |
| 50 | GB |
| 5 | GB |
|
| 3rd daily incremental backup |
| 50 | GB |
| 5 | GB |
|
| 4th daily incremental backup |
| 50 | GB |
| 5 | GB |
|
| 5th daily incremental backup |
| 50 | GB |
| 5 | GB |
|
| 2nd weekly full backup |
| 500 | GB |
| 25 | GB |
|
| 3rd weekly full backup |
| 500 | GB |
| 25 | GB |
|
| 25th weekly full backup |
| 500 | GB |
| 25 | GB |
|
|
|
|
|
|
|
| ||
| Total | 12,750 GB |
| 1,125 GB |
| |||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This example uses a system containing 500 GB of backup data that equates to 500 GB of storage for the first traditional full backup. If 10% of the files change between backups, then a traditional incremental backup would send about 10% of the size of the full backup or about 50 GB to the backup device. However, because data deduplication operates at the block level, instead of the file level, in actuality only a 1% change in the data has occurred. This means only 5 GB of block level changes or 5 GB of data stored with deduplication. Over time, the savings multiply. When the next full backup is stored, it will not be 500 GB. With deduplication the equivalent full backup is only 25 GB. A backup system with data deduplication enabled would use the same amount of storage in six months that would typically be required to store only one week of traditional backup data. Over a 6 month period data deduplication would provide an 11:1 effective savings in storage capacity. It also provides the ability to restore from further back in time without having to go to physical tape for the data. The key thing to remember here is that the deduplication ratio depends primarily on two things:
•What percentage of the data is changing between backups (percentage of data in percentage of files)
•How long is the retention period of the backups stored on disk
For example, a 0.5% daily change in the data in 10% of the files would yield a 50:1 deduplication ratio over one year of daily full backups. Obviously, the percentage daily change rate is quite difficult to predict for complex systems, especially for applications like Exchange, SQL, and Oracle so benchmarking is strongly advised.
As already indicated, backup data retention period and backup data change rate matters to find out what the approximate deduplication ratio will be. Figure 3 shows the approximate space saving based on the given daily change rate and backup policy.
5