Deduplication architecture 117
About deduplication fingerprinting
■The client nbostpxy process moves the data to the PureDisk plug-in.
■The PureDisk plug-in retrieves a list of fingerprints from the last full backup for the client from the NetBackup Deduplication Engine. The list is used as a cache so the plug-in does not have to request each fingerprint from the engine.
■The PureDisk plug-in performs file fingerprinting calculations.
■The PureDisk plug-in sends only unique data segments to the PureDisk storage pool.
About deduplication fingerprinting
The NetBackup Deduplication Engine uses a unique identifier to identify each file and each file segment that is backed up. The engine identifies files inside the backup images and then processes the files.
The process is known as fingerprinting.
For the first deduplicated backup, the following is the process:
■The PureDisk plug-in reads the backup image and separates the image into files.
■The plug-in separates files into segments.
■For each segment, the plug-in calculates the hash key (or fingerprint) that identifies each data segment. To create a hash, every byte of data in the segment is read and added to the hash.
■The plug-in compares it calculated fingerprints to the fingerprints that the NetBackup Deduplication Engine stores on the media server. Two segments that have the same fingerprint are duplicates of each other.
■The plug-in sends unique segments to the deduplication engine to be stored. A unique segment is one for which a matching fingerprint does not exist in the engine already.
The first backup may have a 0% deduplication rate; however, a 0% deduplication rate is unlikely. Zero percent means that all file segments in the backup data are unique.
■The NetBackup Deduplication Engine saves the fingerprint information for that backup.
For subsequent backups, the following is the process:
■The PureDisk plug-in retrieves a list of fingerprints from the last full backup for the client from the NetBackup Deduplication Engine. The list is used as a cache so the plug-in does not have to request each fingerprint from the engine.