How netapp dedupe works

3
How NetApp Dedupe works? When you 'sis on' a volume, the behaviour of that volume changes, and the change takes place in two phases: TWO PHASE PROCESS: PHASE 1 -> SIS enabled: Pre-process: Before the block is written to the array: Collecting Fingerprint Note: This is true for new blocks, for the existing data blocks that were written before enabling SIS, you need to run the scan on the existing data and pull those fingerprints into the catalogue. PHASE 2 -> SIS Start : Post-process: After the block is written to the array: Sorting, Comparing and deduping PHASE 1: The moment SIS is enabled: Every time SIS notices a block write request coming in, the sis process makes a call to Data ONTAP to get a copy of the fingerprint for that block so that it can store this fingerprint in its catalogue file.

description

How netapp dedupe works

Transcript of How netapp dedupe works

How NetApp Dedupe works?

When you 'sis on' a volume, the behaviour of that volume changes, and the change takes place in

two phases:

TWO PHASE PROCESS:

PHASE 1 -> SIS enabled: Pre-process: Before the block is written to the array: Collecting Fingerprint

Note: This is true for new blocks, for the existing data blocks that were written before enabling SIS, you need to run the scan on the

existing data and pull those fingerprints into the catalogue.

PHASE 2 -> SIS Start : Post-process: After the block is written to the array: Sorting, Comparing and

deduping

PHASE 1:

The moment SIS is enabled:

Every time SIS notices a block write request coming in, the sis process makes a call to Data

ONTAP to get a copy of the fingerprint for that block so that it can store this fingerprint in its

catalogue file.

Note: This request interrupts the write string and results in a 7% performance penalty for all writes

into any volume with sis enabled.

PHASE 2:

Now, at some point you'll want to dedupe the volume using the 'sis start' command manually/auto

or via schedule:

SIS goes through the process of comparing fingerprints from the fingerprint database

catalogue file, validating data, and dedupe'ing blocks that pass the validation phase.

Note: In the end all we are really doing is adjusting some inode metadata to say "hey remember that

data that used to be here, well it’s over there now."

IMPORTANT: Nothing about the basic data structure of the WAFL file system has changed, except

you are traversing a different path in the file structure to get to your desired data block. That’s why

NetApp dedupe *usually* has no perceivable impact on read performance - all we've done is

redirect some block pointers. Accessing your data might go a little faster, a little slower, or more

likely not change at all - it all depends on the pattern of the file system data structure and the

pattern of requests coming from the application.

What is a fingerprint?

Fingerprint is a small digital representation of a larger data object. Basically, it is a checksum

character generated by WAFL for each BLOCK for the purpose of consistency checking (This generally

involves the creation of a hash).

Is fingerprint generated by SIS?

No. Each time a WAFL block is created, a 'checksum' character is generated for the purpose of

consistency checking. NetApp Deduplication (SIS) simply "borrows" a copy of this checksum and

stores it in a catalogue as fingerprint.

What happens during post-process dedupe?

A. The fingerprint catalogue is sorted and searched for identical fingerprints.

B. When a fingerprint "match" is made, the associated data blocks are retrieved and scanned byte-

for-byte.

C. Assuming successful validation, the inode pointer metadata of the duplicate block is redirected to

the original block.

D. The duplicate block is marked as "Free" and returned to the system, eligible for re-use.

When to use QSM vs. VSM on dedupe volumes?

Use QSM when you only want to dedupe the destination volume, and use VSM when you want to

dedupe both the source and destination volumes automatically, and save bandwidth during SM

transfers.

Courtesy: Dr. Dedupe, NetApp.

Prepared by:

[email protected]