An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

19
An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng

Transcript of An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

Page 1: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

An Evaluation of Using Deduplication in Swappers

Weiyan Wang, Chen Zeng

Page 2: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

Motivation

Deduplication detects duplicate pages in storageNetApp, Data Domain: billion $ business

We explore another direction: use deduplication in swappers

Our experimental results indicate that using deduplication in swappers is beneficial

Page 3: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

What is a swapper?

A mechanism to expand usable address spacesSwap out: swap a page in memory to swap areaSwap in: swap a page in swap area to memory

Swap area is on disk

pte’

Free P1Used P1

Page 4: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

Why deduplication is useful?

Writes to disk is slowDisk accesses is much slower than memory!

When duplicate pages exist:Do we really need to swap out all of them? If a duplicate page appear in swap area, we can save

one I/O.

P1 P3P2

P1

Page 5: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

Architecture

Swap out A page

Compute checksum

Lookup in the dedupcache

YES

Skip pageout

pageout

NO

Add to dedup cache

Page 6: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

Computing Checksum

SHA-1 checksum (160bit)Collision probability of one in 280

Only use the first 32bit (one in 216)Related to the implementation of dedup cache

Only store checksum

We assume two pages are identical if their checksums are equalTrade consistency for performance

Page 7: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

Dedup Cache

Dedup cache - radix tree Checksum -> dedup_entry_tA Trie with O(|key|) lookup and update

overheadWell written in the kernel

Key in radix tree is 32 bitsWe only keep the first 32 bits of a checksum as

key

Page 8: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

Entries in Dedup Cache

The index of a page in swap areaThe number of duplicates pages given a

checksumA lock for consistency typedef struct {

swp_entry_t base;

atomic_t count;

spinlock_t lock;

}dedup_entry_t;

Page 9: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

Changes to Linux Kernel

Swap cacheswap_entry_t ->pageAvoid repeatedly swapping in

Happens when a page swapped out is shared by multiple processes

ExampleProcess A and B share the page PP is swapped out, PTE in A and B are updatedA wants to access PB wants to access P

Page 10: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

Will dedup cache grows infinitely?

Swap Counter for each swap_entry_t# of reference in the memorycounter++ when

one more pte contains swap_entry_tIt’s in swap cacheIt’s in dedup cache

counter-- when swap in a page remove swap_entry_t from dedup cache and

swap cache when counter = 2

Page 11: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

Reference Counters

(4)

A

B

Swap cache

dedup cache

Swap area

(2)

Page 12: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

Changes to Swap Cache

Maintain the mapping between swap_entry and page

We change that mapping to swap_entry and a list of pages of same contents

Why we need a list?

Page 13: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

Possible Inconsistency

Swap out page P1 to swap_entry e1Swap out page P2, a duplicate of P1

The mapping of e1->P2 can not be added to swap cache

Swap in P1: mapping is deleted Swap in P2: Ooops!

Swap Cache

E1 -> P1

Page 14: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

Our Solution

Swap out page P1 to swap_entry E1Swap out page P2, a duplicate of P1

The mapping of e1->P2 is added to the list

Swap in P1: only P1 is deleted Swap in P2: delete E1->P2

Swap Cache

E1 -> P2E1 -> P1,P2E1 -> P1

Page 15: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

Experimental Evaluation

We run our experiment on VMWare with Linux 2.6.26

Our testing program: sequentially access an arrayEach element is of size 4KBWe change the percentage of duplicate pages

in that array

Page 16: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

All of the pages are duplicates

Duplication significantly reduces the access time

Page 17: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

No Duplicate Pages

However, duplication also incurs a significant overhead

Page 18: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

Overheads in Deduplication

Major overheads:Calculating checksums: 35 us

When a page is swapped in or swapped out, we all calculate the checksums.

Maintain the reference counterExplicitly require locks impose significant overhead:

average of 65 us in our experiments

Page 19: An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

Conclusion

Deduplication is a double-edged sword in swappersWhen a lot of duplicate pages are presented,

deduplication reduces the access time by orders of magnitude

When few duplicate pages are presented, the overhead is also non-negligible