An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.
-
Upload
sydney-carr -
Category
Documents
-
view
212 -
download
0
Transcript of An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.
An Evaluation of Using Deduplication in Swappers
Weiyan Wang, Chen Zeng
Motivation
Deduplication detects duplicate pages in storageNetApp, Data Domain: billion $ business
We explore another direction: use deduplication in swappers
Our experimental results indicate that using deduplication in swappers is beneficial
What is a swapper?
A mechanism to expand usable address spacesSwap out: swap a page in memory to swap areaSwap in: swap a page in swap area to memory
Swap area is on disk
pte’
Free P1Used P1
Why deduplication is useful?
Writes to disk is slowDisk accesses is much slower than memory!
When duplicate pages exist:Do we really need to swap out all of them? If a duplicate page appear in swap area, we can save
one I/O.
P1 P3P2
P1
Architecture
Swap out A page
Compute checksum
Lookup in the dedupcache
YES
Skip pageout
pageout
NO
Add to dedup cache
Computing Checksum
SHA-1 checksum (160bit)Collision probability of one in 280
Only use the first 32bit (one in 216)Related to the implementation of dedup cache
Only store checksum
We assume two pages are identical if their checksums are equalTrade consistency for performance
Dedup Cache
Dedup cache - radix tree Checksum -> dedup_entry_tA Trie with O(|key|) lookup and update
overheadWell written in the kernel
Key in radix tree is 32 bitsWe only keep the first 32 bits of a checksum as
key
Entries in Dedup Cache
The index of a page in swap areaThe number of duplicates pages given a
checksumA lock for consistency typedef struct {
swp_entry_t base;
atomic_t count;
spinlock_t lock;
}dedup_entry_t;
Changes to Linux Kernel
Swap cacheswap_entry_t ->pageAvoid repeatedly swapping in
Happens when a page swapped out is shared by multiple processes
ExampleProcess A and B share the page PP is swapped out, PTE in A and B are updatedA wants to access PB wants to access P
Will dedup cache grows infinitely?
Swap Counter for each swap_entry_t# of reference in the memorycounter++ when
one more pte contains swap_entry_tIt’s in swap cacheIt’s in dedup cache
counter-- when swap in a page remove swap_entry_t from dedup cache and
swap cache when counter = 2
Reference Counters
(4)
A
B
Swap cache
dedup cache
Swap area
(2)
Changes to Swap Cache
Maintain the mapping between swap_entry and page
We change that mapping to swap_entry and a list of pages of same contents
Why we need a list?
Possible Inconsistency
Swap out page P1 to swap_entry e1Swap out page P2, a duplicate of P1
The mapping of e1->P2 can not be added to swap cache
Swap in P1: mapping is deleted Swap in P2: Ooops!
Swap Cache
E1 -> P1
Our Solution
Swap out page P1 to swap_entry E1Swap out page P2, a duplicate of P1
The mapping of e1->P2 is added to the list
Swap in P1: only P1 is deleted Swap in P2: delete E1->P2
Swap Cache
E1 -> P2E1 -> P1,P2E1 -> P1
Experimental Evaluation
We run our experiment on VMWare with Linux 2.6.26
Our testing program: sequentially access an arrayEach element is of size 4KBWe change the percentage of duplicate pages
in that array
All of the pages are duplicates
Duplication significantly reduces the access time
No Duplicate Pages
However, duplication also incurs a significant overhead
Overheads in Deduplication
Major overheads:Calculating checksums: 35 us
When a page is swapped in or swapped out, we all calculate the checksums.
Maintain the reference counterExplicitly require locks impose significant overhead:
average of 65 us in our experiments
Conclusion
Deduplication is a double-edged sword in swappersWhen a lot of duplicate pages are presented,
deduplication reduces the access time by orders of magnitude
When few duplicate pages are presented, the overhead is also non-negligible