Redundancy in Network Traffic: Findings and Implications
description
Transcript of Redundancy in Network Traffic: Findings and Implications
REDUNDANCY IN NETWORK TRAFFIC: FINDINGS AND IMPLICATIONS
Ashok Anand Ramachandran Ramjee Chitra Muthukrishnan Microsoft Research Lab, India Aditya Akella University of Wisconsin, Madison
2
Redundancy in network traffic Redundancy in network traffic
Popular objects, partial content matches, headers
Redundancy elimination (RE) for improving network efficiency Application layer object caching
Web proxy caches Recent protocol independent RE approaches
WAN optimizers, De-duplication, WAN Backups, etc.
3
Protocol independent RE
Message granularity: packet or object chunk Different RE systems operate at different
granularity
WAN link
4
RE applications Enterprise and data centers
Accelerate WAN performance
As a primitive in network architecture Packet Caches [Sigcomm 2008] Ditto [Mobicom 2008]
5
ISP
Protocol independent RE in enterprises
Enterprises
Wan Opt
Wan Opt
Data centers
Globalized enterprise dilemma Centralized servers
Simple management Hit on performance
Distributed servers Direct request to closest servers Complex management
RE gives benefits of both worlds Deployed in network middle-boxes Accelerate WAN traffic while
keeping management simple
RE for accelerating WAN backup applications
6
ISP
Recent proposals for protocol independent RE
Enterprises
Web content
University
RE deployment on ISP access links to improve capacity
Reduce load on ISP access links Improve effective capacity
Packet caches [Sigcomm 2008] RE on all routers
Ditto [Mobicom 2008] Use RE on nodes in wireless
mesh networks to improve throughput
7
Understanding protocol independent RE systems Currently little insight into these RE systems
How far are these RE techniques from optimal? Are there other better schemes? When is network RE most effective? Do end-to-end RE approaches offer performance close
to network RE? What fundamental redundancy patterns drive the
design and bound the effectiveness?
Important for effective design of current systems as well as future architectures e.g. Ditto, packet caches
8
Large scale trace-driven study First comprehensive study
Traces from multiple vantage points Focus on packet level redundancy elimination
Performance comparison of different RE algorithms Average bandwidth savings Bandwidth savings in peak and 95th percentile utilization Impact on burstiness
Origins of redundancy Intra-user vs. Inter-user Different protocols
Patterns of redundancy Distribution of match lengths Hit distribution Temporal locality of matches
9
Data sets Enterprise packet
traces (3 TB) with payload 11 enterprises
Small (10-50 IPs) Medium (50-100 IPs) Large (100+ IPs)
2 weeks Protocol composition
HTTP (20-55%) Spring et al. (64%)
File sharing (25-70%) Centralization of servers
UW Madison packet traces (1.6 TB) with payload 10000 IPs; trace collected
at campus border router Outgoing /24, web server
traffic 2 different periods of 2
days each Protocol composition
Incoming, HTTP 60% Outgoing, HTTP 36%
10
Evaluation methodology Emulate memory-bound (500 MB - 4GB) WAN optimizer
Entire cache resides in DRAM (packet-level RE) Emulate only redundancy elimination
WAN optimizers do other optimizations also Deployment across both ends of access links
Enterprise to data center All traffic from University to one ISP
Replay packet trace
Compute bandwidth savings as (saved bytes/total bytes) Includes packet headers in total bytes Includes overhead of shim headers used for encoding
11
Large scale trace-driven study Performance comparison of different RE
algorithms
Origins of redundancy
Patterns of redundancy Distribution of match lengths Hit distribution
12
Redundancy elimination algorithms
Redundancy elimination algorithms
Redundancy suppression across
different packets(Use history)
Data compression only within packets
(No history)
MODP (Spring et al.)
MAXP (new algorithm)
GZIP and other variants
13
MODP
Packet payloadWindo
wRabin fingerprinting
Value sampling: sample those fingerprints whose value is 0 mod p
Fingerprint table
Packet store
Payload-1
Payload-2
Spring et al. [Sigcomm 2000]
Compute fingerprints
Lookup fingerprints in Fingerprint table
14
MAXP
MAXP
Choose fingerprints that are local maxima( or minima) for p bytes region
Similar to MODP Only selection criteria changes
MODP
Sample those fingerprints whose value is 0 mod p
No fingerprint to represent the shaded region
Gives uniform selection of fingerprints
15
Optimal Approximate upper bound on optimal
Store every fingerprint in a bloom filter Identify fingerprint match if bloom filter
contains the fingerprint
Low false positive for bloom filter: 0.1%
16
Comparison of MODP, MAXP and optimal
MAXP outperforms MODP by 5-10% in most cases Uniform sampling approach of MAXP MODP loses due to non uniform clustering of fingerprints
New RE algorithm which performs better than classical MODP
44040030
40
50
60
70MODP MAXP Optimal
Fingerprint sampling period(p)
Band
wid
th s
avin
gs(%
)
17
Small Medium Large Univ/24 Univ-out0
10203040506070
GZIP (10 ms)->GZIPMAXP MAXP->(10ms)->GZIP
Band
wid
th
savi
ngs(
%)
Comparison of different RE algorithms
GZIP offers 3-15% benefit (10ms buffering) -> GZIP increases benefit up to 5%
MAXP significantly outperforms GZIP, offers 15-60% bandwidth savings MAXP -> (10 ms) -> GZIP further enhances benefit up to 8%
We can use combination of RE algorithms to enhance the bandwidth savings
-> means followed by
18
Large scale trace-driven study Performance study of different RE
algorithms
Origins of redundancy
Patterns of redundancy Distribution of match lengths Match distribution
19
Origins of redundancy
Enterprise Middlebox
Data Centers
Middlebox
Flow-1Flow-2
Flow-3
Flow-1Flow-2Flow-3
Different users accessing the same content, or same content being accessed repeatedly by same user?
Middle-box deployments can eliminate bytes shared across users How much sharing across users in practice?
INTER-USER: sharing across users(a) INTER-SRC(b)INTER-DEST(c) INTER-NODE
INTRA-USER: redundancy within same user (a) INTRA-FLOW (b) INTER-FLOW
20
Study of composition of redundancy
90% savings is across destinations for Uout/24
For Uin/Uout, 30-40% savings is due to intra-user
For enterprises, 75-90% savings is due to intra-user
UOut
/24
UOut UIn
Larg
eM
ediu
mSm
all0
102030405060708090
100intersrc
internode
interdst
interflow
intraflow
Cont
ribut
ion
to S
avin
gs (%
)
Inter User
Intra User
21
Implication: End-to-end RE as a promising alternative
Enterprise Middlebox
Data Centers
Middlebox
End-to-end RE as a compelling design choice Similar savings Deployment requires just software upgrade
Middle-boxes are expensive Middle-boxes may violate end-to-end semantics
22
Large scale trace-driven study Performance study of different RE
algorithms
End-to-end RE versus network RE
Patterns of redundancy Distribution of match lengths Hit distribution
23
Match length analysis Do most of the savings come from full
packet matches? Simple technique of indexing full packet will
be good
For partial packet matches, what should be the minimum window size?
24
Match length analysis for enterprise
70% of the matches are less than 150 bytes and contribute 20% of savings 10% of the matches come from full matches and contribute 50% of savings Need to index small chunks of size <= 150 bytes for maximum benefit
<=150
150-300
300-450
450-600
600-750
750-900
900-1050
1050-1200
1200-1350
1350-1500
01020304050607080
Match length distribution Contribution to savings
Bins of different match lengths (in bytes)
Perc
enta
ge
25
Hit distribution Contributors of redundancy
Few pieces of content repeated multiple times Small packet store would be sufficient
Many pieces of content repeated few times Large packet store
26
Zipf-like distribution for chunk matches
Chunk ranking Unique chunk matches
sorted by their hit counts
Straight line shows the zip-fian distribution
Similar to web page access frequency How much popular
chunks contribute to savings?
27
Savings due to hit distribution
80% of savings come from 20% of chunks
Need to index 80% of chunks for remaining 20% of savings
Diminishing return for cache size
28
Savings vs. cache size
Small packet caches (250 MB) provide significant percentage of savings
Diminishing returns for increasing packet cache size after 250 MB
0 300 600 900 1200 150005
1015202530354045
Small Medium Large
Cache size (MB)
Savi
ngs
(%)
29
Conclusion First comprehensive study of protocol
independent RE systems Key Results
15-60% savings using protocol independent RE A new RE algorithm, which performs 5-10% better
than Spring et al. approach Zip-fian distribution of chunk hits; small caches
are sufficient to extract most of the redundancy End-to-end RE solutions are promising
alternatives to memory-bound WAN optimizers for enterprises
30
Thank you!
Questions ?
31 Backup slides
32
Peak and 95th percentile savings
1 10 100 1000 10000 1000000
102030405060
Mean Median 95%tile Peak
Time (seconds)
Savi
ngs
(%)
33
Effect on burstiness Wavelet based multi-resolution analysis
Energy plot higher energy means more burstiness
Compared with uniform compression Results
Enterprise No reduction in burstiness
Peak savings lower than average savings University
Reduction in burstiness Positive correlation of link utilization with redundancy
34
Redundancy across protocols Large enterprise
University
Protocol Percentage Volume
Percentage redundancy
HTTP 16.8 29.5SMB 45.46 21.4LDAP 4.85 44.33Src code ctrl 17.96 50.32
Protocol Percentage Volume
Percentage redundancy
HTTP 58 12.49DNS 0.22 21.39RTSP 3.38 2FTP 0.04 16.93