Post on 13-Dec-2015
1 Schuehler
Oral Qualifying ExaminationDavid V. Schuehler
•Papers reviewed:–Packet Classification on Multiple Fields
•Gupta and McKeown
–Scalable Packet Classification•Baboescu and Varghese
–What Packets May Come: Automata for Network Monitoring•Bhargavan, Chandra, McCann and Gunter
–Protocol Boosters•Feldmeier, McAuley, Smith, Bakin, Marcus and Raleigh
2 Schuehler
Services Provided by Packet Classifiers
• Packet Filtering
• Policy Routing
• Accounting & Billing
• Traffic Rate Limiting
• Traffic Shaping
3 Schuehler
Network Monitoring
• Troubleshoot problems
• Analyze performance
• Validate correctness of operations
• Data gathering
• Network tuning
4 Schuehler
Heterogeneous Internet
• Fiber Optic
• Copper
• Wireless
• Satellite
5 Schuehler
First Paper
• Packet Classification on Multiple Fields– Pankaj Gupta and Nick McKeown– Computer Systems Laboratory– Stanford University
• Published in SIGCOMM 1999– August, 1999– Cambridge, MA
6 Schuehler
Challenge
• Develop a high performance packet classifier
• Exploit structure and redundancy found in existing classifier rule sets
7 Schuehler
Analysis of 793 Classifiers from 101 ISPs
• 41,505 total rules• Small rule sets
– 99% contained < 1000 rules, mean of 50 rules
• Filter on maximum of 8 fields– src/dst addr, src/dst port, TOS, protocol, flags
• Small number of protocols filtered• 10% contain ranges• 14% contain non-contiguous mask
– Ex. 137.98.217.0/8.22.160.80
• Duplication found in rule field specifications• 8% or rules were redundant
8 Schuehler
Structure of Classifiers
• Small amount of rule intersection in existing classifiers
• For 1734 rules in 4 dimensions, found 4316 overlapping regions – worst case is 1013
9 Schuehler
Recursive Flow Classification (RFC)
• Perform mapping from packet header fields to classification ID in multiple phases
• Each phase consists of multiple parallel lookups
• Each lookup is a reduction in bit length
10 Schuehler
Packet Classification using RFC
11 Schuehler
RFC Performance Tuning
• Number of phases– Time (# of lookups)
• Reduction tree selected– Space (memory utilization)
• Tuning operation– Select number of phases– Combine chunks with most
correlation– Combine as many chunks as
possible• Tree A is optimal
12 Schuehler
Memory – Time Tradeoff
3 Phases: < 2.5MBytes2 Phases: < 10GBytes
4 Phases: < 1.1MBytes
13 Schuehler
Rule Preprocessing Time
14 Schuehler
Software Performance
• 333Mhz Pentium-II (Windows NT)• Worst case time double that of average• Average time for 100,000 classifications
15 Schuehler
Adjacency Groups
• Combine rules which contain differences in one dimension, but are otherwise identical
• Loose knowledge of which rule packet matched
• Additional preprocessing work required
• Reduces the total number of rules
• Handles 15,000 rules in 3.85 MB
16 Schuehler
Summary
• Exploit structure & redundancy in rules• Recursive Flow Classification (RFC)
– 1 million packets/sec in S/W– 30 million packets/sec in H/W
• Supports < 6000 rules, < 15,000 with Adj Grp• Utilizing knowledge of rule set to reduce complexity• Combine rules (adjacency groups) to reduce the number
of chunk equivalence classes• Hardware performance optimistic• Problems with small number of phases and large rule sets
17 Schuehler
Second Paper
• Scalable Packet Classifications– Florin Baboescu & George Varghese– Dept. of Computer Science & Engineering– University of California, San Diego
• Published in SIGCOMM 2001– August, 2001– San Diego, CA
18 Schuehler
Challenge
• Develop a high performance packet classifier that supports large rule sets (100,000 rules)
• Exploit structure and redundancy found in existing classifier rule sets
• Extend Bell Labs/Lucent Bit Vector search algorithm
19 Schuehler
Lucent Bit Vector
• Point location in multi-dimensional space
• Parallel lookups for each dimension
• Bit vector generated for each field (dimension)
• Take intersection of result vectors
• Search is linear with respect to number or rules
• Scales to 10,000 rules
20 Schuehler
Lucent Bit Vector (continued)
Max 2n+1 intervals for n rules
21 Schuehler
Aggregate Bit Vector
• Rule Aggregation– Bit vectors are large (scale with # of rules)– Bit vectors are sparsely populated– Packets match at most 4 rules– Large rule sets created by combining smaller
disjoint rule sets
• Rule Rearrangement– Rearrange rules to improve aggregation– Reduce false matches– Must compute lowest cost for all matches
22 Schuehler
Aggregation Example
23 Schuehler
Rearrangement Example
Rule Field1 Field2
F1 X A1
F2 X A2
F3 X A3
… … …
F30 X A30
F31 X Y
F32 A1 Y
F33 A2 Y
… … …
F61 A30 Y
Before Rearrangement
30 false matches
After Rearrangement
No false matches
• Aggregation size = 2
• Packet from source X to destination Y
Rule Field1 Field2
F1 X A1
F2 A1 Y
F3 X A2
F4 A2 Y
F5 X A3
F6 A3 Y
F7 X A4
… ... …
F60 A30 Y
F61 X Y
24 Schuehler
Results
Worst case memory access for 4 databases with 5 fields (A=32)
Improvement: 27% - 54% unsorted 40% - 75% sorted
25 Schuehler
Synthetic Database Results
26 Schuehler
Multiple Levels of Aggregation
• Comparison of one & two levels of aggregation
• Zero length prefixes are injected
• 60% improvement for large rule set
• Number of memory accesses required
27 Schuehler
Summary
• Add aggregation & rearrangement to Lucent Bit Vector algorithm
• Order of magnitude faster than BV scheme
• Suitable for large rule sets (100,000 rules)
• Multiple levels of aggregation reduce memory operations for large databases
• Wide memory widths improve efficiency
28 Schuehler
Third Paper
• What Packets May Come: Automata for Network Monitoring– Karthikeyan Bhargavan & Carl A. Gunter– University of Pennsylvania– Satish Chandra & Peter J. McCann– Bell Laboratories
• Published in POPL 2001– Principles of Programming Languages– January, 2001– London, UK
29 Schuehler
Challenge
• Formulate an external network protocol monitor as a language recognition problem
• Given a language specification of input & output sequences, develop a second that corresponds to the sequences observed externally
30 Schuehler
Complications
• Observed traffic could differ from traffic observed by target
• Protocol specifications are often vague
• Implementations of protocols vary
• Observed language could be significantly different from language that target device processes
31 Schuehler
Basic Monitor
iq
oq
id
od
• Sequence at M iqa iqb iqc iqe oqd
• Sequence at S ida idb odd
32 Schuehler
Admissibily
Given string at S: i1 i2 o1 i3 o2 i4 i5
Queue sizes: input = 3, output = 2
A: iq1 id1 iq2 id2 od1 oq1 iq3 id3 od2 oq2 iq4 id4 iq5 id5
B: iq1 iq2 id1 id2 od1 oq1 iq3 id3 od2 oq2 iq4 id4 iq5 id5
C: iq1 iq2 iq3 id1 id2 od1 oq1 id3 od2 oq2 iq4 id4 iq5 id5
D: iq1 iq2 iq3 id1 id2 od1 id3 od2 oq1 oq2 iq4 id4 iq5 id5
E: iq1 iq2 iq3 id1 id2 od1 iq4 iq5 id3 od2 oq1 oq2 id4 id5
F: iq1 iq2 iq3 iq4 id1 id2 od1 iq5 id3 od2 oq1 oq2 id4 id5
33 Schuehler
Elimination of Output Buffer
• CU the maximum number of input symbols without an intervening output symbol
• M(S, m, n) => M(S, m+CU*n, 0)
• Example m = 2, n = 2, CU = 2iq1 iq2 od1 id1 iq3 id2 iq4 od2 id3 iq5 id4 iq6
oq1 oq2
Move iq and oq tokens as far left as possibleiq1 iq2 iq3 iq4 iq5 iq6 od1 oq1 id1 id2 od2 oq2
id3 id4
Maximum input buffer size = 6 (2 + 2 * 2)
34 Schuehler
Dealing with Packet Loss
• CL the maximum number of dropped tokens between two id tokens must be less than CL
• LM(S,m,n) => LM(S,m+CU*CL*n,0)
• Example iq1 il1 iq2 iq3 id2 od1 il3 oq1
• Tokens at M iq1 iq2 iq3 oq1
• Tokens at S id2 od1
35 Schuehler
Brute Force Search
• g is a function that checks S on a sequence of tokens and indicates whether it is in LS
• F(g,T) is a function that tells us whether trace T corresponds to proper execution with respect to S
• Construct all possible token sequences at S based on tokens observed at M
• Iterate through each sequence checking for an admissible string
• If found, observed string is in LS
• Otherwise, failure
36 Schuehler
No Data Loss Optimizations (CL= 1)
• P1: Counting Properties– Every output must consume between cmin & cmax inputs
• P2: Independent Inputs and Outputs– Validate input and output sequences separately
• P3: Periodic Outputs– Output is produced every P inputs
• P4: Deterministic Placement of Outputs– One position for output after sequence of inputs
• P5: Contiguously Enabled Commutative Outputs– Output is valid for a contiguous range of inputs
• P6: Output-checkpointed Automata– For each output, there is at most one next state
• P7: Finite State Machines– If g is FSM, BFS has polynomial bound in # of states &
size of buffers (|T| * B2)
37 Schuehler
Optimizations with Data Loss
• P1*: Counting Properties– Buffer limit becomes m + cmax * CL * (n + 1)
• P2o: Independent Output Properties– Same as no loss case
• P8: Insert-closed Commutative Outputs– If string is accepted, so is string with arbitrary inputs added
• P7*: Finite State Machines– Still bounded, but must consider 2B lossy substrings
• P9: Deterministic Stateless Transducers– Stateless automata where all inputs are distinct
• P10: Output-checkpointed Stateful Transducers– Unique state after consuming odx
• P6*: Output-checkpointed Automata– Check maximum of 2(B+CU*CL) strings against g at output
38 Schuehler
Complexities
P1: Counting (P6, P7)
P2: Independent In & Outputs (P5)
P2o: Independent Outputs (P2, P8)
P3: Periodic Outputs (P4)
P4: Deterministic Placement (P5)
P5: Commutative Outputs (ALL)
P6: Checkpointed Automata (ALL)
P7: Finite State Machines (ALL)
P8: Commutative Outputs (P5)
P9: Finite State Machines (P7, P10)
P10: Stateless Transducers (P4, P6)
(implies)
39 Schuehler
Monitoring TCP
• Property 1 describes counting property– Monitors ACKs generated for at least every
other message
• Property 2 describes independent inputs & outputs– Monitors non-decreasing sequence numbers
• Property 3 describes periodic outputs (no loss)– Monitors ACKs generated for contiguously
received set of segments
40 Schuehler
Summary
• External monitor developed as a language recognition problem
• Problem unbounded with respect to space & time• Properties defined to limit complexity• Impressive goal to attempt monitoring of complex
protocols with finite automata• Disappointed at TCP monitoring examples• Does not account for loss of output events• Monitor should be placed close to endpoint
41 Schuehler
Fourth Paper
• Protocol Boosters– D.C. Feldmeier, A.J. McAuley, J.M.Smith, D.S.
Bakin, W.S. Marcus, T.M. Raleigh– Bellcore and University of Pennsylvania
• Published in IEEE JSAC– Journal on Selected Areas in Communications– April, 1998
42 Schuehler
Challenge
• Develop a new methodology for protocol design
• Support localized customization in heterogeneous networks
• Provide for rapid protocol evolution
43 Schuehler
Current Limitations with IP Internet
• Protocols evolve slowly with respect to advances in networking technology– IPV6– Multicast– Short duration connections (HTTP)
• Sacrifice efficiency in order to support a large heterogeneous network– Satellite communication– Wi-Fi wireless etherent– ATM
44 Schuehler
Protocol Booster
• Software or hardware module that transparently improves protocol performance
45 Schuehler
One-Element Protocol Boosters
• UDP checksum generation– Generate UDP checksum within network
• TCP ACK compression– Compress multiple ACKs on slow link
• TCP congestion control– Generate duplicate ACKs to reduce window size
• TCP ARQ booster– Caches packets and performs retransmission
ARQ (automatic repeat request)
46 Schuehler
Two-Element Protocol Boosters
• Forward error correction coding– Add parity and correction bits– Regenerates missing data
• Jitter elimination for real-time communication– Match packet arrival rate at other end– Eliminates jitter by increasing latency
• TCP Selective ARQ– Cache packets add sequence numbers– Generate NACK for missing packet– Retransmit packet on receipt of NACK
47 Schuehler
Fast Evolution
• No standards body
• Developed by small team
• Contained insertion into network
• Free market supports competition and collaboration
• Proprietary boosters offer competitive advantage
48 Schuehler
Targeted Improvements
• Quick fix applied to individual network segments
• Rapid deployment
• Isolated boosters
• Targeted trouble spots
• Doesn’t affect other areas of the network
49 Schuehler
Comparisons to Other Approaches
• Link Layer Adaptation– Only operates at link layer
• Protocol Conversion– Conversion changes message syntax
• Protocol Termination– Loses end-to-end properties
• Special Purpose End-to-End Protocols– Cannot account for changes in network
50 Schuehler
Example Implementation
• Protocol boosters added to Linux & NetBSD systems
• Forward error correction booster implemented
• UDP data traffic
• Random and bursty error models used
• Booster successfully reduced effective packet loss
51 Schuehler
Summary
• Targeted improvements• Help solve problems with heterogeneous
Internet• Boosters can be nested• Booster should be invisible to end systems• Placement important• Rapid development & deployment• Two element boosters need to be paired
52 Schuehler
Topics Covered
• Packet Classification– Examine data set for structure– Develop targeted solutions – Optimize for lookups
• Network Monitoring– Automata for validating aspects of a protocol’s behavior
based on monitored traffic
• Protocol Boosters– Improve IP protocol performance through a
heterogeneous network
53 Schuehler
Final Thoughts
• An enhanced packet classifier can be considered a one-element protocol booster
• Both packet classification papers take a divide and conquer approach performing multiple lookups in parallel
• Classifiers could be combined with protocol booster to determine which packets to process
• Automata based monitor could validate properties of protocol booster
54 Schuehler
Questions