Packet Classification # 3 Ozgur Ozturk CSE 581: Internet Technology Winter 2002 Packet...
-
Upload
joshua-shields -
Category
Documents
-
view
224 -
download
1
Transcript of Packet Classification # 3 Ozgur Ozturk CSE 581: Internet Technology Winter 2002 Packet...
Packet Classification # 3
Ozgur OzturkCSE 581: Internet Technology
Winter 2002
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
2
Introduction
Importance Identify the context of packets
Apply necessary actions Differentiated services
Memory and Time Efficiency Must handle Ks of rules Must be at wire-speed (No queuing)
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
3
Packet Classification # 3
Paper ListT. Lakshman, D. Stiliadis, "High-Speed Policy-based Packet Forwarding Using Efficient Multi-dimensional Range Matching” [Bit-Parallelism]
http://www.bell-labs.com/user/stiliadi/filter/paper.html
F. Baboescu, G. Varghese, "Scalable Packet Classification” [ABV: Agregated Bit Vector]M. Buddhikot, S. Suri, M. Waldvogel, "Space Decomposition Techniques for Fast Layer-4 Switching“ [Space Decomposition]V. Srinivasan, G. Varghese, S. Suri, M. Waldvogel, "Fast and Scalable Layer Four Switching“ [Paper4]
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
4
Bit-Parallelism Paper-Intro.
Presents packet classification schemes traffic-independent and worst-case
performance metric a few K rules, at rates of M packets
per second using range matches on more than 4 packet header fields
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
5
Bit-Parallelism Paper
Requirement for Real-Time Operation
Traditional router architectures flow-cache architectures to classify packets identified flows are expected to arrive in near
future Current backbone routers
active flows extremely high OC-3 links, 256K flows
Cashes implemented as hash tables scales well to that size
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
6
Bit-Parallelism Paper
Requirement for Real-Time Operation 2 - Hash-Table Prob.s
Good hash function is non-trivial 100 to 200 bits of header to be randomly distributed to no
more than 20 to 24 bits of hash index header value distribution is unknown
Performance of cache-based schemes is heavily traffic dependent
Malicious Users limitations of hashing algo. & cashing techniques
Packet queuing delays acceptable after classification
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
7
Bit-Parallelism Paper Packet Classification Constraints
Scale to large routers with Gigabit links.Process at wire-speed
75% of packets < typical TCP packet size (552 bytes) Nearly half are 40 to 44 bytes (TCP Ack)
Rules on several fields, specifying ranges, exact matches and prefixes
Two prefix fields in some cases
Allow arbitrary priorities for policies to allow distinction for multiple matchesOptimize for lookups, sacrifice update performance
lookup rate/update rate 107.
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
8
Memory access time; dominant factor in worst-case lookup execution timeAmenable to hardware implementationTime vs. Space
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism Paper Packet Classification Constraints-2
9
Decomposable search to perform multi-dimensional search for packet filtering k-dimensional query a set of 1-dimensional queries on
1-dimensional intervals Exploit parallelism where possible Seek poly-logarithmic solution
Packet header fields k-dimensions
Filters overlapping regions in the k-dimensional space
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism Paper General Packet Classification
10
1st Algorithm Memory: k*n2O(n) bits per dimension Time: log(2n)+1 Memory access: n/w
2nd Algorithm Memory reduce to O(n log n) bits Time increase constant Can be optimized for time and memory
budget Exploit on-chip memory in traffic-
independent manner, to speed up worst case.
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism Paper Efficiency of Proposed Algorithms
11
Notation
Rule rm in k dimentions rm = (e1,m, e2,m,…. ek,m) e range
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
12Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism Paper Algorithm demo on 2-D/Preprocessing
1
13Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism Paper Algorithm demo on 2-D/Preprocessing
2
Max 2n+1 intervals for n rules
14Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism Paper Algorithm demo on 2-D/Preprocessing
3
Sets of rules formed corresponding to each region
15
P1 (x*,y*) to be classified find intervals x* and y* belongs to
binary search log(2n+1)+1 comparisons/dimension
Create Intersection of all sets conjunction of corresponding bit vectors
Highest Priority entry in the resultant bit vector
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism Paper Algorithm demo on 2-D/Online 1
16
Max Set Cardinality = O(n)Intersection step examines all rules at least ones Time complexity = O(n)With bit-level parallelism The bitmaps representing sets stored in a
(2n+1)*n array Bj[i,1..n] (Ri,j set stored for each dimension)
k*n/w memory accesses
Different processing elements for each dimension in hardware implementation Prototype
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism Paper Algorithm demo on 2-D/Online 2
17
•Different processing elements for each dimension in hardware implementation
Prototype
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
18
Bit-Parallelism Paper- Algorithm 2 Packet Class. based on Inc. Reads
Algorithm utilizes incremental reads to reduce required memoryAllows time-space optimization and increases localization for off-chip SDRAM and wide on-chip memory implementationsConsider a specific dimension j
Assume maximum 2n+1 non-overlapping intervals Corresponding to intervals in an n-bit bitmap with the
positions of the 1s indicating the filter rules that overlap this interval
Adjacent intervals’ corresponding bitmaps differ in only one bit
A single bitmap and 2n pointers of size log n to the differing bits can be used to reconstruct any bitmap
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
19
Bit-Parallelism Paper- Algorithm 2 Packet Class. based on Inc. Reads 2
Reduces space requirement to O(n log n) from O(n2)
Further Generalize (2n+1)/l bitmaps instead of 1 (2n+1)/2l pointers needed Choose l by need
2n+1 memory reduce to O(n log n) Memory access increase n/w2n log n /w
Trade off decision according to on-chip/off-chip memory ratio.
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
20
Bit-Parallelism Paper- Algorithm 2 Special Case: 2-D Classification
Necessary for best-effort traffic aggregation in Internet backboneDetermine next hop and resource allocations based on destination and source addresses only
Longest prefix match lookups Restrict source prefix ranges to powers of 2 in order
to reduce space space requirement O(n) with trie implementation
Virtual intervals Map intervals of prefix lengths to both dimensions,
sorted by length “Virtual Intervals” allow worst-case lookup time of
O(ls+log n) where ls is the number of possible prefix lengths
Multicast group identification requires only two additional memory accesses
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
21
Bit-Parallelism Paper- Algorithm 2 Conclusions
Packet classification, or filtering, is a useful primitive in connectionless networks to provide differentiated service and policy-based routingMore recently, security and active processing
Two multi-dimensional range matching algorithms allow millions of packets per second to be processed on a set of thousands of filter rules
Robust and predictable worst-case performance
Efficient 2-D algorithm for backbone routers with hundreds of thousands of routing entriesAlgorithms demonstrate that there may be no need to restrict filtering to edge routers
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
22
Paper4
Layer Four Switching
Traditional router performs looking-up based on destination addressLayer four switching provides increased flexibility: it gives a router the capability to distinguish and deal with traffics differently:
Block traffic from dangerous site Provide QoS service for certain traffics Give preferential treatment to certain traffic (say, database
flow).
Difficulties: need layer four header information, which may not always available
any modification of layer four header may cause problems Do not how to get header info when encrypted
Some variants of L4S: Firewall Reservation protocols such as RSVP Routing based on traffic type, say web traffic
23
A packet P has k distinct header fields for lookup: H[1], … , H[k]The filter database of a Layer 4 Router consists of a finite set of filters: F1, F2, …, FN, each filter Fi has an associated directive acti
Match: each field of P matches the corresponding field of FCost: used to determine an unambiguous match (say order of filters)An address range can always be transferred into a sequence of prefixes so we can use prefix match
Paper4
The Best Matching Filter Problem
Dest
MMMMT1*
Net*
Src
**S*
T0Net**
DP
25535323
123***
SP
****
123***
SP
*UDP
**
UDP*
TCP-ACK
*
A filter database
A packet example:
(M, S, UDP, 53, 125)
24
Paper4
Set Pruning Trees (1)
• Build a trie on the destination prefixes in the database
• Each valid prefix in the destination trie points to a trie containing some source prefixes.
• A single filter may be fit into multiple destination prefixes, thus has multiple source trie copies.
• Memory space: O(N2)
• Time complexity: O(N)
25
Set Pruning Trees (2)
Filter Destination SourceF1 0* 10*F2 0* 01*F3 0* 1*F4 00* 1*F5 00* 11*F6 10* 1*F7 * 00*
0
0
0
0 0
1
1
11
0
0
0 0
1
11
0
0
0
0
1
Dest-Trie
Src-Trie
F1F1F7 F2 F5 F7 F2 F7 F7
F6F3F3F4
E.g.: Looking for: (001, 001)
26
Avoid the Memory Blowup (1)
Avoid the copying by having each destination prefix D point to a source trie that stores the filters whose destination field is exactly DWhen searching, may need go back to the destination trie for multiple timesTime complexity: O(W2)Space complexity: O(NW)
27
Avoid the Memory Blowup (2)
Filter Destination SourceF1 0* 10*F2 0* 01*F3 0* 1*F4 00* 1*F5 00* 11*F6 10* 1*F7 * 00*
0
0
1
1
1
0
0
0
1
1
0
0
1
Dest-Trie
Src-TrieF1F5 F2 F7
F6F3F4E.g.: Looking for: (001, 001)
Memory requirement=O(NW)
Lookup Worst Case= O(W2)
28
Improving Search Time: Basic Grid-of-Tries (1)
Basic idea: Use pre-computation and switch pointers (in the lower
lever tries) to speed up search in a later source trie base on the search in an earlier source trie. (Remember the previous searching result)
Role of switch pointer Allow us to increase the length of the matching source
prefix, without having to restart at the root of the next ancestor source trie.
Stored Filter: node (D,S) stores the least cost filter whose dest field is a prefix of D and src field is a prefix of S
Time complexity: 2WSpace complexity: O(NW)
29
Improving Search Time: Basic Grid-of-Tries (2)
Filter Destination SourceF1 0* 10*F2 0* 01*F3 0* 1*F4 00* 1*F5 00* 11*F6 10* 1*F7 * 00*
0
0
1
1
1
0
0
0
1
1
0
0
1
Dest-Trie
Src-TrieF1F5 F2 F7
F6F3F4
0
0
0
0x
y
E.g.: Looking for: (001, 001)
30
Further Improvement & Extension
Use some faster scheme for destination address matching
Time complexity O(W) O(log W)
Use multi-bit tries for source address matching Time complexity O(W) O(W/k)
Extend Grid-of-tries to handle protocol and port fields
3 GOT copies for TCP, UDP and OTHER respectively, 4 hash tables for 4 port combinations:
both unspecified, destination only, source only, both specified
31
How-to Slice filter database into column, the i-th column storing all
distinct prefixes in field i. Make a cross-product table of all k columns Pre-compute the least cost filter that matches each cross-
product entry When packet comes in, do best prefix matching for each
field respectively With matching results, find out the corresponding entry in
the cross-product table
Discussion Very fast (for matching) Problem: memory explosion: N^k Solution: On Demand Cross-Producting
Cross-Producting (1)
32
Cross-Producting (2)
DestPrefix
SrcPrefix
DestPortPrefix
SrcPortPrefix
FlagsPrefixes
M
T1
NetDefault
S
T0
NetDefault
25
53
23123
Default
123Default
UDP
TCP-ACK
Default
Num CrossProduct Matching Filter123456…
479480
M, S, 25, 123, UDPM, S, 25, 123, TCP-ACK
M, S, 25, 123, defaultM, S, 25, default, UDP
M, S, 25, default, TCP-ACKM, S, 25, default, default
… …default,default,default,default,TCP-ACK
default,default,default,default,default
F1F1F1F1F1F1…F8F8
E.g. Looking for: (M,S,UDP,25,57)
Dest
MMMMT1*
Net*
Src
**S*
T0Net**
DP
25535323
123***
SP
****
123***
SP
*UDP
**
UDP*
TCP-ACK
*
33
Conclusions
GOT solution scalable (linear) storage & fast lookups for D-S filters. More general filters high lookup cost
Cross-Producting solution, higher variance, but faster on average (for lookup) because of cashing need.Hybrid scheme combines flexibility with efficiency.
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
34
ABV:
"Scalable Packet Classification” F. Baboescu, G. Varghese,
GOAL Packet classification
scalable (in rules, upto 100,000) wire speed
Past Work Linear time search Linear amount of TCAMS Lucent scheme
worst case doesn't scale
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
35
SOLUTION
Aggregated Bit Vector improvement on Lucent bit vector rule aggregation rule rearrangement
Rule Aggregation bit vectors are sparse
i.e., few rules match Some compression scheme
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
36
SOLUTION continued
Rule Rearrangement overlap is rare place rules w/ common values
together sort out rule ordering later
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
37
Comparing ABV w/ BV of Lucent
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
38
Results
At least an order magnitude faster than BVScales well for memory access
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
39
Paper # 3
“Space Decomposition Techniques for Fast Layer-4 Switching" M. Buddhikot, S. Suri, M. Waldvogel
new scheme, based on space decomposition, whose search time is comparable to the best existing schemes, but which also offers fast worst-case filter update time. three key ideas innovative data-structure based on
quadtrees for a hierarchical representation of the recursively decomposed search space
fractional cascading and precomputation to improve packet classification time
prefix partitioning to improve update timePacket Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
40
Space Decomposition Evaluation
Depending on the actual requirements of the system this algorithm is deployed in, a single parameter can be used to tradeoff search time for update time. Amenable to fast software and hardware implementation.For N two-dimensional filters specified using prefixes of up to W bits in length, Area-based Quadtrees (AQT) data structure requires O(N) space, O(W) search time, and O((N)1/) Both the average and worst-case search times and memory consumption are comparable or better than other schemes known in the literature.
Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02