Routing Lookups and Packet Classification: Theory and Practice
description
Transcript of Routing Lookups and Packet Classification: Theory and Practice
Routing Lookups and Packet Classification: Theory and Practice
Pankaj GuptaDepartment of Computer Science
Stanford [email protected]
http://www.stanford.edu/~pankaj
August 18, 2000
Hot Interconnects 8High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.
2
Tutorial Outline
• Introduction– What this tutorial is about
• Routing lookups– Background, lookup schemes
• Packet Classification– Background, classification schemes
• Implementation choices for given design requirements
3
Request to you
• Please ask lots of questions!– But I may not be able to answer all of
them right now
• I am here to learn, so please share your experiences, thoughts and opinions freely
4
What this tutorial is about?
5
Internet: Mesh of Routers
The Internet Core
Edge Router
Campus Area Network
6
RFC 1812: Requirements for IPv4 Routers
• Must perform an IP datagram forwarding decision (called forwarding)
• Must send the datagram out the appropriate interface (called switching)
Optionally: a router MAY choose to perform special processing on incoming packets
7
Examples of special processing
• Filtering packets for security reasons• Delivering packets according to a pre-
agreed delay guarantee• Treating high priority packets
preferentially• Maintaining statistics on the number
of packets sent by various routers
8
Special Processing Requires Identification of
Flows• All packets of a flow obey a pre-defined
rule and are processed similarly by the router
• E.g. a flow = (src-IP-address, dst-IP-address), or a flow = (dst-IP-prefix, protocol) etc.
• Router needs to identify the flow of every incoming packet and then perform appropriate special processing
9
Flow-aware vs Flow-unaware Routers
• Flow-aware router: keeps track of flows and perform similar processing on packets in a flow
• Flow-unaware router (packet-by-packet router): treats each incoming packet individually
10
What this tutorial is about:
• Algorithms and techniques that an IP router uses to decide where to forward the packets next (routing lookup)
• Algorithms and techniques that a flow-aware router uses to classify packets into flows (packet classification)
11
Routing Lookups
12
Routing Lookups: Outline
• Background and problem definition
• Lookup schemes• Comparative evaluation
13
Lookup in an IP Router
Unicast destination address based lookup
Dstn Addr
Next Hop
--------
---- ----
--------
Dstn-prefix Next Hop
Forwarding Table
Next Hop Computation
Forwarding Engine
Incoming Packet
HEADER
14
Packet-by-packet Router
ForwardingDecision
Forwarding
Table
Interconnect
Linecard
Linecard
Linecard
Linecard
Routing processor
ForwardingDecision
Forwarding
Table
15
Switching
Routing Control
Datapath:per-packet processing
Routing lookup
Packet-by-packet Router: Basic Architectural
Components
Scheduling
16
ATM and MPLS SwitchesDirect Lookup
(Port, vci/label)
Address
Memory
Data
(Port, vci/label)
17
IPv4 Addresses
• 32-bit addresses• Dotted quad notation: e.g.
12.33.32.1• Can be represented as integers on
the IP number line [0, 232-1]: a.b.c.d denotes the integer: (a*224+b*216+c*28+d)
0.0.0.0 255.255.255.255IP Number Line
18
Class-based Addressing
A B C D
0.0.0.0
E
128.0.0.0 192.0.0.0
Class Range MS bits netid hostidA 0.0.0.0 –
128.0.0.00 bits 1-7 bits 8-31
B 128.0.0.0 -191.255.255.255
10 bits 2-15 bits 16-31
C 192.0.0.0 -223.255.255.255
110 bits 3-23 bits 24-31
D (multicast)
224.0.0.0 - 239.255.255.255
1110 - -
E (reserved)
240.0.0.0 -255.255.255.255
11110 - -
19
Lookups with Class-based Addresses
23
186.21
Port 1
Port 2192.33.32.1
Class A
Class B
Class C
192.33.32 Port 3Exact match
netid port#
20
Problems with Class-based Addressing
• Fixed netid-hostid boundaries too inflexible: rapid depletion of address space
• Exponential growth in size of routing tables
21
Exponential Growth in Routing Table Sizes
Num
ber
of
BG
P r
oute
s advert
ised
22
Classless Addressing (and CIDR)
• Eliminated class boundaries• Introduced the notion of a variable
length prefix between 0 and 32 bits long• Prefixes represented by P/l: e.g., 122/8,
212.128/13, 34.43.32/22, 10.32.32.2/32 etc.
• An l-bit prefix represents an aggregation of 232-l IP addresses
23
CIDR:Hierarchical Route Aggregation
Backbone
Router
R1R2
R3R4
ISP, P ISP, Q192.2.0/22 200.11.0/22
Site, S
192.2.1/24
Site, T
192.2.2/24 192.2.0/22 200.11.0/22
192.2.1/24 192.2.2/24
192.2.0/22, R2
Backbone routing table
IP Number Line
24
Size of the Routing Table
Source: http://www.telstra.net/ops/bgptable.html
Nu
mb
er
of
act
ive B
GP p
refixes
Date
25
Classless Addressing
A BC
0.0.0.0
Class-based:
255.255.255.255
Classless:
0.0.0.0255.255.255.255
23/8 191/8
191.23/16191.128.192/18
191.23.14/23
26
Backbone
Router
R1
R2R3
ISP, P
192.2.0/22, R2
Backbone routing table
Non-aggregatable Prefixes:
(1) Multi-homed Networks
192.2.0/22192.2.2/24
R4
192.2.2/24, R3
27
Backbone
Router
R1R2 R3 R4
ISP, P ISP, Q192.2.0/22 200.11.0/22
Site, S
192.2.1/24
Site, T
192.2.2/24 192.2.0/22 200.11.0/22
192.2.1/24 192.2.2/24
Non-aggregatable Prefixes:
(2) Change of ProviderBackbone routing table
192.2.0/22, R2
192.2.2/24, R3
IP Number Line
28
Routing Lookups with CIDR
192.2.0/22, R2
192.2.2/24, R3 192.2.0/22 200.11.0/22
192.2.2/24
200.11.0/22, R4
200.11.0.33192.2.0.1 192.2.2.100
Find the most specific route, or the longest matching prefix among all the prefixes matching the destination address of an incoming packet
29
Longest Prefix Match is Harder than Exact Match
• The destination address of an arriving packet does not carry with it the information to determine the length of the longest matching prefix
• Hence, one needs to search among the space of all prefix lengths; as well as the space of all prefixes of a given length
30
Metrics for Lookup Algorithms
• Speed• Storage requirements• Low update time• Ability to handle large routing
tables• Flexibility in implementation• Low preprocessing time
31
0.01
0.1
1
10
100
1000
10000
100000
1980 1985 1990 1995 2000 2005Year
Sin
gle
fib
er
capaci
ty (
Gb/s
)
2x per year
Maximum Bandwidth per Installed Fiber
Source: Lucent
32
Maximum Bandwidth per Router Port, and Lookup Performance Required
Year Line Linerate (Gbps)
40B (Mpps)
84B (Mpps)
354B (Mpps)
1997-98
OC3 0.155 0.48 0.23 0.054
1998-99
OC12 0.622 1.94 0.92 0.22
1999-00
OC48 2.5 7.81 3.72 0.88
2000-01
OC192 10.0 31.25 14.88 3.53
2002-03
OC768 40.0 125 59.52 14.12
1GE 1.0 3.13 1.49 0.35
33
Size of Routing Table?
• Currently, 85K entries• At 25K per year, 230-256K prefixes
for next 5 years• Decreasing costs of transmission
may increase rate of routing table growth
• At 50K per year, need 350-400K prefixes for next 5 years
34
Routing Update Rate?
• Currently a peak of a few hundred BGP updates per second
• Hence, 1K per second is a must• 5-10K updates/second seems to be safe • BGP limitations may be a bottleneck
first• Updates should be atomic, and should
interfere little with normal lookups
35
Routing Lookups: Outline
• Background and problem definition
• Lookup schemes• Comparative evaluation
36
Example Forwarding Table (5-bit Prefixes)
Prefix Next-hop
P1 111* H1
P2 10* H2
P3 1010* H3
P4 10101 H4
37
Linear Search
• Keep prefixes in a linked list• O(N) storage, O(N) lookup time,
O(1) update complexity• Improve average time by keeping
linked list sorted in order of prefix lengths
38
Caching Addresses
CPU BufferMemory
LineCard
DMA
MAC
LocalBuffer
Memory
LineCard
DMA
MAC
LocalBuffer
Memory
LineCard
DMA
MAC
LocalBuffer
Memory
Fast Path
Slow Path
39
Caching Addresses
Advantages
Increased average lookup performance
Disadvantages
Decreased locality in backbone trafficCache sizeCache management overheadHardware implementation difficult
40
Radix Trie
P1 111* H1
P2 10* H2
P3 1010*
H3
P4 10101
H4
P2
P3
P4
P1
A
B
C
G
D
F
H
E
1
0
0
1 1
1
1
Lookup 10111
Add P5=1110*
I
0
P5
next-hop-ptr (if prefix)
left-ptr right-ptr
Trie node
41
Radix Trie
• W-bit prefixes: O(W) lookup, O(NW) storage and O(W) update complexity
Advantages
SimplicityExtensible to wider fields
Disadvantages
Worst case lookup slowWastage of storage space in chains
42
Leaf-pushed Binary Trie
A
B
C
G
D
E
1
0
0
1
1
left-ptr or next-hop
Trie node
right-ptr or next-hop
P2
P4P3
P2
P1P1 111* H1
P2 10* H2
P3 1010*
H3
P4 10101
H4
43
PATRICIA
2A
B C
E
10
1
Patricia tree internal node
3
P3
P2
P4
P110
0F G
D5
bit-position
left-ptr right-ptr
Lookup 10111
P1 111* H1
P2 10* H2
P3 1010*
H3
P4 10101
H4
44
PATRICIA
• W-bit prefixes: O(W2) lookup, O(N) storage and O(W) update complexity
Advantages
Decreased storage Extensible to wider fields
Disadvantages
Worst case lookup slowBacktracking makes implementation complex
45
Path-compressed Tree
1, , 2A
B C10
10,P2,4
P4
P1
1
0
E
D1010,P3,5
bit-position
left-ptr right-ptr
variable-length bitstring
next-hop (if prefix present)
Path-compressed tree node structure
Lookup 10111
P1 111* H1
P2 10* H2
P3 1010*
H3
P4 10101
H4
46
Path-compressed Tree
• W-bit prefixes: O(W) lookup, O(N) storage and O(W) update complexity
Advantages
Decreased storage
Disadvantages
Worst case lookup slow
47
Early Lookup Schemes
• BSD unix [sklower91] : Patricia, expected lookup time = 1.44logN
• Dynamic prefix trie [doeringer96] : Patricia variant, complex insertion/deletion : 40K entries consumed 2MB with 0.3-0.5 Mpps
48
Multi-bit Tries
Depth = WDegree = 2Stride = 1 bit
Binary trieW
Depth = W/kDegree = 2k
Stride = k bits
Multi-ary trie
W/k
49
Prefix Expansion with Multi-bit Tries
If stride = k bits, prefix lengths that are not a multiple of k need to be expanded
Prefix Expanded prefixes
0* 00*, 01*
11* 11*
E.g., k = 2:
Maximum number of expanded prefixes corresponding to one non-expanded prefix = 2k-
1
50
Four-ary Trie (k=2)
P2
P3 P12
A
B
F11
next-hop-ptr (if prefix)
ptr00 ptr01
A four-ary trie node
P11
10
P42
H11
P41
10
10
1110
D
C
E
G
ptr10 ptr11
Lookup 10111
P1 111* H1
P2 10* H2
P3 1010*
H3
P4 10101
H4
51
Compressed Trie (k=8)
L16
L24
L8
Only 4 memory accesses!
L32
8-8-8-8 split
52
Prefix Expansion Increases Storage
Consumption• Replication of next-hop ptr• Greater number of unused (null)
pointers in a node
Time ~ W/kStorage ~ NW/k * 2k-1
53
Generalization: Different Strides at Each Trie Level
• 16-8-8 split• 4-10-10-8 split• 24-8 split• 21-3-8 split
54
Choice of Strides: Controlled Prefix Expansion [Sri98]
Given a forwarding table and a desired number of memory accesses in the worst case (i.e., maximum tree depth, D)
A dynamic programming algorithm to compute the optimal sequence of strides that minimizes the storage requirements: runs in O(W2D) timeAdvantages
Optimal storage under these constraints
Disadvantages
Updates lead to sub-optimality anywayHardware implementation difficult
55
Further Generalization: Different Stride at Each
Node [Sri98]
Given a forwarding table and a desired number of memory accesses in the worst case (i.e., maximum tree depth, D)
A dynamic programming algorithm to compute the optimal stride at each node that minimizes the storage requirements: runs in O(NW2D) time
56
Stride Optimization : Implementation Results
Two levels Three levels
Fixed-stride
49 MB, 1ms 1.8 MB, 1ms
Varying-stride
1.6MB, 130 ms
0.57 MB, 871 ms
38816 prefixes, 300 MHz P-II
57
Lulea Algorithm [lulea98]
16-8-8 split
L16
L24
L32
58
Lulea Algorithm
1 0 0 0 1 0 1 1 1 0 0 0 1 1
1 1
16-8-8 split
59
Lulea Algorithm
10001010 11100010 10000010 10110100 11000000
R1, 0 R5, 0R2, 3 R3, 7 R4, 9
0 13
Codeword array
Base index array
0 1
0 321 4
P1 P2 P3 P4Pointer array
60
Lulea Algorithm
33K entries: 160KB, average 2Mpps
Advantages
Extremely small data structure – can fit in L1/L2 cache
Disadvantages
Scalability to larger tables?Incremental updates not supported
61
Binary Search on Trie Levels [wald98]
P
62
Prefix-length
Hashtable ptr
8
12
16
22
Binary Search on Trie Levels
10
10.1, 10.2
10.1.10, 10.1.32, 10.2.64
Example prefixes10/8
10.1/16
10.1.10/22
10.1.32/22
10.2.64/22
Example addresses10.1.10.4
10.2.3.9
63
Binary Search on Trie Levels
33K entries: 1.4 MB, 1.2-2.2 Mpps
Advantages
Scales nicely to IPv6
Disadvantages
Multiple hashed memory accessesIncremental updates complex
64
Binary Search on Prefix Intervals [lampson98]
Prefix Interval
P1 * 0000-1111
P2 00* 0000-0011
P3 1* 1000-1111
P4 1101 1101-1101
P5 001* 0010-0011
0000 11110010 0100 0110 1000 11101010 1100
P1
P4P3
P5P2
I1 I3 I4 I5 I6I2
65
0111
I1
I3
I2 I4 I5
I6
0011 1101
11000001
>
>
>
>
>
Alphabetic Tree
0000 11110010 0100 0110 1000 11101010 1100
P1
P4P3
P5P2
I1 I3 I4 I5 I6I2
66
Multiway Search on Intervals
38K entries: 0.95 MB, 2.1 Mpps
Advantages
Space is O(N)
Disadvantages
Incremental updates complex
67
Depth-constrained Near-optimal Alphabetic Tree
• Redraw the binary search tree based on probability of access of routing table entries:– Minimize average lookup time– But keep worst case lookup time
bounded
40% improvement in lookup time with a small relaxation in worst case lookup time.
68
Routing Lookups in Hardware [gupta98]
Prefix length
Num
ber
April 11, 2000
MAE-EAST routing table (source: www.merit.edu)
69
Routing Lookups in Hardware
142.19.6.14
Prefixes up to 24-bits
1 Next Hop
24
Next Hop
142.19.6
224 = 16M entries
142.19.6
70
Routing Lookups in Hardware
Prefixes up to 24-bits
1 Next Hop
128.3.72
24 0 Pointer
8
Prefixes above 24-bits
Next Hop
Next Hop
Next Hop
off
set
base
128.3.72.14
128.3.72
14
71
Routing Lookups in Hardware
Prefixes up to n-bits2n entries:
0
n + m
n
i j Prefixeslonger than
n+m bits
Next Hop
2m
i entries
72
Routing Lookups in Hardware
Various compression schemes can be employed to decrease the storage requirements: e.g. employ carefully chosen variable length strides, bitmap compression etc.
Advantages
20 Mpps with 50ns DRAM or 66 Mpps with e-DRAMEasy to implement in hardware
Disadvantages
Large memory required (9-33 MB)Depends on prefix-length distribution
73
Content-addressable Memory (CAM)
• Fully associative memory• Exact match operation in a single
clock cycle: parallel compare
74
Lookups with Ternary-CAM
Memory array Priority
encoder
Next-hopmemory
P32
P31
P8
DestinationAddress
Next-hop
TCAM RAM
01
2
3
M
0
1
0
0
1
75
Advantages
Fast: 15-20 ns
Disadvantages
Expensive (and low density): 0.25 MB at 50 MHZ costs $30-$75High power: 5-8 WUpdates slow
Lookups with TCAM
76
Updates with TCAM
P32
P31
P8
01
2
3
M
Issue: how to manage the free space : [Hoti’00]
Empty space
77
Routing Lookups: Outline
• Background and problem definition
• Lookup schemes• Comparative evaluation
78
Performance Comparison: Complexity
Algorithm Lookup
Storage
Update
Binary trie W NW W
Patricia W2 N W
Path-compressed trie W N W
Multi-ary trie W/k N*2k -
LC trie W N -
Lulea - - -
Binary search on trie levels
logW NlogW -
Binary search on intervals log(2N) N -
TCAM 1 N W
79
Performance Comparison
Algorithm Lookup (ns)
Storage (KB)
Patricia (BSD) 2500 3262
Multi-way fixed-stride optimal trie (3-levels)
298 1930
Multi-way fixed-stride optimal trie (5-levels)
428 660
LC trie - 700
Lulea 409 160
Binary search on trie levels 650 1600
6-way search on intervals 490 950
Lookups with direct access 15-60 9-33 * 1000
TCAM 15-20 512
80
Routing Lookups: References
• [lulea98] A. Brodnik, S. Carlsson, M. Degermark, S. Pink. “Small Forwarding Tables for Fast Routing Lookups”, Sigcomm 1997, pp 3-14.
• [gupta98] P. Gupta, S. Lin, N.McKeown. “Routing lookups in hardware at memory access speeds”, Infocom 1998, pp 1241-1248, vol. 3.
• P. Gupta, B. Prabhakar, S. Boyd. “Near-optimal routing lookups with bounded worst case performance,” Proc. Infocom, March 2000
• [lampson98] B. Lampson, V. Srinivasan, G. Varghese. “ IP lookups using multiway and multicolumn search”, Infocom 1998, pp 1248-56, vol. 3.
81
Routing lookups : References (contd)
• [wald98] M. Waldvogel, G. Varghese, J. Turner, B. Plattner. “Scalable high speed IP routing lookups”, Sigcomm 1997, pp 25-36.
• [LC-trie] S. Nilsson, G. Karlsson. “Fast address lookup for Internet routers”, IFIP Intl Conf on Broadband Communications, Stuttgart, Germany, April 1-3, 1998.
• [sri98] V. Srinivasan, G.Varghese. “Fast IP lookups using controlled prefix expansion”, Sigmetrics, June 1998
• TCAM vendors: netlogicmicro.com, laratech.com, mosaid.com, sibercore.com
82
Packet Classification
83
Packet Classification: Outline
• Background and problem definition
• Classification schemes• Comparative evaluation
84
Flow-aware vs Flow-unaware Routers (recap)
• Flow-aware router: keeps track of flows and perform similar processing on packets in a flow
• Flow-unaware router (packet-by-packet router): treats each incoming packet individually
85
Why Flow-aware Router?
Routers require additional mechanisms: admission control, resource reservation, per-flow queueing, fair scheduling etc.
ISPs want to provide differentiated services
capability to distinguish and isolate traffic belonging to different flows based on negotiated service agreements
classification
Rules or policies
86
Need for Differentiated Services
ISP1
NAP
E1E2
ISP2
ISP3Z
X
Y
Service ExampleTraffic Shaping
Ensure that ISP3 does not inject more than 50Mbps of total traffic on interface X, of which no more than 10Mbps is email traffic
Packet Filtering
Deny all traffic from ISP2 (on interface X) destined to E2
Policy Routing
Send all voice-over-IP traffic arriving from E1 (on interface Y) and destined to E2 via a separate ATM network
87
More Value added Services
• Differentiated services – Regard traffic from Autonomous System
#33 as `platinum grade’
• Accounting and Billing– Treat all video traffic as highest priority and
perform accounting for this type of traffic
• Committed Access Rate (rate limiting)– Rate limit WWW traffic from sub
interface#739 to 10Mbps
88
Multi-field Packet Classification
Given a classifier with N rules, find the action associated with the highest priority rule matching an incoming packet.
Example: packet (5.168.3.32, 152.133.171.71, …, TCP)
Field 1 Field 2 … Field k
Action
Rule 1 5.3.90/21 2.13.8.11/32
… UDP A1
Rule 2 5.168.3/24 152.133/16 … TCP A2
… … … … … …
Rule N 5.168/16 152/8 … ANY AN
89
Packet Header Fields for Classification
L3-SA L2-DAL2-SAL3-DA L3-PROTL4-PROTL4-DPL4-SPPAYLOAD
Transport layer header Network layer header MAC header
Direction of transmission of packet
DA = Destination AddressSA = Source AddressPROT = ProtocolSP = Source portDP = Destination port
L2 = layer 2 (e.g., Ethernet)L3 = layer 3 (e.g., IP)L4 = layer 4 (e.g., TCP)
90
Special processing
Control
Datapath:per-packet processing
Routing lookup
Flow-aware Router: Basic Architectural Components
Routing, resource reservation, admission control, SLAs
Packet classification
Switching
Scheduling
91
Packet Classification
Action
--------
---- ----
--------
Predicate Action
Classifier (policy database)
Packet Classification
Forwarding Engine
Incoming Packet
HEADER
92
Packet Classification: Problem Definition
Given a classifier C with N rules, Rj, 1 j N, where Rj consists of three entities:
1) A regular expression Rj[i], 1 i d, on each of the d header fields,
2) A number, pri(Rj), indicating the priority of the rule in the classifier, and
3) An action, referred to as action(Rj).
For an incoming packet P with the header considered as a d-tuple of points (P1, P2, …, Pd), the d-dimensional packet classification problem is to find the rule Rm with the highest priority among all the rules Rj matching the d-tuple; i.e., pri(Rm) > pri(Rj), j m, 1 j N, such that Pi matches Rj[i], 1 i d. We call rule Rm the best matching rule for packet P.
93
Example 4D classifier
Rule
L3-DA L3-SA L4-DP L4-PROT
Action
R1 152.163.190.69/255.255.255.255
152.163.80.11/255.255.255.255
* * Deny
R2 152.168.3/255.255.255
152.163.200.157/255.255.255.255
eq www udp Deny
R3 152.168.3/255.255.255
152.163.200.157/255.255.255.255
range 20-21
udp Permit
R4 152.168.3/255.255.255
152.163.200.157/255.255.255.255
eq www tcp Deny
R5 * * * * Deny
94
Example Classification Results
Pkt Hdr
L3-DA L3-SA L4-DP L4-PROT
Rule, Action
P1 152.163.190.69 152.163.80.11 www tcp R1, Deny
P2 152.168.3.21 152.163.200.157
www udp R2, Deny
95
Classification is a Generalization of Lookup
• Classifier = routing table• One-dimension (destination
address)• Rule = routing table entry• Regular expression = prefix• Action = (next-hop-address, port)• Priority = prefix-length
96
Metrics for Classification Algorithms
• Speed• Storage requirements• Low update time• Ability to handle large classifiers• Flexibility in implementation• Low preprocessing time• Scalability in the number of header fields• Flexibility in rule specification
97
Size of Classifier?
• Microflow recognition: 128K-1M flows in a metro/edge router
• Firewall applications, 8-16K• Wildcarded filters, 16-128K • Depends heavily on where your
box will be deployed
98
Packet Classification: Outline
• Background and problem definition
• Classification schemes• Comparative evaluation
99
Example Classifier
Rule Destination Address
Source Address
R1 0* 10*
R2 0* 01*
R3 0* 1*
R4 00* 1*
R5 00* 11*
R6 10* 1*
R7 * 00*
100
Set-pruning Tries [Tsuchiya, Sri98]
Dimension DA
Rule
DA SA
R1 0* 10*
R2 0* 01*
R3 0* 1*
R4 00* 1*
R5 00* 11*
R6 10* 1*
R7 * 00*
R7 Dimension SAR2 R1 R5 R7 R2 R1
R3
R7
R6
R7
R4
O(N2) memory
101
Grid-of-Tries [Sri98]
Dimension DA
Dimension SAR5 R2 R1
R3R6
R7
R4
O(NW) memoryO(W2) lookup
Rule
DA SA
R1 0* 10*
R2 0* 01*
R3 0* 1*
R4 00* 1*
R5 00* 11*
R6 10* 1*
R7 * 00*
102
Grid-of-Tries [Sri98]
Dimension DA
Dimension SAR5 R2 R1
R3R6
R7
R4
O(NW) memoryO(2W) lookup
Rule
DA SA
R1 0* 10*
R2 0* 01*
R3 0* 1*
R4 00* 1*
R5 00* 11*
R6 10* 1*
R7 * 00*
103
Grid-of-Tries
Advantages
Good solution for two dimensions
Disadvantages
Static solutionNot easily extensible to more than two dimensions
20K entries: 2MB, 9 memory accesses (with expansion)
104
R5
Geometric Interpretation in 2D
R4
R3
R2R1
R7
P2
Dimension #1
Dim
ensi
on #
2
R6
P1
e.g. (128.16.46.23, *)e.g. (144.24/16, 64/24)
105
Bitmap-intersection [Lak98]
R4 R3 R2R11
1
0
0
1
0
1
1
R3
R4
R1
R2
106
Bitmap-intersection
Advantages
Good solution for multiple dimensions, for small classifiers
Disadvantages
Static solutionLarge memory bandwidth (scales linearly in N)Large amount of memory (scales quadratically in N)Hardware-optimized
512 rules: 1Mpps with single FPGA (33MHz) and five 1Mb SRAM chips
107
2D classification [Lak98]
R4 R3 R2R1
R5
R6
Prefixes
Ranges
P1
R7
Prefixes of length 4
Prefixes of length 3
108
2D Classification [Lak98]: Preprocessing
• Store the prefixes in a trie• With each prefix store the set of
intervals that form a rectangle with that prefix as the other side
• Store the intervals by storing them as a set of non-overlapping disjoint intervals
109
2D Classification [Lak98]: Lookup
• For each prefix length:– Find the prefix matching the incoming
point and the set of non-overlapping intervals associated with the prefix
– Search for the non-overlapping interval that contains the point
• Repeat for all prefix lengths
110
2D Classification [Lak98]: Complexity
• Lookups: O(WlogN) with N two-dimensional rules– O(W+logN) using fractional cascading
• Space: O(N) • Static data structure
111
Crossproducting [Sri98]
R4 R3R2
R1
54
3
2
1
6
21 7 8 94 5 63
P1P2
(1,3)
(8,4)
112
Crossproducting
Advantages
Fast accessesSuitable for multiple fields
Disadvantages
Large amount of memoryNeed caching for bigger classifiers (> 50 rules)
50 rules: 1.5MB, need caching (on-demand crossproducting) for bigger classifiers
Need: d 1-D lookups + 1 memory access, O(Nd) space
113
Space-time Tradeoff
Point Location among N non-overlapping regions in d dimensions:
either O(log N) time with O(Nd) space, orO(logd-1N) time with O(N) space
Need help: exploit structure in real-life classifiers.
114
Recursive Flow Classification [Gupta99]
• Difficult to achieve both high classification rate and reasonable storage in the worst case
• Real classifiers exhibit structure and redundancy
• A practical scheme could exploit this structure and redundancy
Observations:
115
RFC: Classifier Dataset
• 793 classifiers from 101 ISP and enterprise networks with a total of 41505 rules.
• 40 classifiers: more than 100 rules. Biggest classifier had 1733 rules.
• Maximum of 4 fields per rule: source IP address, destination IP address, protocol and destination port number.
116
Structure of the Classifiers
R1
R2
R34 regions
117
Structure of the Classifiers
R1
R2
R3
{R1, R2}
{R2, R3}
{R1, R2, R3}
7 regions
dataset: 1733 rule classifier = 4316 distinct regions (worst case is 1013 !)
118
Recursive Flow Classification
2S = 2128 2T = 212
One-step
2S = 2128 2T = 212232264
Multi-step
119
Chunking of a Packet
Source L3 Address
Destination L3 Address
L4 protocol and flags
Source L4 port
Destination L4 port
Type of Service
Packet Header
Chunk #0
Chunk #7
120
Packet Flow
Phase 0 Phase 1 Phase 2 Phase 3
index
action
Header
Combination
16
16 8
16 8
16 8 Reduction
128 64 32 16
14
121
Choice of Reduction Tree
3
2
1
0
5
4
Number of phases = P = 310 memory accesses
3
2
1
0
5
4
Number of phases = P = 411 memory acceses
122
RFC: Storage Requirements
Number of Rules
Mem
ory
in M
byte
s
123
RFC: Classification Time
• Pipelined hardware: 30 Mpps (worst case OC192) using two 4Mb SRAMs and two 64Mb SDRAMs at 125MHz.
• Software: (3 phases) 1 Mpps in the worst case and 1.4-1.7 Mpps in the average case. (average case OC48) [performance measured using Intel Vtune simulator on a windows NT platform]
124
RFC: Pros and Cons
Advantages
Exploits structure of real-life classifiersSuitable for multiple fieldsSupports non-contiguous masksFast accesses
Disadvantages
Depends on structure of classifiersLarge pre-processing timeIncremental updates slowLarge worst-case storage requirements
125
Hierarchical Intelligent Cuttings (HiCuts)
[Gupta99]
• No single good solution for all cases – But real classifiers have structure
• Perhaps an algorithm can exploit this structure– A heuristic hybrid scheme …
Observations:
126
HiCuts: Basic Idea
{R1, R2, R3, …, Rn}
Decision Tree
{R1, R3,R4} {R1, R2,R5} {R8, Rn}
Binth: BinThreshold = Maximum Subset Size = 3
127
Heuristics to Exploit Classifier Structure
• Picking a suitable dimension to hicut across
• Minimize the maximum number of rules into any one partition, OR
• Maximize the entropy of the distribution of rules across the partition, OR
• Maximise the different number of specifications in one dimension
• Picking the suitable number of partitions (HiCuts) to be made
• Affects the space consumed and the classification time. Tuned by a parameter, spfac
128
HiCuts:Number of Memory Accesses
Binth = 8, spfac = 4
Number of Rules (log
scale)
Crossproducting
129
HiCuts: Storage Requirements
Binth = 8 ; spfac = 4
Sp
ace
in
KB
yte
s (l
og
sc
ale
)
Number of Rules (log
scale)
130
Incremental Update Time
Binth = 8, spfac = 4 , 333MHz P-II running Linux
Number of Rules (log
scale)
Tim
e in
seco
nd
s (l
og
sc
ale
)
131
HiCuts: Pros and Cons
Advantages
Exploits structure of real-life classifiersAdapts data structureSuitable for multiple fieldsSupports incremental updates
Disadvantages
Depends on structure of classifiersLarge pre-processing timeLarge worst-case storage requirements
132
Tuple Space Search [Suri99]
Decompose the classification problem into a number of exact match problems, then use hashing
Rule TupleR1 (01*, 111*)
[2,3]
R2 (11*, 010*)
[2,3]
R3 (1*, *) [1,0]
Use one hash table for each tuple, search all hash tables sequentially
133
Improved TSS via Precomputation
• Extension of “binary search on trie levels”
• If [2,3,3] succeeeds, no need to search e.g., [4,5,6]
• If [2,3,3] fails, no need to search e.g., [1,2,1]
• Search the tuple space intelligently (decision tree on tuple space)
134
TSS: Pros and Cons
Advantages
Suitable for multiple fieldsSupports incremental updatesFast classification and updates on average
Disadvantages
Large pre-processing timeMultiple hashed-memory accesses
135
Area-based Quad Tree [Buddhikot99]
00 01 1110
00 01 1110R2
R1
R3R5
R4
R1,R2
R5
R3,R4
Crossing Filter Set
Lookup: two 1-D longest prefix match operations at every node in the path from the root to a leaf
O(N) space O(WlogN) lookup timeO(W+logN) using FC
P1
136
AQT: Efficient Updates
new
old
Partition prefixes into groups and do pre-computation per group instead of per interval
O(aW) search and O(aN1/a) updates
137
2-D Classification Using FIS Tree [Feldmann00]
R2
R1
R3R5
R4
P1
x-FIS tree
l levelsO(ln1+1/l) space(l+1) 1-D lookups
138
FIS Tree: Experimental Study
Number of rules
Levels in FIS tree
Storage space
Number of memory accesses
4-60 K 2 < 5 MB < 15
~106 3 < 100 MB
< 18
Rulesets constructed using netflow data from AT&T Worldnet. Experiments done using static 2-D FIS trees.
139
Ternary CAMs
Advantages
Suitable for multiple fieldsFast: 16-20 ns (50-66 Mpps)Simple to understand
Disadvantages
Inflexible: range-to-prefix blowupDensity: largest available in 2000 is 32K x 128 (but can be cascaded)Management software, and on-chip logic: non-trivial complexityPower: 5-8 WIncremental updates: slowDRAM-based CAMs: higher density but soft-error is a problemCost: $30-$160 for 1Mb
140
Rule Range Maximal Prefixes
R5 [3,11] 0011, 01**, 10**
R4 [2,7] 001*, 01**
R3 [4,11] 01**, 10**
R2 [4,7] 01**
R1 [1,15] 0001, 001*, 01**, 10**, 110*, 1110
Range-to-prefix Blowup
Rule Range
R1 [3,11]
R2 [2,7]
R3 [4,11]
R4 [4,7]
R5 [1,14]
Maximum memory blowup = factor of (2W-2)d
141
Packet Classification: References
• [Lak98] T.V. Lakshman. D. Stiliadis. “High speed policy based packet forwarding using efficient multi-dimensional range matching”, Sigcomm 1998, pp 191-202
• [Sri98] V. Srinivasan, S. Suri, G. Varghese and M. Waldvogel. “Fast and scalable layer 4 switching”, Sigcomm 1998, pp 203-214
• [Suri99] V. Srinivasan, G. Varghese, S. Suri. “Fast packet classification using tuple space search”, Sigcomm 1999, pp 135-146
• [Gupta99] P. Gupta, N. McKeown, “Packet classification using hierarchical intelligent cuttings,” Hot Interconnects VII, 1999
142
Packet Classification: References (contd.)
• [Gupta99] P. Gupta, N. McKeown, “Packet classification on multiple fields,” Sigcomm 1999, pp 147-160
• [Buddhikot99] M. M. Buddhikot, S. Suri, and M. Waldvogel, “Space decomposition techniques for fast layer-4 switching,” Protocols for High Speed Networks, vol. 66, no. 6, pp 277-283, 1999
• [Feldmann00] A. Feldmann and S. Muthukrishnan, “Tradeoffs for packet classification,” Infocom 2000
• T. Woo, “A modular approach to packet classification: algorithms and results, “ Infocom 2000
143
Special Instances of Classification
• Multicast – PIM SM
– Longest Prefix Matching on the source and group address
– Try (S,G) followed by (*,G) followed by (*,*,RP) – Check Incoming Interface
– DVMRP: – Incoming Interface Check followed by (S,G) lookup
• IPv6 – 128 bit destination address field
144
Implementation Choices Given Design
Requirements
Disclaimer: These are my opinions
145
Design Requirement LU1
2.5 Gbps, 100K routes
a) 2-4 TCAMsb) On-chip logic with one external
SDRAM chip (using multibit tries)c) On-chip e-DRAM
Requirements:
Choices:
146
Design Requirement LU2
10 Gbps, 256K routes
a) 4-8 TCAMsb) On-chip logic with 2-4 external
SDRAM chips (using multibit tries)c) On-chip e-DRAM
Requirements:
Choices:
147
Design Requirement PC1
10 Gbps classification up to L4, 16-64K comparatively static 128bit entries
a) 1-4 TCAMs b) On-chip logic with 2 external SDRAM
and 2 SRAM chips (using RFC)c) Off-chip SRAMs (using HiCuts)
Requirements:
Choices:
148
Your Design Here
Requirements:
Choices:
149
Lookup/Classification Chip Vendors
• Switch-on• Fastchip• Agere• Solidum• Siliconaccess• TCAM vendors: Netlogic, Lara,
Sibercore, Mosaid, Klsi etc.
150
Summary
• Both problems are well studied by now but increasing linerates and database sizes continue to present interesting opportunities
• Still need a high-speed (~OC192) dynamic, generic, multi-field classification algorithm for large number of (up to a million) rules