Routing Lookups and Packet Classification: Theory and Practice

Pankaj GuptaDepartment of Computer Science

Stanford Universitypankaj@stanford.edu

http://www.stanford.edu/~pankaj

August 18, 2000

Hot Interconnects 8High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.

Tutorial Outline

• Introduction– What this tutorial is about

• Routing lookups– Background, lookup schemes

• Packet Classification– Background, classification schemes

• Implementation choices for given design requirements

Request to you

• Please ask lots of questions!– But I may not be able to answer all of

them right now

• I am here to learn, so please share your experiences, thoughts and opinions freely

What this tutorial is about?

Internet: Mesh of Routers

The Internet Core

Edge Router

Campus Area Network

RFC 1812: Requirements for IPv4 Routers

• Must perform an IP datagram forwarding decision (called forwarding)

• Must send the datagram out the appropriate interface (called switching)

Optionally: a router MAY choose to perform special processing on incoming packets

Examples of special processing

• Filtering packets for security reasons• Delivering packets according to a pre-

agreed delay guarantee• Treating high priority packets

preferentially• Maintaining statistics on the number

of packets sent by various routers

Special Processing Requires Identification of

Flows• All packets of a flow obey a pre-defined

rule and are processed similarly by the router

• E.g. a flow = (src-IP-address, dst-IP-address), or a flow = (dst-IP-prefix, protocol) etc.

• Router needs to identify the flow of every incoming packet and then perform appropriate special processing

Flow-aware vs Flow-unaware Routers

• Flow-aware router: keeps track of flows and perform similar processing on packets in a flow

• Flow-unaware router (packet-by-packet router): treats each incoming packet individually

What this tutorial is about:

• Algorithms and techniques that an IP router uses to decide where to forward the packets next (routing lookup)

• Algorithms and techniques that a flow-aware router uses to classify packets into flows (packet classification)

Routing Lookups

Routing Lookups: Outline

• Background and problem definition

• Lookup schemes• Comparative evaluation

Lookup in an IP Router

Unicast destination address based lookup

Dstn Addr

Next Hop

--------

---- ----

--------

Dstn-prefix Next Hop

Forwarding Table

Next Hop Computation

Forwarding Engine

Incoming Packet

HEADER

Packet-by-packet Router

ForwardingDecision

Forwarding

Interconnect

Linecard

Routing processor

ForwardingDecision

Forwarding

Switching

Routing Control

Datapath:per-packet processing

Routing lookup

Packet-by-packet Router: Basic Architectural

Components

Scheduling

ATM and MPLS SwitchesDirect Lookup

(Port, vci/label)

Address

Memory

(Port, vci/label)

IPv4 Addresses

• 32-bit addresses• Dotted quad notation: e.g.

12.33.32.1• Can be represented as integers on

the IP number line [0, 232-1]: a.b.c.d denotes the integer: (a*224+b*216+c*28+d)

0.0.0.0 255.255.255.255IP Number Line

Class-based Addressing

A B C D

0.0.0.0

128.0.0.0 192.0.0.0

Class Range MS bits netid hostidA 0.0.0.0 –

128.0.0.00 bits 1-7 bits 8-31

B 128.0.0.0 -191.255.255.255

10 bits 2-15 bits 16-31

C 192.0.0.0 -223.255.255.255

110 bits 3-23 bits 24-31

D (multicast)

224.0.0.0 - 239.255.255.255

1110 - -

E (reserved)

240.0.0.0 -255.255.255.255

11110 - -

Lookups with Class-based Addresses

186.21

Port 1

Port 2192.33.32.1

Class A

Class B

Class C

192.33.32 Port 3Exact match

netid port#

Problems with Class-based Addressing

• Fixed netid-hostid boundaries too inflexible: rapid depletion of address space

• Exponential growth in size of routing tables

Exponential Growth in Routing Table Sizes

s advert

Classless Addressing (and CIDR)

• Eliminated class boundaries• Introduced the notion of a variable

length prefix between 0 and 32 bits long• Prefixes represented by P/l: e.g., 122/8,

212.128/13, 34.43.32/22, 10.32.32.2/32 etc.

• An l-bit prefix represents an aggregation of 232-l IP addresses

CIDR:Hierarchical Route Aggregation

Backbone

Router

ISP, P ISP, Q192.2.0/22 200.11.0/22

Site, S

192.2.1/24

Site, T

192.2.2/24 192.2.0/22 200.11.0/22

192.2.1/24 192.2.2/24

192.2.0/22, R2

Backbone routing table

IP Number Line

Size of the Routing Table

Source: http://www.telstra.net/ops/bgptable.html

refixes

Classless Addressing

0.0.0.0

Class-based:

255.255.255.255

Classless:

0.0.0.0255.255.255.255

23/8 191/8

191.23/16191.128.192/18

191.23.14/23

Backbone

Router

ISP, P

192.2.0/22, R2

Backbone routing table

Non-aggregatable Prefixes:

(1) Multi-homed Networks

192.2.0/22192.2.2/24

192.2.2/24, R3

Backbone

Router

R1R2 R3 R4

ISP, P ISP, Q192.2.0/22 200.11.0/22

Site, S

192.2.1/24

Site, T

192.2.2/24 192.2.0/22 200.11.0/22

192.2.1/24 192.2.2/24

Non-aggregatable Prefixes:

(2) Change of ProviderBackbone routing table

192.2.0/22, R2

192.2.2/24, R3

IP Number Line

Routing Lookups with CIDR

192.2.0/22, R2

192.2.2/24, R3 192.2.0/22 200.11.0/22

192.2.2/24

200.11.0/22, R4

200.11.0.33192.2.0.1 192.2.2.100

Find the most specific route, or the longest matching prefix among all the prefixes matching the destination address of an incoming packet

Longest Prefix Match is Harder than Exact Match

• The destination address of an arriving packet does not carry with it the information to determine the length of the longest matching prefix

• Hence, one needs to search among the space of all prefix lengths; as well as the space of all prefixes of a given length

Metrics for Lookup Algorithms

• Speed• Storage requirements• Low update time• Ability to handle large routing

tables• Flexibility in implementation• Low preprocessing time

100000

1980 1985 1990 1995 2000 2005Year

capaci

2x per year

Maximum Bandwidth per Installed Fiber

Source: Lucent

Maximum Bandwidth per Router Port, and Lookup Performance Required

Year Line Linerate (Gbps)

40B (Mpps)

84B (Mpps)

354B (Mpps)

1997-98

OC3 0.155 0.48 0.23 0.054

1998-99

OC12 0.622 1.94 0.92 0.22

1999-00

OC48 2.5 7.81 3.72 0.88

2000-01

OC192 10.0 31.25 14.88 3.53

2002-03

OC768 40.0 125 59.52 14.12

1GE 1.0 3.13 1.49 0.35

Size of Routing Table?

• Currently, 85K entries• At 25K per year, 230-256K prefixes

for next 5 years• Decreasing costs of transmission

may increase rate of routing table growth

• At 50K per year, need 350-400K prefixes for next 5 years

Routing Update Rate?

• Currently a peak of a few hundred BGP updates per second

• Hence, 1K per second is a must• 5-10K updates/second seems to be safe • BGP limitations may be a bottleneck

first• Updates should be atomic, and should

interfere little with normal lookups

Example Forwarding Table (5-bit Prefixes)

Prefix Next-hop

P1 111* H1

P2 10* H2

P3 1010* H3

P4 10101 H4

Linear Search

• Keep prefixes in a linked list• O(N) storage, O(N) lookup time,

O(1) update complexity• Improve average time by keeping

linked list sorted in order of prefix lengths

Caching Addresses

CPU BufferMemory

LineCard

LocalBuffer

Memory

LineCard

LocalBuffer

Memory

LineCard

LocalBuffer

Memory

Fast Path

Slow Path

Caching Addresses

Advantages

Increased average lookup performance

Disadvantages

Decreased locality in backbone trafficCache sizeCache management overheadHardware implementation difficult

Radix Trie

P1 111* H1

P2 10* H2

P3 1010*

P4 10101

Lookup 10111

Add P5=1110*

next-hop-ptr (if prefix)

left-ptr right-ptr

Trie node

Radix Trie

• W-bit prefixes: O(W) lookup, O(NW) storage and O(W) update complexity

Advantages

SimplicityExtensible to wider fields

Disadvantages

Worst case lookup slowWastage of storage space in chains

Leaf-pushed Binary Trie

left-ptr or next-hop

Trie node

right-ptr or next-hop

P1P1 111* H1

P2 10* H2

P3 1010*

P4 10101

PATRICIA

Patricia tree internal node

bit-position

left-ptr right-ptr

Lookup 10111

P1 111* H1

P2 10* H2

P3 1010*

P4 10101

PATRICIA

• W-bit prefixes: O(W2) lookup, O(N) storage and O(W) update complexity

Advantages

Decreased storage Extensible to wider fields

Disadvantages

Worst case lookup slowBacktracking makes implementation complex

Path-compressed Tree

1, , 2A

10,P2,4

D1010,P3,5

bit-position

left-ptr right-ptr

variable-length bitstring

next-hop (if prefix present)

Path-compressed tree node structure

Lookup 10111

P1 111* H1

P2 10* H2

P3 1010*

P4 10101

Path-compressed Tree

• W-bit prefixes: O(W) lookup, O(N) storage and O(W) update complexity

Advantages

Decreased storage

Disadvantages

Worst case lookup slow

Early Lookup Schemes

• BSD unix [sklower91] : Patricia, expected lookup time = 1.44logN

• Dynamic prefix trie [doeringer96] : Patricia variant, complex insertion/deletion : 40K entries consumed 2MB with 0.3-0.5 Mpps

Multi-bit Tries

Depth = WDegree = 2Stride = 1 bit

Binary trieW

Depth = W/kDegree = 2k

Stride = k bits

Multi-ary trie

Prefix Expansion with Multi-bit Tries

If stride = k bits, prefix lengths that are not a multiple of k need to be expanded

Prefix Expanded prefixes

0* 00*, 01*

11* 11*

E.g., k = 2:

Maximum number of expanded prefixes corresponding to one non-expanded prefix = 2k-

Four-ary Trie (k=2)

P3 P12

next-hop-ptr (if prefix)

ptr00 ptr01

A four-ary trie node

ptr10 ptr11

Lookup 10111

P1 111* H1

P2 10* H2

P3 1010*

P4 10101

Compressed Trie (k=8)

Only 4 memory accesses!

8-8-8-8 split

Prefix Expansion Increases Storage

Consumption• Replication of next-hop ptr• Greater number of unused (null)

pointers in a node

Time ~ W/kStorage ~ NW/k * 2k-1

Generalization: Different Strides at Each Trie Level

• 16-8-8 split• 4-10-10-8 split• 24-8 split• 21-3-8 split

Choice of Strides: Controlled Prefix Expansion [Sri98]

Given a forwarding table and a desired number of memory accesses in the worst case (i.e., maximum tree depth, D)

A dynamic programming algorithm to compute the optimal sequence of strides that minimizes the storage requirements: runs in O(W2D) timeAdvantages

Optimal storage under these constraints

Disadvantages

Updates lead to sub-optimality anywayHardware implementation difficult

Further Generalization: Different Stride at Each

Node [Sri98]

Given a forwarding table and a desired number of memory accesses in the worst case (i.e., maximum tree depth, D)

A dynamic programming algorithm to compute the optimal stride at each node that minimizes the storage requirements: runs in O(NW2D) time

Stride Optimization : Implementation Results

Two levels Three levels

Fixed-stride

49 MB, 1ms 1.8 MB, 1ms

Varying-stride

1.6MB, 130 ms

0.57 MB, 871 ms

38816 prefixes, 300 MHz P-II

Lulea Algorithm [lulea98]

16-8-8 split

Lulea Algorithm

1 0 0 0 1 0 1 1 1 0 0 0 1 1

16-8-8 split

Lulea Algorithm

10001010 11100010 10000010 10110100 11000000

R1, 0 R5, 0R2, 3 R3, 7 R4, 9

Codeword array

Base index array

0 321 4

P1 P2 P3 P4Pointer array

Lulea Algorithm

33K entries: 160KB, average 2Mpps

Advantages

Extremely small data structure – can fit in L1/L2 cache

Disadvantages

Scalability to larger tables?Incremental updates not supported

Binary Search on Trie Levels [wald98]

Prefix-length

Hashtable ptr

Binary Search on Trie Levels

10.1, 10.2

10.1.10, 10.1.32, 10.2.64

Example prefixes10/8

10.1/16

10.1.10/22

10.1.32/22

10.2.64/22

Example addresses10.1.10.4

10.2.3.9

Binary Search on Trie Levels

33K entries: 1.4 MB, 1.2-2.2 Mpps

Advantages

Scales nicely to IPv6

Disadvantages

Multiple hashed memory accessesIncremental updates complex

Binary Search on Prefix Intervals [lampson98]

Prefix Interval

P1 * 0000-1111

P2 00* 0000-0011

P3 1* 1000-1111

P4 1101 1101-1101

P5 001* 0010-0011

0000 11110010 0100 0110 1000 11101010 1100

I1 I3 I4 I5 I6I2

I2 I4 I5

0011 1101

11000001

Alphabetic Tree

0000 11110010 0100 0110 1000 11101010 1100

I1 I3 I4 I5 I6I2

Multiway Search on Intervals

38K entries: 0.95 MB, 2.1 Mpps

Advantages

Space is O(N)

Disadvantages

Incremental updates complex

Depth-constrained Near-optimal Alphabetic Tree

• Redraw the binary search tree based on probability of access of routing table entries:– Minimize average lookup time– But keep worst case lookup time

bounded

40% improvement in lookup time with a small relaxation in worst case lookup time.

Routing Lookups in Hardware [gupta98]

Prefix length

April 11, 2000

MAE-EAST routing table (source: www.merit.edu)

Routing Lookups in Hardware

142.19.6.14

Prefixes up to 24-bits

1 Next Hop

Next Hop

142.19.6

224 = 16M entries

142.19.6

Prefixes up to 24-bits

1 Next Hop

128.3.72

24 0 Pointer

Prefixes above 24-bits

Next Hop

128.3.72.14

128.3.72

Prefixes up to n-bits2n entries:

i j Prefixeslonger than

n+m bits

Next Hop

i entries

Various compression schemes can be employed to decrease the storage requirements: e.g. employ carefully chosen variable length strides, bitmap compression etc.

Advantages

20 Mpps with 50ns DRAM or 66 Mpps with e-DRAMEasy to implement in hardware

Disadvantages

Large memory required (9-33 MB)Depends on prefix-length distribution

Content-addressable Memory (CAM)

• Fully associative memory• Exact match operation in a single

clock cycle: parallel compare

Lookups with Ternary-CAM

Memory array Priority

encoder

Next-hopmemory

DestinationAddress

Next-hop

TCAM RAM

Advantages

Fast: 15-20 ns

Disadvantages

Expensive (and low density): 0.25 MB at 50 MHZ costs $30-$75High power: 5-8 WUpdates slow

Lookups with TCAM

Updates with TCAM

Issue: how to manage the free space : [Hoti’00]

Empty space

Performance Comparison: Complexity

Algorithm Lookup

Storage

Update

Binary trie W NW W

Patricia W2 N W

Path-compressed trie W N W

Multi-ary trie W/k N*2k -

LC trie W N -

Lulea - - -

Binary search on trie levels

logW NlogW -

Binary search on intervals log(2N) N -

TCAM 1 N W

Performance Comparison

Algorithm Lookup (ns)

Storage (KB)

Patricia (BSD) 2500 3262

Multi-way fixed-stride optimal trie (3-levels)

298 1930

Multi-way fixed-stride optimal trie (5-levels)

428 660

LC trie - 700

Lulea 409 160

Binary search on trie levels 650 1600

6-way search on intervals 490 950

Lookups with direct access 15-60 9-33 * 1000

TCAM 15-20 512

Routing Lookups: References

• [lulea98] A. Brodnik, S. Carlsson, M. Degermark, S. Pink. “Small Forwarding Tables for Fast Routing Lookups”, Sigcomm 1997, pp 3-14.

• [gupta98] P. Gupta, S. Lin, N.McKeown. “Routing lookups in hardware at memory access speeds”, Infocom 1998, pp 1241-1248, vol. 3.

• P. Gupta, B. Prabhakar, S. Boyd. “Near-optimal routing lookups with bounded worst case performance,” Proc. Infocom, March 2000

• [lampson98] B. Lampson, V. Srinivasan, G. Varghese. “ IP lookups using multiway and multicolumn search”, Infocom 1998, pp 1248-56, vol. 3.

Routing lookups : References (contd)

• [wald98] M. Waldvogel, G. Varghese, J. Turner, B. Plattner. “Scalable high speed IP routing lookups”, Sigcomm 1997, pp 25-36.

• [LC-trie] S. Nilsson, G. Karlsson. “Fast address lookup for Internet routers”, IFIP Intl Conf on Broadband Communications, Stuttgart, Germany, April 1-3, 1998.

• [sri98] V. Srinivasan, G.Varghese. “Fast IP lookups using controlled prefix expansion”, Sigmetrics, June 1998

• TCAM vendors: netlogicmicro.com, laratech.com, mosaid.com, sibercore.com

Packet Classification

Packet Classification: Outline

• Classification schemes• Comparative evaluation

Flow-aware vs Flow-unaware Routers (recap)

• Flow-aware router: keeps track of flows and perform similar processing on packets in a flow

• Flow-unaware router (packet-by-packet router): treats each incoming packet individually

Why Flow-aware Router?

Routers require additional mechanisms: admission control, resource reservation, per-flow queueing, fair scheduling etc.

ISPs want to provide differentiated services

capability to distinguish and isolate traffic belonging to different flows based on negotiated service agreements

classification

Rules or policies

Need for Differentiated Services

Service ExampleTraffic Shaping

Ensure that ISP3 does not inject more than 50Mbps of total traffic on interface X, of which no more than 10Mbps is email traffic

Packet Filtering

Deny all traffic from ISP2 (on interface X) destined to E2

Policy Routing

Send all voice-over-IP traffic arriving from E1 (on interface Y) and destined to E2 via a separate ATM network

More Value added Services

• Differentiated services – Regard traffic from Autonomous System

#33 as `platinum grade’

• Accounting and Billing– Treat all video traffic as highest priority and

perform accounting for this type of traffic

• Committed Access Rate (rate limiting)– Rate limit WWW traffic from sub

interface#739 to 10Mbps

Multi-field Packet Classification

Given a classifier with N rules, find the action associated with the highest priority rule matching an incoming packet.

Example: packet (5.168.3.32, 152.133.171.71, …, TCP)

Field 1 Field 2 … Field k

Action

Rule 1 5.3.90/21 2.13.8.11/32

… UDP A1

Rule 2 5.168.3/24 152.133/16 … TCP A2

… … … … … …

Rule N 5.168/16 152/8 … ANY AN

Packet Header Fields for Classification

L3-SA L2-DAL2-SAL3-DA L3-PROTL4-PROTL4-DPL4-SPPAYLOAD

Transport layer header Network layer header MAC header

Direction of transmission of packet

DA = Destination AddressSA = Source AddressPROT = ProtocolSP = Source portDP = Destination port

L2 = layer 2 (e.g., Ethernet)L3 = layer 3 (e.g., IP)L4 = layer 4 (e.g., TCP)

Special processing

Control

Datapath:per-packet processing

Routing lookup

Flow-aware Router: Basic Architectural Components

Routing, resource reservation, admission control, SLAs

Packet classification

Switching

Scheduling

Action

--------

---- ----

--------

Predicate Action

Classifier (policy database)

Forwarding Engine

Incoming Packet

HEADER

Packet Classification: Problem Definition

Given a classifier C with N rules, Rj, 1 j N, where Rj consists of three entities:

1) A regular expression Rj[i], 1 i d, on each of the d header fields,

2) A number, pri(Rj), indicating the priority of the rule in the classifier, and

3) An action, referred to as action(Rj).

For an incoming packet P with the header considered as a d-tuple of points (P1, P2, …, Pd), the d-dimensional packet classification problem is to find the rule Rm with the highest priority among all the rules Rj matching the d-tuple; i.e., pri(Rm) > pri(Rj), j m, 1 j N, such that Pi matches Rj[i], 1 i d. We call rule Rm the best matching rule for packet P.

Example 4D classifier

L3-DA L3-SA L4-DP L4-PROT

Action

R1 152.163.190.69/255.255.255.255

152.163.80.11/255.255.255.255

* * Deny

R2 152.168.3/255.255.255

152.163.200.157/255.255.255.255

eq www udp Deny

R3 152.168.3/255.255.255

152.163.200.157/255.255.255.255

range 20-21

udp Permit

R4 152.168.3/255.255.255

152.163.200.157/255.255.255.255

eq www tcp Deny

R5 * * * * Deny

Example Classification Results

Pkt Hdr

L3-DA L3-SA L4-DP L4-PROT

Rule, Action

P1 152.163.190.69 152.163.80.11 www tcp R1, Deny

P2 152.168.3.21 152.163.200.157

www udp R2, Deny

Classification is a Generalization of Lookup

• Classifier = routing table• One-dimension (destination

address)• Rule = routing table entry• Regular expression = prefix• Action = (next-hop-address, port)• Priority = prefix-length

Metrics for Classification Algorithms

• Speed• Storage requirements• Low update time• Ability to handle large classifiers• Flexibility in implementation• Low preprocessing time• Scalability in the number of header fields• Flexibility in rule specification

Size of Classifier?

• Microflow recognition: 128K-1M flows in a metro/edge router

• Firewall applications, 8-16K• Wildcarded filters, 16-128K • Depends heavily on where your

box will be deployed

Packet Classification: Outline

• Classification schemes• Comparative evaluation

Example Classifier

Rule Destination Address

Source Address

R1 0* 10*

R2 0* 01*

R3 0* 1*

R4 00* 1*

R5 00* 11*

R6 10* 1*

R7 * 00*

Set-pruning Tries [Tsuchiya, Sri98]

Dimension DA

R1 0* 10*

R2 0* 01*

R3 0* 1*

R4 00* 1*

R5 00* 11*

R6 10* 1*

R7 * 00*

R7 Dimension SAR2 R1 R5 R7 R2 R1

O(N2) memory

Grid-of-Tries [Sri98]

Dimension DA

Dimension SAR5 R2 R1

O(NW) memoryO(W2) lookup

R1 0* 10*

R2 0* 01*

R3 0* 1*

R4 00* 1*

R5 00* 11*

R6 10* 1*

R7 * 00*

Grid-of-Tries [Sri98]

Dimension DA

Dimension SAR5 R2 R1

O(NW) memoryO(2W) lookup

R1 0* 10*

R2 0* 01*

R3 0* 1*

R4 00* 1*

R5 00* 11*

R6 10* 1*

R7 * 00*

Grid-of-Tries

Advantages

Good solution for two dimensions

Disadvantages

Static solutionNot easily extensible to more than two dimensions

20K entries: 2MB, 9 memory accesses (with expansion)

Geometric Interpretation in 2D

Dimension #1

e.g. (128.16.46.23, *)e.g. (144.24/16, 64/24)

Bitmap-intersection [Lak98]

R4 R3 R2R11

Bitmap-intersection

Advantages

Good solution for multiple dimensions, for small classifiers

Disadvantages

Static solutionLarge memory bandwidth (scales linearly in N)Large amount of memory (scales quadratically in N)Hardware-optimized

512 rules: 1Mpps with single FPGA (33MHz) and five 1Mb SRAM chips

2D classification [Lak98]

R4 R3 R2R1

Prefixes

Ranges

Prefixes of length 4

Prefixes of length 3

2D Classification [Lak98]: Preprocessing

• Store the prefixes in a trie• With each prefix store the set of

intervals that form a rectangle with that prefix as the other side

• Store the intervals by storing them as a set of non-overlapping disjoint intervals

2D Classification [Lak98]: Lookup

• For each prefix length:– Find the prefix matching the incoming

point and the set of non-overlapping intervals associated with the prefix

– Search for the non-overlapping interval that contains the point

• Repeat for all prefix lengths

2D Classification [Lak98]: Complexity

• Lookups: O(WlogN) with N two-dimensional rules– O(W+logN) using fractional cascading

• Space: O(N) • Static data structure

Crossproducting [Sri98]

R4 R3R2

21 7 8 94 5 63

Crossproducting

Advantages

Fast accessesSuitable for multiple fields

Disadvantages

Large amount of memoryNeed caching for bigger classifiers (> 50 rules)

50 rules: 1.5MB, need caching (on-demand crossproducting) for bigger classifiers

Need: d 1-D lookups + 1 memory access, O(Nd) space

Space-time Tradeoff

Point Location among N non-overlapping regions in d dimensions:

either O(log N) time with O(Nd) space, orO(logd-1N) time with O(N) space

Need help: exploit structure in real-life classifiers.

Recursive Flow Classification [Gupta99]

• Difficult to achieve both high classification rate and reasonable storage in the worst case

• Real classifiers exhibit structure and redundancy

• A practical scheme could exploit this structure and redundancy

Observations:

RFC: Classifier Dataset

• 793 classifiers from 101 ISP and enterprise networks with a total of 41505 rules.

• 40 classifiers: more than 100 rules. Biggest classifier had 1733 rules.

• Maximum of 4 fields per rule: source IP address, destination IP address, protocol and destination port number.

Structure of the Classifiers

R34 regions

Structure of the Classifiers

{R1, R2}

{R2, R3}

{R1, R2, R3}

7 regions

dataset: 1733 rule classifier = 4316 distinct regions (worst case is 1013 !)

Recursive Flow Classification

2S = 2128 2T = 212

One-step

2S = 2128 2T = 212232264

Multi-step

Chunking of a Packet

Source L3 Address

Destination L3 Address

L4 protocol and flags

Source L4 port

Destination L4 port

Type of Service

Packet Header

Chunk #0

Chunk #7

Packet Flow

Phase 0 Phase 1 Phase 2 Phase 3

action

Header

Combination

16 8 Reduction

128 64 32 16

Choice of Reduction Tree

Number of phases = P = 310 memory accesses

Number of phases = P = 411 memory acceses

RFC: Storage Requirements

Number of Rules

RFC: Classification Time

• Pipelined hardware: 30 Mpps (worst case OC192) using two 4Mb SRAMs and two 64Mb SDRAMs at 125MHz.

• Software: (3 phases) 1 Mpps in the worst case and 1.4-1.7 Mpps in the average case. (average case OC48) [performance measured using Intel Vtune simulator on a windows NT platform]

RFC: Pros and Cons

Advantages

Exploits structure of real-life classifiersSuitable for multiple fieldsSupports non-contiguous masksFast accesses

Disadvantages

Depends on structure of classifiersLarge pre-processing timeIncremental updates slowLarge worst-case storage requirements

Hierarchical Intelligent Cuttings (HiCuts)

[Gupta99]

• No single good solution for all cases – But real classifiers have structure

• Perhaps an algorithm can exploit this structure– A heuristic hybrid scheme …

Observations:

HiCuts: Basic Idea

{R1, R2, R3, …, Rn}

Decision Tree

{R1, R3,R4} {R1, R2,R5} {R8, Rn}

Binth: BinThreshold = Maximum Subset Size = 3

Heuristics to Exploit Classifier Structure

• Picking a suitable dimension to hicut across

• Minimize the maximum number of rules into any one partition, OR

• Maximize the entropy of the distribution of rules across the partition, OR

• Maximise the different number of specifications in one dimension

• Picking the suitable number of partitions (HiCuts) to be made

• Affects the space consumed and the classification time. Tuned by a parameter, spfac

HiCuts:Number of Memory Accesses

Binth = 8, spfac = 4

Number of Rules (log

scale)

Crossproducting

HiCuts: Storage Requirements

Binth = 8 ; spfac = 4

scale)

Incremental Update Time

Binth = 8, spfac = 4 , 333MHz P-II running Linux

scale)

HiCuts: Pros and Cons

Advantages

Exploits structure of real-life classifiersAdapts data structureSuitable for multiple fieldsSupports incremental updates

Disadvantages

Depends on structure of classifiersLarge pre-processing timeLarge worst-case storage requirements

Tuple Space Search [Suri99]

Decompose the classification problem into a number of exact match problems, then use hashing

Rule TupleR1 (01*, 111*)

R2 (11*, 010*)

R3 (1*, *) [1,0]

Use one hash table for each tuple, search all hash tables sequentially

Improved TSS via Precomputation

• Extension of “binary search on trie levels”

• If [2,3,3] succeeeds, no need to search e.g., [4,5,6]

• If [2,3,3] fails, no need to search e.g., [1,2,1]

• Search the tuple space intelligently (decision tree on tuple space)

TSS: Pros and Cons

Advantages

Suitable for multiple fieldsSupports incremental updatesFast classification and updates on average

Disadvantages

Large pre-processing timeMultiple hashed-memory accesses

Area-based Quad Tree [Buddhikot99]

00 01 1110

00 01 1110R2

Crossing Filter Set

Lookup: two 1-D longest prefix match operations at every node in the path from the root to a leaf

O(N) space O(WlogN) lookup timeO(W+logN) using FC

AQT: Efficient Updates

Partition prefixes into groups and do pre-computation per group instead of per interval

O(aW) search and O(aN1/a) updates

2-D Classification Using FIS Tree [Feldmann00]

x-FIS tree

l levelsO(ln1+1/l) space(l+1) 1-D lookups

FIS Tree: Experimental Study

Number of rules

Levels in FIS tree

Storage space

Number of memory accesses

4-60 K 2 < 5 MB < 15

~106 3 < 100 MB

Rulesets constructed using netflow data from AT&T Worldnet. Experiments done using static 2-D FIS trees.

Ternary CAMs

Advantages

Suitable for multiple fieldsFast: 16-20 ns (50-66 Mpps)Simple to understand

Disadvantages

Inflexible: range-to-prefix blowupDensity: largest available in 2000 is 32K x 128 (but can be cascaded)Management software, and on-chip logic: non-trivial complexityPower: 5-8 WIncremental updates: slowDRAM-based CAMs: higher density but soft-error is a problemCost: $30-$160 for 1Mb

Rule Range Maximal Prefixes

R5 [3,11] 0011, 01**, 10**

R4 [2,7] 001*, 01**

R3 [4,11] 01**, 10**

R2 [4,7] 01**

R1 [1,15] 0001, 001*, 01**, 10**, 110*, 1110

Range-to-prefix Blowup

Rule Range

R1 [3,11]

R2 [2,7]

R3 [4,11]

R4 [4,7]

R5 [1,14]

Maximum memory blowup = factor of (2W-2)d

Packet Classification: References

• [Lak98] T.V. Lakshman. D. Stiliadis. “High speed policy based packet forwarding using efficient multi-dimensional range matching”, Sigcomm 1998, pp 191-202

• [Sri98] V. Srinivasan, S. Suri, G. Varghese and M. Waldvogel. “Fast and scalable layer 4 switching”, Sigcomm 1998, pp 203-214

• [Suri99] V. Srinivasan, G. Varghese, S. Suri. “Fast packet classification using tuple space search”, Sigcomm 1999, pp 135-146

• [Gupta99] P. Gupta, N. McKeown, “Packet classification using hierarchical intelligent cuttings,” Hot Interconnects VII, 1999

Packet Classification: References (contd.)

• [Gupta99] P. Gupta, N. McKeown, “Packet classification on multiple fields,” Sigcomm 1999, pp 147-160

• [Buddhikot99] M. M. Buddhikot, S. Suri, and M. Waldvogel, “Space decomposition techniques for fast layer-4 switching,” Protocols for High Speed Networks, vol. 66, no. 6, pp 277-283, 1999

• [Feldmann00] A. Feldmann and S. Muthukrishnan, “Tradeoffs for packet classification,” Infocom 2000

• T. Woo, “A modular approach to packet classification: algorithms and results, “ Infocom 2000

Special Instances of Classification

• Multicast – PIM SM

– Longest Prefix Matching on the source and group address

– Try (S,G) followed by (*,G) followed by (*,*,RP) – Check Incoming Interface

– DVMRP: – Incoming Interface Check followed by (S,G) lookup

• IPv6 – 128 bit destination address field

Implementation Choices Given Design

Requirements

Disclaimer: These are my opinions

Design Requirement LU1

2.5 Gbps, 100K routes

a) 2-4 TCAMsb) On-chip logic with one external

SDRAM chip (using multibit tries)c) On-chip e-DRAM

Requirements:

Choices:

Design Requirement LU2

10 Gbps, 256K routes

a) 4-8 TCAMsb) On-chip logic with 2-4 external

SDRAM chips (using multibit tries)c) On-chip e-DRAM

Requirements:

Choices:

Design Requirement PC1

10 Gbps classification up to L4, 16-64K comparatively static 128bit entries

a) 1-4 TCAMs b) On-chip logic with 2 external SDRAM

and 2 SRAM chips (using RFC)c) Off-chip SRAMs (using HiCuts)

Requirements:

Choices:

Your Design Here

Requirements:

Choices:

Lookup/Classification Chip Vendors

• Switch-on• Fastchip• Agere• Solidum• Siliconaccess• TCAM vendors: Netlogic, Lara,

Sibercore, Mosaid, Klsi etc.

Summary

• Both problems are well studied by now but increasing linerates and database sizes continue to present interesting opportunities

• Still need a high-speed (~OC192) dynamic, generic, multi-field classification algorithm for large number of (up to a million) rules

Thanks! I will appreciate direct

feedback at pankaj@stanford.edu

Routing Lookups and Packet Classification: Theory and Practice

Documents

Transcript of Routing Lookups and Packet Classification: Theory and Practice

TreeCAM: Decoupling Updates and Lookups in Packet Classification

Basic Data Structures for IP lookups and Packet Classification

Medium Access Control, Packet Routing, and Internet ...

[MC-NPR]: .NET Packet Routing Protocol

1 Basic Data Structures for IP lookups and Packet Classification.

Simulasi Routing Menggunakan Packet Tracer

CCNA Exploration - Routing Packet Tracer Instructor Labs

Cisco Catalyst 6500 Series Virtual Switching System System can also simultaneously perform packet lookups. Centralized Management

Avalon - Order of Cancellation Routing Packet

Cisco Nexus 7000 Switch Architecturesaloperie.com/docs/BRKARC-3470.pdfIngress MAC table lookups IGMP snooping lookups IGMP snooping redirection Egress MAC lookups IGMP snooping lookups

Panduan Routing Di Cisco Packet Tracer - Rheinamf

Paper Title (use style: paper title) · Web viewThe packet header fields in the flow table lookups are The packet header fields in the flow table lookups are IV.RELATED WORKSNetwork

Routing and Packet Forwarding - TUM...3 Routing and Packet Forwarding When transmitting data packets, IP distinguished between routing, which refers to choos-ing the path for a packet,

CA_Ex_S2M01_Introduction to Routing and Packet Forwarding

[eBook] Intro to Routing & Packet Forwarding Packet 2

Small Forwarding Tables for Fast Routing Lookups

Secure Routing Packet Transmission With Dynamic Topologies

Fast Incremental Updates on Ternary-CAMs for Routing Lookups and Packet Classification Devavrat Shah and Pankaj Gupta Department of Computer Science Stanford.

Introduction to Routing and Packet Forwarding

Routing in packet switching networks