Post on 13-Nov-2014
description
Layered Interval Codes for
TCAM-based ClassificationAnat Bremler Barr, Danny Hendler,
David Hay , Boris Farber
04/08/23
Overview
• Introduction and Problem Statement
• Related Work
• Our Solution –Multi Layered Interval CodeLayering Bit AllocationEncodingSoftware Simulation
• Results and Comparative Analysis
• Conclusions
Packet Classification Scheme
Routing Table
Src IP Protocol
Src port
Dest IP Dest port
Action
123.25.34.43
TCP 80 255.2.3.4 80 Allow
13.24.35.46
TCP >1023 255.2.127.4 5556 Deny
16.32.223.14
UDP 50-20 255.2.3.4 50-70 Allow
22.2.3.4 TCP 21 255.2.3.4 21 Limit
255.2.3.4 ICMP 12-809 255.2.3.4 17-190
Log
Extract Packet Header Data
Rules Table
Ternary Content-Addressable Memory (TCAM) Technology
• Associative memory: parallel comparisons against all entries
• Fixed-width entries 144 bits , 36 spare bits !
• Ternary digits: 0 / 1 / X (don’t care)
• Only first match is returned
0011101101010XX00X01001111XXXX
11X00X00001110X0X101000110XXXX10XX010100X0XX0100011010X01000
001110XXXXXXXXXXXXXXXXXXXXXXX
.
.
.
1110XX010X01X0010101010X0XXXXX
TCAM
1
0011101101010000010100111110110
Search key
Problem: TCAM’s Are Not Well Suited for Range Representation
001110110110110000000
Match-type rule field value matching key-fieldexact
prefix
range
00111011011011000000001110110110110000000001*****************
>1024 2012
TCAM range rule representation takes many entries
Research goal : develop efficient algorithm for constructing efficient representation of range rules (source/dest fields )
Encoding Schemes – How the Rules Encoded
• Database Dependent Scheme encoding scheme where the rules are coded differently as a function of the database they occur.
• Database Independent Scheme encoding scheme where the rules are coded without any consideration of their occurrence in database.
Prefix Expansion (Database Independent)
Representing [1,6]
TCAM entries:
001 ,01 ,*10 ,*110
Prefix expansion is inefficient • A range over W-bits may expand to 2W-2 entries
• For 2 range-fields, may expand to (2W-2)2
• Expansion factor of up to 6 on real-world databases !!!
010
0 1
000 001 011 100 101 110 111
[1,6]
(Srinisavan 98)
SRGE (Database Independent)
• Small ranges occur frequently, can be encoded efficiently by using Gray code.
• Gray code Is a binary numeral system ;where two successive values differ in only one digit.
• Gray code encoding scheme decreases the overall database expansion (number of entries)
(Bremler/Hendler 06)
Dec Gray Binary
0 000 000
1 001 001
2 011 010
3 010 011
4 110 100
5 111 101
6 101 110
7 100 111
Basic Dependent Encoding
• Use TCAM string un-used bits (extra bits)
• Divide extra bits to ranges− A single extra bit is assigned to each selected range− If range r assigned to bit i then i extra bit is set to
one all the rest to don’t care
• Cons The solution doesn’t scale to more then number of bits !
• 36 bits versus 300+ ranges
(Liu 02)
R1 1****
R2 *1***
R3 **1**
R4 ***1*
R5 ****1
R1 10000
R2 01000
R3 00100
R4 00010
R5 00001
Search EncodeR1
R2
R3
R4
R5
Region Splitting (Database Dependent)
• Ranges in different regions are disjoint
• The region number is encoded by log (r) bits, where r is the number of regions. Each region-range is encoded by a separate bit.
• Encode rule and range by
• Total required bits log (r) + max ranges at a region
• Rule in multiple ranges expansion
• Cons Looks on geometry not number of entries the range is encoded
(Liu 02)
Region# #range at region
Reg1 Reg2 Reg3
R1
R2
R3
R4
R5
R1 011xxx, 111xxx
R2 01x1xx, 11x1xx
R3 11xx1x , 101xxx
R4 01xx1x
R5 01xxx1, 11xx1x
01 11 10
Database-Dependent Encoding (DRES)
• Key idea: allocate an extra bit to commonly occurring ranges (all the rest by database independent scheme)
• Range set the assigned extra bit to 1 ,Set all other extra bits to X
• Cons limited by the number of ranges (bits) can be encoded , un-efficient bits usage
Example
Source-port ≥ 1024
0011101101010XX00X01001111XXXX
11X00X00001110X0X101000110XXXX10XX010100X0XX0100011010X01000
001110XXXXXXXXXXXXXXXXXXXXXXX
.
.
.
1110XX010X01X0010101010X0XXXXX
1
2
3
4
TCAM
XXXXX
XXXXXXXXXX
XXXXX
.
.
.
XXXXX
11010010101XXXXXXXXXXXXXXXXXX 1
(Che 07)
Layered Interval Proposal (LIC)
c
d
a b
•N disjoint ranges can be encoded using log(n) bits
•One to One mapping between range and index
How can we utilize the key observation to achieve minimal database expansion ?
LIC Overview
• Divide to levels (regions)− Layer – represents a set of disjoint ranges.− ranges at the same layer can be encode by bits
− Encode range and search key by
L
iil
1
)log(
a
b c
f
1****
*01**
il
*10**
***10***01 d e***11
#range at L1 #range at L2 #range at L3
L1
L2
L3
The technique was mentioned at [Lunteren and Engbersen2003] but no algorithm given. We can extend technique further to reduce space complexity
General Problem
• Given set of ranges (intervals) .
• How can we divide them to non-intersecting sets of ranges efficient ?
• Such that encode the sets such that the resulted size of encoding will be minimal.
Interval Graph Definitions
• Interval Graph - is the intersection graph of a set of intervals on the real line. − It has one vertex for each interval in the
set, and − It has an edge between every pair of
vertices corresponding to intervals that intersect.
• Interval Graph Coloring – assignment of colors to the nodes of graph such that no two adjacent vertices have the same color
Interval Graphs
Minimum Layered Interval Code(MLIC)
• Find in interval graph coloring of vertices that minimizes the encoding size of resulted colored sets.
• Interval graph property colors neighbors intersection
| |
21
log ( 1)C
ii
n
min
n is the number of vertices colored by i color
Budgeted Minimum Layered Interval Code (BMLIC)
• Let G be a Weighted Interval Graph.
• Find subset of vertices (V’) to be colored such that that the coloring weight is maximal and the encoding size is bounded from above by BitsNumber.
| |
21
log ( 1)C
ii
n
n is the number of vertices colored by i color
BitsNumber
Maximize such that'
( )v V
w v
Hardness of Problem
• Using the similar ideas from chromatic sum problem on interval graphs we proved that MLIC and BMLIC problem are NP-Hard.
• We based our proof on reduction from Circular Arc Coloring , which s known to be NP-Hard.
The Algorithm Stages
1. Layering Stage, the intervals are partitioned into a set of disjoint interval sets.
2. Bits Allocation Stage in which each layer is allocated with a certain number of bits
3. Encoding Stage in which we obtain a search key for each entry in domain
Layering Stage
• The most important and challenging component in the algorithm.
• The Layering divides ranges into non-intersecting sets:
Ranges
L5
L1
L2
L3
L4
Maximum Size Independent Sets(MSIS)
• We find the maximal size independent set by maximal size un- weighted ranges set. (Bar-Noy et al’ 98 )
Iteratively find maximum size independent sets on interval graph
Maximal Size Colorable Sets(MSCS)
• Based On: Nicolso et al’ 99
• We proved that this is 2-approximation algorithm to the problem !
• Conclusion : we proposed the best known approximation
Iteratively find maximum size i-colorable sets on interval graph , for each i from 1 to chromatic number of G
Weight Definition
• We define the weight of range be the number of redundant TCAM entries required to encode the range.
• Calculated as :(Number entries for encoding the range
-1) * number of rules where the range occurs
• We thrive to get number entries for one range as small as possible.
Maximum Weight Independent Sets (MWIS)
• Same as MSIS , expect we iteratively find a maximum weighted independent sets
• We find the maximal size independent set by maximal weighted ranges set , using Weighted Interval Scheduling algorithm. (Bar Noy et al’ 98)
Maximum Weight Colorable Sets (MWCS)
• Same as MSCS ,instead we find maximum weight k-colorable sets.
• Finding maximum weight k-colorable set on an interval graph can be done in polynomial time.
• Done by transforming the graph into acyclic network graph and solving minimum cost maximal flow problem (Augmented path negative cycles etc’)
Example
Interval Endpoints
Weight
r1 [0,5] 10
r2 [6,8] 10
r3 [9,11] 1
r4 [12,14] 1
r5 [15,17] 1
r6 [6,11] 5
r7 [12,17] 10
MSIS , MSCS
S1 = {(r1,r2,r3,r4,r5)}
S2= {(r6,r7)}
Note the similarities of two methods
Bits Allocation Stage
• Given the layering from the first phase and the bits allocations from previous iterations ,
• The algorithm finds the layer that will cover the maximum additional weight if additional bits allocated to it
Best Fit per bits
Maximal Cover for Next Bit
For all levels we calculate the following value (weight range)
[ ]
[ ]
2 1
[1, ]2
1max ( [ , ])
assign i k
assign ik b
j
w L i jk
Number already assigned bits , b is the bits number
j’th range at level i
We chose level with maximal value
Bits Auction Algorithm
Sort ranges in all levels decreased by number of TCAM entries (the weight)
While free bits
1. chose maximal cover for next bits (k from 1 to b) level (previous slide)
2. encode the next level ranges
3. decrease free bits
Each iteration is auction where layers compete for the next bit
Bits per layer l
1 2 3 4
Num covered ranges
1 3 7 15 log2(n-1)
Encode 0 element (not in layer)
Encoding Stage
• Given the layers , with number of bits from bits allocation phase. We associate with each range its LIC code
Encoding Stage
For all layers For each range r let j be index of range r inside
layer If j smaller than number of encoded ranges
from bit allocation phase code(r) = *****bin(j)***** search-key(r) = 00000bin(j)00000 The *s and 0s are the bits of other layers
where r doesn’t belong
Else r is encoded by some independent scheme like Gray code or binary
Example
• Range r from level 5 , whose index is 2nd largest , if bit auction gave to level 5 3 bits, the encoding is as follows
Code ********************010************************
Search Key 00000000000000000100000000000000000000
Level 5 bits
Experiment Database
• DIPRE (real life) database
• 120 files
• 223,000 Rules
• 273 different Ranges
• Analysis of rules entries (number appearances , encoding size)
Algorithm Software ImplementationEngine
• Implemented in Java , Eclipse
• 50 classes
• 3 algorithm components :LIC , Bitmap , Prefix Expansion
• Exporter for data calculations
• 2 encoding schemes Gray code and Binary
Our Measurements
• We calculated the redundancy factor for all range rules (a.k.a. rules that have more than one range field)
• Redundancy is the average number of redundant TCAM entries required to encode range rules of database.
• Performed with 2 encoding schemes Gray code and Binary
Redundancy Factors
1
10
100
1000
0 5 10 15 20 25 30 35 40
Free bits
Red
un
dan
cy
(%)
Prefix Expansion DRES
Regions Partition MWCS
MSIS
Redundancy Factors
0123456789
20 30 40 50 60 70 80 90Free bits
Red
un
dan
cy (
%)
MWIS MWCS MSIS MSCS
Number of Bits Required to Encode Whole Database(Encoding Problem)
Algorithm Bits
DRES 235
Region Partition 235
LIC MSIS 85
LIC MSCS 86
Best LIC MSIS 85
Results 0-36 Bits
Algorithm Expansion Redundancy
Prefix Expansion 2.68 6.63
Region Partition 1.64 2.51
DIPRE 1.2 -
SRGE 1.03 1.2
DRES 1.025 0.09
MSCS 1.0088 0.034
MSIS(SRGE) 1.0061 0.024
Weighted Phenomenon
• The un-weighted layering does better than weighted layering after 36 bits in our database
• Weighted layering is better when number of bits is un-bound. MWCS - best results till 36 bits
• MSIS - best results for 70-90 bits
Conclusions
• Our scheme produces best results for any number of bits
• For any number of bits better than DRES in about 50%-70%
• LIC- SRGE (Gray code scheme) achieves superior results , MSIS layering outperforms all other layering algorithms.
Innovation & Novelty
• Theoretical−MLIC & BMLIC is NP-Hard−2- approximation algorithm to MLIC
& BMLIC−Using Interval Graphs in IP Lookup
domain
• Practical −Best results so far−Efficient and fast layering
algorithms
Discussion
• LIC is not optimal for following problem instance, where the ranges form a balanced binary tree (each level i from 1 to log (n) has ranges )2i
log(n)
• LIC creates layers that are mapped to tree levels
• The LIC encoding size will be sum of logs of tree levels
sizes
log(8+1)+log(4+1)+log(2+1)+log(1+1) 10 bits
• Optimal will use balanced binary tree properties , such that all nodes can be encoded by function of log (next slide)
• Optimal encodes using 2*4 8 bits
2log ( )n
Optimizations & Future Work
2log ( )n
log( )n• LIC is off by
factor from optimal in this
instance
• Each range will be encoded by following bit string of the size 2*log(n)
• The recursive formula is concatenation of previous level bits with range state (odd/even), filled by *s
• Example for range from level 2
01 01****
Previous level bits first level
Current range data:01 - even10 – odd00 – not included
Don’t cares
• LIC’s encode size is , while opt takeslog( )n
THANK YOU