Thesis presentation

Layered Interval Codes for

TCAM-based ClassificationAnat Bremler Barr, Danny Hendler,

David Hay , Boris Farber

04/08/23

Overview

• Introduction and Problem Statement

• Related Work

• Our Solution –Multi Layered Interval CodeLayering Bit AllocationEncodingSoftware Simulation

• Results and Comparative Analysis

• Conclusions

Packet Classification Scheme

Routing Table

Src IP Protocol

Src port

Dest IP Dest port

Action

123.25.34.43

TCP 80 255.2.3.4 80 Allow

13.24.35.46

TCP >1023 255.2.127.4 5556 Deny

16.32.223.14

UDP 50-20 255.2.3.4 50-70 Allow

22.2.3.4 TCP 21 255.2.3.4 21 Limit

255.2.3.4 ICMP 12-809 255.2.3.4 17-190

Extract Packet Header Data

Rules Table

Ternary Content-Addressable Memory (TCAM) Technology

• Associative memory: parallel comparisons against all entries

• Fixed-width entries 144 bits , 36 spare bits !

• Ternary digits: 0 / 1 / X (don’t care)

• Only first match is returned

0011101101010XX00X01001111XXXX

11X00X00001110X0X101000110XXXX10XX010100X0XX0100011010X01000

001110XXXXXXXXXXXXXXXXXXXXXXX

1110XX010X01X0010101010X0XXXXX

0011101101010000010100111110110

Search key

Problem: TCAM’s Are Not Well Suited for Range Representation

001110110110110000000

Match-type rule field value matching key-fieldexact

prefix

00111011011011000000001110110110110000000001*****************

>1024 2012

TCAM range rule representation takes many entries

Research goal : develop efficient algorithm for constructing efficient representation of range rules (source/dest fields )

Encoding Schemes – How the Rules Encoded

• Database Dependent Scheme encoding scheme where the rules are coded differently as a function of the database they occur.

• Database Independent Scheme encoding scheme where the rules are coded without any consideration of their occurrence in database.

Prefix Expansion (Database Independent)

Representing [1,6]

TCAM entries:

001 ,01 ,*10 ,*110

Prefix expansion is inefficient • A range over W-bits may expand to 2W-2 entries

• For 2 range-fields, may expand to (2W-2)2

• Expansion factor of up to 6 on real-world databases !!!

000 001 011 100 101 110 111

(Srinisavan 98)

SRGE (Database Independent)

• Small ranges occur frequently, can be encoded efficiently by using Gray code.

• Gray code Is a binary numeral system ;where two successive values differ in only one digit.

• Gray code encoding scheme decreases the overall database expansion (number of entries)

(Bremler/Hendler 06)

Dec Gray Binary

0 000 000

1 001 001

2 011 010

3 010 011

4 110 100

5 111 101

6 101 110

7 100 111

Basic Dependent Encoding

• Use TCAM string un-used bits (extra bits)

• Divide extra bits to ranges− A single extra bit is assigned to each selected range− If range r assigned to bit i then i extra bit is set to

one all the rest to don’t care

• Cons The solution doesn’t scale to more then number of bits !

• 36 bits versus 300+ ranges

(Liu 02)

R1 1****

R2 *1***

R3 **1**

R4 ***1*

R5 ****1

R1 10000

R2 01000

R3 00100

R4 00010

R5 00001

Search EncodeR1

Region Splitting (Database Dependent)

• Ranges in different regions are disjoint

• The region number is encoded by log (r) bits, where r is the number of regions. Each region-range is encoded by a separate bit.

• Encode rule and range by

• Total required bits log (r) + max ranges at a region

• Rule in multiple ranges expansion

• Cons Looks on geometry not number of entries the range is encoded

(Liu 02)

Region# #range at region

Reg1 Reg2 Reg3

R1 011xxx, 111xxx

R2 01x1xx, 11x1xx

R3 11xx1x , 101xxx

R4 01xx1x

R5 01xxx1, 11xx1x

01 11 10

Database-Dependent Encoding (DRES)

• Key idea: allocate an extra bit to commonly occurring ranges (all the rest by database independent scheme)

• Range set the assigned extra bit to 1 ,Set all other extra bits to X

• Cons limited by the number of ranges (bits) can be encoded , un-efficient bits usage

Example

Source-port ≥ 1024

0011101101010XX00X01001111XXXX

11X00X00001110X0X101000110XXXX10XX010100X0XX0100011010X01000

001110XXXXXXXXXXXXXXXXXXXXXXX

1110XX010X01X0010101010X0XXXXX

XXXXXXXXXX

11010010101XXXXXXXXXXXXXXXXXX 1

(Che 07)

Layered Interval Proposal (LIC)

•N disjoint ranges can be encoded using log(n) bits

•One to One mapping between range and index

How can we utilize the key observation to achieve minimal database expansion ?

LIC Overview

• Divide to levels (regions)− Layer – represents a set of disjoint ranges.− ranges at the same layer can be encode by bits

− Encode range and search key by

***10***01 d e***11

#range at L1 #range at L2 #range at L3

The technique was mentioned at [Lunteren and Engbersen2003] but no algorithm given. We can extend technique further to reduce space complexity

General Problem

• Given set of ranges (intervals) .

• How can we divide them to non-intersecting sets of ranges efficient ?

• Such that encode the sets such that the resulted size of encoding will be minimal.

Interval Graph Definitions

• Interval Graph - is the intersection graph of a set of intervals on the real line. − It has one vertex for each interval in the

set, and − It has an edge between every pair of

vertices corresponding to intervals that intersect.

• Interval Graph Coloring – assignment of colors to the nodes of graph such that no two adjacent vertices have the same color

Interval Graphs

Minimum Layered Interval Code(MLIC)

• Find in interval graph coloring of vertices that minimizes the encoding size of resulted colored sets.

• Interval graph property colors neighbors intersection

log ( 1)C

n is the number of vertices colored by i color

Budgeted Minimum Layered Interval Code (BMLIC)

• Let G be a Weighted Interval Graph.

• Find subset of vertices (V’) to be colored such that that the coloring weight is maximal and the encoding size is bounded from above by BitsNumber.

log ( 1)C

n is the number of vertices colored by i color

BitsNumber

Maximize such that'

( )v V

Hardness of Problem

• Using the similar ideas from chromatic sum problem on interval graphs we proved that MLIC and BMLIC problem are NP-Hard.

• We based our proof on reduction from Circular Arc Coloring , which s known to be NP-Hard.

The Algorithm Stages

1. Layering Stage, the intervals are partitioned into a set of disjoint interval sets.

2. Bits Allocation Stage in which each layer is allocated with a certain number of bits

3. Encoding Stage in which we obtain a search key for each entry in domain

Layering Stage

• The most important and challenging component in the algorithm.

• The Layering divides ranges into non-intersecting sets:

Ranges

Maximum Size Independent Sets(MSIS)

• We find the maximal size independent set by maximal size un- weighted ranges set. (Bar-Noy et al’ 98 )

Iteratively find maximum size independent sets on interval graph

Maximal Size Colorable Sets(MSCS)

• Based On: Nicolso et al’ 99

• We proved that this is 2-approximation algorithm to the problem !

• Conclusion : we proposed the best known approximation

Iteratively find maximum size i-colorable sets on interval graph , for each i from 1 to chromatic number of G

Weight Definition

• We define the weight of range be the number of redundant TCAM entries required to encode the range.

• Calculated as :(Number entries for encoding the range

-1) * number of rules where the range occurs

• We thrive to get number entries for one range as small as possible.

Maximum Weight Independent Sets (MWIS)

• Same as MSIS , expect we iteratively find a maximum weighted independent sets

• We find the maximal size independent set by maximal weighted ranges set , using Weighted Interval Scheduling algorithm. (Bar Noy et al’ 98)

Maximum Weight Colorable Sets (MWCS)

• Same as MSCS ,instead we find maximum weight k-colorable sets.

• Finding maximum weight k-colorable set on an interval graph can be done in polynomial time.

• Done by transforming the graph into acyclic network graph and solving minimum cost maximal flow problem (Augmented path negative cycles etc’)

Example

Interval Endpoints

Weight

r1 [0,5] 10

r2 [6,8] 10

r3 [9,11] 1

r4 [12,14] 1

r5 [15,17] 1

r6 [6,11] 5

r7 [12,17] 10

MSIS , MSCS

S1 = {(r1,r2,r3,r4,r5)}

S2= {(r6,r7)}

Note the similarities of two methods

Bits Allocation Stage

• Given the layering from the first phase and the bits allocations from previous iterations ,

• The algorithm finds the layer that will cover the maximum additional weight if additional bits allocated to it

Best Fit per bits

Maximal Cover for Next Bit

For all levels we calculate the following value (weight range)

[1, ]2

1max ( [ , ])

assign i k

assign ik b

w L i jk

Number already assigned bits , b is the bits number

j’th range at level i

We chose level with maximal value

Bits Auction Algorithm

Sort ranges in all levels decreased by number of TCAM entries (the weight)

While free bits

1. chose maximal cover for next bits (k from 1 to b) level (previous slide)

2. encode the next level ranges

3. decrease free bits

Each iteration is auction where layers compete for the next bit

Bits per layer l

1 2 3 4

Num covered ranges

1 3 7 15 log2(n-1)

Encode 0 element (not in layer)

Encoding Stage

• Given the layers , with number of bits from bits allocation phase. We associate with each range its LIC code

Encoding Stage

For all layers For each range r let j be index of range r inside

layer If j smaller than number of encoded ranges

from bit allocation phase code(r) = *****bin(j)***** search-key(r) = 00000bin(j)00000 The *s and 0s are the bits of other layers

where r doesn’t belong

Else r is encoded by some independent scheme like Gray code or binary

Example

• Range r from level 5 , whose index is 2nd largest , if bit auction gave to level 5 3 bits, the encoding is as follows

Code ********************010************************

Search Key 00000000000000000100000000000000000000

Level 5 bits

Experiment Database

• DIPRE (real life) database

• 120 files

• 223,000 Rules

• 273 different Ranges

• Analysis of rules entries (number appearances , encoding size)

Algorithm Software ImplementationEngine

• Implemented in Java , Eclipse

• 50 classes

• 3 algorithm components :LIC , Bitmap , Prefix Expansion

• Exporter for data calculations

• 2 encoding schemes Gray code and Binary

Our Measurements

• We calculated the redundancy factor for all range rules (a.k.a. rules that have more than one range field)

• Redundancy is the average number of redundant TCAM entries required to encode range rules of database.

• Performed with 2 encoding schemes Gray code and Binary

Redundancy Factors

0 5 10 15 20 25 30 35 40

Free bits

Prefix Expansion DRES

Regions Partition MWCS

Redundancy Factors

0123456789

20 30 40 50 60 70 80 90Free bits

MWIS MWCS MSIS MSCS

Number of Bits Required to Encode Whole Database(Encoding Problem)

Algorithm Bits

DRES 235

Region Partition 235

LIC MSIS 85

LIC MSCS 86

Best LIC MSIS 85

Results 0-36 Bits

Algorithm Expansion Redundancy

Prefix Expansion 2.68 6.63

Region Partition 1.64 2.51

DIPRE 1.2 -

SRGE 1.03 1.2

DRES 1.025 0.09

MSCS 1.0088 0.034

MSIS(SRGE) 1.0061 0.024

Weighted Phenomenon

• The un-weighted layering does better than weighted layering after 36 bits in our database

• Weighted layering is better when number of bits is un-bound. MWCS - best results till 36 bits

• MSIS - best results for 70-90 bits

Conclusions

• Our scheme produces best results for any number of bits

• For any number of bits better than DRES in about 50%-70%

• LIC- SRGE (Gray code scheme) achieves superior results , MSIS layering outperforms all other layering algorithms.

Innovation & Novelty

• Theoretical−MLIC & BMLIC is NP-Hard−2- approximation algorithm to MLIC

& BMLIC−Using Interval Graphs in IP Lookup

domain

• Practical −Best results so far−Efficient and fast layering

algorithms

Discussion

• LIC is not optimal for following problem instance, where the ranges form a balanced binary tree (each level i from 1 to log (n) has ranges )2i

log(n)

• LIC creates layers that are mapped to tree levels

• The LIC encoding size will be sum of logs of tree levels

log(8+1)+log(4+1)+log(2+1)+log(1+1) 10 bits

• Optimal will use balanced binary tree properties , such that all nodes can be encoded by function of log (next slide)

• Optimal encodes using 2*4 8 bits

2log ( )n

Optimizations & Future Work

2log ( )n

log( )n• LIC is off by

factor from optimal in this

instance

• Each range will be encoded by following bit string of the size 2*log(n)

• The recursive formula is concatenation of previous level bits with range state (odd/even), filled by *s

• Example for range from level 2

01 01****

Previous level bits first level

Current range data:01 - even10 – odd00 – not included

Don’t cares

• LIC’s encode size is , while opt takeslog( )n

THANK YOU

Thesis presentation

Technology

Transcript of Thesis presentation

THESIS Presentation

Presentation Thesis