Thesis presentation

47
Layered Interval Codes for TCAM-based Classification Anat Bremler Barr, Danny Hendler, David Hay , Boris Farber 06/26/22

TAGS:

description

 

Transcript of Thesis presentation

Page 1: Thesis presentation

Layered Interval Codes for

TCAM-based ClassificationAnat Bremler Barr, Danny Hendler,

David Hay , Boris Farber

04/08/23

Page 2: Thesis presentation

Overview

• Introduction and Problem Statement

• Related Work

• Our Solution –Multi Layered Interval CodeLayering Bit AllocationEncodingSoftware Simulation

• Results and Comparative Analysis

• Conclusions

Page 3: Thesis presentation

Packet Classification Scheme

Page 4: Thesis presentation

Routing Table

Src IP Protocol

Src port

Dest IP Dest port

Action

123.25.34.43

TCP 80 255.2.3.4 80 Allow

13.24.35.46

TCP >1023 255.2.127.4 5556 Deny

16.32.223.14

UDP 50-20 255.2.3.4 50-70 Allow

22.2.3.4 TCP 21 255.2.3.4 21 Limit

255.2.3.4 ICMP 12-809 255.2.3.4 17-190

Log

Extract Packet Header Data

Rules Table

Page 5: Thesis presentation

Ternary Content-Addressable Memory (TCAM) Technology

• Associative memory: parallel comparisons against all entries

• Fixed-width entries 144 bits , 36 spare bits !

• Ternary digits: 0 / 1 / X (don’t care)

• Only first match is returned

0011101101010XX00X01001111XXXX

11X00X00001110X0X101000110XXXX10XX010100X0XX0100011010X01000

001110XXXXXXXXXXXXXXXXXXXXXXX

.

.

.

1110XX010X01X0010101010X0XXXXX

TCAM

1

0011101101010000010100111110110

Search key

Page 6: Thesis presentation

Problem: TCAM’s Are Not Well Suited for Range Representation

001110110110110000000

Match-type rule field value matching key-fieldexact

prefix

range

00111011011011000000001110110110110000000001*****************

>1024 2012

TCAM range rule representation takes many entries

Research goal : develop efficient algorithm for constructing efficient representation of range rules (source/dest fields )

Page 7: Thesis presentation

Encoding Schemes – How the Rules Encoded

• Database Dependent Scheme encoding scheme where the rules are coded differently as a function of the database they occur.

• Database Independent Scheme encoding scheme where the rules are coded without any consideration of their occurrence in database.

Page 8: Thesis presentation

Prefix Expansion (Database Independent)

Representing [1,6]

TCAM entries:

001 ,01 ,*10 ,*110

Prefix expansion is inefficient • A range over W-bits may expand to 2W-2 entries

• For 2 range-fields, may expand to (2W-2)2

• Expansion factor of up to 6 on real-world databases !!!

010

0 1

000 001 011 100 101 110 111

[1,6]

(Srinisavan 98)

Page 9: Thesis presentation

SRGE (Database Independent)

• Small ranges occur frequently, can be encoded efficiently by using Gray code.

• Gray code Is a binary numeral system ;where two successive values differ in only one digit.

• Gray code encoding scheme decreases the overall database expansion (number of entries)

(Bremler/Hendler 06)

Dec Gray Binary

0 000 000

1 001 001

2 011 010

3 010 011

4 110 100

5 111 101

6 101 110

7 100 111

Page 10: Thesis presentation

Basic Dependent Encoding

• Use TCAM string un-used bits (extra bits)

• Divide extra bits to ranges− A single extra bit is assigned to each selected range− If range r assigned to bit i then i extra bit is set to

one all the rest to don’t care

• Cons The solution doesn’t scale to more then number of bits !

• 36 bits versus 300+ ranges

(Liu 02)

R1 1****

R2 *1***

R3 **1**

R4 ***1*

R5 ****1

R1 10000

R2 01000

R3 00100

R4 00010

R5 00001

Search EncodeR1

R2

R3

R4

R5

Page 11: Thesis presentation

Region Splitting (Database Dependent)

• Ranges in different regions are disjoint

• The region number is encoded by log (r) bits, where r is the number of regions. Each region-range is encoded by a separate bit.

• Encode rule and range by

• Total required bits log (r) + max ranges at a region

• Rule in multiple ranges expansion

• Cons Looks on geometry not number of entries the range is encoded

(Liu 02)

Region# #range at region

Reg1 Reg2 Reg3

R1

R2

R3

R4

R5

R1 011xxx, 111xxx

R2 01x1xx, 11x1xx

R3 11xx1x , 101xxx

R4 01xx1x

R5 01xxx1, 11xx1x

01 11 10

Page 12: Thesis presentation

Database-Dependent Encoding (DRES)

• Key idea: allocate an extra bit to commonly occurring ranges (all the rest by database independent scheme)

• Range set the assigned extra bit to 1 ,Set all other extra bits to X

• Cons limited by the number of ranges (bits) can be encoded , un-efficient bits usage

Example

Source-port ≥ 1024

0011101101010XX00X01001111XXXX

11X00X00001110X0X101000110XXXX10XX010100X0XX0100011010X01000

001110XXXXXXXXXXXXXXXXXXXXXXX

.

.

.

1110XX010X01X0010101010X0XXXXX

1

2

3

4

TCAM

XXXXX

XXXXXXXXXX

XXXXX

.

.

.

XXXXX

11010010101XXXXXXXXXXXXXXXXXX 1

(Che 07)

Page 13: Thesis presentation

Layered Interval Proposal (LIC)

c

d

a b

•N disjoint ranges can be encoded using log(n) bits

•One to One mapping between range and index

How can we utilize the key observation to achieve minimal database expansion ?

Page 14: Thesis presentation

LIC Overview

• Divide to levels (regions)− Layer – represents a set of disjoint ranges.− ranges at the same layer can be encode by bits

− Encode range and search key by

L

iil

1

)log(

a

b c

f

1****

*01**

il

*10**

***10***01 d e***11

#range at L1 #range at L2 #range at L3

L1

L2

L3

The technique was mentioned at [Lunteren and Engbersen2003] but no algorithm given. We can extend technique further to reduce space complexity

Page 15: Thesis presentation

General Problem

• Given set of ranges (intervals) .

• How can we divide them to non-intersecting sets of ranges efficient ?

• Such that encode the sets such that the resulted size of encoding will be minimal.

Page 16: Thesis presentation

Interval Graph Definitions

• Interval Graph - is the intersection graph of a set of intervals on the real line. − It has one vertex for each interval in the

set, and − It has an edge between every pair of

vertices corresponding to intervals that intersect.

• Interval Graph Coloring – assignment of colors to the nodes of graph such that no two adjacent vertices have the same color

Page 17: Thesis presentation

Interval Graphs

Page 18: Thesis presentation

Minimum Layered Interval Code(MLIC)

• Find in interval graph coloring of vertices that minimizes the encoding size of resulted colored sets.

• Interval graph property colors neighbors intersection

| |

21

log ( 1)C

ii

n

min

n is the number of vertices colored by i color

Page 19: Thesis presentation

Budgeted Minimum Layered Interval Code (BMLIC)

• Let G be a Weighted Interval Graph.

• Find subset of vertices (V’) to be colored such that that the coloring weight is maximal and the encoding size is bounded from above by BitsNumber.

| |

21

log ( 1)C

ii

n

n is the number of vertices colored by i color

BitsNumber

Maximize such that'

( )v V

w v

Page 20: Thesis presentation

Hardness of Problem

• Using the similar ideas from chromatic sum problem on interval graphs we proved that MLIC and BMLIC problem are NP-Hard.

• We based our proof on reduction from Circular Arc Coloring , which s known to be NP-Hard.

Page 21: Thesis presentation

The Algorithm Stages

1. Layering Stage, the intervals are partitioned into a set of disjoint interval sets.

2. Bits Allocation Stage in which each layer is allocated with a certain number of bits

3. Encoding Stage in which we obtain a search key for each entry in domain

Page 22: Thesis presentation

Layering Stage

• The most important and challenging component in the algorithm.

• The Layering divides ranges into non-intersecting sets:

Ranges

L5

L1

L2

L3

L4

Page 23: Thesis presentation

Maximum Size Independent Sets(MSIS)

• We find the maximal size independent set by maximal size un- weighted ranges set. (Bar-Noy et al’ 98 )

Iteratively find maximum size independent sets on interval graph

Page 24: Thesis presentation

Maximal Size Colorable Sets(MSCS)

• Based On: Nicolso et al’ 99

• We proved that this is 2-approximation algorithm to the problem !

• Conclusion : we proposed the best known approximation

Iteratively find maximum size i-colorable sets on interval graph , for each i from 1 to chromatic number of G

Page 25: Thesis presentation

Weight Definition

• We define the weight of range be the number of redundant TCAM entries required to encode the range.

• Calculated as :(Number entries for encoding the range

-1) * number of rules where the range occurs

• We thrive to get number entries for one range as small as possible.

Page 26: Thesis presentation

Maximum Weight Independent Sets (MWIS)

• Same as MSIS , expect we iteratively find a maximum weighted independent sets

• We find the maximal size independent set by maximal weighted ranges set , using Weighted Interval Scheduling algorithm. (Bar Noy et al’ 98)

Page 27: Thesis presentation

Maximum Weight Colorable Sets (MWCS)

• Same as MSCS ,instead we find maximum weight k-colorable sets.

• Finding maximum weight k-colorable set on an interval graph can be done in polynomial time.

• Done by transforming the graph into acyclic network graph and solving minimum cost maximal flow problem (Augmented path negative cycles etc’)

Page 28: Thesis presentation

Example

Interval Endpoints

Weight

r1 [0,5] 10

r2 [6,8] 10

r3 [9,11] 1

r4 [12,14] 1

r5 [15,17] 1

r6 [6,11] 5

r7 [12,17] 10

MSIS , MSCS

S1 = {(r1,r2,r3,r4,r5)}

S2= {(r6,r7)}

Note the similarities of two methods

Page 29: Thesis presentation

Bits Allocation Stage

• Given the layering from the first phase and the bits allocations from previous iterations ,

• The algorithm finds the layer that will cover the maximum additional weight if additional bits allocated to it

Best Fit per bits

Page 30: Thesis presentation

Maximal Cover for Next Bit

For all levels we calculate the following value (weight range)

[ ]

[ ]

2 1

[1, ]2

1max ( [ , ])

assign i k

assign ik b

j

w L i jk

Number already assigned bits , b is the bits number

j’th range at level i

We chose level with maximal value

Page 31: Thesis presentation

Bits Auction Algorithm

Sort ranges in all levels decreased by number of TCAM entries (the weight)

While free bits

1. chose maximal cover for next bits (k from 1 to b) level (previous slide)

2. encode the next level ranges

3. decrease free bits

Each iteration is auction where layers compete for the next bit

Bits per layer l

1 2 3 4

Num covered ranges

1 3 7 15 log2(n-1)

Encode 0 element (not in layer)

Page 32: Thesis presentation

Encoding Stage

• Given the layers , with number of bits from bits allocation phase. We associate with each range its LIC code

Page 33: Thesis presentation

Encoding Stage

For all layers For each range r let j be index of range r inside

layer If j smaller than number of encoded ranges

from bit allocation phase code(r) = *****bin(j)***** search-key(r) = 00000bin(j)00000 The *s and 0s are the bits of other layers

where r doesn’t belong

Else r is encoded by some independent scheme like Gray code or binary

Page 34: Thesis presentation

Example

• Range r from level 5 , whose index is 2nd largest , if bit auction gave to level 5 3 bits, the encoding is as follows

Code ********************010************************

Search Key 00000000000000000100000000000000000000

Level 5 bits

Page 35: Thesis presentation

Experiment Database

• DIPRE (real life) database

• 120 files

• 223,000 Rules

• 273 different Ranges

• Analysis of rules entries (number appearances , encoding size)

Page 36: Thesis presentation

Algorithm Software ImplementationEngine

• Implemented in Java , Eclipse

• 50 classes

• 3 algorithm components :LIC , Bitmap , Prefix Expansion

• Exporter for data calculations

• 2 encoding schemes Gray code and Binary

Page 37: Thesis presentation

Our Measurements

• We calculated the redundancy factor for all range rules (a.k.a. rules that have more than one range field)

• Redundancy is the average number of redundant TCAM entries required to encode range rules of database.

• Performed with 2 encoding schemes Gray code and Binary

Page 38: Thesis presentation

Redundancy Factors

1

10

100

1000

0 5 10 15 20 25 30 35 40

Free bits

Red

un

dan

cy

(%)

Prefix Expansion DRES

Regions Partition MWCS

MSIS

Page 39: Thesis presentation

Redundancy Factors

0123456789

20 30 40 50 60 70 80 90Free bits

Red

un

dan

cy (

%)

MWIS MWCS MSIS MSCS

Page 40: Thesis presentation

Number of Bits Required to Encode Whole Database(Encoding Problem)

Algorithm Bits

DRES 235

Region Partition 235

LIC MSIS 85

LIC MSCS 86

Best LIC MSIS 85

Page 41: Thesis presentation

Results 0-36 Bits

Algorithm Expansion Redundancy

Prefix Expansion 2.68 6.63

Region Partition 1.64 2.51

DIPRE 1.2 -

SRGE 1.03 1.2

DRES 1.025 0.09

MSCS 1.0088 0.034

MSIS(SRGE) 1.0061 0.024

Page 42: Thesis presentation

Weighted Phenomenon

• The un-weighted layering does better than weighted layering after 36 bits in our database

• Weighted layering is better when number of bits is un-bound. MWCS - best results till 36 bits

• MSIS - best results for 70-90 bits

Page 43: Thesis presentation

Conclusions

• Our scheme produces best results for any number of bits

• For any number of bits better than DRES in about 50%-70%

• LIC- SRGE (Gray code scheme) achieves superior results , MSIS layering outperforms all other layering algorithms.

Page 44: Thesis presentation

Innovation & Novelty

• Theoretical−MLIC & BMLIC is NP-Hard−2- approximation algorithm to MLIC

& BMLIC−Using Interval Graphs in IP Lookup

domain

• Practical −Best results so far−Efficient and fast layering

algorithms

Page 45: Thesis presentation

Discussion

• LIC is not optimal for following problem instance, where the ranges form a balanced binary tree (each level i from 1 to log (n) has ranges )2i

log(n)

• LIC creates layers that are mapped to tree levels

• The LIC encoding size will be sum of logs of tree levels

sizes

log(8+1)+log(4+1)+log(2+1)+log(1+1) 10 bits

• Optimal will use balanced binary tree properties , such that all nodes can be encoded by function of log (next slide)

• Optimal encodes using 2*4 8 bits

2log ( )n

Page 46: Thesis presentation

Optimizations & Future Work

2log ( )n

log( )n• LIC is off by

factor from optimal in this

instance

• Each range will be encoded by following bit string of the size 2*log(n)

• The recursive formula is concatenation of previous level bits with range state (odd/even), filled by *s

• Example for range from level 2

01 01****

Previous level bits first level

Current range data:01 - even10 – odd00 – not included

Don’t cares

• LIC’s encode size is , while opt takeslog( )n

Page 47: Thesis presentation

THANK YOU