22 - Boundary Cutting for Packet Classification

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 22, NO. 2, APRIL 2014 443

Boundary Cutting for Packet ClassificationHyesook Lim, Senior Member, IEEE, Nara Lee, Geumdan Jin, Jungwon Lee, Youngju Choi, Student Member, IEEE,

and Changhoon Yim, Member, IEEE

Abstract—Decision-tree-based packet classification algorithmssuch as HiCuts, HyperCuts, and EffiCuts show excellent searchperformance by exploiting the geometrical representation of rulesin a classifier and searching for a geometric subspace to whicheach input packet belongs. However, decision tree algorithms in-volve complicated heuristics for determining the field and numberof cuts. Moreover, fixed interval-based cutting not relating to theactual space that each rule covers is ineffective and results in ahuge storage requirement. A new efficient packet classificationalgorithm using boundary cutting is proposed in this paper. Theproposed algorithm finds out the space that each rule covers andperforms the cutting according to the space boundary. Hence,the cutting in the proposed algorithm is deterministic rather thaninvolving the complicated heuristics, and it is more effective inproviding improved search performance and more efficient inmemory requirement. For rule sets with 1000–100 000 rules, simu-lation results show that the proposed boundary cutting algorithmprovides a packet classification through 10–23 on-chip memoryaccesses and 1–4 off-chip memory accesses in average.

Index Terms—Binary search, boundary cutting, decision treealgorithms, HiCuts, packet classification.

I. INTRODUCTION

P ACKET classification is an essential function in Internetrouters that provides value-added services such as net-

work security and quality of service (QoS) [1]. A packet classi-fier should comparemultiple header fields of the received packetagainst a set of predefined rules and return the identity of thehighest-priority rule that matches the packet header.The use of application-specific integrated circuits (ASICs)

with off-chip ternary content addressable memories (TCAMs)has been the best solution for wire-speed packet for-warding [2], [3]. However, TCAMs have some limitations.TCAMs consume 150 more power per bit than static

Manuscript received July 10, 2010; revised May 31, 2011; October 21, 2011;March 17, 2012; and January 03, 2013; accepted February 26, 2013; approvedby IEEE/ACM TRANSACTIONS ON NETWORKING Editor M. Kodialam. Date ofpublication April 11, 2013; date of current version April 14, 2014. This workwas supported by the Mid-Career Researcher Program (2012-005945) under theNational Research Foundation of Korea (NRF) grants funded by the Ministry ofEducation, Science and Technology and by the ITRC support program (NIPA-2013-H0301-13-1002) under the NIPA funded by the Ministry of KnowledgeEconomy (MKE), Korea. The work of C. Yim was supported by the Basic Sci-ence Research Program (2011-0009426) under the NRF funded by the Ministryof Education, Science and Technology.H. Lim, N. Lee, G. Jin, J. Lee, andY. Choi are with EwhaWomansUniversity,

Seoul 120-750, Korea (e-mail: [email protected]).N. Lee, G. Jin, and Y. Choi were with Ewha Womans University, Seoul 120-

750, Korea.C. Yim is with Konkuk University, Seoul 143-701, Korea.Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TNET.2013.2254124

random access memories (SRAMs). TCAMs consume around30%–40% of total line card power [4], [5]. When line cardsare stacked together, TCAMs impose a high cost on coolingsystems. TCAMs also cost about 30 more per bit of storagethan double-data-rate SRAMs. Moreover, for an -bit portrange field, it may require TCAM entries, making theexploration of algorithmic alternatives necessary.Many algorithms and architectures have been proposed over

the years in an effort to identify an effective packet classifi-cation solution [6]–[35]. Use of a high bandwidth and a smallon-chipmemorywhile the rule database for packet classificationresides in the slower and higher capacity off-chip memory byproper partitioning is desirable [36], [37]. Performance metricsfor packet classification algorithms primarily include the pro-cessing speed, as packet classification should be carried out inwire-speed for every incoming packet. Processing speed is eval-uated using the number of off-chip memory accesses required todetermine the class of a packet because it is the slowest opera-tion in packet classification. The amount of memory required tostore the packet classification table should be also considered.Most traditional applications require the highest priority

matching. However, the multimatch classification concept isbecoming an important research item because of the increasingneed for network security, such as network intrusion detectionsystems (NIDS) and worm detection, or in new application pro-grams such as load balancing and packet-level accounting [7].In NIDS, a packet may match multiple rule headers, in whichcase the related rule options for all of the matching rule headersneed to be identified to enable later verification. In accounting,multiple counters may need to be updated for a given packet,making multimatch classification necessary for the identifica-tion of the relevant counters for each packet [4].Our study analyzed various decision-tree-based packet clas-

sification algorithms. If a decision tree is properly partitionedso that the internal tree nodes are stored in an on-chip memoryand a large rule database is stored in an off-chip memory, thedecision tree algorithm can provide very high-speed search per-formance. Moreover, decision tree algorithms naturally enableboth the highest-priority match and the multimatch packet clas-sification. Earlier decision tree algorithms such as HiCuts [8]and HyperCuts [9] select the field and number of cuts basedon a locally optimized decision, which compromises the searchspeed and the memory requirement. This process requires a fairamount of preprocessing, which involves complicated heuristicsrelated to each given rule set. The computation required for thepreprocessing consumes much memory and construction time,making it difficult for those algorithms to be extended to largerule sets because of memory problems in building the decisiontrees. Moreover, the cutting is based on a fixed interval, which

1063-6692 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

444 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 22, NO. 2, APRIL 2014

does not consider the actual space that each rule covers; henceit is ineffective.In this paper, we propose a new efficient packet classifi-

cation algorithm based on boundary cutting. Cutting in theproposed algorithm is based on the disjoint space coveredby each rule. Hence, the packet classification table using theproposed algorithm is deterministically built and does notrequire the complicated heuristics used by earlier decision treealgorithms. The proposed algorithm has two main advantages.First, the boundary cutting of the proposed algorithm is moreeffective than that of earlier algorithms since it is based on ruleboundaries rather than fixed intervals. Hence, the amount ofrequired memory is significantly reduced. Second, although BCloses the indexing ability at internal nodes, the binary search atinternal nodes provides good search performance.The organization of the paper is as follows. Section II

provides an overview of the earlier decision tree algorithms.Section III describes the proposed BC algorithm. Section IVdescribes the refined structure of the BC algorithm, termedselective BC. Section V shows the data structure of eachdecision tree algorithm. Section VI shows the performanceevaluation results using three different types of rule sets with1000–100 000 rules each acquired from a publicly availabledatabase. Section VII offers a brief conclusion.

II. RELATED WORKS

Packet classification can be formally defined as follows [10]:Packet matches rule , for , if all the headerfields , for , of the packet match the corre-sponding fields in , where is the number of rules andis the number of fields. If a packet matches multiple rules, therule with the highest priority is returned for a single-best-matchpacket classification problem and the list of matching rulesis returned for the multimatch packet classification problem.Rule sets are generally composed of five fields. The first twofields are related to source and destination prefixes and requireprefix match operation. The next two fields are related to sourceand destination port ranges (or numbers), which require rangematch. The last field is related to protocol type and requires anexact match.

A. HiCuts

Each rule defines a five-dimensional hypercube in a five-di-mensional space, and each packet header defines a point in thespace. The HiCuts algorithm [8] recursively cuts the space intosubspaces using one dimension per step. Each subspace ends upwith fewer overlapped rule hypercubes that allow for a linearsearch. In the construction of a decision tree of the HiCutsalgorithm, a large number of cuts consumes more storage, anda small number of cuts causes slower search performance. It ischallenging to balance the storage requirement and the searchspeed. The HiCuts algorithm uses two parameters, a spacefactor (spfac) and a threshold (binth), in tuning the heuristics,which trade off the depth of the decision tree against thememory amount. The field in which a cut may be executed ischosen to minimize the maximum number of rules included inany subspace. The spfac, a function of space measure, is used

to determine the number of cuts for the chosen field. The binthis a predetermined number of rules included in the leaf nodesof the decision tree for a linear search.For simplicity, considering a two-dimensional (2-D) plane

composed of the first two prefix fields, a rule can be representedby an area. Fig. 1 shows the areas that each rule covers in aprefix plane for a given 2-D example rule set. As shown, arule with ) lengths in and fields covers the area of

, where is the maximum length of the field( is 32 in IPv4).An example decision tree of the HiCuts algorithm for the

given rule set is illustrated in Fig. 2. In this example, the spfacand binth are set as 2 and 3, respectively. The number of cutsto be made in each internal node is determined by consideringspfac and the number of rules included in the subspace coveredby the internal node. In Fig. 2, each internal node (oval node)has a cut field and the number of cuts, and each leaf node (rect-angular node) has rules included in the subspace correspondingto the leaf. For the first cut, eight cuts in the field are se-lected. Cutting should be repeated for nodes having rules morethan the binth. The subspaces corresponding to the cuts made bythe decision tree shown in Fig. 2 are indicated by dotted linesin Fig. 1. For example, at the root node, the field ( -axis inFig. 1) is split into eight partitions. The first partition is splitfurther by two cuts in the field ( -axis in Fig. 1), and so on.Rules covering (or belonging to) a partition are stored into thecorresponding leaf node.Starting from the root node, the search procedure examines a

number of bits of a header field that is defined at each internalnode until a leaf node is reached. For example, for a given input(000110, 111100, 19, 23, TCP), the first three bits of the fieldare examined since the root node has eight cuts. Since they are000, the search follows the first edge of the decision tree. Sincethe internal node of the second level has two cuts in thefield, the first bit of the field is examined and determinedto be one; hence, the search follows the second edge. Rules ,, and are stored in the corresponding leaf node, and the

given input is compared first with and finds a match. Forthe multimatch classification problem, other rules are comparedand the list of matching rules is identified. For the best matchclassification, other rules in the leaf node are not necessarilycompared if rules are stored in decreasing order of priority anda match is found. Note that when the port range fields are usedfor cutting, determining the child pointer to follow does not relyon simple indexing since the interval specified by a port rangeis not always a power of two.

B. HyperCuts

While the HiCuts algorithm only considers one field at a timefor selecting cut dimension, the HyperCuts algorithm [9] con-siders multiple fields at a time. For the same example set, thedecision tree of the HyperCuts algorithm is shown in Fig. 3. Thespfac and binth are set as 1.5 and 3, respectively. As shown atthe root node, the and fields are used simultaneously forcutting. Note that each edge of the root node represents the bitcombination of 00, 10, 01, and 11, respectively, which is one bitin the first field followed by one bit in the second field. For thesame input, the search follows the third edge of the root node

LIM et al.: BOUNDARY CUTTING FOR PACKET CLASSIFICATION 445

Fig. 1. Rules in a prefix plane.

Fig. 2. Decision tree of the HiCuts algorithm.

Fig. 3. Decision tree of the HyperCuts algorithm.

and compares it to . The search then follows the first edgeand compares to and at a leaf. Compared to the HiCutsalgorithm, the decision tree of the HyperCuts algorithm gener-ally has a smaller depth (not shown in the figures), as multiplefields are used at the same time in a single internal node.

Since the HyperCuts algorithm generally incurs considerablememory overhead compared to the HiCuts algorithm, severaltechniques have been proposed to refine the algorithm, suchas space compaction and pushing common rule subsets up-ward [9]. For example, rules and at the second level


of the HyperCuts tree in Fig. 3 were pushed upward to theparent internal node because those rules are included in everychild of the internal node. These refinements effectively reducethe number of replicated rules, but require complicated pro-cessing. Moreover, as will be shown through the simulation,if upward pushing occurs at high levels, search performanceis significantly degraded since rules stored in the high levelsof a decision tree are compared in the earlier search phasethat involves many inputs. For example, if we apply upwardpushing to the HiCuts tree in Fig. 2, rules and should belocated at the root node, and those rules should be compared toevery input. This results in search performance degradation.There is a concern in implementing HiCuts or HyperCuts al-

gorithm. If the predetermined binth is improperly small, the al-gorithms will try to split a space even though there is no ruleboundary inside. If there is no rule boundary inside the spacecovered by a node, the number of rules included in the nodecannot be reduced further by more cutting. It is suggested touse an optimization called rule overlapping. If every rule in-cluded in a leaf covers the entire space corresponding to theleaf, which means that if there is no rule boundary inside thespace covered by the leaf, rules except the highest-priority ruleare removed from the leaf by the rule overlapping optimization.By doing this, the number of rules included in the leaf becomesless than the binth. However, no rule should be ignored for mul-timatch classification. Hence, HiCuts andHyperCuts algorithmsrequire a stopping condition other than the binth to be applied tothe multimatch classification problem. As will be shown in thesimulation section, we will find a proper value of binth avoidingthis problem instead of finding out other stopping condition.Recently, a new decision tree algorithm, EffiCuts, is pro-

posed [11]. In solving the issues of previous decision treealgorithms, EffiCuts employs several new ideas such as treeseparation and equi-dense cut. The tree separation is to separatesmall rules from large rules and makes multiple decision treesso that fine cutting for small rules does not cause the replicationof large rules. Here, a small (large) rule means a rule coveringa small (large) subspace. While HiCuts and HyperCuts employequal-sized cuts as shown in Figs. 2 and 3, the equi-dense cutis to employ unequal-sized cuts based on rule density in eachsubspace in order to distribute rules evenly among the children.

III. PROPOSED ALGORITHM

HiCuts and HyperCuts algorithms perform cutting based ona fixed interval, and hence the partitioning is ineffective in re-ducing the number of rules belonging to a subspace. More-over, when the number of cuts in a field is being determined,complicated preprocessing should be made to balance the re-quired memory size and the search performance. In this study,we propose a deterministic cutting algorithm based on each ruleboundary, termed as boundary cutting (BC) algorithm.

A. Building a BC Decision Tree

When the cutting of a prefix plane according to rule bound-aries is performed, both the starting and the ending boundariesof each rule can be used for cutting, but cutting by either is suf-ficient since decision tree algorithms generally search for a sub-space in which an input packet belongs and the headers of the

Fig. 4. Decision tree of the boundary cutting algorithm.

given input are compared for entire fields to the rules belongingto the subspace (represented by a leaf node of the decision tree).For example, regarding the rules shown in Fig. 1, by padding

zeros to reach the maximum length [20] (assuming six in thiscase), the start boundaries in the field of the rules can bederived as 000000, 000100, 001000, 010000, 011100, 100000,and 110000. Hence, the entire line of the fieldcan be split into seven intervals—[0, 3], [4, 7], [8, 15], [16, 27],[28, 31], [32, 47], and [48, 63]—versus the eight regular inter-vals in the HiCuts algorithm. The same processing can be re-peated for the field. The starting boundaries in the andfields are noted in Fig. 1. The decision tree of the proposed

algorithm that makes cuts according to the starting rule bound-aries is shown in Fig. 4. Here, binth is also set as three. At theroot node, the field is used to split the entire line to seven dis-jointed intervals; hence, the root node has seven binary searchentries. At level 2, the field is used to further split each sub-space that has more rules than binth.Comparison of the proposed decision tree to the HiCuts deci-

sion tree shows that the BC algorithm does not cause unneces-sary cutting. Clearly, no two leaves have the same set of rules.Since cutting does not occur in the absence of a boundary, un-necessary cutting is avoided. Hence, rule replication is reducedin the BC algorithm. Additionally, the depth of the decision treeis always less than or equal to six including a leaf since eachfield is used once at the most.

B. Searching in the BC Algorithm

The cuts at each internal node of the BC decision tree do nothave fixed intervals. Hence, at each internal node of the tree, abinary search is required to determine the proper edge to followfor a given input. However, it will be shown in Section VI thatthe BC algorithm provides better search performance than the


HiCuts algorithm despite of the memory access for the binarysearch at the internal nodes.During the binary search, the pointer to the child node is re-

membered when the input matches the entry value or when theinput is larger than the entry value [20]. Consider an input packetwith headers (000110, 111100, 19, 23, TCP), for example; sinceis used at the root node, a binary search using the header

of the given input is performed. The header 000110 is com-pared to the middle entry of the root node, which is 010000.Since the input is smaller, the search proceeds to the smaller halfand compares the input to the entry 000100. Since the input islarger, the child pointer (the second edge) is remembered, andthe search proceeds to a larger half. The input is compared to001000, and it is found to be smaller, but there is no entry toproceed in a smaller half. Hence, the search follows the remem-bered pointer, the second edge. At the second level, by per-forming a binary search, the last edge is selected for theheader 111100. The linear search, which is the same as that in theHiCuts or HyperCuts algorithm, is performed for rules storedin the leaf node. The binary search at each internal node takes

time finding a suitable edge to follow for entries.

IV. SELECTIVE BC ALGORITHM

This section proposes a refined structure for the BC algo-rithm. The decision tree algorithms including the BC algorithmuse binth to determine whether a subspace should become aninternal node or a leaf node. In other words, if the number ofrules included in a subspace is more than binth, the subspacebecomes an internal node; otherwise, it becomes a leaf node. Inthe BC algorithm, if a subspace becomes an internal node, everystarting boundary of the rules included in the subspace is usedfor cutting as shown in Fig. 4. We propose a refined structureusing the binth to select or unselect the boundary of a rule at aninternal node. In other words, the refined structure activates arule boundary only when the number of rules included in a par-tition exceeds the binth.For example, there are three internal nodes in the second

level of the decision tree shown in Fig. 4. For the first internalnode, the middle boundary, which is 100000, separates thearea covered by the internal node into two partitions, whichare [0, 31] and [32, 63] in the interval. Since the firstpartition has two rules, and , there is no need to usethe other boundary 011000 included in [0, 31], and since thesecond partition has three rules (which does not exceed binth),, , and , there is no need to use the other boundary

111000 included in [32, 63]. Hence, the internal node results inonly two partitions. Similarly, in the second internal node, themiddle boundary (110000) separates the area covered by theinternal node into two partitions, each of which has three rules;hence, the other boundaries do not have to be used. The samething happens in the last internal node.The refined structure of the BC algorithm, called selective

BC (SBC), is shown in Fig. 5. At the root node, the middleboundary of each partition is recursively considered to deter-mine whether it should be activated or inactivated, but each par-tition has more rules than binth; hence, all boundaries are used.Upon comparison of the SBC and the BC processes, the number

Fig. 5. Decision tree of the selective boundary cutting algorithm.

of leaves is reduced from 16 to 10; consequently, the number ofstored rules is reduced from 34 to 27.

V. DATA STRUCTURE

There are two different ways of storing rules in decision treealgorithms. The first way separates a rule table from a deci-sion tree. In this case, each rule is stored only once in the ruletable, while each leaf node of a decision tree has pointers to therule table for the rules included in the leaf. The number of rulepointers that each leaf must hold equals the binth. In searchingfor the best matching rule for a given packet or the list of allmatching rules, after a leaf node in the decision tree is reachedand the number of rules included in the leaf is identified, extramemory accesses are required to access the rule table.The other way involves storing rules within leaf nodes. In this

case, search performance is better since extra access to the ruletable is avoided, but extra memory overhead is caused due torule replication. In our simulation in this paper, it is assumedthat rules are stored in leaf nodes since the search performanceis more important than the required memory.A leaf node includes multiple rule entries. Table I shows the

data structure of a rule entry stored in a leaf node of decisiontree algorithms. The data structure of the rule entry is the samefor all decision tree algorithms except for the last field, whichis the next rule pointer. The number of bits allocated for thenext rule pointer field is , where is the total numberof stored rules. As will be shown in Section VI, rule sets withabout 100 000 rules cause tremendous rule replication; hence,becomes an extremely large number. The next rule pointer

shown in Table I is roughly estimated as 30 bits that hold up to10 entries.Table II shows the data structure of the internal nodes of each

decision tree. For BC and selective boundary cutting (SBC),represents the total number of boundary entries at internal

nodes. The number of entries field is used to specify the entriesthat are involved in the binary search at an internal node. Eachinternal node of the BC and SBC algorithms requires 3.625 and3.5 B, respectively, to represent node information such as fieldselection and the number of boundary entries. Each boundary


TABLE IDATA STRUCTURES OF STORING A RULE AT LEAF NODES

entry of the internal node has a 32-bit boundary value and achild pointer. Note that each boundary entry has a child pointerfor the BC and SBC algorithms.For the HiCuts and HyperCuts algorithms, represents the

number of internal nodes. The internal nodes of every decisiontree algorithm have bits that are used to represent a field se-lection (multiple fields for HyperCuts). Since an internal nodecan have both internal nodes and leaf nodes as its children,separately storing internal nodes from leaf nodes and avoidingstoring many child pointers require the presence of a map thatrepresents whether each child is an internal node or a leaf nodeand two pointers that direct the first internal child and the firstleaf child.

VI. SIMULATION RESULTS

Simulations using C++ have been extensively performedfor rule sets created by Classbench [38]. Three different typesof rule sets—access control list (ACL), firewall (FW), andInternet protocol chain (IPC)—are generated with sizes ofapproximately 1000, 5000, 10 000, 50 000, and 100 000 ruleseach. Rule sets are named using the set type followed by thesize such as with ACL1K, which means an ACL type set withabout 1000 rules.The performance of each decision tree is highly dependent on

the rule set characteristics, especially the number of rules with awildcard in the prefix fields, since prefix fields are mostly usedfor cutting because of the variety of distinct values. The numbersand rates of those rules with a wildcard in each field are shownin Table III, in which is the number of rules included in eachrule set. ACL type sets have the smallest rate of wildcards inthe prefix fields, but a high rate in the port number field. Thesource port number is always a wildcard. IPC type sets have ahigh rate of wildcards in port number fields and in the protocoltype field. Compared to other types, FW type sets have a muchhigher rate of wildcards in the prefix fields, especially in thedestination prefix field.

A. Setting binth

The HiCuts and HyperCuts algorithms do not have a cut-stopcondition other than binth. Setting binth as an arbitrary smallnumber will cause an infinite loop of unnecessary cutting if

there is no rule boundary inside the subspace covered by an in-ternal node. The optimization of rule overlapping that removesrules except the highest-priority rule in such nodes can solve thisproblem. However, no rule should be ignored for multimatchpacket classification.To solve this issue and reduce the unpredictability that arises

from using different binth values, we performed the simulationin two steps. The first step is to determine the proper binth value.We first constructed the BC algorithm setting binth as an arbi-trary small number and then identified jumbo leaves, those herethat have more rules than binth. The jumbo leaves exist becausethe number of rules cannot be reduced further by more cuttingsince there is no rule boundary inside the subspace covered bythem. Among the number of rules included in the jumbo leaves,the largest number is identified, and we name the number lowerbound binth, meaning that binth should not be smaller than thisnumber to avoid unnecessary cutting. For example, consideringthe prefix plain shown in Fig. 1, there is no rule boundary insidethe areas covered by , , and . Among these areas, inthe area of , and are also involved. Hence, the largestnumber is three, and hence the lower bound binth is three.

B. Decision Tree Characteristics

In the second step, we constructed decision trees of BC, SBC,HiCuts, and HyperCuts algorithms setting binth as the identi-fied lower bound value. Cutting is recursively performed andstopped if the number of rules included in a subspace is lessthan or equal to the lower bound binth. The performances of theHiCuts and HyperCuts algorithms depend on spfac as well. Inour simulation, spfac was set to 1.5 for 1K sets and 2 for allother sets.The optimization of rule overlapping was not included in our

simulation to allow the decision trees to simultaneously supportthe multimatch and best-match classifications. Node mergingoptimization was implemented for HiCuts tree. Node merging,space compaction, and pushing common rules upward optimiza-tions were implemented for HyperCuts tree.Regarding the optimization of the pushing common rules up-

ward, the HyperCuts study [9] states that all common rules as-sociated with every child of an internal node should be pushedupward. There can be two different ways of pushing upward:during the building or after the building of a decision tree. If weconsider the pushing upward optimization during the buildingof a decision tree, it is implemented as follows. Once one ormore fields are selected as cut fields at an internal node, ruleshaving wildcards in the selected fields are located at the internalnode (by means of the pushing upward) without splitting intochild nodes. In this case, rule replication is greatly reduced, butsearch performance is significantly degraded since many rulesare located at internal nodes and those rules are linearly com-pared to inputs arriving at the node. The search performance isnot tolerable for large rule sets or rule sets with many wildcardrules or short-length prefixes.On the other hand, if we consider pushing upward after the

completion of tree building, it is implemented as follows. Adecision tree is built without performing the optimization, andhence each leaf node has a number of rules less than or equalto binth. From the bottom of the tree, each subtree is examinedto find common rules in every child, and those rules are pushed


TABLE IIDATA STRUCTURES OF INTERNAL NODES IN EACH DECISION TREE

TABLE IIICHARACTERISTICS OF RULE SETS IN THE NUMBER AND RATE OF WILDCARDS

upward into the root of the subtree. This processing is repeateduntil the root of the decision tree is visited. In this case, the totalnumber of rules located at internal nodes along a path from theroot to a leaf node is guaranteed to be at most binth. Hence, thesearch performance is not degraded much by the optimization,but the rule replication becomes significant.For the HyperCuts simulation, we implemented two dif-

ferent versions: HyperCuts_1 and HyperCuts_2. Those twoimplementations are the same for all other things except forthe optimization of pushing upward. In HyperCuts_1, the opti-mization was implemented during the building of the decisiontree. In HyperCuts_2, the optimization was implemented afterthe building of the decision tree.The time to construct each decision tree varies widely de-

pending on the tree and rule set types. The time to build theHiCuts and HyperCuts_2 are longer than building other treesbecause they create many internal nodes and a high numberof replicated rules. Other trees show similar building time. FWtypes requires the most time because their sets have many ruleswith wildcards (Table III) or short-length prefixes and theserules are replicated many times.The characteristics of each decision tree for the identified

lower bound binth are shown in Table IV, where is the

constructed tree depth and is the rule replication factor, whichis the total number of stored rules divided by . For largerule sets such as ACL100K, IPC50K, IPC100K, FW50K, andFW100K, the decision trees of HyperCuts_2 could not be builtbecause of memory shortage problem due to the high rule repli-cation. It is represented by na (not available) in correspondingfields. As shown, the depth of the BC and the SBC is 4–6 in-cluding a leaf since each field is used once at the most. TheHiCuts depth is the worst. The depth of the HyperCuts_2 is be-tween the HyperCuts_1 and the HiCuts.Comparing the rule replication factors of the BC and the

HiCuts, the BC is sometimes better and sometimes worse.The reason is that all of the boundaries are used in the BCfor cutting if a node is determined to be an internal. The rulereplication factor of the SBC algorithm is always smaller thanthat of the BC and much better than that of the HiCuts. Becauseof the optimization of pushing upward in the HyperCuts_1, therule replication factor is the smallest except for the IPC sets.Comparing the HyperCuts_1 and the HyperCuts_2, is muchincreased by performing pushing upward optimization after thebuilding of the tree.The number of nodes affected by the optimization is shown

in Tables V and VI for HiCuts and HyperCuts, respectively.


TABLE IVCHARACTERISTICS OF EACH DECISION TREE

TABLE VOPTIMIZATION CHARACTERISTICS (HICUTS ALGORITHM)

TABLE VIOPTIMIZATION CHARACTERISTICS (HYPERCUTS ALGORITHM)

The node merging and region compaction optimizations reducetree depth and rule replication. Node merging optimization isnot required in the BC and SBC algorithms since leaf nodeshaving the same rule subset are not created. The optimizationof region compaction is not required either because it is natu-rally provided in the BC and SBC algorithms. The pushing up-ward optimization can be applied to the HiCuts and the BC andSBC algorithms. However, it must be reconsidered since it cancause search performance degradation. As shown in Table VI,the maximum number of rules pushed upward at an internalnode in the HyperCuts_1 can exceed 1000 in large rule sets.This results in significant search performance degradation.

Comparing the HyperCuts_1 and the HyperCuts_2, weobserved several interesting facts. In the HyperCuts_2, morenumber of internal nodes have rules, the total number of ruleslocated at internal nodes are a lot more, but the maximumnumber of rules located at an internal node is much smallerthan in the HyperCuts_1.

C. Required Memory

Table VII shows the estimated memory required to storethe internal nodes of each algorithm. In this table, rep-resents the number of internal nodes, while representsthe total number of boundary entries at the internal nodes of


TABLE VIIREQUIRED MEMORY FOR STORING INTERNAL NODES OF EACH DECISION TREE

TABLE VIIIREQUIRED MEMORY FOR STORING RULES IN EACH DECISION TREE

the BC and SBC algorithms. The required memory is calcu-lated by MB for BC nodes and

MB for SBC nodes (Table II).The required memory for the internal nodes of HiCuts andHyperCuts is calculated using MB and

MB, respectively.The BC and SBC algorithms require a reasonable amount of

memory, less than 4 MB except for large sets such as IPC100K,FW50K, and FW100K; hence, they can be implemented usinga cache or embedded memory. The memory requirement forinternal HiCuts nodes is much higher, up to 50 that of SBCalgorithm. Hence, it is not practical to implement the internalnodes of large rule sets in an on-chip memory or a fast cache forHiCuts. While the HyperCuts_1 requires less memory than theBC or the SBC, mainly due to the pushing upward optimization,the HyperCuts_2 requires much more memory.Table VIII lists the estimated memory required to store rules

in each decision tree. The memory amount is estimated byMB based on the width of a rule entry shown in

Table I, where is the total number of stored rules. The re-quired memory for storing rules for decision tree algorithms isgenerally reasonable for small sets with up to 10K rules. How-ever, the memory overhead is significantly increased for large

sets. The HyperCuts_1 requires the least memory for all of theACL sets, but requires morememory than SBC for IPC sets. Thememory requirement for the HyperCuts_2 is more than that ofthe HyperCuts_1 in all cases because of the large rule replica-tion. The HiCuts always requires more memory than the SBC(up to 8 ). Building any of the decision trees for a large numberof FW type rules is not practical in required memory amount.Table IX lists the total memory amount (bytes per rule) re-

quired by each decision tree for storing a rule. The number isthe required memory amount storing internal nodes shown inTable VII plus the required memory amount storing rules shownin Table VIII, divided by the number of rules, .The memory amount for decision tree algorithms depends on

rule number and type. The ACL and IPC types require severalkilobytes per rule, while the FW type requires several hundredkilobytes for large sets. The HyperCuts_1 requires the smallestmemory overhead except for the IPC50K and IPC100K.As shown, all of the decision tree algorithms require consid-

erable memory overhead, especially for large sets. The memoryoverhead can be controlled by increasing the binth and reducingthe spfac at the expense of degraded search performance. Deter-mining the balance between memory overhead and search per-formance is not a simple task.


TABLE IXTOTAL REQUIRED MEMORY AMOUNT PER RULE (kB)

Fig. 6. Worst-case number of memory accesses at internal nodes. (a) ACL.(b) IPC. (c) FW.

D. Search Performance

Input traces have been also generated from theClassbench [38] to evaluate search performance. The numberof inputs is about 3 the number of rules. The searchperformance was evaluated assuming the highest-priority

Fig. 7. Average number of memory accesses at internal nodes. (a) ACL.(b) IPC. (c) FW.

match packet classification. Rules are stored in descendingorder of priority, and the search is immediately stopped whena match is found. For HyperCuts, rules stored in the internalnodes by pushing upward optimization are also comparedto the descending order of priority. If a match is found, the


Fig. 8. Worst-case number of memory accesses at leaf nodes. (a) ACL. (b) IPC.(c) FW.

current best match is remembered. When the search continuesat a child node, the priority of the current best match is alwayscompared to the priority of a rule under comparison to avoidunnecessary rule comparison in cases that the priority is lowerthan that of current best match.Figs. 6 and 7 compare the worst-case and the average-case

search performances in terms of the number of memory ac-cesses required at internal nodes. For the HyperCuts_1 and theHyperCuts_2, the rule comparison at internal nodes causedby the pushing upward optimization was not included inthese graphs because it is different from the usual operationperformed at the internal nodes. It will be accounted for thenumber of memory accesses at the leaf nodes in later graphs.These graphs show that the number of memory accesses re-

quired at internal nodes of the BC or SBC algorithms is much

Fig. 9. Average number of memory accesses at leaf nodes. (a) ACL. (b) IPC.(c) FW.

better than that of the HiCuts and slightly worse than that of theHyperCuts, even though they perform binary searches in eachinternal node. The HiCuts provides the worst performance be-cause of its large depths. The worst-case number is up to 76 forthe HiCuts, while it is up to 29 for the proposed algorithms.The average number is also better in the BC and SBC algo-rithms than in the HiCuts. The HyperCuts_1 provides the bestperformance in the number of memory accesses at the internalnodes since the depth is greatly reduced by the pushing upwardoptimization.Figs. 8 and 9 show the worst-case and average-case search

performances at leaf nodes. The worst-case number of memoryaccesses at the leaf nodes is closely related to binth. Becauseof the rule comparison at internal nodes by the pushing upwardoptimization, the HyperCuts_1 shows the worst performance,


Fig. 10. Total number of memory accesses in the worst case. (a) ACL. (b) IPC.(c) FW.

which is not tolerable for high-speed packet classification.The HyperCuts_2 shows much better performance thanHyperCuts_1, but slightly worse performance than other algo-rithms. In the average search performance, the BC algorithmprovides the best for all cases since every rule boundary isused; hence, leaves with the smallest number of rules aregenerated and require only 1.0–2.8 accesses. The SBC algo-rithm also shows excellent search performance at leaf nodes,approximately 1.4–4.0. Assuming that the internal nodes in ourproposed algorithms are stored in an on-chip memory or a fastcache, the packet classification of our proposed decision treealgorithms requires 1.0–4.0 in average and 4–47 in the worstcases in terms of the number of off-chip memory accesses forthe sets with up to 100 000 rules. To the best of our knowledge,it is the best among the known packet classification algorithms.

Fig. 11. Total number of memory accesses in average. (a) ACL. (b) IPC.(c) FW.

Figs. 10 and 11 show the total number of memory accessesincluding the internal node access and the leaf node access in theworst case and in the average case, respectively, assuming thatboth nodes are stored in off-chip memories. The HyperCuts_1provides excellent search performance at internal nodes, but theworst search performance at leaf nodes, while the HiCuts pro-vide the worst search performance at internal nodes, but excel-lent search performance at leaf nodes. The proposed algorithmsand the HyperCuts_2 algorithm provide a reasonable searchperformance as a whole, but the memory requirement for theHyperCuts_2 is not tolerable as shown in Table IX.

VII. CONCLUSION

In this paper, decision tree algorithms have been studied,and a new decision tree algorithm is proposed. Throughout the


extensive simulation using Classbench databases [38] for theprevious decision tree algorithms, HiCuts and HyperCuts, wediscovered that the performance of decision tree algorithms ishighly dependent on the rule set characteristics, especially thenumber of rules with a wildcard or a short-length prefix. Forexample, the HiCuts algorithm can provide high-speed searchperformance, but the memory overhead for large sets or setswith many wildcard rules makes its use impractical. We alsodiscovered that the HyperCuts algorithm either does not providehigh-speed search performance or requires a huge amount ofmemory depending on how to implement the pushing upwardoptimization.While the cutting in the earlier decision tree algorithms is

based on a regular interval, the cutting in the proposed algo-rithm is based on rule boundaries; hence, the cutting in ourproposed algorithm is deterministic and very effective. Further-more, to avoid rule replication caused by unnecessary cutting, arefined structure of the proposed algorithm has been proposed.The proposed algorithms consume a lot less memory spacecompared to the earlier decision tree algorithms, and it is upto several kilobytes per rule except for FW50K and FW100K.The proposed algorithms achieve a packet classification by10–23 on-chip memory accesses and 1.0–4.0 off-chip memoryaccesses in average.New network applications have recently demanded a mul-

timatch packet classification [4] in which all matching resultsalong with the highest-priority matching rule must be returned.It is necessary to explore efficient algorithms to solve bothclassification problems. The decision tree algorithms includingthe proposed algorithms in this paper naturally enable both thehighest priority match and the multimatch packet classification.

ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewersfor their valuable comments and suggestions to improve thequality of the paper, especially related to the implementationof HyperCuts.

REFERENCES

[1] H. J. Chao, “Next generation routers,” Proc. IEEE, vol. 90, no. 9, pp.1518–1588, Sep. 2002.

[2] A. X. Liu, C. R.Meiners, and E. Torng, “TCAM razor: A systematic ap-proach towards minimizing packet classifiers in TCAMs,” IEEE/ACMTrans. Netw., vol. 18, no. 2, pp. 490–500, Apr. 2010.

[3] C. R. Meiners, A. X. Liu, and E. Torng, “Topological transformationapproaches to TCAM-based packet classification,” IEEE/ACM Trans.Netw., vol. 19, no. 1, pp. 237–250, Feb. 2011.

[4] F. Yu and T. V. Lakshnam, “Efficient multimatch packet classificationand lookup with TCAM,” IEEE Micro, vol. 25, no. 1, pp. 50–59, Jan.–Feb. 2005.

[5] F. Yu, T. V. Lakshman, M. A. Motoyama, and R. H. Katz, “Efficientmultimatch packet classification for network security applications,”IEEE J. Sel. Areas Commun., vol. 24, no. 10, pp. 1805–1816, Oct. 2006.

[6] H. Yu and R. Mahapatra, “A memory-efficient hashing by multi-predi-cate bloom filters for packet classification,” in Proc. IEEE INFOCOM,2008, pp. 2467–2475.

[7] H. Song and J. W. Lockwood, “Efficient packet classification for net-work intrusion detection using FPGA,” in Proc. ACM SIGDA FPGA,2005, pp. 238–245.

[8] P. Gupta and N. Mckeown, “Classification using hierarchical intelli-gent cuttings,” IEEE Micro, vol. 20, no. 1, pp. 34–41, Jan.–Feb. 2000.

[9] S. Singh, F. Baboescu, G. Varghese, and J. Wang, “Packet classifica-tion using multidimensional cutting,” in Proc. SIGCOMM, 2003, pp.213–224.

[10] P. Gupta and N. Mckeown, “Algorithms for packet classification,”IEEE Netw., vol. 15, no. 2, pp. 24–32, Mar.–Apr. 2001.

[11] B. Vamanan, G. Voskuilen, and T. N. Vijaykumar, “EffiCuts: Opti-mizing packet classification for memory and throughput,” in Proc.ACM SIGCOMM, 2010, pp. 207–218.

[12] H. Song, M. Kodialam, F. Hao, and T. V. Lakshman, “Efficient triebraiding in scalable virtual routers,” IEEE/ACM Trans. Netw., vol. 20,no. 5, pp. 1489–1500, Oct. 2012.

[13] J. Treurniet, “A network activity classification schema and its appli-cation to scan detection,” IEEE/ACM Trans. Netw., vol. 19, no. 5, pp.1396–1404, Oct. 2011.

[14] L. Choi, H. Kim, S. Ki, and M. H. Kim, “Scalable packetclassification through rulebase partitioning using the maximumentropy hashing,” IEEE/ACM Trans. Netw., vol. 17, no. 6, pp.1926–1935, Dec. 2009.

[15] F. Baboescu and G. Varghese, “Scalable packet classification,”IEEE/ACM Trans. Netw., vol. 13, no. 1, pp. 2–14, Feb. 2005.

[16] P. C. Wang, C. L. Lee, C. T. Chan, and H. Y. Chang, “Performanceimprovement of two-dimensional packet classification by filterrephrasing,” IEEE/ACM Trans. Netw., vol. 15, no. 4, pp. 906–917,Aug. 2007.

[17] P. C. Wang, C. L. Lee, C. T. Chan, and H. Y. Chang, “O(log W) mul-tidimensional packet classification,” IEEE/ACM Trans. Netw., vol. 15,no. 2, pp. 462–472, Apr. 2007.

[18] X. Sun, S. K. Sahni, and Y. Q. Zhao, “Packet classification consumingsmall amount of memory,” IEEE/ACM Trans. Netw., vol. 13, no. 5, pp.1135–1145, Oct. 2005.

[19] W. Lu and S. Sahni, “Succinct representation of static packet classi-fiers,” IEEE/ACM Trans. Netw., vol. 17, no. 3, pp. 803–816, Jun. 2009.

[20] B. Lampson, B. Srinivasan, and G. Varghese, “IP lookups using mul-tiway and multicolumn search,” IEEE/ACM Trans. Netw., vol. 7, no.3, pp. 324–334, Jun. 1999.

[21] D. E. Taylor, “Survey and taxonomy of packet classification tech-niques,” Comput. Surveys, vol. 37, no. 3, pp. 238–275, Sep. 2005.

[22] V. Srinivasan, G. Varghese, S. Suri, and M. Waldvogel, “Fast andscalable layer four switching,” in Proc. ACM SIGCOMM, 1998, pp.191–202.

[23] T. V. Lakshnam and D. Stiliadas, “High-speed policy-based packet for-warding using efficient multi-dimensional range matching,” in Proc.ACM SIGCOMM, Oct. 1998, pp. 203–214.

[24] M. M. Buddhikot, S. Suri, and M. Waldvogel, “Space decompositiontechniques for fast layer-4 switching,” in Proc. Protocols High SpeedNetw., Aug. 1999, pp. 25–41.

[25] F. Baboescu and G. Varghese, “Scalable packet classification,” in Proc.ACM SIGCOMM, Aug. 2001, pp. 199–210.

[26] F. Baboescu, S. Singh, and G. Varghese, “Packet classification for corerouters: Is there an alternative to CAMs?,” in Proc. IEEE INFOCOM,2003, pp. 53–63.

[27] S. Dharmapurikar, H. Song, J. Turner, and J. Lockwood, “Fast packetclassification using Bloom filters,” in Proc. ACM/IEEE ANCS, 2006,pp. 61–70.

[28] P. Wang, C. Chan, C. Lee, and H. Chang, “Scalable packet classifica-tion for enabling internet differentiated services,” IEEE Trans. Multi-media, vol. 8, no. 6, pp. 1239–1249, Dec. 2006.

[29] H. Lim, M. Kang, and C. Yim, “Two-dimensional packet classificationalgorithm using a quad-tree,” Comput. Commun., vol. 30, no. 6, pp.1396–1405, Mar. 2007.

[30] H. Lim, H. Chu, and C. Yim, “Hierarchical binary search tree for packetclassification,” IEEE Commun. Lett., vol. 11, no. 8, pp. 689–691, Aug.2007.

[31] H. Lim and J. Mun, “High-speed packet classification using binarysearch on length,” in Proc. IEEE/ACM ANCS, Orlando, FL, USA, Dec.2007, pp. 137–144.

[32] A. G. Alagu Priya and H. Lim, “Hierarchical packet classification usinga Bloom filter and rule-priority tries,” Comput. Commun., vol. 33, no.10, pp. 1215–1226, Jun. 2010.

[33] H. Lim and S. Y. Kim, “Tuple pruning using bloom filters for packetclassification,” IEEE Micro, vol. 30, no. 3, pp. 48–58, May–Jun. 2010.

[34] H. Song, J. Turner, and S. Dharmapurikar, “Packet classification usingcoarse-grained tuple spaces,” in Proc. ACM/IEEE ANCS, 2006, pp.41–50.

[35] V. Srinivasan, S. Suri, and G. Varghese, “Packet classification usingtuple space search,” in Proc. ACM SIGCOMM, 1999, pp. 135–146.


[36] H. Song, S. Dharmapurikar, J. Turner, and J. Lockwood, “Fast hashtable lookup using extended bloom filter: An aid to network pro-cessing,” in Proc. ACM SIGCOMM, 2005, pp. 181–192.

[37] P. Panda, N. Dutt, and A. Nicolau, “On-chip vs. off-chip memory:The data partitioning problem in embedded processor-based systems,”Trans. Design Autom. Electron. Syst., vol. 5, no. 3, pp. 682–704, Jul.2000.

[38] D. E. Taylor and J. S. Turner, “ClassBench: A packet classificationbenchmark,” IEEE/ACMTrans. Netw., vol. 15, no. 3, pp. 499–511, Jun.2007.

Hyesook Lim (M’91–SM’13) received the B.S.and M.S. degrees in control and instrumentationengineering from Seoul National University, Seoul,Korea, in 1986 and 1991, respectively, and the Ph.D.degree in electrical and computer engineering fromthe University of Texas at Austin, Austin, TX, USA,in 1996.From 1996 to 2000, she had been employed as a

Member of Technical Staff with Bell Labs, LucentTechnologies, Murray Hill, NJ, USA. From 2000 to2002, she was a Hardware Engineer for Cisco Sys-

tems, San Jose, CA, USA. She is currently a Professor with the Departmentof Electronics Engineering, Ewha Womans University, Seoul, Korea, whereshe does research on packet forwarding algorithms such as IP address lookup,packet classification, and deep packet inspection, and research on hardware im-plementation of various network protocols such as TCP/IP and Mobile IPv6.She is currently working as the Associate Vice President of the Office of Fac-ulty and Academic Affairs at Ewha Womans University.

Nara Lee received the B.S. andM.S. degrees in elec-tronics engineering from Ewha Womans University,Seoul, Korea, in 2009 and 2012, respectively.She is currently a Research Engineer with the IP

Technical Team, DTVSoCDepartment, SIC Lab, LGElectronics, Inc., Seoul, Korea. Her research inter-ests include various network algorithms such as IPaddress lookup, packet classification, Web caching,and Bloom filter application to various distributed al-gorithms.

Geumdan Jin received the B.S. and M.S. degrees in electronics engineeringfrom Ewha Womans University, Seoul, Korea, in 2008 and 2010, respectively.She is currently working for the TMS Service Development Team, Hyundai

Motor Company, Seoul, Korea. Her research interests are layer-2 and layer-3architectures, fast packet classification, and wireless networks.

Jungwon Lee received the B.S. degree in mecha-tronics engineering from Korea Polytechnic Univer-sity, Gyeonggi-do, Korea, in 2011, and the M.S. de-gree in electronics engineering from Ewha WomansUniversity, Seoul, Korea, in 2013, and is currentlypursuing the Ph.D. degree in electronics engineeringfrom Ewha Womans University.Her research interests include address lookup

and packet classification algorithms and packetforwarding technology at content-centric networks.

Youngju Choi (S’12) received the B.S. and M.S. de-grees in electronics engineering from Ewha WomansUniversity, Seoul, Korea, in 2011 and 2013, respec-tively.She works for the Technology IT Team, Hyundai

Consulting and Information Company, Ltd., Seoul,Korea. Her research interests include various net-work algorithms and TCAM architectures.

Changhoon Yim (M’91) received the B.S. degree incontrol and instrumentation engineering from SeoulNational University, Seoul, Korea, in 1986, the M.S.degree in electrical engineering from Korea Ad-vanced Institute of Science and Technology, Seoul,Korea, in 1988, and the Ph.D. degree in electricaland computer engineering from the University ofTexas at Austin, Austin, TX, USA, in 1996.He was a Research Engineer working on HDTV

with the Korean Broadcasting System, Seoul, Korea,from 1988 to 1991. From 1996 to 1999, he was

a Member of the Technical Staff with the HDTV and Multimedia Division,Sarnoff Corporation, Princeton, NJ, USA. From 1999 to 2000, he workedwith Bell Labs, Lucent Technologies, Murray Hill, NJ, USA. From 2000 to2002, he was a Software Engineer with the KLA-Tencor Corporation, Milpitas,CA, USA. From 2002 to 2003, he was a Principal Engineer with SamsungElectronics, Suwon, Korea. He is currently a Professor with the Departmentof Internet and Multimedia Engineering, Konkuk University, Seoul, Korea.His research interests include multimedia communication, digital imageprocessing, packet classification, and IP address lookup.

22 - Boundary Cutting for Packet Classification

Documents

Transcript of 22 - Boundary Cutting for Packet Classification