TCAM architecture for IP lookup using prefix properties ...

60

In modern IP routers, Internet Pro-tocol (IP) lookup forms a bottleneck in pack-et forwarding because the lookup speedcannot catch up with the increase in linkbandwidth. Ternary content-addressablememories (TCAMs) have emerged as viabledevices for designing high-throughput for-warding engines on routers. Called ternarybecause they store don’t-care states in additionto 0s and 1s, TCAMs search the data (IPaddress) in a single clock cycle. Because of thisproperty, TCAMs are particularly attractivefor packet forwarding and classifications.Despite these advantages, large TCAM arrayshave high power consumption and lack scal-able design schemes, which limit their use.

Today’s high-density TCAMs consume 12to 15 W per chip when the entire memory isenabled. To support the superlinearly increas-ing number of IP prefixes in core routers, ven-dors use up to eight TCAM chips. Filtering andpacket classification would also require addi-tional chips. The high power consumption ofusing many chips increases cooling costs and

also limits the router design to fewer ports.1

Recently, researchers have proposed a fewapproaches to reducing power consumptionin TCAMs,1,2 including routing-table com-paction.3,4 Liu presents a novel technique toeliminate redundancies in the routing table.3

However, this technique takes excessive timefor update because it is based on the Espres-so-II minimization algorithm,5 which expo-nentially increases in complexity with thenumber of prefixes in a routing table. Thus,our work’s main objective is a TCAM-basedrouter architecture that consumes less powerand is suitable for the incremental updatingthat modern IP routers need. Additionally, theapproach we will describe minimizes thememory size required for storing the prefixes.

We propose a two-level pipelined architec-ture that reduces power consumption throughmemory compaction and the selective enable-ment of only a portion of the TCAM array. Wealso introduce the idea of prefix aggregationand prefix expansion to reduce the number ofrouting-table entries in TCAMs for IP lookup.

Ravikumar V.C.Rabi N. MahapatraTexas A&M University

TERNARY CONTENT-ADDRESSABLE MEMORY (TCAM) IS A POPULAR

HARDWARE DEVICE FOR FAST ROUTING LOOKUP. THOUGH USEFUL, TCAM

ARRAYS HAVE HIGH POWER CONSUMPTION AND HEAT DISSIPATION,

PROBLEMS ALLEVIATED BY REDUCING THE NUMBER OF ROUTING-TABLE

ENTRIES. THE AUTHORS PRESENT AN APPROACH THAT EXPLOITS TWO

PROPERTIES OF PREFIXES TO COMPACT THE ROUTING TABLE.

TCAM ARCHITECTURE FORIP LOOKUP USING

PREFIX PROPERTIES

Published by the IEEE Computer Society 0272-1732/04/$20.00 2004 IEEE

Today, TCAM vendors provide mechanismsto enable a chunk of TCAM much smaller thanthe entire TCAM array. Exploiting this tech-nology reduces power consumption duringlookup in the TCAM-based router. We alsodiscuss an efficient incremental update schemefor the routing of prefixes and provide empir-ical equations for estimating memory require-ments and proportional power consumptionfor the proposed architecture. Finally, we usetraces from bbnplanet, attcanada,utah, and is routers to validate the proposedTCAM architecture’s effectiveness.

Background and related workWe can categorize approaches to perform-

ing IP lookup as either software or hardwarebased. Researchers have proposed various soft-ware solutionsfor routing lookup.6,7 Howev-er, these software approaches are too slow foruse in complex packet processing at everyrouter, mainly because they require too manymemory accesses: Independent of their designor implementation techniques, softwareapproaches take at least four to six memoryaccesses for a single lookup operation. Today’spacket processing requires speeds of about 40Gbps; software approaches achieve no morethan 10 Gbps.8

Hardware approaches typically use dedi-cated hardware for routing lookup.9,10 Morepopular techniques use commercially avail-able content-addressable memory (CAM).CAM storage architectures have gained inpopularity because their search time is O(1)—that is, it is bounded by a single memoryaccess. Binary CAMs allow only fixed-lengthcomparisons and are therefore unsuitable forlongest-prefix matching. The TCAM solvesthe longest-prefix problem and is by far is thefastest hardware device for routing. In con-trast to TCAMs, ASICs that use tries—digi-tal trees for storing strings (in this case, theprefixes)—require four to six memory access-es for a single route lookup and thus havehigher latencies.1 Also, TCAM-based routing-table updates have been faster than their trie-based counterparts.

The number of routing-table entries isincreasing superlinearly.8 Today, routing tableshave approximately 125,000 entries and willcontain about 500,000 entries by 2005, so theneed for optimal storage is also very impor-

tant. Yet CAM vendors claim to handle a max-imum of only 8,000 to 128,000 prefixes, tak-ing allocators and deallocators into account.11

The gap between the projected numbers ofrouting-table entries and the low capacity ofcommercial products has given rise to workon optimizing the TCAM storage space byusing the properties of lookup tables.3, 4

However, even though TCAMs can storelarge numbers of prefixes, they consume largeamounts of power, which limits their useful-ness. Recently, Panigrahy and Sharma intro-duced a paged-TCAM architecture to reducepower consumption in TCAM routers.2 Theirscheme partitions prefixes into eight groupsof equal size; each group resides on a separateTCAM chip. A lookup operation can thenselect and enable only one of the eight chipsto find a match for an incoming IP address.In addition, the approach introduces a pag-ing scheme to enable only a set of pages with-in a TCAM. However, this approach achievesonly marginal power savings at the cost ofadditional memory and lookup delay.

Other work describes two architectures, bitselection and trie-based, which use a pagingscheme as the basis for a power-efficientTCAM.1 The bit selection scheme extracts the16 most significant bits of the IP address anduses a hash function to enable the lookup ofa page in the TCAM chip. The approachassumes the prefix length to be from 16 to 24bits. Prefixes outside this range receive specialhandling; the lookup searches for them sepa-rately. However, the number of such prefixesin today’s routers is very large (65,068 for thebbnplanet router), and so this approachwill result in significant power consumption.

61MARCH–APRIL 2004

Today, routing tables have

approximately 125,000 entries

and will contain about

500,000 entries by 2005,

making optimal storage

important.

In addition, the partitioning scheme cre-ates a trie structure for the routing table pre-fixes and then traverses the trie to createpartitions by grouping prefixes having thesame subprefix. The subprefixes go into anindex TCAM, which further indexes into sta-tic RAM to identify and enable the page inthe data TCAM that stores the prefixes. Theindex TCAM is quite large for smaller pagesizes and is a key factor in power consump-tion. The three-level architecture, thoughpipelined, introduces considerable delay.

It is important to note that both theapproaches1,2 store the entire routing table,which is unnecessary overhead in terms ofmemory and power. Although the existingapproaches reduce power either by compact-ing the routing table or selecting a portion ofthe TCAM, our approach reduces power bycombining the two approaches and exploit-ing the observable properties of prefixes.

Prefix properties for compactionThe two purposes for delving into the pre-

fix properties are

• to further compact the routing tablebeyond the compaction provided byexisting techniques, and

• to derive an upper bound on minimiza-tion time to prevent it from becoming abottleneck in the routing lookup.

Previous attempts to reduce the routing-table size using prefix properties used prefixoverlapping and minimization techniques.3

Prefix overlapping achieves compaction by stor-ing only one of many overlapping prefixes thathave the same next hop. Two prefixes overlapif one is an enclosure of the other. Let Pi be aprefix, with |Pi| denoting the length of prefixPi. Then Pi ∈ P is called an enclosure of Pj, ifPj ∈ P, j ≠ i and |Pi| < |Pj|, such that Pi is a sub-prefix of Pj. If Pi and Pj have the same next hop,then Pi can represent them. Thus, set of over-lapping prefixes {P1, P2, P3, … Pn}, such thatany |P1| < |P2| < |P3| … < |Pn| having the samenext hop is replaceable with single entry P1. Ifan update deletes entry P1, entry P2 shouldrepresent set of overlapping prefixes {P2, …Pn}. When an update adds new entry Pi suchthat |Pi| <| P1| < |P2| … < |Pn|, then Pi replacesexisting entry P1. However, if new entry Pi

arrives such that |Pi| > |P1|, then the routingtable should not change, except for an updateof the set of overlapping prefixes.

Prefix minimization logically reduces two ormore prefixes to a minimal set, if these prefix-es have the same next hop. The logic mini-mization is an NP-complete problem, solvableusing the Espresso-II algorithm. Espresso-IIcan minimize prefixes {P1, P2, P3, … Pn} to{P ′1, P ′2, P ′3, … P ′m}, such that m ≤ n.

Although the prefix overlapping and mini-mization techniques together compact about30 to 45 percent of the routing table, thesetechniques have an overhead when it comes toprefixes that require fast updates. The timetaken for prefix overlapping is bounded andindependent of the router’s size. However, thelogic minimization algorithm using Espresso-II has a runtime that increases exponentiallywith input size. Based on Liu’s techniques,3 theEspresso-II input can be as high as 15,146 pre-fixes for the attcanada router. Thus, theruntime for such a large amount of input datacan be several minutes and is very expensivefor incremental updates. We propose tacklingthis problem by establishing an upper bound,independent of router size, on the input to theminimization algorithm. We introduce anoth-er property, called prefix aggregation, to helpus fix the upper bound. Based on this proper-ty, prefix minimization is time bounded.

Prefix aggregationFigure 1 presents a portion of the routing-

table traces from the bbnplanet router. Itrepresents all the prefixes in the routing table,starting with 129.66 and having prefixlength l, for 16 < l ≤ 24. The number 129.66is the largest common subprefix (LCS) forthe set of those prefixes. The LCS for anyprefix is the subprefix with a length nearestto the multiple of eight, such that |Si| < |Pi|,where Si is the LCS of prefix Pi or LCS(Pi).The prefixes having a different LCS are leastlikely to have the same next hop. Accordingto this observation, applying minimizationto a set of prefixes having the same LCSshould achieve a compaction nearly equal tothat achieved by applying minimization tothe entire lookup table.

So without sacrificing much compaction,the use of prefix aggregation can provide anupper bound on the number of possible pre-

62

TCAM ARCHITECTURE

IEEE MICRO

fixes for input to the minimization algorithm.This input set is the largest set of prefixes suchthat each prefix in the set has the same LCS.These prefixes, which have LCS value S, canhave prefix lengths from |S| to |S| + 8. This rangecontains a maximum of 512 prefixes; however,all these combinations are not possible. Forexample, the assignment 120.255.0.0/12 pre-cludes the combinations 120.255.0.0/13 through120.255.0.0/16. So to calculate the maximumpossible number of prefixes having the same LCS,we assign all combinations of 120.x.0.0/16prefixes so that no assignments are possible forprefixes of length less than 16. Thus, a maxi-mum of 256 prefixes can share the same LCS.

Prefix expansionResearchers have presented the property of

prefix expansion for IP lookup using softwareapproaches.12 We adopt this property to fur-ther compact the routing table and can repre-sent the prefix expansion property as follows.If Pi represents a prefix, such that |Pi| is not amultiple of eight, then the prefix expansionproperty expands Pi to Pi·Xm, where X is adon’t-care and m = 8 − (|Pi| mod 8). The oper-ator “⋅” represents the concatenation opera-tion. For example, the prefix 100000010100expands to 100000010100XXXX using theprefix expansion property.

The Espresso-II algorithm will providemore compaction if all Pi are of same length.However, all the prefixes corresponding to the

same LCS are not necessarily the same length.Hence, to increase the compaction, weexpand prefixes to a size that is the nearestmultiple of eight.

Just as the number of inputs governs Espres-so-II’s runtime, so does the input’s bit lengths.We have already limited the input set to the setof prefixes with the same LCS. We nextobserve that, in terms of Espresso-II’s calcu-lations, the LCS is redundant. So it would beuseful to further compact the prefixes by elim-inating the LCS from each prefix and usingthe remainder, which is the least-significant 8bits after prefix expansion.

Routing-table compactionDuring router initialization, our approach

compacts the routing table using the prop-erty of prefix overlapping and also removesredundant entries (prefix minimization). Wethen use the property of prefix aggregationto form sets of prefixes based on their LCSvalues. Each of these sets goes through pre-fix expansion before serving as input toEspresso-II.

Prefix overlapping. Prefix overlapping appliesto all the prefix entries in the routing tablewith the same next hop. The routing tablethen stores only one of the many overlappingprefixes. Figure 2 is the result of applying pre-fix overlapping to the trace in Figure 1. In thiscase, prefix overlapping reduces the table byseven entries.

Prefix minimization and prefix expansion.Figure 3 shows that prefix minimizationand prefix expansion reduce the number of


PPrreefifixx NNeexxtt hhoopp

129.66.6.0/24 4.0.6.142

129.66.8.0/24 4.0.6.142

129.66.12.0/24 4.0.6.142

129.66.20.0/24 4.0.6.142

129.66.21.0/24 4.0.6.142

129.66.30.0/23 4.0.6.142

129.66.31.0/24 4.0.6.142

129.66.32.0/19 4.0.6.142

129.66.34.0/24 4.0.6.142

129.66.47.0/24 4.0.6.142

129.66.48.0/24 4.0.6.142

129.66.64.0/18 4.0.6.142

129.66.88.0/24 4.0.6.142

129.66.95.0/24 4.0.6.142

129.66.111.0/24 4.0.6.142

129.66.128.0/22 4.0.6.142

129.66.132.0/24 4.0.6.142

129.66.172.0/24 4.0.6.142

Figure 1. Sample trace of routing table frombbnplanet.

PPrreefifixx NNeexxtt hhoopp

100000010100001000000110XXXXXXXX 4.0.6.142

100000010100001000001000XXXXXXXX 4.0.6.142

100000010100001000001100XXXXXXXX 4.0.6.142

100000010100001000010100XXXXXXXX 4.0.6.142

100000010100001000010101XXXXXXXX 4.0.6.142

10000001010000100001111XXXXXXXXX 4.0.6.142

1000000101000010001XXXXXXXXXXXXX 4.0.6.142

100000010100001001XXXXXXXXXXXXXX 4.0.6.142

1000000101000010100000XXXXXXXXXX 4.0.6.142

100000010100001010000100XXXXXXXX 4.0.6.142

100000010100001010101100XXXXXXXX 4.0.6.142

Figure 2. Result of applying prefix overlapping to Figure 1trace.

prefixes by nine. Figure 4 shows that prefixminimization without prefix expansionreduces the number of prefixes by eight.

Proposed architectureThe prefix properties reduce the IP lookup

table’s length (vertically) as described earlier.Here, we propose an architectural techniquethat reduces the IP lookup table laterally. Thistechnique adopts the two-level routing lookuparchitecture in Figure 5.

Level 1 consists of a decoder and a recon-figurable page-enable block (PEB). The PEBis a combinational circuit consisting of a dri-ver network to drive the output port, whichconnects directly to TCAM pages throughpage-enable lines. The size of the PEB’s out-put port (in bits) equals the number of pagesin the TCAM. The PEB is also reconfigurableto accommodate changes during a page over-flow. Level 2 is a 256-segment TCAM array.In addition to the assigned pages, the TCAMarray contains many empty pages; an imple-mentation will use these pages for memorymanagement.

For each incoming IP address, the first octet

(A) goes to the decoder-PEB, which enablesonly one segment in the TCAM array. Thedecoder-PEB then sends page-enable signals toeach TCAM page. The TCAM then comparesthe enabled pages with the next three incomingoctets (B, C, and D) of the IP address.

The overhead of the decoder-PEB hardwareis insignificant compared to that of theTCAM array. Avoiding the storage of the firstoctet in the TCAM array also reduces the totalTCAM storage space by 25 percent. This sav-ings is significant, especially when routingtables are as large as 125,000 entries. Also, thearchitecture enables only one segment of theTCAM array at any time, significantly reduc-ing power consumption.

The lookup latency is the TCAM readaccess time plus small gate delays because ofthe decoder-PEB hardware. But gate delaysshould be less than the memory access time,so the TCAM’s access time would limit lookupthroughput. To increase the lookup through-put, the architecture could select more thanone segment in the TCAM and simultane-ously perform multiple IP lookups. This ispossible by duplicating the decoder-PEB logicin level 1 and resolving additional IP prefixesin the TCAM array. Similarly, concurrentlookup and update are also feasible using sucharchitecture. However, the more TCAM seg-ments the lookup enables, the more power theTCAM consumes, so a designer must trade offpower consumption and throughput.

Empirical model for memory and powerWe now present empirical formulas to com-

pute the total memory requirement andpower consumption of our proposed archi-tecture. Let rowsi represent the number ofentries in the ith TCAM segment; and tagbits,the TCAM word length (24 bits). The pro-posed architecture’s minimum memoryrequirement (for 32-bit entries) is

(1)

We base our empirical model for power esti-mation on the number of entries enabled dur-ing the lookup process. Based on this model,the power consumption (for 32-bit entries) ina TCAM with its ith segment enabled is

( / )rows tagbitsi

i

×=

∑ 321

256

64

TCAM ARCHITECTURE

IEEE MICRO

PPrreefifixx eennttrriieess NNeexxtt hhoopp

100000010100001010101100XXXXXXXX 4.0.6.142

100000010100001000011111XXXXXXXX 4.0.6.142

100000010100001000001X00XXXXXXXX 4.0.6.142

10000001010000100001010XXXXXXXXX 4.0.6.142

100000010100001010000100XXXXXXXX 4.0.6.142

100000010100001000000110XXXXXXXX 4.0.6.142

10000001010000100001111XXXXXXXXX 4.0.6.142

1000000101000010100000XXXXXXXXXX 4.0.6.142

1000000101000010001XXXXXXXXXXXXX 4.0.6.142

100000010100001001XXXXXXXXXXXXXX 4.0.6.142

Figure 4. Result of applying prefix minimization without prefix expansion toFigure 1 trace.

PPrreefifixx eennttrriieess NNeexxtt hhoopp

1000000101000010X0101100XXXXXXXX 4.0.6.142

10000001010000100XX1010XXXXXXXXX 4.0.6.142

10000001010000100XX01X00XXXXXXXX 4.0.6.142

100000010100001010000X00XXXXXXXX 4.0.6.142

10000001010000100XX00110XXXXXXXX 4.0.6.142

10000001010000100XX1111XXXXXXXXX 4.0.6.142

1000000101000010100000XXXXXXXXXX 4.0.6.142

10000001010000100X1XXXXXXXXXXXXX 4.0.6.142

100000010100001001XXXXXXXXXXXXXX 4.0.6.142

Figure 3. Result of applying prefix minimization and prefix expansion toFigure 1 trace.

P (rows,tagbits) = k × (rowsi × tagbits/32) (2)

where k is the power constant.The largest segment in the TCAM array

determines the maximum power consump-tion for the lookup process.

Fast incremental updateApproximately 100 to 1,000 updates per

second take place in core routers today.13

Thus, the update operation should be incre-mental and fast to avoid becoming a bottle-neck in the search operation.

A router receives information—routingupdates—about the route changes from itsnearby routers. During each routing update(an insertion or deletion), we apply com-paction techniques to avoid any redundant

storage in the TCAM. However, it is possiblethat more than one prefix in the TCAM willneed updating, a situation called TCAMupdate.14 In Liu’s technique, for a routingupdate, compaction would require the mini-mization of about 10,000 prefixes at a time.Minimizing such a large set of prefixes intro-duces two types of delays: The first delaycomes from Espresso-II’s computation time.The second delay comes from TCAM entryupdates. If M is the size of the prefix set to beminimized, then it takes O (M) time for aTCAM insert operation. This is because theTCAM must scan the minimized set of pre-fixes and update only those prefixes that havechanged. Deleting an entry from the TCAMis even costlier, taking O (M2) time. In ourtechnique, because the maximum value of M


Figure 5. TCAM architecture for routing lookup.

B, C, and D

Level 2Level 1

ReconfigurablePEBDecoder

256

PE1 = 0

PE2 = 0

PE3 = 0

PEi = 0

PEi+1 = 0

PEi+n = 0

24

Segment

Page

Enabledsegment

is 256 (rather than 10,000), the insertion anddeletion steps are fast and truly incremental.

For any routing update, our approachrestricts TCAM updates to a segment. So it ispossible to update several segments simulta-neously by modifying the decoder-PEB unit.Our technique can achieve a higher numberof updates per second than what a singleTCAM chip supports by using several TCAMchips and placing each on a separate bus.

In our architecture, it is possible for the seg-ment size to grow during update, possiblyleading to an overflow. Rebuilding the entirerouting table would take care of the overflow.However, a rebuild is time consuming, sounsuitable for fast lookup. Providing emptypages in the TCAM array will help managethe overflow. Reconfiguring the PEB on thefly will permit an empty page to become partof the overflowing segment. This reconfigu-ration takes microseconds.

Experimental setup and resultsWe simulated our approach using a chipset

containing two 1.6-GHz AMD Athlons tocompute the prefix minimizations. The prefixtraces for minimization came from four

routers (is, utah, bbnplanet, andattcanada), and we evaluated memorycompaction, minimization time, and worst-case power consumption. For memory com-paction and minimization time, we comparedour approach to Liu’s.3 We also compare ourworst-case power consumption to the resultsof Liu;3 Zane, Narlikar, and Basu;1 and Pani-grahy and Sharma.2

Memory compactionTable 1 indicates the compaction achieved

by prefix overlapping. For example, the caseof attcanada shows a 29-percent com-paction.

Table 2 shows the compaction from ourprefix minimization approach, and com-pares it to Liu’s results. The second columnshows the number of prefixes remaining ifwe were to minimize as many prefixes as pos-sible in the given routing table. Ourapproach and Liu’s group the prefixes beforeminimization; columns three and four showthe number of prefixes remaining after min-imization, for each approach. Although nei-ther approach eliminates all the prefixespossible, each significantly compacts therouting tables.

However, for large routing tables, we couldnot determine the prefix minimization of Liu’sapproach, even after running the simulationfor two days. Also, our approach compacts thesmaller routing tables of is and utahmorethan Liu’s approach, because our approachuses prefix expansion. In fact, our approachachieves more than 33 percent compactionfor the attcanada, bbnplanet, and isrouters, as column five of Table 2 shows.

We also determined the compaction of both

66

TCAM ARCHITECTURE

IEEE MICRO

Table 1. Compaction using prefix overlapping.

No. of prefixes Before After Compaction

Router compaction compaction (percentage)

is 801 648 19.1utah 145 133 8.3attcanada 112,412 79,769 29.0bbnplanet 124,538 92,774 25.5

Table 2. Compaction using prefix minimization.

No. of prefixes After minimizing After prefix After prefix Compaction of

all possible minimization minimization our approach Router prefixes (our approach) (Liu’s approach) (percentage)is 434 505 562 36.9utah 123 123 132 15.2attcanada Hangs* 70,368 Hangs* 37.4bbnplanet Hangs* 83,468 Hangs* 33.0

* The minimization processes do not complete within two days.

approaches when they include prefix overlap-ping followed by prefix minimization. Col-umn 2 of Table 3 shows the number of entriesremaining after our approach minimizes everyprefix possible; we compute this number usingequation 1. Column 3 shows the results thatLiu reported earlier. Column 4 indicates ourapproach’s total routing table compaction, andcolumn 5 shows the percentage improvementin compaction over Liu’s results. Ourapproach offers significant improvementsbecause it exploits the compaction opportu-nities inherent in our architecture and the pre-fix properties.

Minimization timeTable 4 shows the runtime for minimizing

the prefixes using Espresso-II in bothapproaches. The results indicate the maxi-mum possible time that the 1.6-GHz Athlon,dual-processor chipset takes to minimize theprefixes when the neighboring router requeststhe addition of a new prefix. The worst-caseminimization time for Liu’s approach was1,098.47 seconds for the attcanada router(15,146 prefixes), which is very expensive forincremental updates. Our approach, on theother hand, took a worst-case time of 0.006seconds for bbnplanet. This value is notonly small and practical for real-life updates,but also bounded because at any point of time

the number of inputs to Espresso-II will neverexceed 256. Espresso-II’s runtime also increas-es with the number of bits in the prefixesunder consideration. Our approach uses eightbits as opposed to the 32 bits for Liu’sapproach, further reducing runtime.

Worst-case power consumptionTCAM power consumption is proportion-

al to the number of bits enabled in the TCAMduring the search operation.1 Table 5 shows theworst-case power consumption (32-bit equiv-alent) for several approaches. Liu’s approachenables the entire TCAM (N prefixes) duringevery lookup operation, making it very costly.Panigrahy and Sharma’s approach enables alarge number of entries (16,000)2 and 8,000group IDs. In the bitmap technique, althoughthe number of entries enabled during search(Cmax) is very small, the entries with prefixes ofless than 16 and greater than 24 bits are alwaysenabled during every lookup operation. Thenumber of these entries, Pb, is very large(65,068) for the bbnplanet router. The par-tition technique uses an index TCAM and adata TCAM. It enables only a page of size N/bin the data TCAM, where b is the number ofpages in the TCAM. For every lookup, how-ever, this technique searches the entire indexTCAM of size Sd, which is large if the page sizeis small. When the page size is large, the index


Table 3. Compaction using prefix minimization and overlapping.

No. of prefixes Our Liu’s Our approach’s Improvement

Router approach approach compaction (percentage) (percentage)is 348 460 56.6 24.7utah 87 123 40.0 62.1attcanada 43,378 54,476 61.4 15.2bbnplanet 53,625 69,646 56.9 22.6

Table 4. Timing analysis for minimization.

Our approach Liu’s approach Time No. of prefixes Time No. of prefixes

Router (seconds) updated (seconds) updatedis 0.003 34 0.159 373utah 0.002 8 0.008 44attcanada 0.005 143 1,098.47 15,146bbnplanet 0.006 171 63.04 7,580

TCAM has few entries; however, it is the size ofthe page enabled in the data TCAM that makesthe main contribution to power consumption.Our approach enables only a segment (Si) ofthe TCAM, which is quite small compared tothe entire TCAM. For example, in the bbn-planet router, the enabled segment hasabout 5,600 prefixes. Because most segmentsin our TCAM are very small, the average powerconsumption is quite small. It might be impor-tant to mention that the word length of theTCAM in our architecture is 24 bits as opposedto the 32 bits in conventional TCAMs.

Table 6 shows the worst-case power con-sumption per lookup in terms of the numberof bits enabled; we calculate this value usingequation 2. Our approach saves significantpower compared to approaches that enableentire TCAMs.

The proposed architecture enables only asmall portion of TCAM array during the

lookup operation and achieves significantpower savings. Results indicate that such anarchitecture holds promise in supportingstorage- and energy-efficient router designs.We plan to further reduce the update time toincrease the throughput of incrementalupdates using a new approach on minimiza-tion (other than Espresso-II) that will needreduced minimization time. MICRO

References

1. F. Zane, G. Narlikar, and A. Basu,“CoolCAMs: Power-Efficient TCAMs forForwarding Engines,” Proc. IEEE Infocom2003, IEEE Press, 2003, pp. 42-52.

2. R. Panigrahy and S. Sharma, “ReducingTCAM Power Consumption and IncreasingThroughput,” Proc. 10th Symp. High-Performance Interconnects (HOTI 02), IEEECS Press, 2002, p. 107-112.

3. H. Liu, “Routing Table Compaction inTernary CAM,” IEEE Micro, Jan.-Feb. 2002,vol. 22, no. 1, pp. 58-64.

4. M.J. Akhbarizadeh and M. Nourani, “An IPPacket Forwarding Technique Based onPartitioned Lookup Table,” Proc. IEEE Int’lConf. Comm. (ICC 02), IEEE Press, 2002, pp.2263-2267.

5. R.K. Brayton et al., Logic MinimizationAlgorithms for VLSI Synthesis, KluwerAcademic Publishers, 1984.

6. S. Nilsson and G. Karlsson, “IP-AddressLookup Using LC-Tries,” IEEE J. SelectedAreas Comm., vol. 17, no. 6, June 1999, pp.1083-1092.

7. M. Waldvogel et al., “Scalable High-SpeedIP Routing Table Lookups,” ComputerComm. Rev., vol. 27, no. 4, Oct. 1997, pp.25-36.

8. P. Gupta, Algorithms for Routing Lookupsand Packet Classification, doctoraldissertation, Dept. Computer Science,Stanford Univ., 2000.

9. P. Gupta, S. Lin, and N. McKeown, “RoutingLookups in Hardware at Memory AccessSpeeds,” Proc. IEEE Infocom 1998, IEEEPress, 1998, pp. 1240-1247.

10. M. Kobayashi, T. Murase, and A. Kuriyama,“A Longest Prefix Match Search Engine forMultigigabit IP Processing,” Proc. IEEE Int’lConf. Comm. (ICC 00), IEEE Press, 2000, pp.1360-1364.

11. S. Sikka and G. Varghese, “Memory-Efficient State Lookups with Fast Updates,”Proc. ACM Special Interest Group on DataComm. (SIGCOMM 00), ACM Press, 2000,pp. 335-347.

12. V. Srinivasan and G. Varghese, “FastAddress Lookups Using Controlled PrefixExpansion,” ACM Trans. ComputerSystems, vol. 17, no. 1, Oct. 1999, pp. 1-40.

13. C. Labovitz, G.R. Malan, and F. Jahanian,

68

TCAM ARCHITECTURE

IEEE MICRO

Table 5. Worst-case bounds on the number of

enabled entries.

Worst-case no. of Approach enabled entriesLiu 0(N)Panigrahy and Sharma 0(16,000 + p × 6/32)Bitmap 0(Cmax + Pb)Partition 0(N/b + Sd)Proposed 0(Si)

Table 6. Relative power consumption per lookup.

No. of Total no. of bits Savings Router enabled bits in the TCAM (percentage)is 205 × 24 460 × 32 67utah 17 × 24 123 × 32 90attcanada 4,582 × 24 544,476 × 32 99bbnplanet 5,596 × 24 69,646 × 32 94

“Internet Routing Instability,” IEEE/ACMTrans. on Networking, vol. 6, no. 5, Oct.1998, pp. 515-528.

14. D. Shah and P. Gupta, “Fast UpdatingAlgorithms for TCAMs,” IEEE Micro, vol. 21,no. 1, Jan.-Feb. 2001, pp. 36-47.

Ravikumar V.C. is a system designer atHewlett-Packard. His research interests includenetworking, operating systems, and embeddedsystems. He has an MS degree in computerscience from Texas A&M University.

Rabi N. Mahapatra is an associate professorwith the Department of Computer Science at

Texas A&M University. His research interestsinclude embedded-system codesign, system onchips, VLSI design, and system architectures.Mahapatra has a PhD from the Indian Insti-tute of Technology, Kharagpur. He is a seniormember of the IEEE Computer Society.

Direct questions and comments to Rabi N.Mahapatra, Dept. of Computer Science,Texas A&M Univ., College Station, TX77845; [email protected].

For further information on this or any othercomputing topic, visit our Digital Library athttp://www.computer.org/publications/dlib.


WE’RE ONLINESubmit your manuscript to Micro on the Web!

Our new ManuscriptCentral site lets you

monitor your submissionas it progresses throughthe review process. Thisnew Web-based systemshelps our editorial board

and reviewers trackyour manuscript.

For more information, see us at http://cs-ieee.manuscriptcentral.com

TCAM architecture for IP lookup using prefix properties ...

Documents

Transcript of TCAM architecture for IP lookup using prefix properties ...