1904 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED...

A Memory-Efficient Bit-Split Parallel StringMatching Using Pattern Dividing for Intrusion

Detection SystemsHyun Jin Kim, Hong-Sik Kim, and Sungho Kang, Member, IEEE

Abstract—For the low-cost hardware-based intrusion detection systems, this paper proposes a memory-efficient parallel string

matching scheme. In order to reduce the number of state transitions, the finite state machine tiles in a string matcher adopt bit-level

input symbols. Long target patterns are divided into subpatterns with a fixed length; deterministic finite automata are built with the

subpatterns. Using the pattern dividing, the variety of target pattern lengths can be mitigated, so that memory usage in homogeneous

string matchers can be efficient. In order to identify each original long pattern being divided, a two-stage sequential matching scheme is

proposed for the successive matches with subpatterns. Experimental results show that total memory requirements decrease on

average by 47.8 percent and 62.8 percent for Snort and ClamAV rule sets, in comparison with several existing bit-split string matching

methods.

Index Terms—Computer network security, finite state machines, site security monitoring, string matching.

Ç

1 INTRODUCTION

INTRUSION detection systems (IDSs) are designed to detectvarious hazardous contents and alert their existence in

the networks. Most IDSs adopt a rule set that contains theinformation about target patterns from hazardous packetpayloads and actions against the target patterns [1]. Asshown in [2] and [3], most adopted patterns to be identifiedare described with strings. Therefore, the string matchingengine is still an essential component. The string matcher isa processing unit that detects mapped patterns from packetpayloads. A string matching engine can have multiplestring matchers for parallel string matching.

Due to the slow speed of the software-based string engine,the hardware-based string matching engine is preferred dueto great parallelism for the high-performance IDSs. Inparticular, as shown in [4], [5], and [6], the memory-basedstring matching engine allows on-the-fly update of memorycontents for high reconfigurability. However, there areseveral well-known challenges: high throughput, regularity,scalability, and low memory requirements. Especially, in thememory-based string matching engine, the string matchingbased on deterministic-finite automaton (DFA) is frequentlyadopted due to the deterministic transitions between statesaccording to input symbols; state transitions can be

performed in a fixed number of cycles, where the throughputcan be maintained unchanged. In addition, due to the fixednumber of output transitions in a state, regularity can beguaranteed in the DFA-based string matching engine.Scalability can be supported by the homogeneity of multiplestring matchers where DFAs are mapped. Because of thedeterministic transitions between states, however, memoryrequirements are proportional to both the number of statesand the number of transitions in a state. The total cost of astring matching engine is directly related to memoryrequirements; therefore, the target pattern informationshould be compressed. In the traditional Aho-Corasickalgorithm [7] and the bit-split DFA-based string matchingin [8] and [9], common prefixes between target patterns areshared in a DFA.

Lin et al. proposed a pattern matching algorithm thatreduced total memory requirements by sharing commoninfixes of target patterns [4]. For the pattern identification, astate should contain its own match vector with a set of bits,where each bit represents a matched pattern in the state. Eventhough the information of shared common infixes was storedin match vectors, the number of shared common infixes waslimited by the size of the match vectors. In addition,throughput could decrease due to the modified statetransition mechanism. In [10], the memory requirementsfor match vectors were reduced by relabeling states andeliminating the match vectors of nonoutput states. By sharingcommon infixes of target patterns [4] or relabeling states andeliminating the match vectors of nonoutput states [10], thememory usage in the match vectors could be efficient.

However, the variety of target pattern lengths is anotherserious problem in achieving regularity and scalabilitywith low hardware cost. Each pattern consists of multiplecharacter codes, where the number of character codes isdefined as the pattern length. According to the rule sets,the distribution of pattern lengths could be different from

1904 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 22, NO. 11, NOVEMBER 2011

. H.J. Kim is with the Memory Division of Samsung Electronics, ComputerSystems and Reliable SOC LAB., Department of Electrical and ElectronicEngineering, Yonsei University, Seodaemoon-Gu Shinchon-Dong 134,Seoul 120-749, Korea. E-mail: [email protected].

. H.-S. Kim and S. Kang are with the Computer Systems Reliable SOCLAB., Department of Electrical and Electronic Engineering, YonseiUniversity, Seodaemoon-Gu Shinchon-Dong 134, Seoul 120-749, Korea.E-mail: {hongsik, shkang}@yonsei.ac.kr.

Manuscript received 5 Aug. 2009; revised 28 Mar. 2010; accepted 21 Nov.2010; published online 10 Mar. 2011.Recommended for acceptance by Y. Liu.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TPDS-2009-08-0351.Digital Object Identifier no. 10.1109/TPDS.2011.85.

1045-9219/11/$26.00 � 2011 IEEE Published by the IEEE Computer Society

each other. In addition, the variation of pattern lengths ineach rule set is irregular. If target patterns are to bemapped onto multiple homogeneous string matchers,memory usage cannot be balanced without consideringdifferent pattern lengths.

In order to reduce the memory requirements of the DFA-based string matching engine, this paper proposes amemory-efficient parallel string matching scheme using thepattern dividing approach and its hardware architecture forthe pattern identification. Long target patterns are dividedinto subpatterns with a fixed length; therefore, the variety oftarget pattern lengths can be mitigated. By balancingmemory usage between the string matchers, unused memoryarea in homogeneous string matchers decreases. Moreover,the number of shared common states increases due to boththe reduced length and the increasing number of subpat-terns, compared with the cases of the string matching withlong target patterns. For each string matcher, DFAs are builtwith bit-level input symbols for the bit splitting in order toreduce the number of state transitions from each state. Foridentifying the original long target patterns, the successivematches with subpatterns are detected using the proposedtwo-stage sequential string matching engine. Experimentalresults show that memory requirements decrease on averageby 47.8 percent and 62.8 percent for selected rules Snort andClamAV, compared with several existing bit-split stringmatching approaches in [8], [9], [10], and [4].

2 BACKGROUND INFORMATION

2.1 Divided Patterns

A target pattern and a set of its k subpatterns, which areobtained after dividing the target pattern, are denoted as Pi

and Qi ¼ fSPi1;SPi2; . . . ;SPikg, respectively. The subscripti is the index of the target pattern. A set of k subpatterns Qi

will be called the quotient vector of Pi. The fixed length of ksubpatterns is denoted as f . If the length of a target patternis shorter than f , the target pattern does not need to bedivided, so the pattern is defined as the short pattern. Theremnant pattern Ri represents a suffix or residual subpatternof the target pattern Pi that succeeds the quotient vector ofPi. The match in a subpattern is encoded with a quotientindex, which represents the unique index of the subpatternmatch. The remnant pattern Ri should be matchedsequentially after the quotient vector Qi is matched.

2.2 Memory-Based Bit-Split DFA

From the definition in [11], DFA is an FSM where there isone and only one transition to a next state according to eachpair of state and input symbols. DFA can be representedwith a five-tuple: a finite set of states (Q), a finite set ofinput symbols (

P), a transition function (� : Q�

P! Q),

an initial state (q0 2 Q), and a set of output states (F � Q).The identification index of a target pattern is an individualkeyword used to distinguish the target pattern match. Thememory requirements of DFA are proportional to the size ofQ and

P.

2.3 Pattern Identification

For each target pattern, a unique identification index shouldbe provided in order to distinguish its pattern match from

other pattern matches. If multiple target patterns aremapped onto a DFA, it is possible that a target pattern canbe a subpattern of other target patterns. For example, it isassumed that four target patterns {“abc,” “abcd,” “ac,”“bcd”} are mapped on a DFA, where target pattern lengthsrange from 2 to 4. The fourth target pattern is a suffix of thesecond target pattern. If the second target pattern is matched,the fourth target pattern is always matched, but not viceversa. We let a target pattern Pi be a suffix of another targetpattern Pj for Pi 6¼ Pj. If different identification indexes areprovided for the matches with Pi and Pj, respectively, Pi isexplicitly identified when Pj is matched. If only theidentification index for Pj is provided, Pi is implicitlyidentified. In this case, only the target pattern with thelongest prefix from the initial state has its own identificationindex in the implicit identification; therefore, users coulddetect matches with Pi after analyzing the identification forPj with extra effort. Let’s assume that only k subpatterns of aquotient vector, Qi ¼ fSPi1;SPi2; . . . ;SPikg, are mappedonto a DFA. In this case, the number of output states for thesubpatterns is equal to or smaller than k. In addition, a targetpattern Pi can be identified after its quotient vector Qi and itsremnant vector Ri are matched in order.

3 PROPOSED STRING MATCHING SCHEME

3.1 Architecture of FSM Tiles

Multiple string matchers are adopted for parallel stringmatching. In a string matcher, several homogeneous FSMtiles take n bits as an input at every cycle. In the state of eachFSM tile, the pattern identification information is stored as apartial match vector (PMV), where the ith bit indicateswhether the ith pattern is matched or not in the state. Apattern can be identified with a full match vector (FMV),which is obtained with the logical AND operation of PMVsin all FSM tiles.

The different types of FSM tile in Fig. 1 can be adopted.The number in the angle brackets describes the field width.In the FSM tile in Fig. 1a, every state can indicate its PMV. Adifference of the FSM tile in Fig. 1a from those in [8] and [9]is that the FSM memory for storing next-state pointers canbe separated from the PMV table. As shown in Fig. 1b, ifthere is no need to have PMVs in several states, the memoryallocation for the states is not required; only several PMVsare stored in a PMV table. The stored PMVs are defined asnonzero PMVs; the PMVs to be reduced are defined as zeroPMVs. When many PMVs can be shared between multiplestates, the FSM tile type in Fig. 1c is beneficial by adoptingseparate small PMV table. A pattern match index (PMI) ineach state indicates a unique PMV for the state. By adoptinga separate PMV table, the memory requirements for storingrepeated PMVs can be eliminated. For example, it isassumed that four target patterns {“ab,” “abb,” “abab,”“a”} are mapped on an FSM tile with one input bit of the theleast significant bit (LSB). The fourth pattern “a” is theprefix of the other patterns. In addition, the pattern “a” canbe an infix of the third pattern “abab.” In this case, twooutput states for the pattern “a” can share the same PMV. Inanother example, target patterns with same lengths canshare the same PMV. For example, let us assume that an

KIM ET AL.: A MEMORY-EFFICIENT BIT-SPLIT PARALLEL STRING MATCHING USING PATTERN DIVIDING FOR INTRUSION DETECTION... 1905

FSM tile takes two LSBs for input symbols. The matcheswith patterns “ab” and “cd” indicate an identical PMV inthe FSM tile.

3.2 Divided Pattern Matching

In order to explain the divided pattern matching with anexample, “00,” “j05 00j,” “BN j10 00 02 00j,” “BN j20 0002 00j,” and “getclients” are assumed to be a set of targetpatterns, where the sequence of two digits between pipe

symbols is the sequence of hexadecimal numbers. Thelength of the subpatterns for the quotient vector is fixed as 3.All divided patterns are ordered as shown in Fig. 2, wherebinary code values are provided in the right column. Let usassume that the LSB of characters is adopted for the input ofan FSM tile.

Different types of FSM tiles in string matchers areadopted for the quotient vector, the remnant pattern, andthe short pattern matching, respectively. The quotientvector matching adopts all possible states for the subpat-terns with the same length; only the output states shouldindicate nonzero PMVs, so the architecture of the FSM tilein Fig. 1b is adopted. The numbers of output states andpossible states are determined according to the length of thesubpatterns. If many output states are shared, the numberof mapped subpatterns could be greater than the number ofoutput states. In this case, the maximum number of mappedtarget patterns is the same as the number of bits in a PMV.

Fig. 3a describes a DFA for the quotient vector matchingin Fig. 2, where the double-circled eight states representpossible output states. In addition, the failing pointers arenot shown for clarity. Subpatterns SP32 and SP42 areidentical, so only the identification index of SP32 is shownin the nonzero PMV table. The DFA is implemented in anFSM tile for one input bit in Fig. 3b, where the startingaddress of the nonzero PMV table is the same as the startingaddress of the FSM tile. The lengths of remnant patternsand short patterns are shorter than the fixed length ofsubpatterns in the quotient vector. In this case, any statesexcept for the initial state can be output states; therefore, thearchitecture of the FSM tile in Fig. 1a is adopted.

The DFA for the remnant pattern and the short patternmatching is described in Fig. 4a, where all states except for


Fig. 1. FSM tile architectures: (a) FSM tile where all states have theirown PMVs. (b) FSM tile that stores only nonzero PMVs. (c) FSM tile thatadopts PMI and separate PMV table.

Fig. 2. Example of subpatterns for the divided pattern matching.

Fig. 3. FSM tile contents for the quotient vector matching: (a) DFA forthe quotient vector matching. (b) FSM tile contents of (a).

Fig. 4. FSM tile contents for the remnant pattern and short patternmatching: (a) DFA for the remnant pattern and short pattern matching.(b) FSM tile contents for the remnant pattern matching. (c) FSM tilecontents for the short pattern matching.

the initial state can be output states. The FSM contents of theremnant pattern matching and the short pattern matchingare shown in Figs. 4b and 4c, respectively. In Fig. 4b, R1 canbe matched in three output states. P1 and P2 are two; on theother hand, the number of output states is only two for P1

and P2 in Fig. 4c. In a state of the FSM tile for the remnantpattern matching, when the depth from the initial state isthe maximum, the next state becomes the initial state, whichmeans that the initial state is always reached periodically.

3.3 Sequential Matching with Divided Patterns

The match with a divided target pattern consists ofsuccessive matches with its quotient vector and remnantpattern. If a target pattern is divided by a fixed length f , thesequential matches with the subpatterns in the quotientvector should be detected at f different points. Because thestarting points of the sequential matches can be different,the points when the target pattern is matched can vary.Fortunately, the sequential matches for the quotient vectorcan be performed based on the FSM architecture in Fig. 1cwith additional registers. State pointers and PMVs are heldfor f cycles and updated periodically every f cycles. Due tovarious lengths of the remnant patterns, the output states inan FSM for the remnant patterns can be reached at anycycle. Therefore, the number of string matchers withidentical contents is multiplied by the fixed length f .

3.4 String Matching Engine Architecture

Based on the sequential matching mentioned above, anarchitecture of the proposed string matching engine isillustrated in Fig. 5. In Fig. 5, f is the fixed length ofsubpatterns in the quotient vector. According to f , thenumber of the remnant pattern matchers can be varied.

A character code of one byte from a payload is inputed inthe quotient vector matcher. The quotient vector matcherconsists of v string matchers, where the width of an FMV isequal to the number of bits in a PMV of an FSM tile, p. In thequotient vector matcher, only one bit in total temporary

match vectors becomes true because only one subpatterncan be matched in the quotient vector matcher per cycle.Therefore, the temporary match vectors are encoded usingv � p : dlog2v � pe binary encoder, where the encoder outputcan be the quotient index.

In the quotient index matcher, an FSM tile in each stringmatcher takes n bits from the quotient index as an inputsymbol. The index match vector is the FMV of the stringmatcher in the quotient index matcher. If the number of bitsin a PMV of a string matcher is set as q in the quotient indexmatcher, the width of an index matcher vector is f � q.Unlike [8] and [9], n bits for an FSM tile input can beselected from both the most significant bits (MSBs) andLSBs of the byte input alternately. When target patterns aresorted lexicographically, the values of LSBs can be changedmore frequently than the values of MSBs [12]. Therefore,only by pairing these fixed MSBs with LSBs, the numbers ofadopted states between FSM tiles can be more balanced.

In a similar way, the remnant pattern matching vector isgenerated with f remnant pattern matchers being required.In this case, the FMVs of the remnant pattern matchers, r,are synchronized with the quotient index matching vectorusing delay registers.

3.5 Pattern Partitioning and Mapping

After all target patterns are divided with a fixed length f ,the unique patterns for the quotient vector matcher, thequotient index matcher, the remnant pattern matcher, andthe short pattern matcher are determined. For each matchertype, subpatterns are partitioned into multiple subsets forhomogeneous string matchers. The pattern partitioning isrepresented in Algorithm 1 as follows: procedure PPdenotes the pattern partitioning with patterns T and stringmatcher parameter M. First, patterns are sorted lexicogra-phically to increase the number of shared common prefixes.Then, a procedure PM, which denotes the patternmapping, is called for obtaining the FSM tile contents fora string matcher. Then, the unmapped patterns are


Fig. 5. An example of the proposed string matching engine architecture.

returned. The obtained FSM tile contents are stored invec fsms. The pattern mapping is repeated until there areno unmapped patterns.

Algorithm 1. Pattern_Partitioning

1: Procedure PP(patterns T, string matcher M)

2: t SortðTÞ3: While t 6¼ � do

4: fsms; t PMðt;MÞ5: vec fsms ¼ vec fsmsþ fsms6: end while

7: return vec fsms

8: end procedure

Algorithm 2 represents the pattern mapping for a stringmatcher, where sorted patterns ST and string matcherparameter M are given. Initially, for loop checks whetherall FSM tiles can be built when the number of mappedpatterns is maximum, pðMÞ. A variable � denotes thenumber of mapped patterns in each turn. With � front

patterns from the given sorted patterns, mapped t, aprocedure Build tries is called to obtain the tries forFSM tiles in a string matcher. Then, if the largest numberof states in tries is greater than the maximum number ofstates in an FSM tile, sðMÞ, next iteration is continued byreducing � by one; otherwise, the loop is broken afterobtaining the mappable patterns, mapped t. The unmappedpatterns are returned by a procedure Remove. After exiting

for loop, a procedure Add failing pointer is called to addfailing pointers from each state to the longest suffix state[7]. Finally, for the mapped patterns mapped t, the contentsof the PMVs are obtained by calling a procedure calledSet PMV s.

Algorithm 2. Pattern_Mapping

1: procedure PM(sorted patterns ST, string matcher M)

2: t ST

3: for � ¼ pðMÞ to 1 do

4: mapped t frontð�; tÞ5: tries Build triesðmapped tÞ6: if maxðNum statesðtriesÞ > sðMÞ then

7: continue

8: else

9: Removeðmapped t; tÞ10: break

11: end if

12: end for

13: dfas Add failing pointerðtries;mapped tÞ14: fsms Set PMV sðdfas;mapped tÞ15: return fsms and t

16: end procedure

Due to the hardware resource parameter such as pðMÞand sðMÞ, the pattern mapping algorithm shows theconstant time complexity, Oð1Þ. Therefore, the time com-plexity for partitioning total patterns can be OðT Þ, where Tdenotes the number of patterns. On the other hand, the

time complexity of pattern sorting can be OðT � log2T Þ.However, due to the large constant factor of the patternmapping, if T is not sufficiently large, the pattern sortingwill not be dominant.

4 EXPERIMENTAL RESULTS

4.1 Experimental Environments

To evaluate the proposed algorithm, four rule sets wereadopted in the Snort v2.8 rules [2], where exact strings inthe contents field were parsed. A set of total target patternstotal was extracted from all Snort rule sets. In addition, thepatterns except for the regular expression for ClamAntiVirus (ClamAV) [3] were adopted. Table 1 providesseveral characteristics of the extracted target pattern sets.

Three memory-based string matching approaches in [8],[10], and [4] were evaluated for the apples-to-applescomparisons. In the evaluations for the Snort rule sets, thenumber of states in an FSM tile should be greater than 122,due to the maximum length of target patterns. Consideringthe design analysis in [8], the maximum number of states inan FSM of a string matcher was set as 128 or 256 in theevaluations of previous studies; however, for the ClamAVrule set, only 256 was adopted for the number of states in anFSM tile due to the maximum target pattern length of 210.For the five Snort rule sets, evaluations were performed byvarying the number of bits in a PMV from 16, 32, 64, and 96;for the ClamAV rule set, numbers 8, 16, 32, and 64 wereadopted. Based on the analysis in [8], the number of inputbits in an FSM for each string matcher was set as 2. In theextension of the bit-split string matching in [10], the numberof unique nonzero PMVs in an FSM tile was set equal to thenumber of bits in a PMV.

In the proposed string matching engine, according to thefixed length of subpatterns in the quotient vector, f , thenumbers of states in the FSMs of the quotient vector matcher,the remnant pattern matcher, and the short pattern matcherwere predetermined. In the quotient index matcher, themaximum number of states was set as d128

f e or d256f e for the

Snort rule sets; for the ClamAV rule set, only d256f e was

adopted due to the long target pattern length. In theexperiments, the number of bits in a PMV for the quotientvector matcher, the remnant pattern matcher, and the shortpattern matcher was set arbitrarily as 128. The number of bitsin a PMV for the quotient index matcher was varied from 16,32, 64, and 96 for the Snort rule sets and from 8, 16, 32, and 64for the ClamAV rule set. The number of unique PMVs in aPMV table was set as the half of the number of bits in a PMV,unless otherwise noted. Considering the parameter valuesmentioned above, the optimal number of input bits for thequotient index matcher was set as 2.


TABLE 1Characteristics of Target Pattern Rule Sets

4.2 Parameter Sensitivity

In order to determine several design parameter values, astudy on their sensitivity in terms of total memoryrequirements was required. First, experiments were per-formed by varying f from 2 to 5. Figs. 6a and 6b present asummary of total memory requirements by varying f forthe total and spyware-put rule sets in Table 1. In addition, thenumber of bits in a PMV and the maximum number ofstates in an FSM tile for the quotient index matcher werevaried. The minimized memory requirements were ob-tained when f was 4 and 3 for total and spyware-put rulesets, respectively. For the minimized total memory require-ments for the total rule set, there were two parameter setsfor the quotient index matcher for the number of bits in aPMV and the number of states in an FSM: (16, d128

f e) and (32,d256f e). On the other hand, in order to minimize the total

memory requirements of the spyware-put rule set, the twosets of the number of bits in a PMV and the number of statesin an FSM were (32, d128

f e) and (64, d256f e). In comparison with

the spyware-put rule set, due to the long average length oftarget patterns in the total rule set, the optimal number ofbits in a PMV decreased. In addition, the optimal f for thetotal rule set increased. In the experiments for other Snortrule sets, the total memory requirements were minimizedwhen the number of bits in a PMV was 32 and the numberof states in an FSM was d128

f e for the quotient index matcher.In the ClamAV rule set, when f was 4, the total memoryrequirements were minimized. In this case, the number ofbits in a PMV was 16 and the number of states in an FSMwas d256

f e. Due to the large average of target pattern lengthsin the ClamAV rule set, the optimal number of bits in aPMV for the quotient index matcher was small.

Fig. 7 describes the factors of the total memoryrequirements by varying f . Fig. 7a shows the factors oftotal memory requirements for the total rule set when the

number of states in an FSM was d256f e for the quotient index

matcher; the number of bits in a PMV was set as 32. On theother hand, Fig. 7b shows the memory requirements for theClamAV rule set when the number of states in an FSM forthe quotient index matcher was d256

f e; the number of bits in aPMV was set as 16. As f increased, the ratio of the memoryrequirements for the quotient vector matcher, remnantpattern matcher, and short pattern matcher increasedexponentially. The memory requirements for the quotientindex matcher, however, decreased gently with f . There-fore, there were threshold points of f to minimize the totalmemory requirements.

In addition, when f was 3, experiments were performedby varying the number of unique PMVs in a PMV table of thequotient index matcher. The experiments considered twoadditional cases: the numbers of unique PMVs were set equalto the number of bits in a PMV and to one quarter of thenumber of bits in a PMV, respectively. For the five Snort rulesets, when the number of unique PMVs was the half of thenumber of bits in a PMV, the total memory requirementswere reduced on average by 30.9 percent and 34.3 percent,compared with other cases. Therefore, a reasonable numberof PMVs, the half of the number of bits in a PMV, couldminimize total memory requirements for the proposed stringmatching engine.

4.3 Comparison to Existing Bit-Split StringMatching Approaches

In Table 2, the minimized memory requirements andnumber of string matchers for the four matcher types are


Fig. 6. Total memory requirements obtained by varying the fixed lengthf: (a) Total memory requirements for the total rule set. (b) Total memoryrequirements for the spyware-put rule set.

Fig. 7. Factors of total memory requirements by varying the fixed lengthf: (a) Factors of total memory requirements for total rule set. (b) Factorsof total memory requirements for ClamAV rule set.

TABLE 2Total Memory Requirements for Target Pattern Rule Sets

listed. For all six rule sets, the numbers of adopted stringmatchers and the memory requirements for the quotientindex matcher were greater than those of the other threematchers. Especially, the minimum target pattern length ofthe ClamAV rule set was 10, so the adopted number ofstring matchers in the short pattern matcher was zero. Thenormalized total memory requirements of a rule set wereobtained after dividing total memory requirements bythe total sum of bytes in the unique target patterns of therule set. For the Snort rule sets, the normalized memoryrequirements ranged from 6.1 to 8.6 (bytes/char). For theClamAV rule set, due to the large average of target patternlengths, the normalized memory requirements were up to13.5 (bytes/char).

Fig. 8 illustrates a summary for the comparison with theexisting bit-split string matching approaches described in [8],[10], and [4]. The string matching approaches in [8], [10], and[4], were denoted as bit_split, ex_bit_split, and infix, respec-tively. For all adopted rule sets, the total memory require-ments decreased on average by 62.8, 53.5, and 47.8 percent,compared with bit_split, ex_bit_split, and infix, respectively.

4.4 Hardware Implementation Issues

An example of the proposed string matching engine for theSnort backdoor rule set was implemented in order to discussseveral hardware implementation issues. Based on theXC4VLX160-12ff1513 device in the Xilinx Virtex-4 [13], thedistributed RAMs and logics were implemented based onthe lookup tables (LUT). In the field programmable logicarray (FPGA) device, the possible minimum depth of thedistributed RAMs is 16; the memory requirements shouldnot be reduced when the fixed pattern length in thequotient vector f was smaller than 4. Therefore, when fwas 4, the total memory requirements were minimized. Asshown in Fig. 5, the proposed string matching engine couldcontain additional logics and registers due to the two-stagestring matching; an encoder and delay registers should beimplemented with LUTs.

In the proposed hardware architecture, the deterministicstate transition was adopted. In the two-stage stringmatching with the quotient vector matcher and the quotientindex matcher, the latency for obtaining their uniqueidentification indexes was fixed (e.g., 10 cycles in ourimplementation); however, in each cycle, a character inputwas allowed.

The existing string matching approaches in [8], [10], and[4] were implemented. The hardware descriptions weresynthesized using the Xilinx Synthesis Technology (XST)[14]; the power consumptions were estimated with theXilinx Xpower Analyzer [15]. After configuring memorycontents, randomly generated data were inputted in orderto estimate power consumption based on logic simulation.In Table 3, the implementation results were compared withother existing bit-split string matching approaches. In theproposed string matching engine, the maximum operationfrequency was highest over 300 MHz due to the lowcomplexity of the first-stage FSM implementation. More-over, even though the high operation frequency wasadopted, the total logic power consumption was mini-mized. The number of adopted slices decreased by 50.9,42.5, and 32.4 percent, compared with bit_split, ex_bit_split,and infix, respectively. In the general FPGA device, a bit inthe memory contents can be configured with a bit in anLUT, whereas the logic function and flip-flop wereconfigured using multiple bits in an LUT. Therefore, insplit of the increasing number of flip-flops, the implementa-tion results above show that the proposed string matchingengine can reduce the total hardware costs significantly.

5 CONCLUSION

The proposed DFA-based parallel string matching schememinimizes total memory requirements. The problem ofvarious pattern lengths can be mitigated by dividing longtarget patterns into subpatterns with a fixed length. Thememory-efficient bit-split FSM architectures can reduce thetotal memory requirements. Considering the reducedmemory requirements for the real rule sets, it is concludedthat the proposed string matching scheme is useful forreducing total memory requirements of parallel stringmatching engines.

REFERENCES

[1] P.-C. Lin, Y.-D. Lin, T.-H. Lee, and Y.-C. Lai, “Using StringMatching for Deep Packet Inspection,” IEEE Computer, vol. 41,no. 4, pp. 23-28, Apr. 2008.

[2] Snort, Ver.2.8, Network Intrusion Detection System, http://www.snort.org., 2011.

[3] Clam AntiVirus, Ver.0.95.3. http://www.clamav.net., 2011.[4] C.-H. Lin, Y.-T. Tai, and S.-C. Chang, “Optimization of Pattern

Matching Algorithm for Memory Based Architecture,” Proc. ThirdACM/IEEE Symp. Architecture for Networking and Comm. Systems,pp. 11-16, 2007.

[5] S. Kumar, S. Dharmapurikar, F. Yu, P. Crowley, and J. Turner,“Algorithms to Accelerate Multiple Regular Expressions Matchingfor Deep Packet Inspection,” Proc. Conf. Applications, Technologies,Architectures, and Protocols for Computer Comm., pp. 339-350, 2006.


Fig. 8. Summary of comparisons with existing bit-split string matchingapproaches in terms of normalized memory requirements.

TABLE 3Comparison of Implementation Results for

the Snort Backdoor Rule Set

[6] F. Yu, Z. Chen, Y. Diao, T.V. Lakshman, and R.H. Katz, “Fast andMemory-Efficient Regular Expression Matching for Deep PacketInspection,” Proc. Second ACM/IEEE Symp. Architecture for Net-working and Comm. Systems, pp. 93-102, 2006.

[7] A.V. Aho and M.J. Corasick, “Efficient String Matching: An Aid toBibliographic Search,” Comm. ACM, vol. 18, no 6, pp. 333-340,1975.

[8] L. Tan and T. Sherwood, “A High Throughput String MatchingArchitecture for Intrusion Detection and Prevention,” Proc. 32ndIEEE/ACM Int’l Symp. Computer Architecture, pp. 112-122, 2005.

[9] L. Tan, B. Brotherton, and T. Sherwood, “Bit-Split String-MatchingEngines for Intrusion Detection and Prevention,” ACM Trans.Architecture and Code Optimization, vol. 3, no. 1, pp. 3-34, Mar. 2006.

[10] P. Piyachon and Y. Luo, “Compact State Machines for HighPerformance Pattern Matching,” Proc. 44th Ann. ACM/IEEE DesignAutomation Conf., pp. 493-496, 2007.

[11] Deterministic Finite-State Machine, http://en.wikipedia.org/wiki/Deterministic_finite_state_machine, 2011.

[12] H. Kim, H. Hong, H.-S. Kim, and S. Kang, “A Memory-EfficientParallel String Matching for Intrusion Detection Systems,” IEEEComm. Letters, vol. 13, no. 12, pp. 1004-1006, Dec. 2009.

[13] Virtex-4 FPGA User Guide, http://www.xilinx.com/support/documentation/user_guides/ug070.pdf., 2011.

[14] Xilinx Synthesis Technolgy, Xilinx ISE 9.1. http://www.xilinx.com/itp/xilinx10/books/docs/xst/xst.pdf., 2011.

[15] Xilinx XPower Analyzer, http://www.xilinx.com., 2011.

Hyun Jin Kim received the BS, MS, andPhD degrees in electrical and electronicengineering from Yonsei University, Seoul,Korea, in 1997, 1999, and 2010, respectively.From 2002 to 2004, he worked in the R&Dcenter of Samsung ElectroMechanics. Cur-rently, he is working in the Memory Divisionof Samsung Electronics as a senior engineer.His interests include parallel string matching,reconfigurable computing, interconnection net-

work, microarchitecture, and compiler.

Hong-Sik Kim received the BS, MS, and PhDdegrees in electrical and electronic engineeringfrom Yonsei University, in 1997, 1999, and 2004,respectively. He was a postdoctoral fellow atVirginia Tech, in 2005, and a senior engineer atSystem LSI Group in Samsung Electronics in2006. Currently, he is a research professor atYonsei University, Seoul, Korea. His currentresearch interests include VLSI design, VLSICAD, design for testability, and reliable network

on chip design.

Sungho Kang received the BS degree fromSeoul National University, Korea, and the MSand PhD degrees in electrical and computerengineering from the University of Texas atAustin. He was a postdoctorial fellow with theUniversity of Texas at Austin, a researchscientist with the Schlumberger Laboratory forComputer Science, Schlumberger Inc., and asenior staff engineer with the SemiconductorSystems Design Technology, Motorola Inc.

Since 1994, he has been a professor with the Department of Electricaland Electronic Engineering, Yonsei University, Seoul, Korea. His currentresearch interests include VLSI design, VLSI CAD and VLSI testing, anddesign for testability. He is a member of the IEEE.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.


1904 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED...

Documents

Transcript of 1904 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED...