An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick...

24
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture for networking and communications systems, 2007 Presenter: Ching Hsuan Shih Date: 2014/02/26

Transcript of An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick...

Page 1: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

An Improved Algorithm to Accelerate Regular Expression Evaluation

Author: Michela Becchi, Patrick Crowley

Publisher: 3rd ACM/IEEE Symposium on Architecture for networking and communications

systems, 2007

Presenter: Ching Hsuan Shih

Date: 2014/02/26

Page 2: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

Outline

I. Introduction

II.Motivation

III.The Proposal

IV.Reducing the Alphabet

V. Encoding

VI.Experimental Evaluation

Page 3: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

I. Introduction

• Signature-based deep packet inspection has taken root as a dominant security mechanism in networking devices and computer systems.

• Regular expressions are more expressive than simple patterns of strings and therefore able to describe a wider variety of payload signatures.

• There has been a amount of recent work on implementing regular expressions, particularly with representations based on deterministic finite automata (DFA).

Page 4: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

I. Introduction (Cont.)

• DFAs have attractive properties that explain the attention they have received.

• They have predictable and acceptable memory bandwidth requirements.

• For any given regular expression, a DFA with a minimum number of states can be found [3].

Page 5: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

I. Introduction (Cont.)

• DFAs corresponding to large sets of regular expressions containing complex patterns can be prohibitively large in terms of numbers of states and transitions.

• Yu et al. [15] have proposed segregating rules into multiple groups and evaluating the corresponging DFAs concurrently.

• Delayed Input DFA (D2FA) [9] redundant transitions common to a pair of states with a single default transition.

Page 6: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

I. Introduction (Cont.)

• D2FA has three weaknesses

• It requires a user-provided parameter value which can only be determined experimentally for a given rule-set.

• It creates a data-structure whose worst-case paths may be traversed for each input character processed.

• It requires multiple passes over large support data structures during the construction phase.

• We propose an improved simplified algorithm for building default transitions that addresses the problems above.

Page 7: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

II. Motivation

In this section, we describe the D2FA approach [9].

• The basic goal of the D2FA is to reduce the amount of memory needed to represent all the state transitions in a DFA.

Page 8: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

II. Motivation (Cont.)

• During the string matching operation, the traversal of D2FA will be performed according to the Aho-Corasick algorithm [1], treating default transitions as failure pointers.

• The heuristic proposed in [9] to build a D2FA can be explored systematically as a maximum spanning tree problem on an undirected graph.

• This maximum spanning tree problem can be solved with Kruskal’s algorithm [5].

Page 9: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

II. Motivation (Cont.)

• After the operation of Kruskal’s algorithm, the root of each tree can be selected.

• The node having the smallest maximum distance from any vertices within the same tree is chosen.

• Direct all default transitions towards the root of the default transition tree.

• In order to limit the maximum default path length, a heuristic is proposed to address this problem by determining a maximum spanning tree forest with bounded diameter.

Page 10: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

II. Motivation (Cont.)

Page 11: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

III. The Proposal

• We now take advantage of a simple fact:

• DFA traversal always starts at a single initial state S0

• We propose a more general compression algorithm which leads to a traversal time bound independent of the maximum default transition path length.

Page 12: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

III. The Proposal (Cont.)

• Definition: For each state s, we define its depth as the minimum number of states visited when moving from s0 to s in the DFA.

Page 13: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

III. The Proposal (Cont.)

• Lemma: With any string of length N, a 2N time bound is guaranteed on all D2FA having only “backwards” transitions.

• A string of length N implies N labeled transitions to be followed and the number of default transitions is always at least one less than the number of labeled transitions taken.

• For a string of length N, the total number of state traversals cannot be higher than 2N-1.

Page 14: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

III. The Proposal (Cont.)

3.1 Problem Formulation

• The problem can be now formulated as an instance of maximum spanning tree on a directed graph.

Page 15: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

III. The Proposal (Cont.)

3.2 An example

Page 16: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

III. The Proposal (Cont.)

3.3 Algorithm

• The whole problem is reduced to having each state select the state with lower depth having the most number of outgoing transitions in common with it.

Page 17: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

III. The Proposal (Cont.)

Page 18: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

IV. Reducing the Alphabet

• The basic idea is the following: In an alphabet ∑, two symbols ci and cj will fall into the same class if they are treated the same way in all DFA states.

• In other words, given the transition function δ(states, Σ)→states, δ(s,ci)= δ(s,cj) for each state s belonging to the DFA.

• In practical scenarios (ASCII alphabet) this table will contain 256 entries, with a maximum width of 1 byte (for heavily compressed alphabets 5-6 bits per character may suffice).

Page 19: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

V. Encoding

5.1 Bitmaps• A scheme [18] consists of associating a bitmap as large as the alphabet size to each

DFA state.

• Bits corresponding to uncompressed labeled transitions present in the current state can be set to 1; the remaining bits are set to 0.

• State identifiers can be simply represented through their base address in memory.

• The length of the necessary bitmaps can substantially decrease after alphabet reduction.

Page 20: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

V. Encoding (Cont.)

5.2 Content addressing• A technique [16] consists in representing state identifiers with content labels, which

are stored in memory as next state transitions.

• A state content label contains several fields:

• A state discriminator

• The list of characters for which a labeled transition is defined

• An identifier for the default transition state

• The size of a content label depends on the number of labeled transitions defined for the corresponging state.

Page 21: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

VI. Experimental Evaluation

Page 22: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

VI. Experimental Evaluation (Cont.)

Page 23: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

VI. Experimental Evaluation (Cont.)

Page 24: An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

VI. Experimental Evaluation (Cont.)