StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection

15
StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Author: Xiaofei Wang, Junchen Jiang, Yi Tang, Bin Liu, and Xiaojun Wang Publisher: 2011 IEEE International Conference on Communications Presenter: Ching-Hsuan Shih Date: 2014/06/11 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.

description

StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection. Author: Xiaofei Wang, Junchen Jiang, Yi Tang, Bin Liu, and Xiaojun Wang Publisher: 2011 IEEE International Conference on Communications Presenter : Ching-Hsuan Shih Date: 2014/06/11. - PowerPoint PPT Presentation

Transcript of StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection

Page 1: StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection

StriD2FA: Scalable Regular Expression Matching for Deep Packet Inspection

Author: Xiaofei Wang, Junchen Jiang, Yi Tang, Bin Liu, and Xiaojun WangPublisher: 2011 IEEE International Conference on Communications Presenter: Ching-Hsuan ShihDate: 2014/06/11

Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.

Page 2: StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection

Outline Introduction Related Work System Design Principles and Challenges Building StriD2FAs from Regex Optimization of False Positive Evaluation

2National Cheng Kung University CSIE Computer & Internet Architecture Lab

Page 3: StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection

Introduction (1/2)

Signature-based deep packet inspection has taken root as a dominant security mechanism in networking devices and computer systems.

Regular expressions are more expressive than simple patterns of strings and therefore able to describe a wider variety of payload signatures.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

3

Page 4: StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection

Introduction (2/2)

A novel length-based matching (LBM) is presented for accelerating regex matching. LBM has a DFA-like matcher called Stride-DFA (StriD2FA).• Causing false positive.

National Cheng Kung University CSIE Computer & Internet Architecture Lab

4

Page 5: StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection

Related Work

National Cheng Kung University CSIE Computer & Internet Architecture Lab

5

Dharmapurikar et al. presented a scheme [7] that can process multiple characters per clock cycle with Bloom-filter.

A recent method [4] introduces the sampling techniques to accelerate regex matching, but it not all kinds of regex are supported.

Page 6: StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection

A. Converting input stream into stride lengths (SL) stream

In this manner, any SL sent to a StriD2FA must be in a finite alphabet set Σ = {1, …, w}.

System Design Principles and Challenges (1/5)

National Cheng Kung University CSIE Computer & Internet Architecture Lab

6

Page 7: StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection

B. An Example of StriD2FA Suppose the regex rule is “.*abba.{2}caca”. Here ‘a’ is chosen as the tag and the window size is 3.i. Fa(.*abba) = (1 | 2 | 3)+ 3ii. Fa(.{2}caca) = 3 1 2 | 1 3 2 | 2 2 2 | 1 1 2 2iii. Finally the regex Fa(.*abba.{2}caca)

= (1 | 2 | 3)+ 3 (3 1 2 | 1 3 2 | 2 2 2 | 1 1 2 2), where the alphabet set is {1, 2, 3}.

System Design Principles and Challenges (2/5)

National Cheng Kung University CSIE Computer & Internet Architecture Lab

7

Page 8: StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection

Given an byte stream T = “abcababbabccacabc”. It is first converted into SL stream Fa(T) = 3 2 3 3 1 2 And it matched by the StriD2FA, then the input stream is sent to

the verification module to make an accurate match by using some traditional methods (e.g., reversed DFA in [4])

System Design Principles and Challenges (3/5)

National Cheng Kung University CSIE Computer & Internet Architecture Lab

8

Page 9: StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection

C. Benefits of LBM Increased speed: According to the statistics in Section VI, average

SLs of some characters are larger than 100. Small memory consumption:

• Firstly, the number of states is generally less than traditional DFA (e.g., StriD2FA has 5 less states than the traditional DFA in Figure 2).

• Secondly, the fanout of each state is controlled by the window size.

System Design Principles and Challenges (4/5)

National Cheng Kung University CSIE Computer & Internet Architecture Lab

9

Page 10: StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection

D. Challenges Regex converting: In Section IV, a formal method to efficiently

construct StriD2FA from any regex is described. False positive rate

System Design Principles and Challenges (5/5)

National Cheng Kung University CSIE Computer & Internet Architecture Lab

10

Page 11: StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection

1. Compile Regex to standard DFA.2. Restructure the DFA by classifying all the transitions.

• All labels are removed on transitions and mark each transition whether its character is the tag (solid transition if true and dashed transition otherwise).

Building StriD2FAs from Regex (1/2)

National Cheng Kung University CSIE Computer & Internet Architecture Lab

11

Page 12: StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection

3. Transform the restructured DFA to a non-deterministic StriD2FA by the depth first search (DFS) algorithm.• If a solid transition (pointing to state q’) is reachable in L steps where L ≦ w,

add a transition labeled L from q to q’.• Otherwise (i.e., there is an all-dashed-transition path of length w to state q’),

add a transition labeled w from q to q’.

4. Determinize to the final StriD2FA (similar to the determinization in traditional DFA)

Building StriD2FAs from Regex (2/2)

National Cheng Kung University CSIE Computer & Internet Architecture Lab

12

Page 13: StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection

It is easy to find that choosing “frequent” characters in a rule as tags can greatly reduce false positive rate.

The “frequent” Freq(c, r) of a character c in a regex r refers to the number of occurrences of c in regex r over the sum of lengths of all fixed substrings in r.• Freq(c, r) = , here Sc, r is the set of fixed substrings in regex rule r and |s| is the

length of string s. The reason of using Freq(c, r) to select tags :

• It is simple to calculate.• With higher Freq(c, r), the possibility of false positive is lower because more part

of the regex rule is checked by the chosen set of tags.

Optimization of False Positive

National Cheng Kung University CSIE Computer & Internet Architecture Lab

13

Page 14: StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection

Evaluation (1/2)

National Cheng Kung University CSIE Computer & Internet Architecture Lab

14

Page 15: StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection

Evaluation (2/2)

National Cheng Kung University CSIE Computer & Internet Architecture Lab

15