Post on 18-Jan-2016
Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection
Sailesh Kumar, Jon TurnerMichela Becchi, Patrick Crowley,
George Varghese
2 - Sailesh Kumar - 04/21/23 2 - Jon Turner - 04/21/23
Motivation Network security applications scan packet
content to detect viruses, worms, etc.» typically use signatures common to suspicious packets» regular expressions provide powerful, general way to
describe signatures So what’s new?
» reg-ex matching well-understood for >30 years» reg-exes in network applications are different
– union of thousands of component patterns– state explosion from interacting “repeat patterns”
» tight performance constraints– wire speed processing at 10 Gb/s rates (and up)– limited memory space
3 - Sailesh Kumar - 04/21/23 3 - Jon Turner - 04/21/23
Regular Expression Refresher Sample regular expressions
» a.*b matches ab, aab, abb, accdb, ...» a(ab|c)+[^d] matches aabc, aca, acabb, ...
(a.*b)|(a(ab|c)+[^d])
1
3
2a
b
a,b,c,d
ca
5
6
7
4a b
a,b,c,d
0a b
a,b,c
c
NFA – nondeterministicfinite automaton
6 7 8 597
98
23
54
01
2 6 541 3 557 8 59
2 6 541 3 557 8 591 3 55
a b dc1 0 002 3 54
DFA
012501367
01570127
0134012
01015
0013
statesubsets
4 - Sailesh Kumar - 04/21/23 4 - Jon Turner - 04/21/23
Challenges for Intrusion Detection Hundreds to thousands of patterns
» many fairly simple, but not all» significant number include “repeats” with infinite or
bounded iteration Large space requirements
» DFA formed by combining patterns may require many more states than NFA
» for ASCII inputs, tabular representation of DFAs can be very large
Demanding real-time requirements» 1 or 2 off-chip memory accesses per input character
Must maintain state across many (>100K) flows» constrains affordable per-flow context
5 - Sailesh Kumar - 04/21/23 5 - Jon Turner - 04/21/23
Three-Way Tradeoff
Memory space» on-chip vs. off-chip» pattern matching automata and flow state
Parallelism» hardware solutions allow substantial parallelism» in NPs, parallelism more limited» more parallelism reduces automata space,
increases flow state
throughput
space parallelism
6 - Sailesh Kumar - 04/21/23 6 - Jon Turner - 04/21/23
Problems Addressed Reducing space used by DFAs
» typical tabular DFA is highly redundant– states share many common successors
» reduce redundancy using default transitions– trades off space for throughput
Making it compact and fast» choose default transitions for amortized performance » use content-addressing to skip over default transitions
Coping with state space explosion» process flows that stay in shallow states separately from
flows that “go deep” – fast-path/slow-path processing
7 - Sailesh Kumar - 04/21/23 7 - Jon Turner - 04/21/23
Delayed Input Finite Automata (D2FA)
In tabular DFA representation» for ASCII characters, 256 transitions per state» 50+ distinct transitions per state in real world datasets» need storage for 50+ edges
But, many states share similar sets of edges
Note that states 1 and 3 have common transitionsfor symbols a, b, d.Can we exploit this redundancy to reduce space?
Three patterns:a+, b+c, c*d+
4 transitionsper state
2
1 3b
4
5
a
c
ab
d
a
c
bc
b
a
c
d
c
d
a
dbd
8 - Sailesh Kumar - 04/21/23 8 - Jon Turner - 04/21/23
Default Transitions If (s1,a)=(s2,a) and (s1,b)=(s2,b),
» can replace explicit transitions (s1,a), (s1,b) with default transition from s1 to s2 (or could go other way)
» when parsing input, follow default transition when no outgoing transition defined on input character
» no input consumed when following default transition
2
1 3b
4
5
a
c
a b
d
a
c
bc
b
a
c
d
c
d
a
dbd
2
1 b
4
5
a
c
b
d
cb
a
c
d
c
a
3
d
9 - Sailesh Kumar - 04/21/23 9 - Jon Turner - 04/21/23
Selecting Default Transitions
2
1 3b
4
5
a
c
a b
d
a
c
bc
b
a
c
d
c
d
a
dbd
1 c
2
5
4
3
c
a
d
b
alternate(and better)
solution
2
1 3
4
5
3
33
3
2
2
2
33
2
spacereduction
graph
max wtspanning
tree
potentialsavings
1c
2
5
4
3
a
d
b
c
tree edges directed towards
chosen root
209edges
10 - Sailesh Kumar - 04/21/23 10 - Jon Turner - 04/21/23
Trading off Time and Space Sort edges in space-reduction graph by length For each edge, add to “forest” so long as does not create
cycle or create tree with excessive diameter Choose root for each tree at “most central node” Direct default transitions towards roots
sortededge list
{1,2}{4,5}{1,5}{2,4}{1,4}{2,5}{1,3}{3,5}{3,4}{2,3}
2
1 3
4
5
3
33
3
2
2
2
33
2
diameterbound 2
2
1 3b
4
5
a
d
cb
a
c
c
d
11 - Sailesh Kumar - 04/21/23 11 - Jon Turner - 04/21/23
Sample Results
Sample data set of 612 regular expressions Original DFA has 11.3K states, 2.3M transitions Transitions in D2FA
» with no depth bound, 0.75% of original» with depth bound of 5, 1.07%» with depth bound of 2, 2.54%» with depth bound of 1, 20.70%
Depth bound of d implies d+1 memory accesses per input character
12 - Sailesh Kumar - 04/21/23 12 - Jon Turner - 04/21/23
Representing D2FA
list vector
95% of states have ≤2 outgoing transitions Represent states with few transitions using list Represent others with vector (for direct access)
13 - Sailesh Kumar - 04/21/23 13 - Jon Turner - 04/21/23
Changing Performance Criteria Real objective is bounded time per packet
» amortized complexity, not worst-case» earn “credit” for every normal transition» “spend” a credit for each default transition» choose default transitions to guarantee never in debt
Simple way to ensure ≥0 credits» label states according to distance
from start state» restrict default transitions to go
from larger labels to smaller» bonus – simpler computation
– perform breadth-first search– at each node, select best edge
allowed for default transition ≤2 memory accesses per character
1 c
2
5
4
3
d
b0
1 2
1
1
a
c
14 - Sailesh Kumar - 04/21/23 14 - Jon Turner - 04/21/23
How Well Does It Work? On a typical set of patterns
» number of transitions reduced to 1% of original» depth-bounded D2FA with bound of 1 requires 20%
Can extend to reduce number of accesses» default transitions from depth d states to depth ≤d–k» at most (k+1)/k memory accesses per input character
– so for k=3, 1.33 accesses per char» number of transitions, usage relative to original
– for k=2, 1.8% – for k=3, 5.5% – for k=4, 11.6%
15 - Sailesh Kumar - 04/21/23 15 - Jon Turner - 04/21/23
Content Addressing For nodes with default transitions,
» store selected “content” with predecessors» predecessors use content to skip over default transitions
Potential for collisions
a
b
c
dV
U
RXf/R
Yg/R,ab
Zh/R,ab,cd
if next input {a,b} goto Relse goto hash(R,ab)=U
if next input {a,b,c,d} goto R
else if next input {c,d} goto hash(R,ab)=U else goto hash(R,abcd)=V
16 - Sailesh Kumar - 04/21/23 16 - Jon Turner - 04/21/23
Collisions in Content Addressing Addressing conflicts must be resolved
» in example, X and Y must go to different next states U and V, but would normally both use hash(R,ab)
a
b
a
bVU
R
Xg/R,ab
Y h/R,ab
Solution 1, use hash(R,ba) to reach V Solution 2, add discriminator bits to both hashes
h/R,bah/R,ab101
g/R,ab011
17 - Sailesh Kumar - 04/21/23 17 - Jon Turner - 04/21/23
Selecting Content Addresses For each state
» list possible content addresses» compute hash for each
Construct bipartite graph» states at left» storage locations at right» edges from states to possible
storage locations Construct perfect matching
» easy to do when enough choices (and usually, there are)
» add discriminator bits to get more choices
» or, add extra storage locations
storagelocationsstates
V
ab0
ab1
ba0ba1
U
Y
X
18 - Sailesh Kumar - 04/21/23 18 - Jon Turner - 04/21/23
Coping with State Explosion Large pattern sets can produce
DFAs with too many states» even after conversion to D2FA,
space can be impractically large» one solution: partition patterns
and form several DFAs or D2FAs– greatly reduces number of states– but requires processing each
packet multiple times Observation:
» well-behaved flows rarely visit states far from start state
Fast-path/slow-path» fast path for “shallow states”» slow path handles suspect flows
(ab.*c)|(ac.*b)|(ba.*a)
a
10 2ac b c
a,b
b,c a,b c
3
1 of 3 DFAs – total 12 states
resulting DFA has 20 statesbut state count nearly doubles with each additional pattern
1
5b
a
a,b,c
NFA
0
6
4
8a
cb
a
3
2
7
c
a,b,c
a,b,c
b
19 - Sailesh Kumar - 04/21/23 19 - Jon Turner - 04/21/23
Sample Fast Path Construction
Start with k DFAs for slow path Construct vector-DFA that tracks
states of smaller DFAs» cut off when past target depth» or, cut off based on probability of
good flow reaching given state
ab.*c
a
10 2ac b c
a,b
b,c a,b c
3
a
10 2ab c b
a,c
b,c a,c b
3
b
10 2bc a a
b,c
a,c b,c a
3
ac.*b
ba.*a
fast path DFA
6 - - -7
98
23
54
01
- 6 -- - -- - -
5 2 06 2 78 9 410 11 12
a b c1 2 01 3 4
-
10 - - -11 - - -12 - - -
statevector
212300
031120
001201
112020
000110
113202022
3333
1222
01
333
depth
20 - Sailesh Kumar - 04/21/23 20 - Jon Turner - 04/21/23
Fast Path/Slow Path Operation Flows processed by fast path as
long as stay in shallow states Slow path flows processed by
multiple DFAs» takes more per packet» keep more state between packets
Return to fast path after enough time in shallow states
Mitigating DoS attack» attacker can interfere with good
flows in slow path by sending lots of slow path traffic
» per flow queues in slow path can help, but not complete solution
» adjust priority of flows based on time spent in slow path
fastpath
statememory
slowpath
statememory
21 - Sailesh Kumar - 04/21/23 21 - Jon Turner - 04/21/23
Simulation of DoS Mitigation
Constant attack traffic – adjust time spent in deep states
0
5
10
15
20
25
1 26 51 76 101 126 151 176 201 226 251
Thr
ough
put,
no D
oS p
rote
ctio
n
0
1
2
3
4
5
1 26 51 76 101 126 151 176 201 226 251
Slo
w p
ath
load
0
5
10
15
20
25
1 26 51 76 101 126 151 176 201 226 251
Flo
w th
roug
hput
. DoS
pro
tect
ion
s lo w p a th 's th r es h o ld
N o o v er lo ad in g M o d er a te o v er lo ad in g E x tr em e o v er lo ad in g
tim e ( s ec o n d s )time (seconds)
slow path load
thruputwith
no DOS mitigation
thruputwith DOS
mitigation
no overload moderate overload extreme overload
goodflows
22 - Sailesh Kumar - 04/21/23 22 - Jon Turner - 04/21/23
Summary Reducing space needed for reg-ex matching
» D2FAs use default transitions joining similar states» constraining default transitions to go to shallower states
ensures good amortized performance» content addressing for skipping over default transitions
Coping with state explosion» slow path processes packets through k small DFAs» fast path processes packets using DFA on shallow states» requires DoS mitigation to deal with attacks on slow path
Other issues» bounded iteration causes excessive growth in state table» requires systematic use of counters
– state vector containing control state plus counter values– state machine transitions depend on & manipulate counters