Suffix Arrays: A new method for on-line string searches Udi Manber Gene Myers.
1 String Matching of Bit Parallel Suffix Automata.
-
Upload
mildred-perry -
Category
Documents
-
view
244 -
download
4
Transcript of 1 String Matching of Bit Parallel Suffix Automata.
1
String Matching of Bit Parallel Suffix Automata
2
Suffix Automata
Base on a Deterministic Acyclic Word Graph (DAWG) To facilitate comparing equivalence suffix string Nondeterministic suffix automata
Deterministic suffix automataSubset Construction
3
Suffix Automata Search Also called Backward Deterministic automata Matching (BDM) Build the factor x for pattern p
endpos(x) set of all the pattern position where an occurrence of x ends Ex: Pattern = baabbaa, endpos(aa) = {3,7}
Safe shift, if no equivalent suffix in pattern
Text: shift left to right
Fail to matching a factor
Shift window
Windows size = pattern length
4
BDM AlgorithmBuild automata
Reached the final state
5
Suffix Automata Search Example1. Build Reverse Deterministic Suffix Automata
2. endpos(x) to find a factor
3. Fail to find a factor, do a safe shift
6
1. T= [abbaba a ]bbaab a is a factor of pr and a reverse prefix of p. last =6
01234567
145
26 4
5
62367
737
aa
a a a
a
b
b
b
bb
Suffix Automata Search Example
7
2. T= [abbab aa ]bbaab aa is a factor of pr and a reverse prefix of p. last =5
01234567
145
26 4
5
62367
737
aa
a a a
a
b
b
b
bb
Suffix Automata Search Example
8
3. T= [abba baa ]bbaab
aab is a factor of pr
01234567
145
26 4
5
62367
737
aa
a a a
a
b
b
b
bb
Suffix Automata Search Example
9
4. T= [abb abaa ]bbaabWe fail to recognize the next a.So we shift the window to last.We search again in position:T= abbab[aabbaab] . last=7
01234567
145
26 4
5
62367
737
aa
a a a
a
b
b
b
bb
Suffix Automata Search Example
10
5. T= abbab[aabbaa b ]b is a factor of pr
01234567
145
26 4
5
62367
737
aa
a a a
a
b
b
b
bb
Suffix Automata Search Example
11
6. T= abbab[aabba ab ]
ba is a factor of pr
01234567
145
26 4
5
62367
737
aa
a a a
a
b
b
b
bb
Suffix Automata Search Example
12
7. T= abbab[aabb aab ]
baa is a factor of pr and a reverse prefix of p. last =4
01234567
145
26 4
5
62367
737
aa
a a a
a
b
b
b
bb
Suffix Automata Search Example
13
8. T= abbab[aab baab ]
baab is a factor of pr
01234567
145
26 4
5
62367
737
aa
a a a
a
b
b
b
bb
Suffix Automata Search Example
14
9. T= abbab[aa bbaab ]baabb is a factor of pr
01234567
145
26 4
5
62367
737
aa
a a a
a
b
b
b
bb
Suffix Automata Search Example
15
10. T= abbab[a abbaab ]
baabba is a factor of pr
01234567
145
26 4
5
62367
737
aa
a a a
a
b
b
b
bb
Suffix Automata Search Example
16
11. T= abbab[ aabbaab ]
We recognize the word aabbaab and report an occurrence.
01234567
145
26 4
5
62367
737
aa
a a a
a
b
b
b
bb
Suffix Automata Search Example
17
BNDM Algorithm
Backward Nondeterministic Dawg Matching (BNDM)
Handle class, multiple pattern, and allow errors Using bit parallelism, Combine Shift-Or and BD
M Faster than BDM 20% ~ 25%, Faster than BM
10% ~ 40% Update Function
18
BNDM Algorithm
19
BNDM Example
20
BNDM Example
21
BNDM Further Improvement
Handle long pattern Partition pattern p into subpatterns pi Build a array of D and B, process each part with basic algorithm If pi is found, than process pi+1 …
Handle Class Modified B table only
Have the ith bit set for all chars belonging to ith position in pattern Multiple Pattern
Two method Interleave patterns, shift r bit for each D update Just concatenate, shift 1 bit, but modifed D = (D<<1) &(1m-10)r
Where r is # of patterns Approximate Matching
Use Wu’s method
22
Performance Comparison
In 1/100 of second per megabyte
23
Reference
Gonzalo Navarro and Mathieu Raffinot. A Bit-parallel approach to Suffix Automata: Fast Extended String Matching. In M. Farach (editor), Proc. CPM'98, LNCS 1448. Pages 14-33, 1998.
Gonzalo Navarro, Mathieu Raffinot, Fast and Flexible String Matching by Combining Bit-parallelism and Suffix Automata (1998)
24
Rreverse Pattern ?