Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono*...
-
Upload
lester-griffith -
Category
Documents
-
view
218 -
download
0
Transcript of Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono*...
Approximate Point Set Pattern Matchingon Sequences and Planes
Approximate Point Set Pattern Matchingon Sequences and Planes
Tomoaki Suga,Shinichi Shimozono*
Kyushu Inst. of Tech.Fukuoka, Japan
Tomoaki Suga,Shinichi Shimozono*
Kyushu Inst. of Tech.Fukuoka, Japan
TEXT
Point Set Pattern Matching
Text: A set of points in, ex., a plane
Pattern: A small set of points
Task: Find an occurrence of the pattern as a subset
PATTERN
Approximate Point Set Matching in Practice: Example
Analysis of 2D electrophoresis imagesA set of spots on gel media plane
Searching digital music score by melodyRinger melody, Internet contents, Online “Kara-Oke”
Literature
Exact matching in d-dimensionGeometric algorithm by P. J. de. Rezende & D. T. Lee, '95
Transfer, Scaling, and Rotation in O(nmd)
Allowing local distortionsHuristic and Hardness by Akutsu et al., '99…NP-hard even in 1D matching
Approximate matching of point sequencesNo-skips, O(nm) time by V. Makinen '01
Allowing substitution in O(nm3) time
Extension to 2-dimensional matching is NP-hard
Our Results
Approximate point set pattern matching in 1DPattern matches as a subset: Extends Makinen et al.
Simple fast algorithm dealing with O(nm2) taskBy reasonable assumption on sequences in practice
Algorithm guarantees O(nm) timeLinear with text-size by average-constant time min. query
Four-Russian Speed-upObservation connected to string matching
2D approximate point set pattern matchingWith polynomial-time algorithm
1D Matching As a Target
As a basis of practical problemsAxes of 2D electrophoresis images are independent
Points in higher dimension but having the primary axis (sort order) … ex. 3D structure of proteins
Musical score searchPitch error (tone deafness) is usually fatal
Exact matching in Rhythm/Timing is impractical, but indispensable to distinguish melodies
Point Set Matching in 1D
Text and Pattern: Strictly increasing sequences of Integers
An Occurrence of the Pattern: A Subsequence of the Text
( )
( )
1
1
, , ,
, ,
m
n
T t t
P p p
=
=
K
K
( ) ( )( )1 , ,l l nT t t¢= K
Edit Distance for Point Set Approximate Matching
Distance between two same size sequences:
( ) ( )1 12
,n
i i i ii
d P Q p p q q- -=
= - - -å
Q
P
Approximate Matching and Recurrence
D(i,j) = Distance between First i Points of Pattern and best Occurrence of it in Text ending at j
( ) ( ) ( ){ }11
, min 1, i i j ki k j
D i j D i k p p t t-- £ <
= - + - - -
Distance between one-small prefix-sequences
Difference of the last two distances
D(n,m) can be obtained by Tabular Computation … in O(nm2) time
“Finite Resolution” Assumption on a Class of Sequences
Ratio of distances between two contiguous points is limited
Spots observed as stains on small gel media plane450 ticks per second in typical MIDI sequences
Modified algorithm runs in O(nm) time if sequences have finite resolution
The 3rd iteration can be finished in constant time…
Pattern
Text
A Row can be Divided into “Positive” Part & “Negative” Part
Values in “Negative” part always decrease“Ex-Minimum” can only be a candidate
Only a constant number of “Positive” cells exist if sequences have finite resolution … O(nm) time
( )1 0i i j kp p t t-- - - <
1i
i
-j
large ( ) small 0j kt t¬ - ® ³
Lk¬ ®
Guaranteed O(nm)-time Algorithm
Using “deque” simulating the right-most path of the Cartesian Tree [Gabow, et al., 1985]
Maintains to-be-minimum indices in “Positive” part
Min is available in amortized constant time
Constant time in average for one iteration … O(nm) time
Remove if turned to negative
…k¬ ®
Min.
jPop all larger onesPush the latest index
Computational Results onReal/Synthesized* MIDI Sequences
Simple algorithm expecting “Finite Resolution” is faster than O(nm) time algorithm
Pattern Size = 11, Time (sec.) for filling-up table
Text Size Naïve DP Fin. Res. Cartesian
3086 1.12 0.01 0.01
*18328 197 0.03 0.05
*37741 883 0.05 0.09
*386801 --- 0.58 0.94
Solaris 9 x86/Intel Pentium 4 800MHz
Four-Russian Speed-up for Point Sequences with Finite Resolution
Idea from Arlazarov et al.: Filling tabular cells by pre-computed values
O(nm/log n + n log n) time with unit-cost RAM model
As we can suppose, finite resolution assumption makes point sequences being like strings
Approximate Point Set Pattern Matching on the Plane: Hardness Results
Akutsu et al. (’95), allowing local distortionsNP-hard, even in 1D matching
V. Makinen & E. Ukkonen ('01), an extension of 1DNP-hard; deciding the order of points in matching is hard
Q. Is there any non-trivial 2D approximate point set matching computable in polynomia-time?
Extending 1D Definition to Approximate Matching on the Plane
Regard a set as sequences with two orders
Divide recursively by axis-parallel lines
P Q
Recurrence for Edit Distance
Divide P and Q into two arbitrary parts, by either a horizontal or a vertical lines
[ ] [ ] ( )
[ ] [ ] ( )( )
[ ] [ ]( ) [ ]( ) [ ]( ) [ ]( ) [ ]( )( )[ ] [ ]( )
[ ]( ) [ ]( ) [ ]( ) [ ]( )( ),, , ,
,, , ,
if , , 1 then , ; , 0,
if , , then , ; , , and
, ; ,
, ; , ,min
, ; ,
RR R R
T T T T
k li j i j k l
k li j i j k l
P i j Q k l d i j k l
P i j Q k l d i j k l
d i j k l
d i j k l p p q q
d i j k l p p q q
- -
--
- -
- -
= = =
¹ = ¥
=
ì üï ï+ - - -ï ïï ïï ïí ýï ïï ï+ - - -ï ïï ïî þ
How Pattern Matching Proceeds
x Points of a pattern should be aligned on o points of a text, by cutting and moving the bounding box
Polynomia-time Algorithm for 2D Approximate Point Set Matching
Finds the best partition/direction by DP-like recursion
Results are stored in cache for quadruples [I, j; k, l]… O(n2 m2) space
O(n2m4) time with pattern size n and text size m
Remarks & Future Works
Consider scaling in 1DTempo must be considered in musical sequence search
Looking for more applications1D approximate matching to secondary structure search of proteins
TEXT
Point Set Pattern Matching
Text: A set of points in, e.g., a plane
Pattern: A small set of points
Task: Find an occurrence of the pattern as a subset
PATTERN
TEXT
Point Set Pattern Matching
Text: A set of points in, e.g., a plane
Pattern: A small set of points
Task: Find an occurrence of the pattern as a subset
PATTERN
Approximate Point Set Matching in Practice: Example
Analysis of 2D electrophoresis imagesA set of spots on gel media plane
Searching digital music score by melodyRinger melody, Internet contents, Online “Kara-Oke”
A Row can be Divided into “Positive” Part & “Negative” Part
Absolute values in “Negative” part always increase“Ex-Minimum” can only be a candidate
Only a constant number of “Positive” cells exist if sequences have finite resolution … O(nm) time
( )1 0i i j kp p t t-- - - <
1i
i
-j
large ( ) small 0j kt t¬ - ® ³
Lk¬ ®
A Row can be Divided into “Positive” Part & “Negative” Part
Absolute values in “Negative” part always increase“Ex-Minimum” can only be a candidate
Only a constant number of “Positive” cells exist if sequences have finite resolution … O(nm) time
( )1 0i i j kp p t t-- - - <
1i
i
-j
large ( ) small 0j kt t¬ - ® ³
Lk¬ ®
Extending 1D Definition to Approximate Matching on the Plane
Regard a set as sequences with two orders
Divide recursively by axis-parallel lines
P Q
Extending 1D Definition to Approximate Matching on the Plane
Regard a set as sequences with two orders
Divide recursively by axis-parallel lines
P Q
How Pattern Matching Proceeds
x Points of a pattern should be aligned on o points of a text, by cutting and moving the bounding box