Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono*...

29
Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan

Transcript of Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono*...

Page 1: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

Approximate Point Set Pattern Matchingon Sequences and Planes

Approximate Point Set Pattern Matchingon Sequences and Planes

Tomoaki Suga,Shinichi Shimozono*

Kyushu Inst. of Tech.Fukuoka, Japan

Tomoaki Suga,Shinichi Shimozono*

Kyushu Inst. of Tech.Fukuoka, Japan

Page 2: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

TEXT

Point Set Pattern Matching

Text: A set of points in, ex., a plane

Pattern: A small set of points

Task: Find an occurrence of the pattern as a subset

PATTERN

Page 3: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

Approximate Point Set Matching in Practice: Example

Analysis of 2D electrophoresis imagesA set of spots on gel media plane

Searching digital music score by melodyRinger melody, Internet contents, Online “Kara-Oke”

Page 4: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

Literature

Exact matching in d-dimensionGeometric algorithm by P. J. de. Rezende & D. T. Lee, '95

Transfer, Scaling, and Rotation in O(nmd)

Allowing local distortionsHuristic and Hardness by Akutsu et al., '99…NP-hard even in 1D matching

Approximate matching of point sequencesNo-skips, O(nm) time by V. Makinen '01

Allowing substitution in O(nm3) time

Extension to 2-dimensional matching is NP-hard

Page 5: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

Our Results

Approximate point set pattern matching in 1DPattern matches as a subset: Extends Makinen et al.

Simple fast algorithm dealing with O(nm2) taskBy reasonable assumption on sequences in practice

Algorithm guarantees O(nm) timeLinear with text-size by average-constant time min. query

Four-Russian Speed-upObservation connected to string matching

2D approximate point set pattern matchingWith polynomial-time algorithm

Page 6: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

1D Matching As a Target

As a basis of practical problemsAxes of 2D electrophoresis images are independent

Points in higher dimension but having the primary axis (sort order) … ex. 3D structure of proteins

Musical score searchPitch error (tone deafness) is usually fatal

Exact matching in Rhythm/Timing is impractical, but indispensable to distinguish melodies

Page 7: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

Point Set Matching in 1D

Text and Pattern: Strictly increasing sequences of Integers

An Occurrence of the Pattern: A Subsequence of the Text

( )

( )

1

1

, , ,

, ,

m

n

T t t

P p p

=

=

K

K

( ) ( )( )1 , ,l l nT t t¢= K

Page 8: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

Edit Distance for Point Set Approximate Matching

Distance between two same size sequences:

( ) ( )1 12

,n

i i i ii

d P Q p p q q- -=

= - - -å

Q

P

Page 9: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

Approximate Matching and Recurrence

D(i,j) = Distance between First i Points of Pattern and best Occurrence of it in Text ending at j

( ) ( ) ( ){ }11

, min 1, i i j ki k j

D i j D i k p p t t-- £ <

= - + - - -

Distance between one-small prefix-sequences

Difference of the last two distances

D(n,m) can be obtained by Tabular Computation … in O(nm2) time

Page 10: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

“Finite Resolution” Assumption on a Class of Sequences

Ratio of distances between two contiguous points is limited

Spots observed as stains on small gel media plane450 ticks per second in typical MIDI sequences

Modified algorithm runs in O(nm) time if sequences have finite resolution

The 3rd iteration can be finished in constant time…

Pattern

Text

Page 11: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

A Row can be Divided into “Positive” Part & “Negative” Part

Values in “Negative” part always decrease“Ex-Minimum” can only be a candidate

Only a constant number of “Positive” cells exist if sequences have finite resolution … O(nm) time

( )1 0i i j kp p t t-- - - <

1i

i

-j

large ( ) small 0j kt t¬ - ® ³

Lk¬ ®

Page 12: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

Guaranteed O(nm)-time Algorithm

Using “deque” simulating the right-most path of the Cartesian Tree [Gabow, et al., 1985]

Maintains to-be-minimum indices in “Positive” part

Min is available in amortized constant time

Constant time in average for one iteration … O(nm) time

Remove if turned to negative

…k¬ ®

Min.

jPop all larger onesPush the latest index

Page 13: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

Computational Results onReal/Synthesized* MIDI Sequences

Simple algorithm expecting “Finite Resolution” is faster than O(nm) time algorithm

Pattern Size = 11, Time (sec.) for filling-up table

Text Size Naïve DP Fin. Res. Cartesian

3086 1.12 0.01 0.01

*18328 197 0.03 0.05

*37741 883 0.05 0.09

*386801 --- 0.58 0.94

Solaris 9 x86/Intel Pentium 4 800MHz

Page 14: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

Four-Russian Speed-up for Point Sequences with Finite Resolution

Idea from Arlazarov et al.: Filling tabular cells by pre-computed values

O(nm/log n + n log n) time with unit-cost RAM model

As we can suppose, finite resolution assumption makes point sequences being like strings

Page 15: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

Approximate Point Set Pattern Matching on the Plane: Hardness Results

Akutsu et al. (’95), allowing local distortionsNP-hard, even in 1D matching

V. Makinen & E. Ukkonen ('01), an extension of 1DNP-hard; deciding the order of points in matching is hard

Q. Is there any non-trivial 2D approximate point set matching computable in polynomia-time?

Page 16: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

Extending 1D Definition to Approximate Matching on the Plane

Regard a set as sequences with two orders

Divide recursively by axis-parallel lines

P Q

Page 17: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

Recurrence for Edit Distance

Divide P and Q into two arbitrary parts, by either a horizontal or a vertical lines

[ ] [ ] ( )

[ ] [ ] ( )( )

[ ] [ ]( ) [ ]( ) [ ]( ) [ ]( ) [ ]( )( )[ ] [ ]( )

[ ]( ) [ ]( ) [ ]( ) [ ]( )( ),, , ,

,, , ,

if , , 1 then , ; , 0,

if , , then , ; , , and

, ; ,

, ; , ,min

, ; ,

RR R R

T T T T

k li j i j k l

k li j i j k l

P i j Q k l d i j k l

P i j Q k l d i j k l

d i j k l

d i j k l p p q q

d i j k l p p q q

- -

--

- -

- -

= = =

¹ = ¥

=

ì üï ï+ - - -ï ïï ïï ïí ýï ïï ï+ - - -ï ïï ïî þ

Page 18: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

How Pattern Matching Proceeds

x Points of a pattern should be aligned on o points of a text, by cutting and moving the bounding box

Page 19: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

Polynomia-time Algorithm for 2D Approximate Point Set Matching

Finds the best partition/direction by DP-like recursion

Results are stored in cache for quadruples [I, j; k, l]… O(n2 m2) space

O(n2m4) time with pattern size n and text size m

Page 20: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

Remarks & Future Works

Consider scaling in 1DTempo must be considered in musical sequence search

Looking for more applications1D approximate matching to secondary structure search of proteins

Page 21: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.
Page 22: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

TEXT

Point Set Pattern Matching

Text: A set of points in, e.g., a plane

Pattern: A small set of points

Task: Find an occurrence of the pattern as a subset

PATTERN

Page 23: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

TEXT

Point Set Pattern Matching

Text: A set of points in, e.g., a plane

Pattern: A small set of points

Task: Find an occurrence of the pattern as a subset

PATTERN

Page 24: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

Approximate Point Set Matching in Practice: Example

Analysis of 2D electrophoresis imagesA set of spots on gel media plane

Searching digital music score by melodyRinger melody, Internet contents, Online “Kara-Oke”

Page 25: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

A Row can be Divided into “Positive” Part & “Negative” Part

Absolute values in “Negative” part always increase“Ex-Minimum” can only be a candidate

Only a constant number of “Positive” cells exist if sequences have finite resolution … O(nm) time

( )1 0i i j kp p t t-- - - <

1i

i

-j

large ( ) small 0j kt t¬ - ® ³

Lk¬ ®

Page 26: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

A Row can be Divided into “Positive” Part & “Negative” Part

Absolute values in “Negative” part always increase“Ex-Minimum” can only be a candidate

Only a constant number of “Positive” cells exist if sequences have finite resolution … O(nm) time

( )1 0i i j kp p t t-- - - <

1i

i

-j

large ( ) small 0j kt t¬ - ® ³

Lk¬ ®

Page 27: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

Extending 1D Definition to Approximate Matching on the Plane

Regard a set as sequences with two orders

Divide recursively by axis-parallel lines

P Q

Page 28: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

Extending 1D Definition to Approximate Matching on the Plane

Regard a set as sequences with two orders

Divide recursively by axis-parallel lines

P Q

Page 29: Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

How Pattern Matching Proceeds

x Points of a pattern should be aligned on o points of a text, by cutting and moving the bounding box