Permuted Scaled Matching

Post on 22-Feb-2016

45 views 0 download

description

Permuted Scaled Matching. Ayelet Butman Noa Lewenstein Ian Munro. Scale d matching. Input: Text T=t 1 ,…,t n Pattern P=p 1 ,…,p m Scaling:P [ i ] =p 1 …p 1 p 2 …p 2 … p m …p m Output:All text-locations j where  i s.t. p [ i ] matches at j. . i. i. i. Scaled matching. - PowerPoint PPT Presentation

Transcript of Permuted Scaled Matching

Permuted Scaled

MatchingAyelet ButmanNoa Lewenstein

Ian Munro

Scaled matchingInput: Text T=t1,…,tn

Pattern P=p1,…,pm

Scaling: P[i]=p1…p1p2…p2 … pm…pm

Output: All text-locations j where i s.t. p[i] matches at j.

i i i

Scaled matchingcb aa

bb cc a aa a b babcb a

bb cc aa aa

Permutation matchingInput: Text T=t1,…,tn

Pattern P=p1,…,pm

Permutation (of pattern):pπ(1)pπ(2)…pπ(m) where π is a permutation on [m].

Output: All text-locations j where a pattern permutation occurs.

ba ca b ba c b babcb a

cb aa a bbPermutation matching

ba ca b ba c b babcb a

ba ca b ba

Permutation matching

Permutation matching• Easy to solve in O(n) time (linear size alphabets).

• The pattern matching version of Jumbled Indexing.

Scaled permutation matching

• Match: First Permutation and then Scaling.

Scaled permutation matching

cb aa

aa bb c ac a b babcb aaa bb cc aa

Scaled permutation matching

• Match: First Permutation and then Scaling.

• B-Eres-Landau[04]: Scaled Permutation Matching in O(n) time.

• Open: Can one do the reverse efficiently, i.e. scaling and then permutation.

• Hard ?

How can we solve? First - Naïve algorithm

Permuted scaled matching

Input: Text T=t1,…,tn

Pattern P=p1,…,pm

Output: All text-locations j where exist permuted scaled matching

Permuted scaled matching

cb aa

bc aa b ca a b babcb a

bb cc aa aa

Naïve algorithm

a abc aa a c cb a c b

a ac bP=

T=

Naïve algorithm

a abc aa a c cb a c b

a ac bP=

T=

k=1

Naïve algorithm

a abc aa a c cb a c b

a ac bP=

T=

k=2

Naïve algorithm1. Construct a table R of size (n+1)×|Σ| such

that R(i,j)=#σj(T[0, i]) for i ≥ 0 and R(−1, j) = 0.

2. For every 0 ≤ i < j ≤ n−1 such that j −i+ 1 = km for some natural number k ≥ 1 do:

a. Let r(l) =( R(j,l)−R(i−1,l))/#σl(P).b. if r(l) = k for each l, 0 ≤ l ≤ |Σ| − 1, then

announce that i is a k-scaled appearance.

Naïve algorithm

a abc aa a c cb a c b

a ac bP=

T=

Naïve algorithm

a abc aa a c cb a c bT=

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

T=

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

T=

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

T=

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

T=

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

T=

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

aT=

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

T=

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

T=

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac bP=

T=

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 1P=

T=

K=

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 1#a=2

#b=#c=1

P=

T=

K=

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 1#a=2

#b=#c=1

P=

T=

K=

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 1#a=2

#b=#c=1

P=

T=

K=

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 1K=#a=2

#b=#c=1 = = 1

= = 1

= = 1

P=

T=

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 1#a=2

#b=#c=1 =

= 1

= 0

K=

P=

T=

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 2#a=2

#b=#c=1 =

= = 2

= = 2

K=

P=

T=

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 2#a=2

#b=#c=1 =

= = 2

= = 2

K=

P=

T=

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 2#a=2

#b=#c=1 =

= = 2

= 2

K=

P=

T=

Naïve algorithmThe running time is

where .

Better?

• Mod-Equivalency: i and j are Mod-Equivalent if for every

character σ (with frequency c in P):

#σ in T[0,i] mod c = #σ in T[0,j] mod c

• Equal-Quotients:i and j have equal-quotients for char’s a & b if:

Properties

Mod-equivalent

• Mod-Equivalency: i and j are Mod-Equivalent if for every

character σ (with frequency c in P):

#σ in T[0,i] mod c = #σ in T[0,j] mod c

Mod-equivalent

c bbc ca a c cb a c b1 1020 113 4 5 86 7. 9 12-1

abc

000

001

002

0121

a

1

21

2

21

2

31

2

32

3

32

3

42

3

52

3

53

3

63

3

64

a ac bP=#a=2

#b=#c=1T=

Mod-equivalent

c bbc ca a c cb a c b113

abc

a

1

21

3

63

a ac b#a=2

#b=#c=1

P=

T=

Mod-equivalent

c bbc ca a c cb a c b113

abc

a

1

21

3

63

a ac b

a

#a=2

3𝑚𝑜𝑑2=1𝑚𝑜𝑑 2

P=

T=

Mod-equivalent

c bbc ca a c cb a c b113

abc

a

1

21

3

63

a ac b

a

#a=2

3𝑚𝑜𝑑2=1𝑚𝑜𝑑 2

P=

T=

Mod-equivalent

c bbc ca a c cb a c b113

abc

a

1

21

3

63

a ac b#a=2

3𝑚𝑜𝑑2=1𝑚𝑜𝑑 2

P=

T=

Mod-equivalent

c bbc ca a c cb a c b113

abc

a

1

21

3

63

a ac b#b=1

3𝑚𝑜𝑑1=1𝑚𝑜𝑑1

P=

T=

Mod-equivalent

c bbc ca a c cb a c b113

abc

a

1

21

3

63

a ac b#c=1

6𝑚𝑜𝑑1=2𝑚𝑜𝑑1

P=

T=

Mod-equivalent

c bbc ca a c cb a c b113

abc

a

1

21

3

63

a ac bP=

T=

Mod-equivalent

c bbc ca a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

002

0121

a

1

21

2

21

2

31

2

32

3

32

3

42

3

52

3

53

3

43

3

44

a ac b#a=2

P=

T=

Mod-equivalent

c bbc ca a c cb a c b102

abc

0121

a

3

53

a ac b#a=2

3𝑚𝑜𝑑2≠0𝑚𝑜𝑑2

P=

T=

Mod-equivalent

c bbc ca a c cb a c b102

abc

0121

a

3

53

a ac b#a=2

3𝑚𝑜𝑑2≠0𝑚𝑜𝑑2

P=

T=

Mod-equivalent

c bbc aa a c cb a a b113

abc

a

1

21

5

43

a ac b

5𝑚𝑜𝑑2=1𝑚𝑜𝑑2 3𝑚𝑜𝑑1=1𝑚𝑜𝑑1 4𝑚𝑜𝑑1=2𝑚𝑜𝑑1

P=

T=

Equal-quotients• Equal-Quotients:

i and j have equal-quotients for char’s a & b if:

Equal-quotients

c bbc aa a c cb a a b1 1020 113 4 5 86 7 9 12-1

abc

000

001

002

0121

a

1

21

2

21

2

31

2

32

3

32

3

42

4

42

4

43

5

43

5

44

a ac bP=

T=

Equal-quotients

c bbc aa a c cb a a b113

abc

a

1

21

5

43

a ac bP=

T=

Equal-quotients

c bbc aa a c cb a a b113

abc

a

1

21

5

43

a ac b

⌊ 52 ⌋− ⌊31 ⌋=⌊ 12 ⌋− ⌊

11 ⌋

P=

T=

Equal-quotients

c bbc aa a c cb a a b113

abc

a

1

21

5

43

a ac b

⌊ 52 ⌋− ⌊31 ⌋=⌊ 12 ⌋− ⌊

11 ⌋

P=

T=

Equal-quotients

c bbc aa a c cb a a b113

abc

a

1

21

5

43

a ac b

⌊ 52 ⌋− ⌊31 ⌋=⌊ 12 ⌋− ⌊

11 ⌋ ⌊ 31 ⌋− ⌊

41 ⌋=⌊ 11 ⌋− ⌊

21 ⌋

P=

T=

Equal-quotients

c bbc ca a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

002

0121

a

1

21

2

21

2

31

2

32

3

32

3

42

3

52

3

53

3

63

3

64

a ac bP=

T=

Equal-quotients

c bbc ca a c cb a c b113

abc

a

1

21

3

63

a ac bP=

T=

Equal-quotients

c bbc ca a c cb a c b113

abc

a

1

21

3

63

a ac b

⌊ 32 ⌋− ⌊31 ⌋ ≠ ⌊

12 ⌋− ⌊

11 ⌋

P=

T=

Equal-quotients

a aaa bb a a aa a a b

1 1520 3 … 10 1311 12 14-1ab

00

10

20

30

31

……

101

102

103

104

105

106

a aa b b b

b b b

P=T=

Equal-quotients

a aaa bb a a aa a a b

15ab

3 …31

……

106

a aa b b b

b b b

⌊ 103 ⌋− ⌊ 63 ⌋=⌊ 33 ⌋− ⌊13 ⌋

P=T=

Theorem

T[i, j] is a permuted k-scaling of P for some k iff

1. Locations i and j of T are mod-equivalent

2. Locations i and j of T satisfy the equal-quotients property for each pair of characters

jiabcdef

a-bb-cc-dd-ee-f

Mod-Equivalent

Equal-quotients

jiabcdef

a-bb-cc-dd-ee-f

Mod-Equivalent

Equal-quotients

c bbc ca a c cb a c b

abc

a

a-bb-c

T=b c a a a caP=

2 8000

0

00

0-1

0-1

Putting it together

jiabcdef

a-bb-cc-dd-ee-f

Mod-Equivalent

Equal-quotients

0 1 2

Build a table R of size n×2|Σ|+1

ji0 1 2

Each vector is associated with its location i

ji0 1 2

irisi1 i2 i3

Sort the vectors using Radix sort

irisi1 i2 i3

Group the vectors into equivalence classes according to their prefix of length 2|Σ|−1.

irisi1 i2 i3

For each equivalence class containing locations i1, i2,. . . , il announce appearances T[i + 1, j] for each i,j∈{i1, i2,. . . , il}, s.t. i < j.

Putting it all togetherAlgorithm:

1. Build a table R of size n×2|Σ|+1.2. 0 ≤ i ≤ n−1:

0 ≤ j ≤ |Σ|−1:R(i,j)=#σj(T[0, i]) mod #σj(P)

|Σ|≤ j ≤ 2|Σ|−1:

Putting it together3. Each vector is associated with its location i.4. Sort the vectors using Radix sort.5. Group the vectors into equivalence classes according to their prefix of length 2|Σ|−1.6. For each equivalence class containing locations i1, i2,. . . , il announce appearances T[i + 1, j] for each i,j∈{i1, i2,. . . , il}, s.t. i < j.

Theorem

• The running time of the permuted scaled matching algorithm is:

O(n|Σ|+occ).

Output representation• The output of the algorithm which we

denoted occ may be as large as O(n2/m).

• Example:o Text an.o Pattern am.

Output representation• to reduce large number of appearances

set output to shortest match at each text location i.

a bbc aa a a ab a a b

a baP=

T=

Output representation• to reduce large number of appearances

set output to shortest match at each text location i.

a bbc aa a a ab a a b

a baP=

T=

Claim• Let i < j < h be three text locations. • Assume T[i, j] is a permuted scaled appearance of

P. • Then T[i, h] is a permuted scaled appearance of P

iff T[j + 1, h] is a permuted scaled appearance of P.

a bbc aa a a ab a a b

a baP=

T=

Claim• Let i < j < h be three text locations. • Assume T[i, j] is a permuted scaled appearance of

P. • Then T[i, h] is a permuted scaled appearance of P

iff T[j + 1, h] is a permuted scaled appearance of P.

a bbc aa a a ab a a b

a baP=

T=

Claim• Let i < j < h be three text locations. • Assume T[i, j] is a permuted scaled appearance of

P. • Then T[i, h] is a permuted scaled appearance of P

iff T[j + 1, h] is a permuted scaled appearance of P.

a bbc aa a a ab a a b

a baP=

T=

Putting it all togetherAlgorithm:

1. Build a table R of size n×2|Σ|+1.2. For every 0 ≤ i ≤ n−1:

o For every 0 ≤ j ≤ |Σ|−1:R(i,j)=#σj(T[0, i]) mod #σj(P)

o For every |Σ|≤ j ≤ 2|Σ|−1:

Putting it together3. Each vector is associated with its location i.4. Sort the vectors using Radix sort.5. Group the vectors into equivalence classes according to their prefix of length 2|Σ|−1.6. For each entry q’ containing linked list i1, i2,. . . , il announce appearances T[ir+1,ir+1] for each ir∈{i1, i2,. . . , il}.

Running Time

• Permuted Scaled Matching:The running time is:

O(n|Σ|).

For efficiency• Need to generate the vectors quickly.

• Need to compare vectors quickly.

Idea: hash

• Need hash on vectors that can be modified quickly if vector changes very little.

• Use: hash – similar to Karp-Rabin

i+1

iabcdef

a-bb-cc-dd-ee-f

Mod-Equivalent

Equal-quotients

At most 1 change

s

At most 2change

s

c bbc ca a c cb a c b8-1

abc

000

a

0

00

a-bb-c

00 0

-1

b c a a a ca

90

10

0-1

T=P=

c bbc ca a c cb a c b8-1

abc

000

a

0

00

a-bb-c

00 0

-1

b c a a a ca

90

10

0-1

T=P=

• The running time can be improved to

oDeterministic O(n log |Σ|) oRandomized O(n)