Permuted Scaled Matching

92
Permuted Scaled Matching Ayelet Butman Noa Lewenstein Ian Munro

description

Permuted Scaled Matching. Ayelet Butman Noa Lewenstein Ian Munro. Scale d matching. Input: Text T=t 1 ,…,t n Pattern P=p 1 ,…,p m Scaling:P [ i ] =p 1 …p 1 p 2 …p 2 … p m …p m Output:All text-locations j where  i s.t. p [ i ] matches at j. . i. i. i. Scaled matching. - PowerPoint PPT Presentation

Transcript of Permuted Scaled Matching

Page 1: Permuted Scaled Matching

Permuted Scaled

MatchingAyelet ButmanNoa Lewenstein

Ian Munro

Page 2: Permuted Scaled Matching

Scaled matchingInput: Text T=t1,…,tn

Pattern P=p1,…,pm

Scaling: P[i]=p1…p1p2…p2 … pm…pm

Output: All text-locations j where i s.t. p[i] matches at j.

i i i

Page 3: Permuted Scaled Matching

Scaled matchingcb aa

bb cc a aa a b babcb a

bb cc aa aa

Page 4: Permuted Scaled Matching

Permutation matchingInput: Text T=t1,…,tn

Pattern P=p1,…,pm

Permutation (of pattern):pπ(1)pπ(2)…pπ(m) where π is a permutation on [m].

Output: All text-locations j where a pattern permutation occurs.

Page 5: Permuted Scaled Matching

ba ca b ba c b babcb a

cb aa a bbPermutation matching

Page 6: Permuted Scaled Matching

ba ca b ba c b babcb a

ba ca b ba

Permutation matching

Page 7: Permuted Scaled Matching

Permutation matching• Easy to solve in O(n) time (linear size alphabets).

• The pattern matching version of Jumbled Indexing.

Page 8: Permuted Scaled Matching

Scaled permutation matching

• Match: First Permutation and then Scaling.

Page 9: Permuted Scaled Matching

Scaled permutation matching

cb aa

aa bb c ac a b babcb aaa bb cc aa

Page 10: Permuted Scaled Matching

Scaled permutation matching

• Match: First Permutation and then Scaling.

• B-Eres-Landau[04]: Scaled Permutation Matching in O(n) time.

• Open: Can one do the reverse efficiently, i.e. scaling and then permutation.

• Hard ?

How can we solve? First - Naïve algorithm

Page 11: Permuted Scaled Matching

Permuted scaled matching

Input: Text T=t1,…,tn

Pattern P=p1,…,pm

Output: All text-locations j where exist permuted scaled matching

Page 12: Permuted Scaled Matching

Permuted scaled matching

cb aa

bc aa b ca a b babcb a

bb cc aa aa

Page 13: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b

a ac bP=

T=

Page 14: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b

a ac bP=

T=

k=1

Page 15: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b

a ac bP=

T=

k=2

Page 16: Permuted Scaled Matching

Naïve algorithm1. Construct a table R of size (n+1)×|Σ| such

that R(i,j)=#σj(T[0, i]) for i ≥ 0 and R(−1, j) = 0.

2. For every 0 ≤ i < j ≤ n−1 such that j −i+ 1 = km for some natural number k ≥ 1 do:

a. Let r(l) =( R(j,l)−R(i−1,l))/#σl(P).b. if r(l) = k for each l, 0 ≤ l ≤ |Σ| − 1, then

announce that i is a k-scaled appearance.

Page 17: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b

a ac bP=

T=

Page 18: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c bT=

Page 19: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

T=

Page 20: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

T=

Page 21: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

T=

Page 22: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

T=

Page 23: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

T=

Page 24: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

aT=

Page 25: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

T=

Page 26: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

T=

Page 27: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac bP=

T=

Page 28: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 1P=

T=

K=

Page 29: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 1#a=2

#b=#c=1

P=

T=

K=

Page 30: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 1#a=2

#b=#c=1

P=

T=

K=

Page 31: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 1#a=2

#b=#c=1

P=

T=

K=

Page 32: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 1K=#a=2

#b=#c=1 = = 1

= = 1

= = 1

P=

T=

Page 33: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 1#a=2

#b=#c=1 =

= 1

= 0

K=

P=

T=

Page 34: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 2#a=2

#b=#c=1 =

= = 2

= = 2

K=

P=

T=

Page 35: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 2#a=2

#b=#c=1 =

= = 2

= = 2

K=

P=

T=

Page 36: Permuted Scaled Matching

Naïve algorithm

a abc aa a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

101

1111

a

2

11

3

11

3

21

3

22

4

22

4

32

4

42

5

42

6

42

6

43

a ac b 2#a=2

#b=#c=1 =

= = 2

= 2

K=

P=

T=

Page 37: Permuted Scaled Matching

Naïve algorithmThe running time is

where .

Page 38: Permuted Scaled Matching

Better?

• Mod-Equivalency: i and j are Mod-Equivalent if for every

character σ (with frequency c in P):

#σ in T[0,i] mod c = #σ in T[0,j] mod c

• Equal-Quotients:i and j have equal-quotients for char’s a & b if:

Properties

Page 39: Permuted Scaled Matching

Mod-equivalent

• Mod-Equivalency: i and j are Mod-Equivalent if for every

character σ (with frequency c in P):

#σ in T[0,i] mod c = #σ in T[0,j] mod c

Page 40: Permuted Scaled Matching

Mod-equivalent

c bbc ca a c cb a c b1 1020 113 4 5 86 7. 9 12-1

abc

000

001

002

0121

a

1

21

2

21

2

31

2

32

3

32

3

42

3

52

3

53

3

63

3

64

a ac bP=#a=2

#b=#c=1T=

Page 41: Permuted Scaled Matching

Mod-equivalent

c bbc ca a c cb a c b113

abc

a

1

21

3

63

a ac b#a=2

#b=#c=1

P=

T=

Page 42: Permuted Scaled Matching

Mod-equivalent

c bbc ca a c cb a c b113

abc

a

1

21

3

63

a ac b

a

#a=2

3𝑚𝑜𝑑2=1𝑚𝑜𝑑 2

P=

T=

Page 43: Permuted Scaled Matching

Mod-equivalent

c bbc ca a c cb a c b113

abc

a

1

21

3

63

a ac b

a

#a=2

3𝑚𝑜𝑑2=1𝑚𝑜𝑑 2

P=

T=

Page 44: Permuted Scaled Matching

Mod-equivalent

c bbc ca a c cb a c b113

abc

a

1

21

3

63

a ac b#a=2

3𝑚𝑜𝑑2=1𝑚𝑜𝑑 2

P=

T=

Page 45: Permuted Scaled Matching

Mod-equivalent

c bbc ca a c cb a c b113

abc

a

1

21

3

63

a ac b#b=1

3𝑚𝑜𝑑1=1𝑚𝑜𝑑1

P=

T=

Page 46: Permuted Scaled Matching

Mod-equivalent

c bbc ca a c cb a c b113

abc

a

1

21

3

63

a ac b#c=1

6𝑚𝑜𝑑1=2𝑚𝑜𝑑1

P=

T=

Page 47: Permuted Scaled Matching

Mod-equivalent

c bbc ca a c cb a c b113

abc

a

1

21

3

63

a ac bP=

T=

Page 48: Permuted Scaled Matching

Mod-equivalent

c bbc ca a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

002

0121

a

1

21

2

21

2

31

2

32

3

32

3

42

3

52

3

53

3

43

3

44

a ac b#a=2

P=

T=

Page 49: Permuted Scaled Matching

Mod-equivalent

c bbc ca a c cb a c b102

abc

0121

a

3

53

a ac b#a=2

3𝑚𝑜𝑑2≠0𝑚𝑜𝑑2

P=

T=

Page 50: Permuted Scaled Matching

Mod-equivalent

c bbc ca a c cb a c b102

abc

0121

a

3

53

a ac b#a=2

3𝑚𝑜𝑑2≠0𝑚𝑜𝑑2

P=

T=

Page 51: Permuted Scaled Matching

Mod-equivalent

c bbc aa a c cb a a b113

abc

a

1

21

5

43

a ac b

5𝑚𝑜𝑑2=1𝑚𝑜𝑑2 3𝑚𝑜𝑑1=1𝑚𝑜𝑑1 4𝑚𝑜𝑑1=2𝑚𝑜𝑑1

P=

T=

Page 52: Permuted Scaled Matching

Equal-quotients• Equal-Quotients:

i and j have equal-quotients for char’s a & b if:

Page 53: Permuted Scaled Matching

Equal-quotients

c bbc aa a c cb a a b1 1020 113 4 5 86 7 9 12-1

abc

000

001

002

0121

a

1

21

2

21

2

31

2

32

3

32

3

42

4

42

4

43

5

43

5

44

a ac bP=

T=

Page 54: Permuted Scaled Matching

Equal-quotients

c bbc aa a c cb a a b113

abc

a

1

21

5

43

a ac bP=

T=

Page 55: Permuted Scaled Matching

Equal-quotients

c bbc aa a c cb a a b113

abc

a

1

21

5

43

a ac b

⌊ 52 ⌋− ⌊31 ⌋=⌊ 12 ⌋− ⌊

11 ⌋

P=

T=

Page 56: Permuted Scaled Matching

Equal-quotients

c bbc aa a c cb a a b113

abc

a

1

21

5

43

a ac b

⌊ 52 ⌋− ⌊31 ⌋=⌊ 12 ⌋− ⌊

11 ⌋

P=

T=

Page 57: Permuted Scaled Matching

Equal-quotients

c bbc aa a c cb a a b113

abc

a

1

21

5

43

a ac b

⌊ 52 ⌋− ⌊31 ⌋=⌊ 12 ⌋− ⌊

11 ⌋ ⌊ 31 ⌋− ⌊

41 ⌋=⌊ 11 ⌋− ⌊

21 ⌋

P=

T=

Page 58: Permuted Scaled Matching

Equal-quotients

c bbc ca a c cb a c b1 1020 113 4 5 86 7 9 12-1

abc

000

001

002

0121

a

1

21

2

21

2

31

2

32

3

32

3

42

3

52

3

53

3

63

3

64

a ac bP=

T=

Page 59: Permuted Scaled Matching

Equal-quotients

c bbc ca a c cb a c b113

abc

a

1

21

3

63

a ac bP=

T=

Page 60: Permuted Scaled Matching

Equal-quotients

c bbc ca a c cb a c b113

abc

a

1

21

3

63

a ac b

⌊ 32 ⌋− ⌊31 ⌋ ≠ ⌊

12 ⌋− ⌊

11 ⌋

P=

T=

Page 61: Permuted Scaled Matching

Equal-quotients

a aaa bb a a aa a a b

1 1520 3 … 10 1311 12 14-1ab

00

10

20

30

31

……

101

102

103

104

105

106

a aa b b b

b b b

P=T=

Page 62: Permuted Scaled Matching

Equal-quotients

a aaa bb a a aa a a b

15ab

3 …31

……

106

a aa b b b

b b b

⌊ 103 ⌋− ⌊ 63 ⌋=⌊ 33 ⌋− ⌊13 ⌋

P=T=

Page 63: Permuted Scaled Matching

Theorem

T[i, j] is a permuted k-scaling of P for some k iff

1. Locations i and j of T are mod-equivalent

2. Locations i and j of T satisfy the equal-quotients property for each pair of characters

Page 64: Permuted Scaled Matching

jiabcdef

a-bb-cc-dd-ee-f

Mod-Equivalent

Equal-quotients

Page 65: Permuted Scaled Matching

jiabcdef

a-bb-cc-dd-ee-f

Mod-Equivalent

Equal-quotients

Page 66: Permuted Scaled Matching

c bbc ca a c cb a c b

abc

a

a-bb-c

T=b c a a a caP=

2 8000

0

00

0-1

0-1

Page 67: Permuted Scaled Matching

Putting it together

Page 68: Permuted Scaled Matching

jiabcdef

a-bb-cc-dd-ee-f

Mod-Equivalent

Equal-quotients

0 1 2

Build a table R of size n×2|Σ|+1

Page 69: Permuted Scaled Matching

ji0 1 2

Each vector is associated with its location i

Page 70: Permuted Scaled Matching

ji0 1 2

Page 71: Permuted Scaled Matching

irisi1 i2 i3

Sort the vectors using Radix sort

Page 72: Permuted Scaled Matching

irisi1 i2 i3

Group the vectors into equivalence classes according to their prefix of length 2|Σ|−1.

Page 73: Permuted Scaled Matching

irisi1 i2 i3

For each equivalence class containing locations i1, i2,. . . , il announce appearances T[i + 1, j] for each i,j∈{i1, i2,. . . , il}, s.t. i < j.

Page 74: Permuted Scaled Matching

Putting it all togetherAlgorithm:

1. Build a table R of size n×2|Σ|+1.2. 0 ≤ i ≤ n−1:

0 ≤ j ≤ |Σ|−1:R(i,j)=#σj(T[0, i]) mod #σj(P)

|Σ|≤ j ≤ 2|Σ|−1:

Page 75: Permuted Scaled Matching

Putting it together3. Each vector is associated with its location i.4. Sort the vectors using Radix sort.5. Group the vectors into equivalence classes according to their prefix of length 2|Σ|−1.6. For each equivalence class containing locations i1, i2,. . . , il announce appearances T[i + 1, j] for each i,j∈{i1, i2,. . . , il}, s.t. i < j.

Page 76: Permuted Scaled Matching

Theorem

• The running time of the permuted scaled matching algorithm is:

O(n|Σ|+occ).

Page 77: Permuted Scaled Matching

Output representation• The output of the algorithm which we

denoted occ may be as large as O(n2/m).

• Example:o Text an.o Pattern am.

Page 78: Permuted Scaled Matching

Output representation• to reduce large number of appearances

set output to shortest match at each text location i.

a bbc aa a a ab a a b

a baP=

T=

Page 79: Permuted Scaled Matching

Output representation• to reduce large number of appearances

set output to shortest match at each text location i.

a bbc aa a a ab a a b

a baP=

T=

Page 80: Permuted Scaled Matching

Claim• Let i < j < h be three text locations. • Assume T[i, j] is a permuted scaled appearance of

P. • Then T[i, h] is a permuted scaled appearance of P

iff T[j + 1, h] is a permuted scaled appearance of P.

a bbc aa a a ab a a b

a baP=

T=

Page 81: Permuted Scaled Matching

Claim• Let i < j < h be three text locations. • Assume T[i, j] is a permuted scaled appearance of

P. • Then T[i, h] is a permuted scaled appearance of P

iff T[j + 1, h] is a permuted scaled appearance of P.

a bbc aa a a ab a a b

a baP=

T=

Page 82: Permuted Scaled Matching

Claim• Let i < j < h be three text locations. • Assume T[i, j] is a permuted scaled appearance of

P. • Then T[i, h] is a permuted scaled appearance of P

iff T[j + 1, h] is a permuted scaled appearance of P.

a bbc aa a a ab a a b

a baP=

T=

Page 83: Permuted Scaled Matching

Putting it all togetherAlgorithm:

1. Build a table R of size n×2|Σ|+1.2. For every 0 ≤ i ≤ n−1:

o For every 0 ≤ j ≤ |Σ|−1:R(i,j)=#σj(T[0, i]) mod #σj(P)

o For every |Σ|≤ j ≤ 2|Σ|−1:

Page 84: Permuted Scaled Matching

Putting it together3. Each vector is associated with its location i.4. Sort the vectors using Radix sort.5. Group the vectors into equivalence classes according to their prefix of length 2|Σ|−1.6. For each entry q’ containing linked list i1, i2,. . . , il announce appearances T[ir+1,ir+1] for each ir∈{i1, i2,. . . , il}.

Page 85: Permuted Scaled Matching

Running Time

• Permuted Scaled Matching:The running time is:

O(n|Σ|).

Page 86: Permuted Scaled Matching

For efficiency• Need to generate the vectors quickly.

• Need to compare vectors quickly.

Idea: hash

Page 87: Permuted Scaled Matching

• Need hash on vectors that can be modified quickly if vector changes very little.

• Use: hash – similar to Karp-Rabin

Page 88: Permuted Scaled Matching

i+1

iabcdef

a-bb-cc-dd-ee-f

Mod-Equivalent

Equal-quotients

At most 1 change

s

At most 2change

s

Page 89: Permuted Scaled Matching

c bbc ca a c cb a c b8-1

abc

000

a

0

00

a-bb-c

00 0

-1

b c a a a ca

90

10

0-1

T=P=

Page 90: Permuted Scaled Matching

c bbc ca a c cb a c b8-1

abc

000

a

0

00

a-bb-c

00 0

-1

b c a a a ca

90

10

0-1

T=P=

Page 91: Permuted Scaled Matching

• The running time can be improved to

oDeterministic O(n log |Σ|) oRandomized O(n)

Page 92: Permuted Scaled Matching