Sorting suffixes of two-pattern strings
description
Transcript of Sorting suffixes of two-pattern strings
![Page 1: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/1.jpg)
Sorting suffixes of two-pattern strings
F. Franek & W.F. SmythAlgorithms Research Group
Computing and SoftwareMcMaster University
Hamilton, Ontario
Canada
PSC04, Praha, Czech Republic, August-September 2004
Slide 1
![Page 2: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/2.jpg)
In 2003 several very different linear-time (recursive) algorithms to sort suffixes of strings appeared. All work in four basic steps:
1. Split all suffixes into two sets
2. Sort the first set of suffixes by recursion (recursive reduction of the problem)
3. Sort the second set of suffixes based on the order of the first set
4. Merge both sorted sets together
Slide 2
![Page 3: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/3.jpg)
Our question --- will two-pattern strings exhibit a “natural” tendency to reduce the problem in a recursive fashion?
Two-pattern strings were introduced by us as a generalization of Sturmian (and hence Fibonacci) strings.
Let p, q be binary strings. σ = [p,q,i,j]λ is an expansion of scope λ if |p|, |q| ≤ λ and i ≠ j non-negative integers. We require p and q to be dissimilar enough to be efficiently recognizable (see the paper for the details).
Slide 3
![Page 4: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/4.jpg)
Slide 4
σ(a)=piq, σ(b)=pjq, σ(x[1..n])=σ(x[1])σ(x[2..n])σ1○σ2(x)=σ1(σ2(x))x is two-pattern string of scope λ iff there is a sequence σ1, σ2,...,σn of expansions of scope λ so that x=σ1○σ2○ … ○σn(a)
The "nice" properties of two-pattern strings (see a series of papers by Franek, Smyth and others):
• can be recognized in linear time
![Page 5: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/5.jpg)
Slide 5
• when recognized, the canonical expansion sequence is computed
• repetitions and near repetitions can be effectively computed in linear time using recursive approach
• generalize finite fragments of the Fibonacci string and Sturmian strings
• can easily be generated and represented in recursive fashion
• exhibit rich yet comprehensible recursive structure
![Page 6: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/6.jpg)
Slide 6
• they occur relatively frequently among binary strings
An illustration of a very simple two-pattern string; will be used later to illustrate the workings of the algorithm:
[a,b,2,3] apply to a: a → aab
[ba,ab,1,2] apply to aab: aab → baab baab babaab
baabbaabbabaab is a two-pattern string of scope 2
![Page 7: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/7.jpg)
Slide 7
Now we can rephrase our question:
Given an expansion σ and knowing the order of suffixes of a two-pattern string x, can we efficiently determine the order of suffixes of σ(x)?
The answer is yes and in the following we describe the algorithm.
So let x be a two-pattern string of scope λ and let σ = [p,q,i,j]λ be an expansion and let y = σ(x). Let ρ1 < ρ2 < … < ρ|x| the sorted suffixes of x.
We are assuming that q —p (since then x1 < x2 iff σ(x1) < σ(x2), otherwise we work with
![Page 8: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/8.jpg)
Slide 8
complements and reverse the resulting order of suffixes while taking complements of the suffixes).
First we assign all suffixes of y into various buckets:
……….
Aδ,k = {δpkqσ(ρ) : ρ is a proper suffix of x or ρ=ε}
δ is a suffix of p and 0 < k < i
δ pk q σ(ρ)
![Page 9: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/9.jpg)
Slide 9
……….
Aδ,i = {δpiqσ(ρ) : ρ is a proper suffix of x or ρ=ε}
δ is a suffix of p and also a suffix of q orAδ,i = {δpiqσ(ρ) : bρ proper suffix of x, ρ can be ε}
δ is a suffix of p and not a suffix of q
……….
![Page 10: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/10.jpg)
Slide 10
……….
Aδ,k = {δpkqσ(ρ) : bρ proper suffix of x, ρ can be ε}
δ is a suffix of p and i < k < j
δ pk q σ(ρ)
……….
δ q σ(ρ)
Bδ = {δqσ(ρ) : ρ proper nontrivial suffix of x}
δ is a suffix of p and i < k < j
![Page 11: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/11.jpg)
Slide 11
……….
δ pi q σ(ρ)
Cδ = {δpiqσ(ρ) : aρ proper suffix of x, ρ can be ε}
δ is a suffix of q but not of p
……….
δ pj q σ(ρ)
Dδ = {δpjqσ(ρ) : bρ proper suffix of x, ρ can be ε}
δ is a suffix of q
![Page 12: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/12.jpg)
Slide 12
E = {δ: δ is a nontrivial suffix of p or q}……….
δ……….
δ• All suffixes are covered by A-E !
• Order of suffixes in buckets A-D determined by ρ !
• A-D buckets are order invariant !
![Page 13: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/13.jpg)
Slide 13
So, if we can determine the order of buckets, we can determine the order of all suffixes in buckets A-D. To merge in the suffixes from E is easy (brute force only requires ≤ 4λ2|y| steps).
The main results is based on the fact that the order of buckets A-D can be efficiently determined using 5 cases:
(C1) δ1 —δ2
(C2) δ2 —δ1
(C3) δ1 is a proper prefix ofδ2
(C4) δ2 is a proper prefix ofδ1
(C5) δ1=δ2=δ
![Page 14: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/14.jpg)
Slide 14
(C1) Aδ1,k1Aδ2,k2
(C2) Aδ2,k2Aδ1,k1
(C3) δ2=δ1μ
(a) if μ —p, then Aδ2,k2Aδ1,k1
(b) otherwise Aδ1,k1Aδ2,k2
(C4) δ1=δ2μ
(a) if μ —p, then Aδ1,k1Aδ2,k2
(b) otherwise Aδ2,k2Aδ1,k1
(C5) (a) if k1 < k2, then Aδ,k1Aδ,k2
(b) if k1 = k2, then Aδ,k1
=Aδ,k2
(c) if k1 > k2, then Aδ,k2
Aδ,k1
![Page 15: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/15.jpg)
Slide 15
(C1) Aδ1,k Bδ2
(C2) Bδ2 Aδ1,k
(C3) δ2=δ1μ
(a) if μ —p, then Bδ2 Aδ1,k
(b) otherwise Aδ1,k Bδ2
(C4) δ1=δ2μ
(a) if μpkq —pq, then Aδ1,k Bδ2
(b) otherwise Bδ2 Aδ1,k
(C5) Bδ Aδ,k
No bucket comparison requires more than 3λ steps.
![Page 16: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/16.jpg)
Slide 16
Similarly A~C, A~D, B~B, B~C, B~D, C~C, and C~D. One more example:
(C1) Bδ1 Bδ2
(C2) Bδ2 Bδ1
(C3) δ2=δ1μ
(a) if μqp —qp, then Bδ2 Bδ1
(b) otherwise Bδ1 Bδ2
(C4) δ1=δ2μ
(a) if μqp —qp, then Bδ1 Bδ2
(b) otherwise Bδ2 Bδ1
(C5) Bδ1 =Bδ2
![Page 17: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/17.jpg)
The High-level logic of the algorithm:
1. Create names (A,δ) for every suffix δ of p. (This requires at most λ steps. Each name will be eventually replaced by a sequence of buckets.)
2. Sort the names according to the comparisons of the four A buckets (according to (C1)-(C4)). (This requires at most 2λ3 steps as we are sorting λ names and each comparison requires at most 2λ steps.)
3. Replace every name (A,δ) by a sequence of names (A,δ,k), 0< k < j. Let us call the resulting
Slide 17
![Page 18: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/18.jpg)
Slide 18
BUCKETS. (Now we have the names of A bucketsin the proper order. This requires at most |y| steps as the size of BUCKETS is ≤ |y|.)
4. Create names (B,δ) for every suffix δ of p. (This requires at most λ steps.)
5. Merge into BUCKETS all names (B,δ) according to comparisons. (This requires at most |BUCKETS|3λ2 steps, as we are merging in λ names and each comparison requires ≤ 3λ steps)
![Page 19: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/19.jpg)
Slide 19
6. Create names (C,δ) for every suffix δ of q that is not a suffix of p. (This requires at most λ2 steps.)
7. Merge into BUCKETS all names (C,δ) according to comparisons. (This requires at most |BUCKETS|3λ2 steps.)
8. Create names (D,δ) for every suffix δ of q. (This requires at most λ steps.)
9. Merge into BUCKETS all names(D,δ) according to comparisons. (Now we have all required bucket names, except E, in proper order.
![Page 20: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/20.jpg)
Slide 20
This requires at most |BUCKETS|3λ2 steps.)
10. Traverse BUCKETS and replace each name by a sequence of suffixes according to the sequence of suffixes of x. Let us call this sequence SUFFIXES. (Now we have all suffixes from buckets A-D in proper order. This requires at most |y| steps.)
11. Merge into SUFFIXES the suffixes from the bucket E. (This requires at most |y|4λ2 steps.)
Done in less than (2λ3+14λ2+3λ+2)|y| steps!
![Page 21: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/21.jpg)
Slide 21
The algorithm works in 2(2λ3+14λ2+3λ+2)n steps, where n is the size of the input string.
An example:
x = aab$
y = baabbaabb a b a a b $σ=[ba,ab,1,2]ordered suffixes of x:1 2 3ordered suffixes of y:12 2 6 13 10 3 7 14 11 1 5 9 4 8
1 2 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14
![Page 22: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/22.jpg)
Slide 22
Aba,1 = {babaabσ(ρ) : bρ proper suffix of x, ρ can be
ε}= {babaab}={9}Aa,1 = {abaabσ(ρ) : bρ proper suffix of x, ρ can be
ε}= {abaab}={10}Bba = {baabσ(ρ) : ρ proper suffix of x}={baabσ(ab),
baabσ(b)}={baabbaabbabaab, baabbabaab}={1,5}Ba = {aabσ(ρ) : ρ proper suffix of x}={aabσ(ab),
aabσ(b)}={aabbaabbabaab, aabbabaab}={2,6}Cab = {abbaabσ(ρ) : aρ proper suffix of x}=
{abbaabσ(b)}={abbaabbabaab}={3}Cb = {bbaabσ(ρ) : bρ proper suffix of x}=
{bbaabσ(b)}={bbaabbabaab}={4}
![Page 23: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/23.jpg)
Slide 23
Dab = {abbabaabσ(ρ) : bρ proper suffix of x, ρ can
be ε}= {abbabaab}={7}Db = {bbabaabσ(ρ) : bρ proper suffix of x, ρ can be
ε}= {bbabaab}={8}E = {baab, aab, ab, b}={11, 12, 13, 14}
Aba,1 Aa,1 (by C2)
Aba,1 Bba (by C5)
Aba,1 Ba (by C2)
Aba,1 Cab (by C2)
Aba,1 Cb (by C4a)
![Page 24: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/24.jpg)
Slide 24
Aba,1 Dab (by C2)
Aba,1 Db (by C4a)
Aa,1 Bba (by C1)
Aa,1 Ba (by C5)
Aa,1 Cab (by C3b)
Aa,1 Cb (by C1)
Aa,1 Dab (by C3b)
Aa,1 Db (by C1)
![Page 25: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/25.jpg)
Slide 25
Bba Ba (by C2)
Bba Cab (by C2)
Bba Cb (by C4a)
Bba Dab (by C2)
Bba Db (by C4a)
Ba Cab (by C3b)
Ba Cb (by C1)
Ba Dab (by C3b)
Ba Db (by C1)
![Page 26: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/26.jpg)
Slide 26
Cab Cb (by C1)
Cab Dab (by C5)
Cab Db (by C1)
Cb Dab (by C1)
Cb Db (by C5)
Dab Db (by C1)BaAa,1 Cab Dab Bba Aba,1Cb Db
2 6 10 3 7 1 5 9 4 8
12 13 14 11
![Page 27: Sorting suffixes of two-pattern strings](https://reader036.fdocuments.in/reader036/viewer/2022062408/568143a5550346895db02819/html5/thumbnails/27.jpg)
Slide 27
www.cas.mcmaster.ca/~franek