Faster 2-Dimensional Scaled Matching

24
Faster 2-Dimensional Faster 2-Dimensional Scaled Matching Scaled Matching Amihood Amir and Eran Chencinski

description

Faster 2-Dimensional Scaled Matching. Amihood Amir and Eran Chencinski. Real Scaling. Given an n x n Text T, m x m pattern P, find all occurrences of P in T, scaled to any read scale Best known algorithm [Amir at el.]: Time: O(nm 3 +n 2 m*log(m)) Space: O(nm 3 +n 2 ) Our Altorithm: - PowerPoint PPT Presentation

Transcript of Faster 2-Dimensional Scaled Matching

Page 1: Faster 2-Dimensional  Scaled Matching

Faster 2-Dimensional Faster 2-Dimensional Scaled MatchingScaled Matching

Amihood Amir and Eran Chencinski

Page 2: Faster 2-Dimensional  Scaled Matching

Real ScalingReal Scaling

Given an n x n Text T, m x m pattern P, find all Given an n x n Text T, m x m pattern P, find all occurrences of P in T, scaled to any read scaleoccurrences of P in T, scaled to any read scale

Best known algorithm [Amir at el.]:Best known algorithm [Amir at el.]: Time:Time: O(nm O(nm33+n+n22m*log(m)) m*log(m)) Space:Space: O(nm O(nm33+n+n22))

Our Altorithm:Our Altorithm: Time:Time: O(n O(n22m) m) Space:Space: O(n O(n22))

Page 3: Faster 2-Dimensional  Scaled Matching

Scaling – Geometric Scaling – Geometric DefinitionDefinition

Page 4: Faster 2-Dimensional  Scaled Matching

Scaling – Algebraic Scaling – Algebraic DefinitionDefinition

Rounding Function:Rounding Function:

Page 5: Faster 2-Dimensional  Scaled Matching

Scaling – Algebraic Scaling – Algebraic DefinitionDefinition

Given pattern P, of size m x m, and scale rGiven pattern P, of size m x m, and scale r The first row would be scaled to || 1*r ||The first row would be scaled to || 1*r || The first 2 rows would be scaled to || 2*r ||The first 2 rows would be scaled to || 2*r || …… The first m rows would be scaled to || m*r ||The first m rows would be scaled to || m*r ||

Similarly on the columnsSimilarly on the columns

Page 6: Faster 2-Dimensional  Scaled Matching

Scaling – Algebraic Scaling – Algebraic DefinitionDefinition

Rounding Function:Rounding Function:

Inverse Rounding Function: suppose we Inverse Rounding Function: suppose we know that K rows where scaled to L row:know that K rows where scaled to L row:

Page 7: Faster 2-Dimensional  Scaled Matching

Subrow/column Repetition Subrow/column Repetition QueryQuery

Query time: O(1), preprocessing time: O(nQuery time: O(1), preprocessing time: O(n22))

Page 8: Faster 2-Dimensional  Scaled Matching

Algorithm LayoutAlgorithm Layout

The algorithm consists of 4 stages:The algorithm consists of 4 stages:1. Scale Elimination1. Scale Elimination2. Candidate Consistency2. Candidate Consistency3. Candidate Verification3. Candidate Verification4. Occurrence Recognition4. Occurrence Recognition

Each stage takes O(nEach stage takes O(n22m) time and O(nm) time and O(n22) ) spacespace

Page 9: Faster 2-Dimensional  Scaled Matching

Scale Elimination StageScale Elimination Stage

PivotPivot

Page 10: Faster 2-Dimensional  Scaled Matching

Scale Elimination StageScale Elimination Stage

(i,j)(i,j)

Page 11: Faster 2-Dimensional  Scaled Matching

Scale Elimination StageScale Elimination Stage

(i,j)(i,j)

O(m) time for each location, O(nO(m) time for each location, O(n22m) total, O(nm) total, O(n22) space) space

Page 12: Faster 2-Dimensional  Scaled Matching

Candidate Consistency Candidate Consistency StageStage

Page 13: Faster 2-Dimensional  Scaled Matching

Candidate Consistency Candidate Consistency StageStage

Case (a)Case (a) Case (b)Case (b)

Page 14: Faster 2-Dimensional  Scaled Matching

Witness Table ConstructionWitness Table Construction

For each suffix O(mFor each suffix O(m22) time and O(m) space) time and O(m) space

Page 15: Faster 2-Dimensional  Scaled Matching

Pre-Dueling StepPre-Dueling Step

For each candidate For each candidate cc in T: in T:For each suffix For each suffix ss of P: of P:Compare Compare c’sc’s borders with witness table borders with witness table borders of suffix borders of suffix ss

If borders are not the same – c is eliminatedIf borders are not the same – c is eliminated

Can be done in O(m) time for each candidateCan be done in O(m) time for each candidate

Page 16: Faster 2-Dimensional  Scaled Matching

Performing a DuelPerforming a Duel

Page 17: Faster 2-Dimensional  Scaled Matching

The Dueling OrderThe Dueling Order

Each candidate performs at most O(m) succ. duelsEach candidate performs at most O(m) succ. duels

Page 18: Faster 2-Dimensional  Scaled Matching

Witness Table construction: Witness Table construction: O(mO(m33) time, O(m) time, O(m22) space) space

Pre-Dueling Step:Pre-Dueling Step: O(nO(n22m) time, O(mm) time, O(m22) space) space

# of Duel# of Duel At most O(n) unsucc., at most O(nAt most O(n) unsucc., at most O(n22m) succ.m) succ.

where each duel takes O(1) timewhere each duel takes O(1) time

Total:Total: O(n O(n22m) time, O(nm) time, O(n22) space) space

Candidate Consistency Candidate Consistency StageStage

Page 19: Faster 2-Dimensional  Scaled Matching

Candidate Verification Candidate Verification StageStage

Page 20: Faster 2-Dimensional  Scaled Matching

Candidate Verification Candidate Verification StageStage

For each location find maximal containing For each location find maximal containing intervalinterval

Can be solved in O(n) time per row using solution Can be solved in O(n) time per row using solution to Maximal Interval Problemto Maximal Interval Problem

Page 21: Faster 2-Dimensional  Scaled Matching

Once we find the largest interval we: Once we find the largest interval we: Verify each row in O(m) time, using Verify each row in O(m) time, using

subcolumn repetition queriessubcolumn repetition queries Save the longest matching lengthSave the longest matching length For each candidate run a Range For each candidate run a Range

Minimum Query on the lengthsMinimum Query on the lengths

The pattern appears The pattern appears iffiff pattern size >= pattern size >= RMQRMQ

Candidate Verification Candidate Verification StageStage

Page 22: Faster 2-Dimensional  Scaled Matching

Finding largest intervals:Finding largest intervals: O(n) time per row, O(nO(n) time per row, O(n22) total) total

Verifing columns:Verifing columns: O(nm) time per row, O(nO(nm) time per row, O(n22m) totalm) total

RMQ :RMQ : Preprocess: O(n) time per row, O(nPreprocess: O(n) time per row, O(n22) total) total Quering: O(1) time per candidate, O(nQuering: O(1) time per candidate, O(n22) )

totaltotal

Total:Total: O(n O(n22m) time, O(nm) time, O(n22) space) space

Candidate Verification Candidate Verification StageStage

Page 23: Faster 2-Dimensional  Scaled Matching

Occurrence Recognition Occurrence Recognition StageStage

Recall: Scale elimination stage returned Recall: Scale elimination stage returned

At most O(m) steps At most O(m) steps per candiateper candiate

Total: O(nTotal: O(n22m) timem) time

Page 24: Faster 2-Dimensional  Scaled Matching

ConclusionsConclusions

The algorithm consists of 4 stages:The algorithm consists of 4 stages:1. Scale Elimination1. Scale Elimination2. Candidate Consistency2. Candidate Consistency3. Candidate Verification3. Candidate Verification4. Occurrence Recognition4. Occurrence Recognition

Each stage takes O(nEach stage takes O(n22m) time and O(nm) time and O(n22) ) spacespace