Algorithms for Two Versions of LCS

31
gorithms for Two Versions of Problem for Indeterminate String

description

Algorithms for Two Versions of LCS. Problem. for Indeterminate Strings. Goal of this paper͙. • Study the classic LCS and the Constrained LCS (CLCS) problems for Indeterminate strings. • Present efficient algorithms to solve them. 5-9 Nov 2007. IWOCA 2007. 2. Longest Common Subsequence. - PowerPoint PPT Presentation

Transcript of Algorithms for Two Versions of LCS

Page 1: Algorithms for Two Versions of LCS

Algorithms for Two Versions of LCSProblem

for Indeterminate Strings

Page 2: Algorithms for Two Versions of LCS

Goal of this paper͙

• Study the classic LCS and the Constrained LCS(CLCS) problems for Indeterminate strings• Present efficient algorithms to solve them

5-9 Nov 2007 IWOCA 2007 2

Page 3: Algorithms for Two Versions of LCS

Longest Common Subsequence

• Given two sequences:- X = CAAGCTAAGCTAC- Y = TCAAGTAGAAC

• Common Subsequence: A Subseq common toboth X and Y.• LCS- A subseq having the highest length

5-9 Nov 2007 IWOCA 2007 3

Page 4: Algorithms for Two Versions of LCS

LCS-Example1 2 3 4 5 6 7 8 9 10 11

X= C A A G C T A A G C T

A common subseq: CCT

Y= C C Length = 3G T A T

1 2 3 4 5 6

5-9 Nov 2007 IWOCA 2007 4

Page 5: Algorithms for Two Versions of LCS

LCS-Example1 2 3 4 5 6 7 8 9 10 11 12

X= C A A G C T A A G C G T

Y= C C G T A T A Longest common subseq: CCTATLength = 5

1 2 3 4 5 6

5-9 Nov 2007 IWOCA 2007 5

Page 6: Algorithms for Two Versions of LCS

LCS-Example1 2 3 4 5 6 7 8 9 10 11 12

X= C A A G C T A A G C G T

Y= C C G T A T A Longest common subseq: CCTATLength = 5

1 2 3 4 5 6 Another LCS: CGTATLength = 5

5-9 Nov 2007 IWOCA 2007 6

Page 7: Algorithms for Two Versions of LCS

CLCS: A relatively New Variant

1 2 3 4 5 6 1 2 3 4 5 6

X= T C C A C A X= T C C A C A

Y= A C C A A G Y= A C C A A G

Z= A C Z= A C

5-9 Nov 2007 IWOCA 2007 7

Page 8: Algorithms for Two Versions of LCS

Different Setting͙

• We study LCS and CLCS for indeterminatestrings (i-strings)• We call the two problems ILCS and CILCSrespectively

5-9 Nov 2007 IWOCA 2007 8

Page 9: Algorithms for Two Versions of LCS

i-strings͙

• Let Σ = {A, C, G, T}• Then we can get 2^4 -1 = 15 non-empty setsof letters.• At each position of an i-string we have one ofthose sets.

5-9 Nov 2007 IWOCA 2007 9

Page 10: Algorithms for Two Versions of LCS

i-stringsΣ

A C G T

A C G A C T A G T C G T

C G A C A G A T C G C T

A C G T

5-9 Nov 2007 IWOCA 2007 10

Page 11: Algorithms for Two Versions of LCS

i-strings

1 2 3 4 5 6 7

X=

5-9 Nov 2007

TA C C A C

A

IWOCA 2007

TC C

11

Page 12: Algorithms for Two Versions of LCS

i-strings: Equality/Match

1 2 3 4 5 6 7 X[3] = Y[1]. WHY?

X= A

Y= A

TC C A C

A

CTA C

TC

Because, X[3] п Y[1] = A ≠ Ø

C Y = X[1..3]

Y = X[3..5]

Y = X[4..6]

T TA C C C A C

Interestingly, X[1..3] ≠ X[3..5]!!!

5-9 Nov 2007 IWOCA 2007

A A

X[1..3] X[3..5] 12

Page 13: Algorithms for Two Versions of LCS

i-strings: Equality/Match

1 2 3 4 5 6 7

X= A

Y= A

5-9 Nov 2007

TC C A C

A

CTA C

TC

X[3] =d Y[1]. WHY?C

Because, , X[3] п Y[1] = A ≠ Ø

Y =d X[1..3]

Y =d X[3..5]

Y =d X[4..6]

IWOCA 2007 13

Page 14: Algorithms for Two Versions of LCS

ILCS1 2 3 4 5 6 7

AX=

Y=

B D D A AA

F

A C DB A A AC D F

5-9 Nov 2007 IWOCA 2007 14

Page 15: Algorithms for Two Versions of LCS

CILCS1 2 3 4 5 6 7

AX=

Y=

Z=

B D D A AA

F

A C DB A A AC D F

B D D

5-9 Nov 2007 IWOCA 2007 15

Page 16: Algorithms for Two Versions of LCS

CILCS1 2 3 4 5 6 7

AX=

Y=

Y=

B D D A AA

F

A C DB A A AC D F

B D D

5-9 Nov 2007 IWOCA 2007 16

Page 17: Algorithms for Two Versions of LCS

Motivation͙

• Motivations for LCS and CLCS are well-known.• But, why indeterminate strings?

• Indeterminate strings are ubiquitous inbiological motifs

• And, both LCS and CLCS gets motivation frombioinformatics

5-9 Nov 2007 IWOCA 2007 17

Page 18: Algorithms for Two Versions of LCS

Naive Algorithms

• Using the existing LCS and CLCS algorithms wecan solve ILCS and CILCS easily.

5-9 Nov 2007 IWOCA 2007 18

Page 19: Algorithms for Two Versions of LCS

Naive ICLS Algorithm

• We use the basic and well-known O(n^2) DPsolution (Wagner&Fischer) to LCS:

5-9 Nov 2007 IWOCA 2007 19

Page 20: Algorithms for Two Versions of LCS

Naive ICLS Algorithm

• We use the basic and well-known O(n^2) DPsolution (Wagner&Fischer) to LCS:

=d

5-9 Nov 2007 IWOCA 2007 20

Page 21: Algorithms for Two Versions of LCS

Naive ILCS Algorithm…

• We assume a sorted order among the lettersin the sets of the i-strings

• Then, intersection can be done in O(|Σ|)time.• So total running time O(|Σ|n^2)

5-9 Nov 2007 IWOCA 2007 21

Page 22: Algorithms for Two Versions of LCS

Our Goal

• Our goal is to get a better running time thanO(|Σ|n^2).

5-9 Nov 2007 IWOCA 2007 22

Page 23: Algorithms for Two Versions of LCS

Our Strategy

• We want to facilitate an O(1) time evaluationfor =d i.e. indeterminate equality• To achieve that we do some preprocessing onthe input i-strings• Then we employ existing LCS algorithms

5-9 Nov 2007 IWOCA 2007 23

Page 24: Algorithms for Two Versions of LCS

Preprocessing 1 for ILCS• We compute the following table:

• With the above table, the indeterminateequality can evaluated in O(1).

5-9 Nov 2007 IWOCA 2007 24

Page 25: Algorithms for Two Versions of LCS

Computation of Table Σ ≡

X=

Y=

A

1

A

AT

C G T

2 3 4

TG C A

A

C TCA G

1 0 1 1 10 0 1 0 20 1 0 0 30 0 1 0 4

1 0 1 0 10 1 1 0 20 0 0 1 31 0 0 1 4

5-9 Nov 2007 IWOCA 2007 25

Page 26: Algorithms for Two Versions of LCS

Computation of Table

1 0 1 10 0 1 00 1 0 00 0 1 0

1 0 1 00 1 1 00 0 0 11 0 0 1

5-9 Nov 2007 IWOCA 2007 27

Page 27: Algorithms for Two Versions of LCS
Page 28: Algorithms for Two Versions of LCS

Complete Algorithm

• With Table I, we can evaluate =d in O(1).• So, the DP requires O(n^2)!

• But how much to compute Table I?

5-9 Nov 2007 IWOCA 2007 29

Page 29: Algorithms for Two Versions of LCS
Page 30: Algorithms for Two Versions of LCS
Page 31: Algorithms for Two Versions of LCS

Thank You