Rolf Backofen Danny Hermelin Gad M. Landau Oren Weimann

22
Rolf Backofen Rolf Backofen Danny Hermelin Danny Hermelin Gad M. Landau Gad M. Landau Oren Weimann Oren Weimann

description

Local Alignment of RNA Sequences with Arbitrary Scoring Schemes. Rolf Backofen Danny Hermelin Gad M. Landau Oren Weimann. C G. C G. G C. C. A U. U A. C G. A. G. U. A. G. U. C. G. A. C. G. U. G. U. C. A. A. A. C. G. U. U. G. G. C. RNA sequences. RNA sequences. - PowerPoint PPT Presentation

Transcript of Rolf Backofen Danny Hermelin Gad M. Landau Oren Weimann

Page 1: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann

Rolf BackofenRolf Backofen Danny Hermelin Danny Hermelin Gad M. Landau Gad M. LandauOren WeimannOren Weimann

Page 2: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann

RNA sequencesRNA sequencesC G

C G

G C

U A

A U

C G

C

A G U A G U

C C G U A G U A C C A C A G U G U G G

Page 3: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann

RNA sequencesRNA sequencesC G

C G

G C

U A

A U

C G

C

A G U A G U

C C G U A G U A C C A C A G U G U G G

Page 4: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann

RNA sequencesRNA sequencesC G

C G

G C

U A

A U

C G

C

A G U A G U

C C G U A G U A C C A C A G U G U G G

Page 5: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann

Alignment of StringsAlignment of Strings

Global Alignment: )(nmO

S1=

S2=

U C A C C G __ A __ G

U C G C G G U A U G

Page 6: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann

Alignment of RNA Alignment of RNA sequencessequences

A A G G C C C U G A U

A G A C C G U UA U

Page 7: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann

Alignment of RNA Alignment of RNA sequencessequences

A A G G C C C U G A U

A G A C C G U U U

Page 8: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann

Alignment of RNA Alignment of RNA sequencessequences

RNA Global Alignment via tree edit distance:

A A G G C C C U G A U

A G A C C G U U U

[K 1998]

)n(O)nm(O 422 [SZ 1989]

)nlogn(O)nlgnm(O 32

)n(O))1(lgnm(O 32 [DMRW 2006]

n

m

Theorem: All these algorithms compute the edit distance

between any two arcs provided we match these arcs.

Page 9: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann

The Alignment graphThe Alignment graph

U C A C C G A G

U

C

G

C

G

G

U

A

U

G

Theorem: There is a one to one correspondence between all paths in the alignment graph and all alignments of substrings of R1 and R2.

Page 10: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann

The Alignment graphThe Alignment graph

U C A C C G A G

U

C

G

C

G

G

U

A

U

G

Theorem: There is a one to one correspondence between all paths in the alignment graph and all alignments of substrings of R1 and R2.

Page 11: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann

The Alignment graphThe Alignment graph

U C A C C G A G

U

C

G

C

G

G

U

A

U

G

Page 12: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann

The Alignment graphThe Alignment graph

U C A C C G A G

U

C

G

C

G

G

U

A

U

G

Page 13: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann

The Alignment graphThe Alignment graph

U C A C C G A G

U

C

G

C

G

G

U

A

U

G

Theorem: There is a one to one correspondence between all paths in the alignment graph and all alignments of substrings of R1 and R2 in which all arcs are deleted.

Page 14: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann

The Alignment graphThe Alignment graph

U C A C C G A G

U

C

G

C

G

G

U

A

U

G

Page 15: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann

The Alignment graphThe Alignment graph

U C A C C G A G

U

C

G

C

G

G

U

A

U

G

Theorem: There is a one to one correspondence between HEAVIEST paths in the alignment graph and OPTIMAL alignments of substrings of R1 and R2.

Page 16: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann

The Local Alignment The Local Alignment algorithmsalgorithms

We use the alignment graph to We use the alignment graph to compute the local similarity between compute the local similarity between two RNA sequences according to two RNA sequences according to two well known metrics:two well known metrics: Smith-Waterman – the Smith-Waterman – the highest scoring

alignment between any pair of substrings of the input RNAs.

It’s normalized version. It’s normalized version.

Page 17: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann

Standard Local Similarity Standard Local Similarity (Smith-Waterman)(Smith-Waterman)

The score is computed The score is computed via dynamic program:via dynamic program:

Score(i,j) =Score(i,j) =

maxmax

U C A C C G A G

U

C

G

C

G

G

U

A

U

G

Score(i’,j’) + Weight of the incoming edge from (i’,j’)Score(i’,j’) + Weight of the incoming edge from (i’,j’),,

00Time complexity:

O(mn) + one run of a global algorithm = 1))n(lgO(m2 nm

Page 18: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann

Normalized Local SimilarityNormalized Local Similarity The weakness of Smith Waterman approach The weakness of Smith Waterman approach

[AP 2001]:[AP 2001]:

Solution: look for the substrings (with Solution: look for the substrings (with their arcs) that maximize: their arcs) that maximize:

and some given value.and some given value.

|'R||'R|

)'R,'ED(R

21

21

'R,'R 21

)'R,'ED(R 21

Page 19: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann

Normalized Local Similarity Local Similarity

Again, dynamic program: Again, dynamic program: U C A C C G A G

U

C

G

C

G

G

U

A

U

G

Define Define Length(k,i,j)) to be the length of to be the length of the shortest path that ends at vertex the shortest path that ends at vertex (i,j) and has weight equal to k.(i,j) and has weight equal to k.

• The best The best k/Length(k,i,j) over all ) over all i,j,ki,j,k is the normalized score. is the normalized score.

Page 20: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann

Normalized Local SimilarityNormalized Local Similarity

Again, dynamic program: Again, dynamic program:

Define Define Length(k,i,j)Length(k,i,j) to be the length of to be the length of the shortest path that ends at vertex the shortest path that ends at vertex (i,j) and has weight equal to k.(i,j) and has weight equal to k.

For every k,i,j compute For every k,i,j compute Length(k,i,jLength(k,i,j)) = =

minmin Length(k-w,i’,j’)Length(k-w,i’,j’) + (j’-j+i’-i) | where w = weight of the incoming edge from (i’,j’) + (j’-j+i’-i) | where w = weight of the incoming edge from (i’,j’)

Length(k-w,i’,j’)

Length(k,i,j)

w

j’-j

i’-i

Time complexity:

+ one run of a global algorithm = m)O(n2

m)O(n1))n(lgO(mm)O(n 222 nm

Page 21: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann

Open ProblemsOpen Problems

Arc deletion:Arc deletion:

Improve global tree edit distanceImprove global tree edit distance

U C A C C G A G

U

C

G

C

G

G

U

A

U

G

Page 22: Rolf Backofen  Danny Hermelin  Gad M. Landau Oren Weimann