Regular Expression Constrained Sequence Alignment
description
Transcript of Regular Expression Constrained Sequence Alignment
![Page 1: Regular Expression Constrained Sequence Alignment](https://reader030.fdocuments.in/reader030/viewer/2022020418/568167ee550346895ddd5f26/html5/thumbnails/1.jpg)
Regular Expression Constrained Sequence Alignment
Abdullah N. ArslanAssistant Professor
Computer Science Department
![Page 2: Regular Expression Constrained Sequence Alignment](https://reader030.fdocuments.in/reader030/viewer/2022020418/568167ee550346895ddd5f26/html5/thumbnails/2.jpg)
Outline
• Sequence alignment Common frame-work DP solution Why constrained ?
• RE constrained sequence alignment Algorithm
• Concluding Remarks
![Page 3: Regular Expression Constrained Sequence Alignment](https://reader030.fdocuments.in/reader030/viewer/2022020418/568167ee550346895ddd5f26/html5/thumbnails/3.jpg)
Alignment Matrix
![Page 4: Regular Expression Constrained Sequence Alignment](https://reader030.fdocuments.in/reader030/viewer/2022020418/568167ee550346895ddd5f26/html5/thumbnails/4.jpg)
Edit Graph
![Page 5: Regular Expression Constrained Sequence Alignment](https://reader030.fdocuments.in/reader030/viewer/2022020418/568167ee550346895ddd5f26/html5/thumbnails/5.jpg)
Dynamic Programming Solution
Hi,j: maximum score achieved at (i, j)
where Hi,j = 0 whenever i=0 or j=0,
Hn,m in O(nm) time, O(m) space
![Page 6: Regular Expression Constrained Sequence Alignment](https://reader030.fdocuments.in/reader030/viewer/2022020418/568167ee550346895ddd5f26/html5/thumbnails/6.jpg)
DP Solution: Local Alignment
Hi,j: similarity score achieved at (i, j)
where Si,j = 0 whenever i=0 or j=0,
max Hi,j in O(nm) time, O(m) space
![Page 7: Regular Expression Constrained Sequence Alignment](https://reader030.fdocuments.in/reader030/viewer/2022020418/568167ee550346895ddd5f26/html5/thumbnails/7.jpg)
Dynamic Programming Formulation
Affine gap penalties Penalty for a gap of length k is +(k-1)
where Si,j = Fi,j = Ei,j = 0 when i=0 or j=0
max Hi,j O(nm) time, O(m) space
![Page 8: Regular Expression Constrained Sequence Alignment](https://reader030.fdocuments.in/reader030/viewer/2022020418/568167ee550346895ddd5f26/html5/thumbnails/8.jpg)
The Definition of the Constrained LCS Problem
• The contrained LCS (CLCS) problem Given strings S1,S2, and P
• Find lcs of S1 and S2 s.t. P is a subsequence of this lcs
• Motivation: Computing the homology of two biological
sequences that have a specific part in common
![Page 9: Regular Expression Constrained Sequence Alignment](https://reader030.fdocuments.in/reader030/viewer/2022020418/568167ee550346895ddd5f26/html5/thumbnails/9.jpg)
Constrained Sequence Alignment Problems
• Constrained LCS Tsai 2003, O(n2m2r) time Chin et. al 2004, Arslan and Egecioglu 2004
• O(nmr) time
• Edit-distance constrained sequence alignment Arslan and Egecioglu 2004, O(dnmr)
• Regular-expression constrained sequence alignment Motivation:
• Comet and Henry, 2002• PROSITE patterns
This paper
![Page 10: Regular Expression Constrained Sequence Alignment](https://reader030.fdocuments.in/reader030/viewer/2022020418/568167ee550346895ddd5f26/html5/thumbnails/10.jpg)
PROSITE patterns as constraints
• PROSITE patterns are Regular expressions with no Kleene closure PROSITE database e.g. [GA]-X(4)-G-K-[ST]
• ATP/GTP-binding site motif A (P-loop) (PS00017)
• Comet and Henry reward alignments• Regular expression constrained sequence
alignment Find a maximal alignment that includes a given
RE
![Page 11: Regular Expression Constrained Sequence Alignment](https://reader030.fdocuments.in/reader030/viewer/2022020418/568167ee550346895ddd5f26/html5/thumbnails/11.jpg)
Example: For [GA]-X(4)-G-K-[ST]
![Page 12: Regular Expression Constrained Sequence Alignment](https://reader030.fdocuments.in/reader030/viewer/2022020418/568167ee550346895ddd5f26/html5/thumbnails/12.jpg)
Using Edit Graph: e.g. A(C+G)*(S+T)
![Page 13: Regular Expression Constrained Sequence Alignment](https://reader030.fdocuments.in/reader030/viewer/2022020418/568167ee550346895ddd5f26/html5/thumbnails/13.jpg)
Automata for A(C+G)*(S+T)
![Page 14: Regular Expression Constrained Sequence Alignment](https://reader030.fdocuments.in/reader030/viewer/2022020418/568167ee550346895ddd5f26/html5/thumbnails/14.jpg)
Some Details of Automata Construction
• Equivalent NFA N to a given RE R
• Construct from N a new NxN automaton
Moves on edit operations • (or equivalently on alignment columns)
States have weights• Interested in the weights of the final states after the
alignment is complete
![Page 15: Regular Expression Constrained Sequence Alignment](https://reader030.fdocuments.in/reader030/viewer/2022020418/568167ee550346895ddd5f26/html5/thumbnails/15.jpg)
Weighted Automaton
• Initial weights are
• Weight of (q0,q0) is initially 0
• Update new maximum scores at reachable states
• Weights become in unreachable states
• What are the maximum weights at the final states?
![Page 16: Regular Expression Constrained Sequence Alignment](https://reader030.fdocuments.in/reader030/viewer/2022020418/568167ee550346895ddd5f26/html5/thumbnails/16.jpg)
Computations on Automata
![Page 17: Regular Expression Constrained Sequence Alignment](https://reader030.fdocuments.in/reader030/viewer/2022020418/568167ee550346895ddd5f26/html5/thumbnails/17.jpg)
Complexity• Simulate automata based on DP solution
Each steps requires examining the trasition functions
Maintain a list of active (reachable) states
Update state weights as alignments are formed
Automaton Mi,j has the optimum weights
![Page 18: Regular Expression Constrained Sequence Alignment](https://reader030.fdocuments.in/reader030/viewer/2022020418/568167ee550346895ddd5f26/html5/thumbnails/18.jpg)
Generalizations: Local Alignment & Affine gaps
![Page 19: Regular Expression Constrained Sequence Alignment](https://reader030.fdocuments.in/reader030/viewer/2022020418/568167ee550346895ddd5f26/html5/thumbnails/19.jpg)
CONCLUSION
• Introduced the regular expression constrained sequence alignment problem
• Present an algorithm for the problem
• Future work Generalization of the problem for
• Multiple sequence alignment• Multiple regular expressions as a constraint
![Page 20: Regular Expression Constrained Sequence Alignment](https://reader030.fdocuments.in/reader030/viewer/2022020418/568167ee550346895ddd5f26/html5/thumbnails/20.jpg)
Thank YouThank You