Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems:...
Transcript of Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems:...
![Page 1: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/1.jpg)
Dynamic ProgrammingPart I: Examples
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 1 / 77
![Page 2: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/2.jpg)
Dynamic Programming
Recall: the Change ProblemOther problems: Manhattan Tourist Problem, LCS ProblemFinally: Sequence alignments
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 2 / 77
![Page 3: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/3.jpg)
Manhattan Tourist Problem (MTP)
Imagine seeking a path (from sourceto sink) to travel (only eastward andsouthward) with the most number ofattractions (*) in the Manhattan grid
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 3 / 77
![Page 4: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/4.jpg)
Manhattan Tourist Problem (MTP)
Imagine seeking a path (from sourceto sink) to travel (only eastward andsouthward) with the most number ofattractions (*) in the Manhattan grid
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 4 / 77
![Page 5: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/5.jpg)
Manhattan Tourist Problem: Formulation
Goal: Find the longest path in a weighted grid.Input: A weighted grid G with two distinct vertices, one labeled “source"and the other labeled “sink"Output: A longest path in G from “source" to “sink"
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 5 / 77
![Page 6: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/6.jpg)
MTP: An example
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 6 / 77
![Page 7: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/7.jpg)
MTP: Greedy Algorithm Is Not Optimal
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 7 / 77
![Page 8: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/8.jpg)
MTP: Simple Recursive Program
1 MT(n,m)2 if n = 0 or m = 03 return MT(n,m)4 x ←MT(n-1,m)+ length of the edge from (n − 1,m) to (n,m)
5 y← MT(n,m-1)+length of the edge from (n,m − 1) to (n,m)
6 return max{x,y}
What’s wrong with this approach?
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 8 / 77
![Page 9: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/9.jpg)
MTP: Simple Recursive Program
1 MT(n,m)2 if n = 0 or m = 03 return MT(n,m)4 x ←MT(n-1,m)+ length of the edge from (n − 1,m) to (n,m)
5 y← MT(n,m-1)+length of the edge from (n,m − 1) to (n,m)
6 return max{x,y}
What’s wrong with this approach?
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 8 / 77
![Page 10: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/10.jpg)
MTP: Dynamic Programming
Calculate optimal path score for each vertex in the graphEach vertex’s score is the maximum of the prior vertices score plus theweight of the respective edge in between
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 9 / 77
![Page 11: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/11.jpg)
MTP: Dynamic Programming
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 10 / 77
![Page 12: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/12.jpg)
MTP: Dynamic Programming
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 11 / 77
![Page 13: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/13.jpg)
MTP: Dynamic Programming
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 12 / 77
![Page 14: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/14.jpg)
MTP: Dynamic Programming
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 13 / 77
![Page 15: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/15.jpg)
MTP: Dynamic Programming
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 14 / 77
![Page 16: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/16.jpg)
MTP: Recurrence
Computing the score for a point (i , j) by the recurrence relation:
si ,j = max{
si−1,j + weight of the edge between (i − 1, j)and (i , j)si ,j−1 + weight of the edge between (i , j − 1)and (i , j)
The running time is n ×m for a n by m grid(n = # of rows, m = # of columns)
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 15 / 77
![Page 17: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/17.jpg)
Manhattan is not a perfect Grid
Represented as a DAG: Directed Acyclic Graph
The score at point B is given by:
si ,j = max{
si−1,j + weight of the edge between(i − 1, j)and(i , j)si ,j−1 + weight of the edge between(i , j − 1)and(i , j)
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 16 / 77
![Page 18: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/18.jpg)
Manhattan Is Not A Perfect Grid
Computing the score for point x is given by the recurrence relation:
sx = max{
sy + weight of vertex(y , x)wherey ∈ Predecessors(x)
Predecessors(x): set of vertices that have edges leading to x.
The running time for a graph G (V ,E ) (V is the set of all vertices and E isthe set of all edges) is O(E ) since each edge is evaluated once.
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 17 / 77
![Page 19: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/19.jpg)
Longest Path in DAG Problem
Goal: Find a longest path between two vertices in a weighted DAG
Input: A weighted DAG G with source and sink vertices
Output: A longest path in G from source to sink
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 18 / 77
![Page 20: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/20.jpg)
Longest Path in DAG: Dynamic Programming
Suppose vertex v has indegree 3 and predecessors {u1, u2, u3}Longest path to v from source is:
sv = max
su1 + weight of edge from u1 to vsu2 + weight of edge from u2 to vsu3 + weight of edge from u3 to v
In general:
sv = maxu{su + weight of edge from u to v}
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 19 / 77
![Page 21: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/21.jpg)
Traveling in the Grid
The only hitch is that one must decide on the order in which visit theverticesBy the time the vertex x is analyzed, the values sy for all itspredecessors y should be computed - otherwise we are in troubleWe need to traverse the vertices in some orderTry to find such order for a directed cycle
???
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 20 / 77
![Page 22: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/22.jpg)
Topological ordering
2 different topological orderings of the DAG
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 21 / 77
![Page 23: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/23.jpg)
Traversing the Manhattan Grid
3 different strategies:a) Column by columnb) Row by rowc) Along diagonals
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 22 / 77
![Page 24: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/24.jpg)
Pseudo-code MTP: Dynamic Programming
1 ManhattanTourist(w↓,~w ,w↘i ,j ,n,m)2 for i ← 1 to n3 si ,0 ← si−1,0 + w↓i ,04 for j ← 1 to m5 s0,j ← s0,j−1 + ~w0,j
6 for i ← 1 to n7 for j ← 1 to m8
si ,j = max
si−1,j + w↓i ,jsi ,j−1 + ~wi ,j
si−1,j−1 + w↘i ,j9 return sn,m
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 23 / 77
![Page 25: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/25.jpg)
Dynamic ProgrammingPart II: Edit Distance & Alignments
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 24 / 77
![Page 26: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/26.jpg)
Aligning DNA Sequences
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 25 / 77
![Page 27: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/27.jpg)
LCS Alignment without mismatches
LCS: Longest Common SubsequenceGiven two sequences
v = v1v2...vm and w = w1w2...wn
The LCS of v and w is a sequence of positions in
v : 1 < i1 < i2 < ... < it < m
and a sequence of positions in
w : 1 < j1 < j2 < ... < jt < n
such that it-th letter of v equals to jt-th letter of w and t is maximal
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 26 / 77
![Page 28: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/28.jpg)
LCS: Example
Every common subsequence is a path in 2-D grid
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 27 / 77
![Page 29: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/29.jpg)
LCS Problem as Manhattan Tourist Problem
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 28 / 77
![Page 30: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/30.jpg)
Edit Graph for LCS Problem
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 29 / 77
![Page 31: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/31.jpg)
Edit Graph for LCS Problem
Every path is a common subsequence.Every diagonal edge adds an extra element to common subsequence.LCS Problem: Find a path with maximum number of diagonal edges.
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 30 / 77
![Page 32: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/32.jpg)
Computing LCS
Let vi= prefix of v of length i : v1...viand wj = prefix of w of length j : w1...wj
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 31 / 77
![Page 33: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/33.jpg)
Every Path in the Grid Corresponds to an Alignment
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 32 / 77
![Page 34: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/34.jpg)
Aligning Sequences without Insertions and Deletions:Hamming Distance
Given two DNA sequences v and w :
The Hamming distance: dH(v ,w) = 8 is large but the sequences are verysimilar
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 33 / 77
![Page 35: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/35.jpg)
Aligning Sequences with Insertions and Deletions
By shifting one sequence over one position:
The edit distance: dL(v ,w) = 2Hamming distance neglects insertions and deletions in DNA
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 34 / 77
![Page 36: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/36.jpg)
Levenshtein or edit distance
DefinitionThe Levenshtein distance or edit distance dL between two sequences X andY is is the minimum number of edit operations of type
Replacement,Insertion, orDeletion,
that one needs to transform sequence X into sequence Y :
dL(X ,Y ) = min{R(X ,Y ) + I (X ,Y ) + D(X ,Y )}
Using M for match, an edit transcript is a string over the alphabet I, D,R, M that describes a transformation of X to Y .
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 35 / 77
![Page 37: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/37.jpg)
Levenshtein or edit distance
Example: Given two stringsX = YESTERDAYY = EASTERS
.
Here is a minimum edit transcript for the above example:
edit transcript= D M I M M M M R D D
X= Y E S T E R D A YY= E A S T E R S
The edit distance dL(X ,Y ) of X ,Y is 5.
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 36 / 77
![Page 38: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/38.jpg)
Edit Distance vs Hamming Distance
Hamming distance alwayscompares i th letter of v with i th
letter of w
Hamming distance:
d(v ,w) = 8
Computing Hamming distance isa trivial task.
Edit distance may compare i th
letter of v with j th letter of w
Edit distance:
d(v ,w) = 2
Computing edit distance is anon-trivial task.
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 37 / 77
![Page 39: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/39.jpg)
Edit Distance vs Hamming Distance
Hamming distance alwayscompares i th letter of v with i th
letter of w
Hamming distance:
d(v ,w) = 8
Computing Hamming distance isa trivial task.
Edit distance may compare i th
letter of v with j th letter of w
Edit distance:
d(v ,w) = 2
Computing edit distance is anon-trivial task.
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 37 / 77
![Page 40: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/40.jpg)
Edit Distance: Example
What is the edit distance? 5?
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 38 / 77
![Page 41: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/41.jpg)
Edit Distance: Example
What is the edit distance? 5?
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 38 / 77
![Page 42: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/42.jpg)
Edit Distance: Example
Can it be done in 3 steps?
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 39 / 77
![Page 43: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/43.jpg)
Edit Distance: Example
Can it be done in 3 steps?
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 39 / 77
![Page 44: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/44.jpg)
The Alignment Grid
Every alignment path is from sourceto sink
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 40 / 77
![Page 45: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/45.jpg)
Alignment as a path in the Edit Graph
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 41 / 77
![Page 46: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/46.jpg)
Alignments in Edit Graph
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 42 / 77
![Page 47: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/47.jpg)
Alignments in Edit Graph
Every path in the edit graphcorresponds to alignment:
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 43 / 77
![Page 48: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/48.jpg)
Alignments in Edit Graph
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 44 / 77
![Page 49: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/49.jpg)
Alignments in Edit Graph
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 45 / 77
![Page 50: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/50.jpg)
Alignment: Dynamic Programming
si ,j = max
si−1,j−1 + 1 if vi = wj↘si−1,j ↓si ,j−1 →
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 46 / 77
![Page 51: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/51.jpg)
Dynamic Programming Example
Initialize 1st row and 1st columnto be all zeroesOr, to be more precise, initialize0th row and 0th column to be allzeroes
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 47 / 77
![Page 52: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/52.jpg)
Dynamic Programming Example
si ,j = max
si−1,j−1 : value from NW +1
if vi = wj ↖si−1,j : value from N ↑si ,j−1 : value from W ←
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 48 / 77
![Page 53: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/53.jpg)
Alignment: Backtracking
Arrows indicate where the score originated from:
↖↑←
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 49 / 77
![Page 54: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/54.jpg)
Backtracking Example
Find a match in row and column2i=2, j=2, 5 is a match (T).j=2, i=4, 5, 7 is a match (T).Since vi = wj , si ,j = si−1,j−1 + 1
s2,2 = [s1,1 = 1] + 1
s2,5 = [s1,4 = 1] + 1
s4,2 = [s3,1 = 1] + 1
s5,2 = [s4,1 = 1] + 1
s7,2 = [s6,1 = 1] + 1
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 50 / 77
![Page 55: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/55.jpg)
Dynamic Programming Example
Continuing with the dynamicprogramming algorithm gives thisresult.
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 51 / 77
![Page 56: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/56.jpg)
Alignment: Dynamic Programming
si ,j = max
si−1,j−1 + 1 if vi = wj↘si−1,j ↓si ,j−1 →
This recurrence corresponds to the Manhattan Tourist problem (threeincoming edges into a vertex) with all horizontal and vertical edgesweighted by zero.
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 52 / 77
![Page 57: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/57.jpg)
Alignment: Dynamic Programming
si ,j = max
si−1,j−1 + 1 if vi = wj↘si−1,j ↓si ,j−1 →
This recurrence corresponds to the Manhattan Tourist problem (threeincoming edges into a vertex) with all horizontal and vertical edgesweighted by zero.
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 52 / 77
![Page 58: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/58.jpg)
LCS Algorithm1 LCS(v,w)2 for i ← 1 to n3 si ,0←04 for j ← 1 to m5 s0,j←06 for i ← 1 to n7 for j ← 1 to m8
si ,j = max
si−1,j−1 + 1 if vi = wjsi−1,jsi ,j−1
9
bi ,j =
↑ if si ,j = si−1,j← if si ,j = si ,j−1↖ if si ,j = si−1,j−1
10 returnBioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 53 / 77
![Page 59: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/59.jpg)
Now What?
LCS(v,w) created thealignment gridNow we need a way to readthe best alignment of v andwFollow the arrows backwardsfrom sink
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 54 / 77
![Page 60: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/60.jpg)
Printing LCS: Backtracking
1 PrintLCS(b,v,i,j)2 if i = 0 or j = 03 return4 if bi ,j =↖5 PrintLCS(b,v,i-1,j-1)6 print vi
7 else8 if bi ,j =↑9 PrintLCS(b,v,i-1,j)10 else11 PrintLCS(b,v,i,j-1)
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 55 / 77
![Page 61: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/61.jpg)
Now What?
Alignment:A T C G - T A C -A T - G T T A - T
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 56 / 77
![Page 62: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/62.jpg)
Now What?
Alignment:A T C G - T A C -A T - G T T A - T
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 56 / 77
![Page 63: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/63.jpg)
LCS Runtime
It takes O(nm) time to fill in the n ×m dynamic programming matrix.Why O(nm)? The pseudocode consists of a nested “for" loop inside ofanother “for" loop to set up a n ×m matrix.
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 57 / 77
![Page 64: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/64.jpg)
Dynamic ProgrammingPart II: Sequence Alignment
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 58 / 77
![Page 65: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/65.jpg)
Outline
Dot plotsGlobal AlignmentScoring MatricesLocal AlignmentAlignment with Affine Gap Penalties
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 59 / 77
![Page 66: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/66.jpg)
Types of sequence alignments
Dot plotsNumber of sequences
I pairwise: compares two sequencesI multiple: compares several sequences
Portion of aligned sequenceI global: aligns the sequences over all their lengthI local: finds subsequences with the best similarity scores
AlgorithmsI Optimal methods: Needleman-Wunsch, Smith-WatermanI Heuristics: FASTA, BLAST
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 60 / 77
![Page 67: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/67.jpg)
Dot plot
→ the simplest way to visualize the similarity between two protein/DNAsequences is to use a similarity matrix
Identification of insertions/deletionsIdentification of direct repeats or inversionsSteps to create a dot plot
I 2D matrixI One sequence on the topI One sequence on the leftI For each matrix cell, compare the symbols and draw a point if there is
a coincidence
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 61 / 77
![Page 68: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/68.jpg)
Dot plot
→ the simplest way to visualize the similarity between two protein/DNAsequences is to use a similarity matrix
Identification of insertions/deletionsIdentification of direct repeats or inversionsSteps to create a dot plot
I 2D matrixI One sequence on the topI One sequence on the leftI For each matrix cell, compare the symbols and draw a point if there is
a coincidence
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 61 / 77
![Page 69: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/69.jpg)
Dot matrix sequence comparison
A dot matrix analysis is primarily a method for comparing two sequences.An (n ×m) matrix relating two sequences of length n and m respectively isproduced by placing a dot at each cell for which the corresponding symbolsmatch. Here is an example for the two sequences:
IMISSMISSISSIPPI andMYMISSISAHIPPIE:
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 62 / 77
![Page 70: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/70.jpg)
Dot matrix sequence comparison
DefinitionLet S = s1s2 . . . sn and T = t1 . . . tm be two strings of length n and mrespectively. Let M be an n ×m matrix. Then M is a (simple) dot plot iffor i , j , 1 ≤ i ≤ n, 1 ≤ j ≤ m :
M[i , j ] ={
1 for si = tj0 else.
Note: The longest common substring within the two strings S and T isthen the longest matrix subdiagonal containing only 1s. However, ratherthan drawing the letter 1 we draw a dot.
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 63 / 77
![Page 71: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/71.jpg)
Dot matrix sequence comparison
Some of the properties of a dot plot arethe visualization is easy to understandit is easy to find common substrings, they appear as contiguous dotsalong a diagonalit is easy to find reversed substringsit is easy to discover displacementsit is easy to find repeats
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 64 / 77
![Page 72: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/72.jpg)
Dot plots
Sequence length: n and mO(nm)
DNA Protein
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 65 / 77
![Page 73: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/73.jpg)
Noise in Dot plot
LDL human receptorcompared to himself
The low density lipoprotein (LDL)receptor is a cell surface proteinthat plays a central role in themetabolism of cholesterolin humans and animals. Mutationsaffecting its structureand function give rise to one of the mostprevalenthuman genetic diseases, familialhypercholesterolemia.
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 66 / 77
![Page 74: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/74.jpg)
Reducing the noise
To reduce the noise, a window size w and a stringency s are used and adot is only drawn at point (x , y) if in the next w positions at least scharacters are equal.
Example: Phage P22
w = 1, s = 1 w = 11, s = 7 w = 23, s = 15
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 67 / 77
![Page 75: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/75.jpg)
Random similarity in Dot plots
When comparing DNA, there is a 14 probability of random matches
When comparing protein sequences there is a 120 probability of random
matchesHence, if coding DNA regions are analized: translate first, then align!You can always go back to DNA after alignment
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 68 / 77
![Page 76: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/76.jpg)
w=1
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 69 / 77
![Page 77: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/77.jpg)
w=3
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 70 / 77
![Page 78: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/78.jpg)
w=3, stringency 2
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 71 / 77
![Page 79: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/79.jpg)
DNA sequence
Simple dot plot, w = 1 w = 23, stringency = 16
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 72 / 77
![Page 80: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/80.jpg)
Protein sequence
Simple dot plot, w = 1 w = 23, stringency = 6
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 73 / 77
![Page 81: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/81.jpg)
Insertion Deletion & Inversion
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 74 / 77
![Page 82: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/82.jpg)
Repeats
ABCDEFGEFGHIJKLMNO
Tandem duplication Tandem duplicationCompared to non duplicated Compared to itself
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 75 / 77
![Page 83: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/83.jpg)
Palindromic repeat (intra chain)
5’ GGCGG 3’
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 76 / 77
![Page 84: Dynamic Programming -class4- · Dynamic Programming Recall: theChange Problem Otherproblems: ManhattanTouristProblem,LCSProblem Finally: Sequencealignments BioinfoI (InstitutPasteurdeMontevideo)](https://reader033.fdocuments.in/reader033/viewer/2022060502/5f1c2f069adbbc36e95f06cc/html5/thumbnails/84.jpg)
Limitations of dot plots
No score to quantify identical or similar stringsRuntime is quadratic; more efficient algorithms to identify identicalsubstrings exist (eg. based on suffix trees)
Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 77 / 77