Intro to Alignment Algorithms: Global and...
Transcript of Intro to Alignment Algorithms: Global and...
![Page 1: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/1.jpg)
Intro to Alignment Algorithms:
Global and Local
Algorithmic Functions of Computational Biology
Professor Istrail
![Page 2: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/2.jpg)
Sequence ComparisonBiomolecular sequences □ DNA sequences (string over 4 letter alphabet {A, C,
G, T}) □ RNA sequences (string over 4 letter alphabet
{ACGU}) □ Protein sequences (string over 20 letter alphabet
{Amino Acids})
Sequence similarity helps in the discovery of genes, and the prediction of structure and function of proteins.
Algorithmic Functions of Computational Biology
Professor Istrail
![Page 3: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/3.jpg)
The Basic Similarity Analysis Algorithm
Global Similarity
• Scoring Schemes
• Edit Graphs
• Alignment = Path in the Edit Graph
• The Principle of Optimality
• The Dynamic Programming Algorithm
• The Traceback
Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 4: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/4.jpg)
The Sequence Alignment ProblemInput. : two sequences over the same alphabet and a scoring scheme Output: an alignment of the two sequences of maximum score
Example: □ GCGCATTTGAGCGA □ TGCGTTAGGGTGACCA A possible alignment: - GCGCATTTGAGCGA - - TGCG - - TTAGGGTGACC
match mismatch
indel
Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 5: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/5.jpg)
CSCI2820 - Class 4
Mismatch, Deletion, Insertion
5
mismatch
deletion (in
template)insertion
(in template)
TCAGGGGGCTATTAGTCCTCCGATAA
TCAGGGGGCTATTAGTCC-CCGATAATCAGGGGG-CTATTAGTCCCCCCGATAA
![Page 6: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/6.jpg)
m
n
yyyYxxxX
...21
...21
=
=Consider two sequences
Over the alphabet
},,, TGCA{=Σ
ji yx , belong to Σ
Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 7: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/7.jpg)
Scoring Schemes
δUnit-score A C G T
ACGT
11
11
-
-
0000
000
0 00
00
000
0
00000
Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 8: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/8.jpg)
Alignment
ACG | | | AGG
δScore = (A,A) (C,G) (G,G)+ +
= 1 + 0 + 1 = 2
Unit-cost
A | A
A is aligned with A
C | G
C is aligned with G
G is aligned with GG | G
Algorithmic Functions of Computational Biology –
Professor Istrail
δ δ
![Page 9: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/9.jpg)
Gaps
ACATGGAAT ACAGGAAAT
ACAT GG - AAT ACA - GG AAAT
OPTIMAL ALIGNMENTS
SCORE 7 8
AAAGGG GGGAAA
SCORE 0 3
- - - AAAGGG GGGAAA - - -
“-” is the gap symbol
Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 10: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/10.jpg)
δ
δδ (x,y) = the score for aligning x with y
(-,y) = the score for aligning - with y
(x,-) = the score for aligning x with -
Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 11: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/11.jpg)
A-CG - G ATCGTG
Alignment
Score
δ δδ δ δδ(A,A) + (G,G) +(C,C) +(-,T) + (-,T ) + (G,G)
THE SUM OF THE SCORES OF THE PAIRWISE ALIGNED SYMBOLS
Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 12: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/12.jpg)
Margaret Dayhoff & PAM Similarity Matrices
ARTEMIS Summer 2008
Professor Istrail
![Page 13: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/13.jpg)
Dr. Margaret Oakley Dayhoff The Mother & Father of Bioinformatics
ARTEMIS Summer 2008
Professor Istrail
![Page 14: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/14.jpg)
Scoring SchemeDayhoff PAM scoring matrices
...δ
PTIPLSRLFDNAMLRAHRLHQ SAIENQRLFNIAVSRVQHLHL
Partial alignment for Monkey and Trout somatotropin proteins
- A R N D C Q E G H I L K M F P S T W Y V
-8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 3 -3 0 0 -3 -1 0 1 -3 -1 -3 -2 -2 -4 1 1 1 -7 -4 0A
R N D
64
Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 15: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/15.jpg)
Scoring Functions
Scoring function = a sum of a terms each for a pair of aligned residues, and for each gap
The meaning = log of the relative likelihood that the sequences are related, compared to being unrelated
Identities and conservative substitutions are Positive terms
Non-conservative substitutions are Negative terms
Mutations= Substitutions, Insertions, Deletions
Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 16: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/16.jpg)
CSCI2820 - Class 4
Global alignment problem
• Input: Sequences X and Y of length m and n respectively and a similarity matrix
• Output: An optimal global alignment of X and Y – Global alignments require all bases in both
sequences are aligned
16
![Page 17: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/17.jpg)
CSCI2820 - Class 4
Local alignment problem
• Input: Sequences X and Y of length m and n respectively and a similarity matrix
• Output: An optimal local alignment of X and Y – Local alignments do not require using all
bases in either sequence in the alignment
• Applicable when looking for subsequences of similarity
17
![Page 18: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/18.jpg)
The Edit Graph
Suppose that we want to align AGT with AT
We are going to construct a graph where alignments between the two sequences correspond to paths between the begin and and end nodes of the graph.
This is the Edit Graph
Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 19: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/19.jpg)
0 1 2 30
1
2
AGT has length 3 AT has length 2
The Edit graph has (3+1)*(2+1) nodes
The sequence AGT
The sequence AT
Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 20: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/20.jpg)
0 1 2 30
1
2
A G T
A
T
AGT indexes the columns, and AT indexes the rows of this “table”
Begin
End
Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 21: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/21.jpg)
0 1 2 30
1
2
A G T
A
T
Begin
EndThe Graph is directed. The nodes (i,j) will hold values.
Algorithmic Functions of Computational Biology –
Professor Istrail
0 10
1
2
A G
A
T
![Page 22: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/22.jpg)
T0 1 2 30
1
2
A G
A
T
Begin
End
Algorithmic Functions of Computational Biology
Professor Istrail
![Page 23: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/23.jpg)
T
A
T
0 1 2 30
1
2
A GA -
- A
A A
- A
- A-
A
- A
G -
A -
A -
G -
G -
T -
T -
T -
- T
- T
- T
- T
A T
G T
T T
G A
T A
Directed edges get as labels pairs of aligned letters.
Begin
End
Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 24: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/24.jpg)
Alignment = Path in the Edit GraphT
A
T
0 1 2 30
1
2
A GA -
- A
A A
- A
- A-
A
- A
G -
A -
A -
G -
G -
T -
T -
T -
- T
- T
- T
- T
A T
G T
T T
G A
T A
AGT A-T
Begin
End
Every path from Begin to End corresponds to an alignment
Every alignment corresponds to a path between Begin and End
Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 25: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/25.jpg)
The Principle of Optimality
The optimal answer to a problem is expressed in terms of optimal answer for its sub-problems
Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 26: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/26.jpg)
Dynamic Programming
Part 1: Compute first the optimal alignment score
Part 2: Construct optimal alignment
We are looking for the optimal alignment = maximal score path in the Edit Graph from the Begin vertex to the End vertex
Given: Two sequences X and Y Find: An optimal alignment of X with Y
Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 27: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/27.jpg)
The DP Matrix S(i,j)0 1 2 3
0
1
2
A G T
A
TS(2,1)
S(1,0)
Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 28: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/28.jpg)
The DP MatrixMatrix S =[S(i,j)]
S(i,j) = The score of the maximal cost path from the Begin Vertex and the vertex (i,j)
(i,j)
(i,j-1)
(i-1,j)
(i-1,j-1) The optimal path to (i,j) must pass through one of
the vertices
(i-1,j-1)
(i-1,j)
(i,j-1)
Opt
imal
Pat
h to
(i,j
)Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 29: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/29.jpg)
Opt path
(i,j)
(i,j-1)
(i-1,j)
(i-1,j-1)
Optimal path to (i-1,j) + (- , yj)
- xi
yj -
S(i-1,j) + δ
δ
(- , yj)
Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 30: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/30.jpg)
Optimal path
(i,j)
(i-1,j)
(i,j-1)
(i-1,j-1)
Optimal path to (i-1,j-1) + (xi,yj)δ
δS(i-1,j-1) + (xi , yj)
Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 31: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/31.jpg)
Optimal path
(i,j)
(i,j-1)
(I-1,j)
(i-1,j-1)
Optimal path to (i,j-1) + (xi,-)δ
δS(i,j-1) + (xi, -)
Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 32: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/32.jpg)
The Basic ALGORITHM
S(i,j) =
S(i-1, j-1) + (xi, yj)
S(i-1, j) + (xi, -)
S(i, j-1) + (-, yj)
MAX
δ
δ
δ
Algorithmic Functions of Computational Biology –
Professor Istrail
![Page 33: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/33.jpg)
T
A
T
0 1 2 30
1
2
A GA -
- A
A A
- A
- A-
A
- A
G -
A -
A -
G -
G -
T -
T -
T -
- T
- T
- T
- T
A T
G T
T T
G A
T A
0
0
0
0
1
1
0
1
1
0
1
2AGT A - TOptimal Alignment
Optimal Alignment and TracbackAlgorithmic Functions of Computational Biology –
Professor Istrail
![Page 34: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/34.jpg)
S(i,j) =
S(i-1, j-1) + (xi, yj),
S(i-1, j) + (xi, -),
S(i, j-1) + (-, yj)
MAX
δ
δ
δ
0, We add this
The Basic ALGORITHM: Local SimilarityAlgorithmic Functions of Computational Biology –
Professor Istrail
![Page 35: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/35.jpg)
CSCI2820 - Class 4
Protein global alignment
35
X = hlsek Y = nlsak
• X and Y represent a protein subsequence from the BRCA2 (early onset) protein in human and chimpanzee
• Global alignments are used when the two sequences being compared represent a similar biological sequence
![Page 36: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/36.jpg)
CSCI2820 - Class 4
Margaret Dayhoff’s PAM 100 similarity matrix (partial)
36
A N E H L K S *
A 4 -1 0 -3 -3 -3 1 -9
N -1 5 1 2 -4 1 1 -9
E 0 1 5 -1 -5 -1 -1 -9
H -3 2 -1 7 -3 -2 -2 -9
L -3 -4 -5 -3 6 -4 -4 -9
K -3 1 -1 -2 -4 5 -1 -9
S 1 1 -1 -2 -4 -1 4 -9
* -9 -9 -9 -9 -9 -9 -9 1
![Page 37: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic](https://reader037.fdocuments.in/reader037/viewer/2022090605/605a804a8800b070841854b6/html5/thumbnails/37.jpg)
CSCI2820 - Class 4 37
h l s e k
n
l
s
a
k
XY
-9 2 -7 -16 -25 -34
-9 -18 -27 -36 -45
-27 -16 -1 12 3 -6
-36 -25 -10 3 12 3
-18 -7 8 -1 -10 -19
0
-45 -34 -19 -6 3 17