Michael Schroeder
Biotechnology CenterTU Dresden
Global and Local Alignments
Contents § Why to compare and align sequences?
§ How to judge an alignment? § Z-score, E-value, P-value, structure and function
§ How to compare and align sequences? § Levensthein distance, scoring schemes, longest common
subsequence, global and local alignment, substitution matrix, § How to compute an alignment?
§ Dynamic programming § How to compute an alignment fast?
§ Blast § How to align many sequences
§ Multiple sequence alignment, phylogenetic trees § Alignments and structure
§ How to predict protein structure from protein sequence
2
Local Alignment § Needleman-Wunsch = globally best alignment
§ Finding domains / exons
§ maximise local alignments by ignoring terminal gaps
§ How to maximise locally
3
Local Alignment § Global Alignment
§ path in distance matrix d from d0,0 to dm,n
§ Local Alignment § Path in d from any dk,l to any do,p such that
do,p - dk,l ≥ do,p - di,j for any i ≤ m and j ≤ n with o ≥ k and p ≥ l and o ≥ i and p ≥ j.
§ A path must exist from o,p to k,l and o,p to i,j in db
§ How to § chop off right side? § chop off left side?
4
Needleman-Wunsch Algorithm with Substitution Matrix
5
Global alignment of string a to b
5 Global Alignment with Needleman-Wunsch and Substitution
Matrix
Log-Odds Ratio
log2P (x, y)
P (x) ⇤ P (y)
Needle
Let a = a1 . . . am and b = b1 . . . bn be strings. Then
needlea,b = needlea,b(m,n)
is the global alignment score of a and b with substitution matrix, where
needlea,b(i, j) =
8>>>>>><
>>>>>>:
isg if j = 0,
jsg if i = 0,
max
8><
>:
needlea,b(i� 1, j) + sg
needlea,b(i, j � 1) + sg
needlea,b(i� 1, j � 1) + ds(ai, bj)
otherwise,
for 0 i m and 0 j n, substitution matrix ds(ai, bj), and gap penalty sg < 0.
7
Smith-Waterman Algorithm
6
Local alignment of string a to b
6 Local Alignment with Smith-Waterman Algorithm
Let a = a1 . . . am and b = b1 . . . bn be strings. Then
watera,b = max1im,1jn
{watera,b(i, j)}
is the local alignment score of a and b, where
watera,b(i, j) =
8>>>>>><
>>>>>>:
0 if min(i, j) = 0,
max
8>>>><
>>>>:
0
watera,b(i� 1, j) + sg
watera,b(i, j � 1) + sg
watera,b(i� 1, j � 1) + ds(ai, bj)
otherwise,
and 0 i m and 0 j n, substitution matrix ds(ai, bj), and gap penalty sg < 0.
8
Local Alignment with Dynamic Programming
i \ j p e t r e l l a
p e d r o
7
Global Alignment with Substitution Matrix and Dynamic Programming
needle(a,b,ds): let d be a matrix of size m+1 × n+1 for 0 ≤ i ≤ m: d[i,0] = i * sg for 1 ≤ j ≤ n: d[0,j] = j * sg for 1 ≤ i ≤ m: for 1 ≤ j ≤ n: d[i,j] = max(d[i-1,j ] + sg, d[i ,j-1] + sg, d[i-1,j-1] + ds[ai,bj]) return d[m,n]
8
Local Alignment with Dynamic Programming
water(a,b,ds): let d be a matrix of size m+1 × n+1 max = -∞ for 0 ≤ i ≤ m: d[i,0] = 0 for 1 ≤ j ≤ n: d[0,j] = 0 for 1 ≤ i ≤ m: for 1 ≤ j ≤ n: d[i,j] = max(0, d[i-1,j ] + sg, d[i ,j-1] + sg, d[i-1,j-1] + ds[ai,bj]) if d[i,j]>max: max=d[i,j] return max
9
Affine Gap Penalties § So far: 5 gaps of size 1 are as good 1 gap of size 5 § But: Often whole substrings are deleted/inserted
§ Gap Score for a gap of length l: sg = so + l se § so is gap opening score § se gap extension score
§ Gap penalty vs. match/mismatch § High: shorter, lower-scoring alignments with fewer gaps § Low: higher-scoring, longer alignments with more gaps
§ Gap opening vs. gap extension § Opening influences number of gaps § Extension influences length of gaps
10
Needleman-Wunsch Algorithm with Substitution Matrix
11
Global alignment of string a to b
5 Global Alignment with Needleman-Wunsch and Substitution
Matrix
Log-Odds Ratio
log2P (x, y)
P (x) ⇤ P (y)
Needle
Let a = a1 . . . am and b = b1 . . . bn be strings. Then
needlea,b = needlea,b(m,n)
is the global alignment score of a and b with substitution matrix, where
needlea,b(i, j) =
8>>>>>><
>>>>>>:
isg if j = 0,
jsg if i = 0,
max
8><
>:
needlea,b(i� 1, j) + sg
needlea,b(i, j � 1) + sg
needlea,b(i� 1, j � 1) + ds(ai, bj)
otherwise,
for 0 i m and 0 j n, substitution matrix ds(ai, bj), and gap penalty sg < 0.
7
Needleman-Wunsch with Substitution Matrix and Affine Gap Penalties
12
Global alignment of string a to b
Needleman-Wunsch with Substitution Matrix and Affine Gap Penalties
13
Global alignment of string a to b
7 Needleman-Wunsch Algorithm with substitution matrix and
a�ne gap penalties
Let a = a1 . . . am and b = b1 . . . bn be strings. Then
needlea,b = needlea,b(m,n)
is the global alignment score of a and b with substitution matrix and a�ne gap penalties,where
needlea,b(i, j) =
8>>>>>>>>><
>>>>>>>>>:
0 if i = j = 0,
so + ise if j = 0,
so + jse if i = 0,
max
8><
>:
needlea,b(i� 1, j) + ddel(i, j)
needlea,b(i, j � 1) + dins(i, j)
needlea,b(i� 1, j � 1) + ds(ai, bj)
otherwise,
and 0 i m, 0 j n, substitution matrix ds(ai, bj),gap opening penalty so < 0, gap extension penalty se < 0and a�ne gap panelty matrices
ddel(i, j) =
8><
>:
2so + jse if i = 0and j > 0,
max
(ddel(i� 1, j) + se
needlea,b(i� 1, j) + so + seotherwise,
dins(i, j) =
8><
>:
2so + ise if j = 0and i > 0,
max
(dins(i, j � 1) + se
needlea,b(i, j � 1) + so + seotherwise.
9
Global Alignment with Dynamic Programming
i \ j a p i e d
p e d r o
14
Top Related