Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment...

34
Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer Science University of Li` ege Montefiore - Li` ege - October 16, 2007 Find slides: http://montefiore.ulg.ac.be/lwh/IBIOINFO/ Louis Wehenkel GBIO0009 - Bioinformatique (1/14)

Transcript of Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment...

Page 1: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

Bioinformatics - Lecture 3

Louis Wehenkel

Department of Electrical Engineering and Computer ScienceUniversity of Liege

Montefiore - Liege - October 16, 2007

Find slides: http://montefiore.ulg.ac.be/∼lwh/IBIOINFO/

Louis Wehenkel GBIO0009 - Bioinformatique (1/14)

Page 2: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

Chapter 3. All in the family - Sequence alignmentOn sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Louis Wehenkel GBIO0009 - Bioinformatique (2/14)

Page 3: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Sequence alignment - an introduction

Objective:

◮ Explain the multiple needs for efficient sequence alignment

◮ Explain main ideas behind different kinds of sequencealignment problems

◮ Global versus local alignments◮ Pairwise versus multiple alignments

◮ Explain in detail de dynamic programming principle used byNeedleman-Wunsch and Smith-Waterman algorithms

Louis Wehenkel GBIO0009 - Bioinformatique (3/14)

Page 4: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example

◮ Global alignment of VIVALASVEGAS and VIVADAVIS

(two sequences of amino-acids)

◮ A =

V I V A L A S V E G A S

V I V A D A − V − − I S

1 1 1 1 −1 1 −1 1 −1 −1 −1 1

◮ Global score 2

◮ Is this the best possible alignment ?

Louis Wehenkel GBIO0009 - Bioinformatique (4/14)

Page 5: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Applications

◮ Prediction of function: extrapolate from one organism toanother the functions of genes having similar sequences

◮ Database searching: finding all proteins similar to a givenprotein

◮ Gene finding: by comparing the whole genomes of severalorganisms

◮ Sequence divergence: study variation within a population orbetween different species

◮ Sequence assembly: building up a genome from small piecesof overlapping DNA

Louis Wehenkel GBIO0009 - Bioinformatique (5/14)

Page 6: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Main topics

◮ Sequence similarity: homology, orthology and evolution,paralogy and gene duplication, protein domains.

◮ Substitution matrices: take into account biological properties.

◮ Sequence alignment: global vs local.

◮ Statistical analysis of alignments: compute alignment scoredistribution over a number of permutations of one of the twosequences.

◮ BLAST: fast appromixate local alignment for aligning verylarge sequences (e.g. full genomes).

◮ Multiple sequence alignment: find regions of homology

Louis Wehenkel GBIO0009 - Bioinformatique (6/14)

Page 7: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Exact global alignment of two strings

◮ Consider two strings s(1 : n) and t(1 : m) over the alpbabet A

◮ A global alignment A of s(1 : n) and t(1 : m) is a table

A =

[

a1(s) ... ak−1(s) ak(s)a1(t) ... ak−1(t) ak(t)

]

such that

◮ ai(s), ai (t) ∈ A ∪ {−}, ∀i = 1, . . . , k ,◮ if ai(s) = − then ai (t) 6= −,◮ if ai(t) = − then ai (s) 6= −,◮ s(1 : n) is a subsequence of a1(s)...ak−1(s), ak (s),◮ t(1 : m) is a subsequence of a1(t)...ak−1(t), ak (t).

◮ Given a score function σ(·, ·) : (A ∪ {−}) × (A ∪ {−}) → R:

◮ The score of σ(A) =∑k

i=1 σ(ai (s), ai (t)).◮ An optimal global alignment is a global alignment of maximal

score.

Louis Wehenkel GBIO0009 - Bioinformatique (7/14)

Page 8: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Dynamic programming principle (1)

◮ Let us try to decompose the problem of finding an optimalalignment of s and t into problems of optimally aligningsubstrings of s and t.

◮ Let us denote an optimal alignment of s(1 : n) and t(1 : m) by

A∗(n,m) =

[

a∗1(s) ... a∗k−1(s) a∗k(s)

a∗1(t) ... a∗k−1(t) a∗k(t)

]

◮ NB: k ∈ [max{n,m}, n + m].◮ Then one of the following must hold true:

◮ A∗(n, m) =

[

a∗1(s) ... a∗k−1(s) −a∗1(t) ... a∗k−1(t) tm

]

or

◮ A∗(n, m) =

[

a∗1(s) ... a∗k−1(s) sn

a∗1(t) ... a∗k−1(t) −

]

or

◮ A∗(n, m) =

[

a∗1(s) ... a∗k−1(s) sn

a∗1(t) ... a∗k−1(t) tm

]

.

Louis Wehenkel GBIO0009 - Bioinformatique (8/14)

Page 9: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Dynamic programming principle (2)

Furthermore,◮ if A∗(n,m) =

[

a∗1(s) ... a∗k−1(s) −

a∗1(t) ... a∗k−1(t) tm

]

then[

a∗1(s) ... a∗k−1(s)a∗1(t) ... a∗k−1(t)

]

= A∗(n,m − 1)

and σ(A∗(n, m)) = σ(A∗(n, m − 1)) + σ(−, tm)

◮ if A∗(n,m) =

[

a∗1(s) ... a∗k−1(s) sna∗1(t) ... a∗k−1(t) −

]

then[

a∗1(s) ... a∗k−1(s)

a∗1(t) ... a∗k−1(t)

]

= A∗(n − 1,m)and σ(A∗(n, m)) = σ(A∗(n − 1, m)) + σ(sn ,−)

◮ if A∗(n,m) =

[

a∗1(s) ... a∗k−1(s) sna∗1(t) ... a∗k−1(t) tm

]

then[

a∗1(s) ... a∗k−1(s)a∗1(t) ... a∗k−1(t)

]

= A∗(n − 1,m − 1)

and σ(A∗(n, m)) = σ(A∗(n − 1,m − 1)) + σ(sn , tm)

Louis Wehenkel GBIO0009 - Bioinformatique (9/14)

Page 10: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Dynamic programming principle (3)

Consequently,

◮ if we can compute◮ A∗(n, m − 1), A∗(n − 1, m), A∗(n − 1, m − 1) and◮ σ(A∗(n, m − 1)), σ(A∗(n − 1, m)), σ(A∗(n − 1, m − 1)),

◮ we can easily derive A∗(n,m) and σ(A∗(n,m)).

◮ Note that the base cases are obtained easily by consideringthat one of the strings is empty:

◮ A∗(i , 0) =

[

s1 ... si

− −−− −

]

, ∀i = 1, . . . , n

◮ A∗(0, j) =

[

− −−− −t1 ... tj

]

, ∀j = 1, . . . , m

◮ A∗(0, 0) =[ ]

and σ(A∗(0, 0)) = 0.

Louis Wehenkel GBIO0009 - Bioinformatique (10/14)

Page 11: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (1: problem statement)

◮ Let us consider the two following (AA) sequences◮ VIVALASVEGAS (n = 12)◮ VIVADAVIS (m = 9),

◮ together with the per-symbol similarity matrix◮ σ(a, b) = 1 if a = b◮ σ(a, b) = −1 if a 6= b,

◮ and let us construct the complete table of scoresM(i , j) = σ((A∗(i , j)), for i = 0, . . . , 12 and j = 0, . . . , 9.

Louis Wehenkel GBIO0009 - Bioinformatique (11/14)

Page 12: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (2: computation)

M(i , j) =

− V I V A D A V I S i

− 0V 1I 2V 3A 4L 5A 6S 7V 8E 9G 10A 11S 12j 0 1 2 3 4 5 6 7 8 9

Louis Wehenkel GBIO0009 - Bioinformatique (12/14)

Page 13: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (2: computation)

M(i , j) =

− V I V A D A V I S i

− 0 0V 1I 2V 3A 4L 5A 6S 7V 8E 9G 10A 11S 12j 0 1 2 3 4 5 6 7 8 9

Louis Wehenkel GBIO0009 - Bioinformatique (12/14)

Page 14: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (2: computation)

M(i , j) =

− V I V A D A V I S i

− 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 0V -1 1I -2 2V -3 3A -4 4L -5 5A -6 6S -7 7V -8 8E -9 9G -10 10A -11 11S -12 12j 0 1 2 3 4 5 6 7 8 9

Louis Wehenkel GBIO0009 - Bioinformatique (12/14)

Page 15: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (2: computation)

M(i , j) =

− V I V A D A V I S i

− 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 0V -1 ց1 1I -2 2V -3 3A -4 4L -5 5A -6 6S -7 7V -8 8E -9 9G -10 10A -11 11S -12 12j 0 1 2 3 4 5 6 7 8 9

Louis Wehenkel GBIO0009 - Bioinformatique (12/14)

Page 16: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (2: computation)

M(i , j) =

− V I V A D A V I S i

− 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 0V -1 ց1 →0 -1 →-2 →-3 →-4 -5 →-6 →-7 1I -2 ↓0 2V -3 -1 3A -4 ↓-2 4L -5 ↓-3 5A -6 ↓-4 6S -7 ↓-5 7V -8 -6 8E -9 ↓-7 9G -10 ↓-8 10A -11 ↓-9 11S -12 ↓-10 12j 0 1 2 3 4 5 6 7 8 9

Louis Wehenkel GBIO0009 - Bioinformatique (12/14)

Page 17: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (2: computation)

M(i , j) =

− V I V A D A V I S i

− 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 0V -1 ց1 →0 -1 →-2 →-3 →-4 -5 →-6 →-7 1I -2 ↓0 ց2 2V -3 -1 3A -4 ↓-2 4L -5 ↓-3 5A -6 ↓-4 6S -7 ↓-5 7V -8 -6 8E -9 ↓-7 9G -10 ↓-8 10A -11 ↓-9 11S -12 ↓-10 12j 0 1 2 3 4 5 6 7 8 9

Louis Wehenkel GBIO0009 - Bioinformatique (12/14)

Page 18: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (2: computation)

M(i , j) =

− V I V A D A V I S i

− 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 0V -1 ց1 →0 -1 →-2 →-3 →-4 -5 →-6 →-7 1I -2 ↓0 ց2 →1 →0 →-1 →-2 →-3 →-4 →-5 2V -3 -1 ↓1 3A -4 ↓-2 ↓0 4L -5 ↓-3 ↓-1 5A -6 ↓-4 ↓-2 6S -7 ↓-5 ↓-3 7V -8 -6 ↓-4 8E -9 ↓-7 ↓-5 9G -10 ↓-8 ↓-6 10A -11 ↓-9 ↓-7 11S -12 ↓-10 ↓-8 12j 0 1 2 3 4 5 6 7 8 9

Louis Wehenkel GBIO0009 - Bioinformatique (12/14)

Page 19: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (2: computation)

M(i , j) =

− V I V A D A V I S i

− 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 0V -1 ց1 →0 -1 →-2 →-3 →-4 -5 →-6 →-7 1I -2 ↓0 ց2 →1 →0 →-1 →-2 →-3 →-4 →-5 2V -3 -1 ↓1 ց3 2 1 0 − 1 − 2 − 3 3A -4 ↓-2 ↓0 2 ց4 3 2 1 0 − 1 4L -5 ↓-3 ↓-1 1 3 ց3 2 1 0 − 1 5A -6 ↓-4 ↓-2 0 2 2 ց4 3 2 1 6S -7 ↓-5 ↓-3 − 1 1 1 ↓3 3 2 3 7V -8 -6 ↓-4 − 2 0 0 2 ց4 3 2 8E -9 ↓-7 ↓-5 − 3 − 1 − 1 1 ↓3 3 2 9G -10 ↓-8 ↓-6 − 4 − 2 − 2 0 ↓2 2 2 10A -11 ↓-9 ↓-7 − 5 − 3 − 3 − 1 1 ց1 1 11S -12 ↓-10 ↓-8 − 6 − 4 − 4 − 2 0 0 ց2 12j 0 1 2 3 4 5 6 7 8 9

Louis Wehenkel GBIO0009 - Bioinformatique (12/14)

Page 20: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (3: trace back)

◮ The bottom right cell of the table gives the score of theoptimal alignment.

◮ When filling the table, we have kept track of the predecessorof each cell (North-West, North, West).

◮ To trace back, we move backwards along this path and do thefollowing

◮ If we move NW: we output a pair composed of thecorresponding characters of s and t

◮ If we move N: we output a pair composed of the correspondingcharacter of s and −

◮ If we move W: we output a pair composed of − and thecorresponding character of t

◮ In our example this produces:

A =

[ ]

Louis Wehenkel GBIO0009 - Bioinformatique (13/14)

Page 21: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (3: trace back)

◮ The bottom right cell of the table gives the score of theoptimal alignment.

◮ When filling the table, we have kept track of the predecessorof each cell (North-West, North, West).

◮ To trace back, we move backwards along this path and do thefollowing

◮ If we move NW: we output a pair composed of thecorresponding characters of s and t

◮ If we move N: we output a pair composed of the correspondingcharacter of s and −

◮ If we move W: we output a pair composed of − and thecorresponding character of t

◮ In our example this produces:

A =

[

S

S

]

Louis Wehenkel GBIO0009 - Bioinformatique (13/14)

Page 22: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (3: trace back)

◮ The bottom right cell of the table gives the score of theoptimal alignment.

◮ When filling the table, we have kept track of the predecessorof each cell (North-West, North, West).

◮ To trace back, we move backwards along this path and do thefollowing

◮ If we move NW: we output a pair composed of thecorresponding characters of s and t

◮ If we move N: we output a pair composed of the correspondingcharacter of s and −

◮ If we move W: we output a pair composed of − and thecorresponding character of t

◮ In our example this produces:

A =

[

A S

I S

]

Louis Wehenkel GBIO0009 - Bioinformatique (13/14)

Page 23: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (3: trace back)

◮ The bottom right cell of the table gives the score of theoptimal alignment.

◮ When filling the table, we have kept track of the predecessorof each cell (North-West, North, West).

◮ To trace back, we move backwards along this path and do thefollowing

◮ If we move NW: we output a pair composed of thecorresponding characters of s and t

◮ If we move N: we output a pair composed of the correspondingcharacter of s and −

◮ If we move W: we output a pair composed of − and thecorresponding character of t

◮ In our example this produces:

A =

[

G A S

− I S

]

Louis Wehenkel GBIO0009 - Bioinformatique (13/14)

Page 24: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (3: trace back)

◮ The bottom right cell of the table gives the score of theoptimal alignment.

◮ When filling the table, we have kept track of the predecessorof each cell (North-West, North, West).

◮ To trace back, we move backwards along this path and do thefollowing

◮ If we move NW: we output a pair composed of thecorresponding characters of s and t

◮ If we move N: we output a pair composed of the correspondingcharacter of s and −

◮ If we move W: we output a pair composed of − and thecorresponding character of t

◮ In our example this produces:

A =

[

E G A S

− − I S

]

Louis Wehenkel GBIO0009 - Bioinformatique (13/14)

Page 25: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (3: trace back)

◮ The bottom right cell of the table gives the score of theoptimal alignment.

◮ When filling the table, we have kept track of the predecessorof each cell (North-West, North, West).

◮ To trace back, we move backwards along this path and do thefollowing

◮ If we move NW: we output a pair composed of thecorresponding characters of s and t

◮ If we move N: we output a pair composed of the correspondingcharacter of s and −

◮ If we move W: we output a pair composed of − and thecorresponding character of t

◮ In our example this produces:

A =

[

V E G A S

V − − I S

]

Louis Wehenkel GBIO0009 - Bioinformatique (13/14)

Page 26: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (3: trace back)

◮ The bottom right cell of the table gives the score of theoptimal alignment.

◮ When filling the table, we have kept track of the predecessorof each cell (North-West, North, West).

◮ To trace back, we move backwards along this path and do thefollowing

◮ If we move NW: we output a pair composed of thecorresponding characters of s and t

◮ If we move N: we output a pair composed of the correspondingcharacter of s and −

◮ If we move W: we output a pair composed of − and thecorresponding character of t

◮ In our example this produces:

A =

[

S V E G A S

− V − − I S

]

Louis Wehenkel GBIO0009 - Bioinformatique (13/14)

Page 27: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (3: trace back)

◮ The bottom right cell of the table gives the score of theoptimal alignment.

◮ When filling the table, we have kept track of the predecessorof each cell (North-West, North, West).

◮ To trace back, we move backwards along this path and do thefollowing

◮ If we move NW: we output a pair composed of thecorresponding characters of s and t

◮ If we move N: we output a pair composed of the correspondingcharacter of s and −

◮ If we move W: we output a pair composed of − and thecorresponding character of t

◮ In our example this produces:

A =

[

A S V E G A S

A − V − − I S

]

Louis Wehenkel GBIO0009 - Bioinformatique (13/14)

Page 28: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (3: trace back)

◮ The bottom right cell of the table gives the score of theoptimal alignment.

◮ When filling the table, we have kept track of the predecessorof each cell (North-West, North, West).

◮ To trace back, we move backwards along this path and do thefollowing

◮ If we move NW: we output a pair composed of thecorresponding characters of s and t

◮ If we move N: we output a pair composed of the correspondingcharacter of s and −

◮ If we move W: we output a pair composed of − and thecorresponding character of t

◮ In our example this produces:

A =

[

L A S V E G A S

D A − V − − I S

]

Louis Wehenkel GBIO0009 - Bioinformatique (13/14)

Page 29: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (3: trace back)

◮ The bottom right cell of the table gives the score of theoptimal alignment.

◮ When filling the table, we have kept track of the predecessorof each cell (North-West, North, West).

◮ To trace back, we move backwards along this path and do thefollowing

◮ If we move NW: we output a pair composed of thecorresponding characters of s and t

◮ If we move N: we output a pair composed of the correspondingcharacter of s and −

◮ If we move W: we output a pair composed of − and thecorresponding character of t

◮ In our example this produces:

A =

[

A L A S V E G A S

A D A − V − − I S

]

Louis Wehenkel GBIO0009 - Bioinformatique (13/14)

Page 30: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (3: trace back)

◮ The bottom right cell of the table gives the score of theoptimal alignment.

◮ When filling the table, we have kept track of the predecessorof each cell (North-West, North, West).

◮ To trace back, we move backwards along this path and do thefollowing

◮ If we move NW: we output a pair composed of thecorresponding characters of s and t

◮ If we move N: we output a pair composed of the correspondingcharacter of s and −

◮ If we move W: we output a pair composed of − and thecorresponding character of t

◮ In our example this produces:

A =

[

V A L A S V E G A S

V A D A − V − − I S

]

Louis Wehenkel GBIO0009 - Bioinformatique (13/14)

Page 31: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (3: trace back)

◮ The bottom right cell of the table gives the score of theoptimal alignment.

◮ When filling the table, we have kept track of the predecessorof each cell (North-West, North, West).

◮ To trace back, we move backwards along this path and do thefollowing

◮ If we move NW: we output a pair composed of thecorresponding characters of s and t

◮ If we move N: we output a pair composed of the correspondingcharacter of s and −

◮ If we move W: we output a pair composed of − and thecorresponding character of t

◮ In our example this produces:

A =

[

I V A L A S V E G A S

I V A D A − V − − I S

]

Louis Wehenkel GBIO0009 - Bioinformatique (13/14)

Page 32: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Example (3: trace back)

◮ The bottom right cell of the table gives the score of theoptimal alignment.

◮ When filling the table, we have kept track of the predecessorof each cell (North-West, North, West).

◮ To trace back, we move backwards along this path and do thefollowing

◮ If we move NW: we output a pair composed of thecorresponding characters of s and t

◮ If we move N: we output a pair composed of the correspondingcharacter of s and −

◮ If we move W: we output a pair composed of − and thecorresponding character of t

◮ In our example this produces:

A =

[

V I V A L A S V E G A S

V I V A D A − V − − I S

]

Louis Wehenkel GBIO0009 - Bioinformatique (13/14)

Page 33: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Exact local alignment of two strings

◮ Definition: a local alignment of two strings, s and t, is aglobal alignment of the subsequences s(i : j) and t(k : l) forsome choice of (i , j) and (k, l). The optimal alignment isgiven by the optimal choice of (i , j) and (k, l) so as tomaximize the alignment score.

◮ Smith-Waterman algorithm: obtained by making twomodifications to the Needleman-Wunsch algorithm:

◮ each time a cell would obtain a negative value, replace thisvalue by 0

◮ trace back from the highest value in the table to the first zeroelement on the trace back path

◮ See book for an example.

Louis Wehenkel GBIO0009 - Bioinformatique (14/14)

Page 34: Bioinformatics - Lecture 3...2007/10/16  · Chapter 3. All in the family - Sequence alignment Bioinformatics - Lecture 3 Louis Wehenkel Department of Electrical Engineering and Computer

Chapter 3. All in the family - Sequence alignment

On sequence alignmentMain topicsNeedleman-Wunsch algorithmSmith-Waterman algorithm

Homework 3

Personal Homework for Chapter 3 (deadline: October 22, 2007)

◮ Do thehttp://www.computational-genomics.net/casestudies/eyelessdemo.html

◮ Compute by hand an optimal global and local alignment of thesequence ’BIOINFO’ and a random permutation of it.

Louis Wehenkel GBIO0009 - Bioinformatique (15/14)