Intro Sequence comparisons Visualization Alignments Scoring

51
Intro Sequence comparisons Visualization Alignments Scoring Algorithms Last time Introduction What is Bioinformatics? Databases in Bioinformatics

Transcript of Intro Sequence comparisons Visualization Alignments Scoring

Page 1: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Last time

• Introduction• What is Bioinformatics?• Databases in Bioinformatics

Page 2: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Today: Sequence comparisons

• Visualisation• Different objectives• Pairwise alignments

Page 3: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Sequence comparisons: Goals

• What are the similarities?• Local similarities — domains and motifs• What is variable?

• Identify positions — basis for evolutionarystudies

• Understand structural similarities• Determine ancestry

Page 4: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Sequence comparisons: Goals

• What are the similarities?• Local similarities — domains and motifs• What is variable?

• Identify positions — basis for evolutionarystudies

• Understand structural similarities• Determine ancestry

Page 5: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Sequence comparisons: Goals

• What are the similarities?• Local similarities — domains and motifs• What is variable?

• Identify positions — basis for evolutionarystudies

• Understand structural similarities

• Determine ancestry

Page 6: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Sequence comparisons: Goals

• What are the similarities?• Local similarities — domains and motifs• What is variable?

• Identify positions — basis for evolutionarystudies

• Understand structural similarities• Determine ancestry

Page 7: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Homology

• Definition: Homology = common ancestry

• Principle: Similarity⇒homology• Quote: ”These sequences are somewhat

homologous”. Bad!

Similarity 6= homology

• Correct: ”These sequences are somewhatsimilar”.

Page 8: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Homology

• Definition: Homology = common ancestry• Principle: Similarity⇒homology

• Quote: ”These sequences are somewhathomologous”. Bad!

Similarity 6= homology

• Correct: ”These sequences are somewhatsimilar”.

Page 9: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Homology

• Definition: Homology = common ancestry• Principle: Similarity⇒homology• Quote: ”These sequences are somewhat

homologous”.

Bad!

Similarity 6= homology

• Correct: ”These sequences are somewhatsimilar”.

Page 10: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Homology

• Definition: Homology = common ancestry• Principle: Similarity⇒homology• Quote: ”These sequences are somewhat

homologous”. Bad!

Similarity 6= homology

• Correct: ”These sequences are somewhatsimilar”.

Page 11: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Important questions

• When are two sequences significantlysimilar?

• How do we evaluate similarity?

Page 12: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Important questions

• When are two sequences significantlysimilar?

• How do we evaluate similarity?

Page 13: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Data

• DNA: genes, genomes, non-coding DNA,etc

• Codons• RNA• Peptides

Page 14: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Idea of dotplots

Q V A S K I N T N ES

V

A

T

K

I

YMN

• •

E

Put dot where identical residues

, then filter outrandomness

Page 15: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Idea of dotplots

Q V A S K I N T N ES •V •A •T •K •I •YMN • •E •

Put dot where identical residues

, then filter outrandomness

Page 16: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Idea of dotplots

Q V A S K I N T N ES

V •A •T

K •I •YMN • •E •

Put dot where identical residues, then filter outrandomness

Page 17: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Dotplots in practicePttMAP20 (horizontal) vs. OsMAP20 (vertical)

0 100

0

50

100

150

Page 18: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Dotplots in practicePttMAP20 (horizontal) vs. OsMAP20 (vertical)

0 100

0

50

100

150

Page 19: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Dotplots in practicePttMAP20 (horizontal) vs. OsMAP20 (vertical)

0 100

0

50

100

150

Page 20: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Dotplots in practicePttMAP20 (horizontal) vs. OsMAP20 (vertical)

0 100

0

50

100

150

Page 21: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

What happened here?

s1: A B C Ds2: A C B D

Page 22: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

What happened here?

s1: A B C Ds2: A C B D

Page 23: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Genomic dotplot

Many inversions around origin and termini of replication.

Page 24: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Genomic dotplot

Many inversions around origin and termini of replication.

Page 25: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Visualizing with alignmentOsMAP20 69 SQTSPKRSSPKHEQPLSYFRLHTEERAIKRAGFNYQVASKINTNEIIRR

S+ +PK + ++ +P F+LHT +RA+KRA FNY VA+KI NE +RPttMAP20 43 SKVAPKPFAKENTKPQE-FKLHTGQRALKRAMFNYSVATKIYMNEQQKR

OsMAP20 118 FEEKLSKVIEEREIKMMRKEMVHKAQLMPAFDKPFHPQRSTRPLTVPKEE++ K+IEE E++ MRKEMV +AQLMP FD+PF PQRS+RPLTVP+E

PttMAP20 91 QIERIQKIIEEEEVRTMRKEMVPRAQLMPYFDRPFFPQRSSRPLTVPRE

OsMAP20 167 PSFPSF

PttMAP20 140 PSF

OsMAP20 69 SQTSPKRSSPKHEQPLSYFRLHTEERAIKRAGFNYQVASKINTNEIIRRF 118|:.:||..:.::.:| ..|:|||.:||:|||.|||.||:||..||..:|.

PttMAP20 43 SKVAPKPFAKENTKP-QEFKLHTGQRALKRAMFNYSVATKIYMNEQQKRQ 91

OsMAP20 119 EEKLSKVIEEREIKMMRKEMVHKAQLMPAFDKPFHPQRSTRPLTVPKEPS 168.|::.|:|||.|::.||||||.:|||||.||:||.||||:||||||:|||

PttMAP20 92 IERIQKIIEEEEVRTMRKEMVPRAQLMPYFDRPFFPQRSSRPLTVPREPS 141

OsMAP20 169 F--LRLKC--CI 176| :..|| ||

PttMAP20 142 FHMVNSKCWSCI 153

Page 26: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Visualizing with alignmentOsMAP20 69 SQTSPKRSSPKHEQPLSYFRLHTEERAIKRAGFNYQVASKINTNEIIRR

S+ +PK + ++ +P F+LHT +RA+KRA FNY VA+KI NE +RPttMAP20 43 SKVAPKPFAKENTKPQE-FKLHTGQRALKRAMFNYSVATKIYMNEQQKR

OsMAP20 118 FEEKLSKVIEEREIKMMRKEMVHKAQLMPAFDKPFHPQRSTRPLTVPKEE++ K+IEE E++ MRKEMV +AQLMP FD+PF PQRS+RPLTVP+E

PttMAP20 91 QIERIQKIIEEEEVRTMRKEMVPRAQLMPYFDRPFFPQRSSRPLTVPRE

OsMAP20 167 PSFPSF

PttMAP20 140 PSF

OsMAP20 69 SQTSPKRSSPKHEQPLSYFRLHTEERAIKRAGFNYQVASKINTNEIIRRF 118|:.:||..:.::.:| ..|:|||.:||:|||.|||.||:||..||..:|.

PttMAP20 43 SKVAPKPFAKENTKP-QEFKLHTGQRALKRAMFNYSVATKIYMNEQQKRQ 91

OsMAP20 119 EEKLSKVIEEREIKMMRKEMVHKAQLMPAFDKPFHPQRSTRPLTVPKEPS 168.|::.|:|||.|::.||||||.:|||||.||:||.||||:||||||:|||

PttMAP20 92 IERIQKIIEEEEVRTMRKEMVPRAQLMPYFDRPFFPQRSSRPLTVPREPS 141

OsMAP20 169 F--LRLKC--CI 176| :..|| ||

PttMAP20 142 FHMVNSKCWSCI 153

Page 27: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Alignments

• Def: A pairwise alignment is a pairing ofsymbols between two sequences.

• Global alignment: Involves wholesequences.

• Local alignment: Involves parts ofsequences.

• Semiglobal or ends-free alignment: Ignore”overhang” in similar sequences withdifferent lengths

Page 28: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Alignments

• Def: A pairwise alignment is a pairing ofsymbols between two sequences.

• Global alignment: Involves wholesequences.

• Local alignment: Involves parts ofsequences.

• Semiglobal or ends-free alignment: Ignore”overhang” in similar sequences withdifferent lengths

Page 29: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Alignments

• Def: A pairwise alignment is a pairing ofsymbols between two sequences.

• Global alignment: Involves wholesequences.

• Local alignment: Involves parts ofsequences.

• Semiglobal or ends-free alignment: Ignore”overhang” in similar sequences withdifferent lengths

Page 30: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Alignments

• Def: A pairwise alignment is a pairing ofsymbols between two sequences.

• Global alignment: Involves wholesequences.

• Local alignment: Involves parts ofsequences.

• Semiglobal or ends-free alignment: Ignore”overhang” in similar sequences withdifferent lengths

Page 31: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Global vs localOsMAP20 69 SQTSPKRSSPKHEQPLSYFRLHTEERAIKRAGFNYQVASKINTNEIIRRF 118

|:.:||..:.::.:| ..|:|||.:||:|||.|||.||:||..||..:|.PttMAP20 43 SKVAPKPFAKENTKP-QEFKLHTGQRALKRAMFNYSVATKIYMNEQQKRQ 91

OsMAP20 119 EEKLSKVIEEREIKMMRKEMVHKAQLMPAFDKPFHPQRSTRPLTVPKEPS 168.|::.|:|||.|::.||||||.:|||||.||:||.||||:||||||:|||

PttMAP20 92 IERIQKIIEEEEVRTMRKEMVPRAQLMPYFDRPFFPQRSSRPLTVPREPS 141

OsMAP20 169 F--LRLKC--CI 176| :..|| ||

PttMAP20 142 FHMVNSKCWSCI 153

OsMAP20 1 MEK--TRKATSPKSSMTSSTGPKSPVRNGGSPPHKKSTSEFRGRKNESQI 48||| |:.|.......:|.:.|.|....|.:....|..

PttMAP20 1 MEKAHTKSALKKLVKASSQSAPWSNAARGMAKDDLKDP------------ 38

OsMAP20 49 FRKGGQDSITLDESKRRSPTSQTSPKRSSPKHEQPLSYFRLHTEERAIKR 98..|:|| .:||..:.::.:| ..|:|||.:||:||

PttMAP20 39 ---------LYDKSK-------VAPKPFAKENTKP-QEFKLHTGQRALKR 71

OsMAP20 99 AGFNYQVASKINTNEIIRRFEEKLSKVIEEREIKMMRKEMVHKAQLMPAF 148|.|||.||:||..||..:|..|::.|:|||.|::.||||||.:|||||.|

PttMAP20 72 AMFNYSVATKIYMNEQQKRQIERIQKIIEEEEVRTMRKEMVPRAQLMPYF 121

OsMAP20 149 DKPFHPQRSTRPLTVPKEPSF--LRLKC--CIGGEFHRHFCYNA------ 188|:||.||||:||||||:|||| :..|| ||..:...::..:|

PttMAP20 122 DRPFFPQRSSRPLTVPREPSFHMVNSKCWSCIPEDELYYYFEHAHPHDHA 171

OsMAP20 189 -KAIK 192|.:|

PttMAP20 172 WKPVK 176

Page 32: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

More terminology

• Insertion• Deletion• Indel — when we don’t know• Gap — indel in an alignment• Indel character: usually ”–”

1 MEK--TRKATSPKSSMTSSTGPKSPVRNGGSPPHKKSTSEFRGRKNESQI 48||| |:.|.......:|.:.|.|....|.:....|..

1 MEKAHTKSALKKLVKASSQSAPWSNAARGMAKDDLKDP------------ 38

49 FRKGGQDSITLDESKRRSPTSQTSPKRSSPKHEQPLSYFRLHTEERAIKR 98..|:|| .:||..:.::.:| ..|:|||.:||:||

39 ---------LYDKSK-------VAPKPFAKENTKP-QEFKLHTGQRALKR 71

Page 33: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Choosing alignment?OsMAP20 69 SQTSPKRSSPKHEQPLSYFRLHTEERAIKRAGFNYQVASKINTNEIIRR

S+ +PK + ++ +P F+LHT +RA+KRA FNY VA+KI NE +RPttMAP20 43 SKVAPKPFAKENTKPQE-FKLHTGQRALKRAMFNYSVATKIYMNEQQKR

OsMAP20 118 FEEKLSKVIEEREIKMMRKEMVHKAQLMPAFDKPFHPQRSTRPLTVPKEE++ K+IEE E++ MRKEMV +AQLMP FD+PF PQRS+RPLTVP+E

PttMAP20 91 QIERIQKIIEEEEVRTMRKEMVPRAQLMPYFDRPFFPQRSSRPLTVPRE

OsMAP20 167 PSFPSF

PttMAP20 140 PSF

OsMAP20 69 SQTSPKRSSPKHEQPLSYFRLHTEERAIKRAGFNYQVASKINTNEIIRRF 118|:.:||..:.::.:| ..|:|||.:||:|||.|||.||:||..||..:|.

PttMAP20 43 SKVAPKPFAKENTKP-QEFKLHTGQRALKRAMFNYSVATKIYMNEQQKRQ 91

OsMAP20 119 EEKLSKVIEEREIKMMRKEMVHKAQLMPAFDKPFHPQRSTRPLTVPKEPS 168.|::.|:|||.|::.||||||.:|||||.||:||.||||:||||||:|||

PttMAP20 92 IERIQKIIEEEEVRTMRKEMVPRAQLMPYFDRPFFPQRSSRPLTVPREPS 141

OsMAP20 169 F--LRLKC--CI 176| :..|| ||

PttMAP20 142 FHMVNSKCWSCI 153

Page 34: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Principle: Identity• Def: The identity in an alignment is the

fraction of identical paired symbols.• Early selection criteria: Choose alignment

with highest identity

Here: 62112 ≈ 55% identity

OsMAP20 69 SQTSPKRSSPKHEQPLSYFRLHTEERAIKRAGFNYQVASKINTNEIIRRF 118|:.:||..:.::.:| ..|:|||.:||:|||.|||.||:||..||..:|.

PttMAP20 43 SKVAPKPFAKENTKP-QEFKLHTGQRALKRAMFNYSVATKIYMNEQQKRQ 91

OsMAP20 119 EEKLSKVIEEREIKMMRKEMVHKAQLMPAFDKPFHPQRSTRPLTVPKEPS 168.|::.|:|||.|::.||||||.:|||||.||:||.||||:||||||:|||

PttMAP20 92 IERIQKIIEEEEVRTMRKEMVPRAQLMPYFDRPFFPQRSSRPLTVPREPS 141

OsMAP20 169 F--LRLKC--CI 176| :..|| ||

PttMAP20 142 FHMVNSKCWSCI 153

Page 35: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Principle: Identity• Def: The identity in an alignment is the

fraction of identical paired symbols.• Early selection criteria: Choose alignment

with highest identityHere: 62

112 ≈ 55% identityOsMAP20 69 SQTSPKRSSPKHEQPLSYFRLHTEERAIKRAGFNYQVASKINTNEIIRRF 118

|:.:||..:.::.:| ..|:|||.:||:|||.|||.||:||..||..:|.PttMAP20 43 SKVAPKPFAKENTKP-QEFKLHTGQRALKRAMFNYSVATKIYMNEQQKRQ 91

OsMAP20 119 EEKLSKVIEEREIKMMRKEMVHKAQLMPAFDKPFHPQRSTRPLTVPKEPS 168.|::.|:|||.|::.||||||.:|||||.||:||.||||:||||||:|||

PttMAP20 92 IERIQKIIEEEEVRTMRKEMVPRAQLMPYFDRPFFPQRSSRPLTVPREPS 141

OsMAP20 169 F--LRLKC--CI 176| :..|| ||

PttMAP20 142 FHMVNSKCWSCI 153

Page 36: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Scoring an alignment

• Identity looses info on similarity

• Better: assign score to every pair ofsymbols. s(x , y) = cExample: for DNA

s A T G CA 2 -1 1 -1T -1 2 -1 1G 1 -1 2 -1C -1 1 -1 2

• Indel scores: s(x ,−) = s(−, x)?= −1

Page 37: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Scoring an alignment

• Identity looses info on similarity• Better: assign score to every pair of

symbols. s(x , y) = cExample: for DNA

s A T G CA 2 -1 1 -1T -1 2 -1 1G 1 -1 2 -1C -1 1 -1 2

• Indel scores: s(x ,−) = s(−, x)?= −1

Page 38: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Scoring an alignment

• Identity looses info on similarity• Better: assign score to every pair of

symbols. s(x , y) = cExample: for DNA

s A T G CA 2 -1 1 -1T -1 2 -1 1G 1 -1 2 -1C -1 1 -1 2

• Indel scores: s(x ,−) = s(−, x)?= −1

Page 39: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Scoring an alignment• Alignment x , y from sequences x and y .

E.g.: x = AAGTT, y = AATT, alignment isx AAGTTy AA-TT

• Alignment score is

S(x , y) =

|x |∑i=1

s(xi , yi)

• Here:

S(x , y) = s(A, A) + s(A, A)

+ s(G,−) + s(T , T ) + s(T , T )

Page 40: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Scoring an alignment• Alignment x , y from sequences x and y .

E.g.: x = AAGTT, y = AATT, alignment isx AAGTTy AA-TT

• Alignment score is

S(x , y) =

|x |∑i=1

s(xi , yi)

• Here:

S(x , y) = s(A, A) + s(A, A)

+ s(G,−) + s(T , T ) + s(T , T )

Page 41: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

How do we choose an alignment?

• Want to choose best global alignment• Many alignments• Given x = x1x2 · · · xm and y = y1y2 · · · yn,

find x , y that maximize score S(x , y).

• Idea: Find best way of ending alignment

Page 42: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

How do we choose an alignment?

• Want to choose best global alignment• Many alignments• Given x = x1x2 · · · xm and y = y1y2 · · · yn,

find x , y that maximize score S(x , y).• Idea: Find best way of ending alignment

Page 43: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

How to end alignment: alternativesOne of:

x1 · · · xm−1y1 · · · yn−1

xmyn

Mm−1,n−1 + s(xm, yn)

or

x1 · · · xm−1y1 · · · yn

xm−

Mm−1,n + s(xm,−)

or

x1 · · · xmy1 · · · yn−1

−yn

Mm,n−1 + s(−, yn)

Page 44: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

How to end alignment: alternativesOne of:

x1 · · · xm−1y1 · · · yn−1

xmyn

Mm−1,n−1 + s(xm, yn)

or

x1 · · · xm−1y1 · · · yn

xm−

Mm−1,n + s(xm,−)

or

x1 · · · xmy1 · · · yn−1

−yn

Mm,n−1 + s(−, yn)

Page 45: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

How to end alignment: alternativesOne of:

x1 · · · xm−1y1 · · · yn−1

xmyn

Mm−1,n−1 + s(xm, yn)

or

x1 · · · xm−1y1 · · · yn

xm− Mm−1,n + s(xm,−)

or

x1 · · · xmy1 · · · yn−1

−yn

Mm,n−1 + s(−, yn)

Page 46: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

How to end alignment: alternativesOne of:

x1 · · · xm−1y1 · · · yn−1

xmyn

Mm−1,n−1 + s(xm, yn)

or

x1 · · · xm−1y1 · · · yn

xm− Mm−1,n + s(xm,−)

or

x1 · · · xmy1 · · · yn−1

−yn

Mm,n−1 + s(−, yn)

Page 47: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

A rekursion for max alignment score

Note: for global alignment

M0,0 = 0

Mm,n = max

Mm−1,n−1 + s(xm, yn) m > 0, n > 0Mm−1,n + s(xm,−) m > 0, n ≥ 0Mm,n−1 + s(−, yn) m ≥ 0, n > 0

We get:Mm,n = max

x ,yS(x , y)

Page 48: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Computing Mm,n

• Keep Mi ,j in a table• Table + Rekursion = Dynamic Programming• Needleman-Wunch algorithm

• mn elements in table⇒Time complexity is ∼ mn.

• When filling the table, note alternatives.• Backtracking for retrieving the alignment.

Page 49: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

Computing Mm,n

• Keep Mi ,j in a table• Table + Rekursion = Dynamic Programming• Needleman-Wunch algorithm• mn elements in table

⇒Time complexity is ∼ mn.• When filling the table, note alternatives.• Backtracking for retrieving the alignment.

Page 50: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

DP and backtracking

From Eddy, Nature Biotech, 2004

Page 51: Intro Sequence comparisons Visualization Alignments Scoring

Intro Sequence comparisons Visualization Alignments Scoring Algorithms

DP for local alignments

• Smith-Waterman algorithm• Allow ”restarting” from zero.

M0,0 = 0

Mm,n = max

Mm−1,n−1 + s(xm, yn) m > 0, n > 0Mm−1,n + s(xm,−) m > 0, n ≥ 0Mm,n−1 + s(−, yn) m ≥ 0, n > 00 ← Here!