Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember...
Transcript of Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember...
![Page 1: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/1.jpg)
Introduction to Bioinformatics for Computer Scientists
Lecture 5
![Page 2: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/2.jpg)
Course Beers or Coffee
● Who wants to volunteer for organizing this?
![Page 3: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/3.jpg)
Plan for next lectures
● Today: Multiple Sequence Alignment● Lecture 6: Introduction to phylogenetics● Lecture 7: Phylogenetic search algorithms● Lecture 8 (Kassian): The phylogenetic
Maximum Likelihood Model● Lecture 8 (Kassian): The phylogenetic
Maximum Likelihood Model (continued)
![Page 4: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/4.jpg)
Insertions, Deletions & Substitutions
Tim
e
ATTGCG
CTTGCGATTGCG
ATGCG ATTGCG CTTGCG CTTGCAAG
![Page 5: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/5.jpg)
Insertions, Deletions & Substitutions
Tim
e
ATTGCG
CTTGCGATTGCG
ATGCG ATTGCG CTTGCG CTTGCAAG
A → C: substitution
![Page 6: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/6.jpg)
Insertions, Deletions & Substitutions
Tim
e
ATTGCG
CTTGCGATTGCG
ATGCG ATTGCG CTTGCG CTTGCAAG
A → C: substitution
VOID → AA: insertion
![Page 7: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/7.jpg)
Insertions, Deletions & Substitutions
Tim
e
ATTGCG
CTTGCGATTGCG
ATGCG ATTGCG CTTGCG CTTGCAAG
A → C: substitution
VOID → AA: insertion
We call this: “an indel”From insertion-deletionThe indel length here is 2, longer indels lengths are not uncommon!
![Page 8: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/8.jpg)
Insertions, Deletions & Substitutions
Tim
e
ATTGCG
CTTGCGATTGCG
AT-GCG ATTGCG CTTGCG CTTGCAAG
A → C: substitution
VOID → AA: insertion
T → VOID: deletion
![Page 9: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/9.jpg)
Insertions, Deletions & Substitutions
Tim
e
ATTGCG
CTTGCGATTGCG
AT-GCG ATTGCG CTTGCG CTTGCAAG
A → C: substitution
VOID → AA: insertionT → VOID: deletion
AT-GC--GATTGC--GCTTGC--GCTTGCAAG
Aligned data:
![Page 10: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/10.jpg)
Insertions, Deletions & Substitutions
Tim
e
ATTGCG
CTTGCGATTGCG
AT-GCG ATTGCG CTTGCG CTTGCAAG
A → C: substitution
VOID → AA: insertionT → VOID: deletion
AT-GC--GATTGC--GCTTGC--GCTTGCAAG
Aligned data:
Compute which characters share a common evolutionary history!
This is also called: inferring homology
![Page 11: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/11.jpg)
Multiple Sequence Alignment
● So far:● Comparing two sequences● Mapping a sequence/read to a reference genome
● What do we do when we want to compare more than two sequences at a time?
● Multiple Sequence Alignment (MSA)● Open question: how do we assess the quality/accuracy of
MSA algorithms?
→ nice review paper: “Who watches the watchmen?” http://arxiv.org/abs/1211.2160
![Page 12: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/12.jpg)
Why do we need MSAs?
● Input for phylogenetic reconstruction● Discover important (conserved) parts of a
protein family ● Protein family → group of evolutionarily related
genes/proteins in different species with similar function/structure
● Family has a different meaning than in taxonomy!
![Page 13: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/13.jpg)
MSA
● Generalization of pair-wise sequence alignment problem
● Given n orthologous sequences s1,...,sn of different lengths, insert gaps “-” such that:● All sequences have the same length● Some criterion is optimized● Corresponding (homologous) characters in si and sj
are aligned to each other (in the same alignment column/site)
● Columns/sites that entirely consist of gaps are not allowed
![Page 14: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/14.jpg)
MSA Terminology
s1 M Q P I L L L
s2 M L R - L L -
s3 M K - I L L L
s4 M P P V L I L
Alignment site/Alignment column
Orthologous sequences:Sequences in different species that have evolved from the same ancestral gene
→ sequences that share a common evolutionary history
![Page 15: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/15.jpg)
MSA Terminology
s1 M Q P I L L L
s2 M L R - L L -
s3 M K - I L L L
s4 M P P V L I L
Alignment site/Alignment column
Homologous characters: Characters that share a commonevolutionary history
![Page 16: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/16.jpg)
Orthology
Species tree
speciation
speciation
Gene duplication
Gene lineage
![Page 17: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/17.jpg)
Orthology
Species tree
speciation
speciation
Gene duplication
Gene lineage
orthologous
![Page 18: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/18.jpg)
Orthology
Species tree
speciation
speciation
Gene duplication
Gene lineage
orthologous
paralogous
![Page 19: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/19.jpg)
Orthology
Species tree
speciation
speciation
Gene duplication
Gene lineage
orthologous
paralogous
homologous
![Page 20: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/20.jpg)
Homology
● High sequence similarity does not automatically induce homology● Same sequence (gene function) can have evolved
independently twice → convergent evolution● For short sequences: similar by chance
parent
offspring offspring
parentparent
offspringoffspring
![Page 21: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/21.jpg)
Convergent Evolution
![Page 22: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/22.jpg)
Orthology Assignment
● Numerous methods available● Will not be covered here● Let's assume that we have a set of n
orthologous sequences s1,...,sn and see how we can align them
![Page 23: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/23.jpg)
Alignment Criteria
● How do we define alignment quality?● There are different criteria
● The SP (sum of pairs) measure● Real data benchmarks● Evolutionary measures● Simulations
![Page 24: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/24.jpg)
Alignment Criteria
● How do we define alignment quality?● There are different criteria
● The SP (sum of pairs) measure● Real data benchmarks● Evolutionary measures● Simulations
![Page 25: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/25.jpg)
The SP measure
● SP: sum-of-pairs score● Score each MSA site and then add up the
scores over all sites● Penalize mismatches and gaps● Favor matches● The per-site score is defined as the sum of all
pairwise scores between characters of a site
![Page 26: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/26.jpg)
SP an example
● SP-score(I, -, I, V) =
p(I,-) + p(I, I) + p(I, V) + p(-, I) + p(-, V) + p(I, V)
● Where p() is the penalty function and p(-,-) := 0● Given a MSA with n sequences and m sites we
can thus compute the overall score as:sp = 0;
for(i = 0; i < m; i++)
sp += SPscore(sites[i]);
![Page 27: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/27.jpg)
An example
s1 A A G A A - A
s2 A T - A A T G
s3 C T G - G - G
Using the the edit distance for p() the score is:2 + 2 + 2 + 2 + 2 + 2 + 2 = 14Note that, we can also compute this as the sum of pair-wise edit distancesbetween the aligned sequences e(s1,s2) + e(s1,s3) + e(s2,s3) = 4 + 5 + 5Keep in mind that, p(-,-) := 0
![Page 28: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/28.jpg)
The SP measure
● Note that, this is only one way to quantify the quality of an alignment
● One can build and alignment algorithm that optimizes the SP measure
● However, alignments (MSAs) with larger SP scores may better represent the true evolutionary history of the characters!
![Page 29: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/29.jpg)
How can we extend pair-wise alignment to triple-wise alignment?
● Any ideas?● What is the time and space complexity?
![Page 30: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/30.jpg)
SP-based optimization
● We can extend the dynamic programming approach for pair-wise sequence alignment to n sequences to calculate an SP-optimal MSA
● Assume that all n sequences have equal length m
● Storing the dynamic programming matrix requires O(mn) space
● And the lower bound for time is also O(mn) because all mn
entries need to be computed → consider an example with n:= 3
● As you can imagine computing the SP-optimal MSA is NP-complete
![Page 31: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/31.jpg)
SP-based MSA
● NP-complete ● Not granted that SP is the correct (biologically
most plausible) criterion!● Depends on -arbitrary- choice of scoring
function p()● We need heuristics!● We will have a look at some basic heuristics in
the following ...
![Page 32: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/32.jpg)
Star Alignment Heuristics
● Pick a center sequence sc
● Align all remaining sequences to sc using a pairwise sequence alignment algorithm
● “Once a gap, always a gap” strategy
→ gaps inserted into sc can not be removed again
● sc can be picked by computing all O(n2) [more precisely: (n2 / 2) - n] optimal pair-wise alignments and selecting the sequence that has the largest similarity to all other sequences
![Page 33: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/33.jpg)
Star Alignment
s1: ATTGCCATT
s2: ATGGCCATT
s3: ATCCAATTTT
s4: ATCTTCTT
s5: ACTGACC
![Page 34: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/34.jpg)
Star Alignment
s1: ATTGCCATT ← center sequence
s2: ATGGCCATT
s3: ATCCAATTTT
s4: ATCTTCTT
s5: ACTGACC
![Page 35: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/35.jpg)
Star Alignment
s1
s2 s3
s4 s5
![Page 36: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/36.jpg)
Star Alignment
s1: ATTGCCATT
s2: ATGGCCATT
s1: ATTGCCATT--
s3: ATC-CAATTTT
s1: ATTGCCATT
s4: ATCTTC-TT
s1: ATTGCCATT
s5: ACTGACC--
![Page 37: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/37.jpg)
Star Alignment
s1: ATTGCCATT
s2: ATGGCCATT
s1: ATTGCCATT--
s3: ATC-CAATTTT
s1: ATTGCCATT--
s4: ATCTTC-TT--
s1: ATTGCCATT--
s5: ACTGACC----
Gaps inserted
“Once a gap, always a gap”
![Page 38: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/38.jpg)
The Star Alignment
s1: ATTGCCATT--
s2: ATGGCCATT--
s3: ATC-CAATTTT
s4: ATCTTC-TT--
s5: ACTGACC----
![Page 39: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/39.jpg)
Another Example
s1:ATTGCCATT
s2:ATGGCCATT
s3:ATCCAATTTT
s4:ATCTTCTT
s5:ATTGCCGATT
![Page 40: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/40.jpg)
Another Example
s1:ATTGCCATT
s2:ATGGCCATT
s1:ATTGCCATT--
s3:AT-CCAATTTT
s1:ATTGCCATT
s4:ATCTTC-TT
s1:ATTGCC-ATT
s5:ATTGCCGATT
Pairwise alignment step
![Page 41: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/41.jpg)
Another Example
s1:ATTGCCATT
s2:ATGGCCATT
s1:ATTGCCATT--
s3:AT-CCAATTTT
s1:ATTGCCATT
s4:ATCTTC-TT
s1:ATTGCC-ATT
s5:ATTGCCGATT
s1:ATTGCCATT
s2:ATGGCCATT
Pairwise alignment step
![Page 42: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/42.jpg)
Another Example
s1:ATTGCCATT
s2:ATGGCCATT
s1:ATTGCCATT--
s3:AT-CCAATTTT
s1:ATTGCCATT
s4:ATCTTC-TT
s1:ATTGCC-ATT
s5:ATTGCCGATT
s1:ATTGCCATT
s2:ATGGCCATT
s1:ATTGCCATT--
s2:ATGGCCATT--
s3:AT-CCAATTTT
Pairwise alignment step
![Page 43: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/43.jpg)
Another Example
s1:ATTGCCATT
s2:ATGGCCATT
s1:ATTGCCATT--
s3:AT-CCAATTTT
s1:ATTGCCATT
s4:ATCTTC-TT
s1:ATTGCC-ATT
s5:ATTGCCGATT
s1:ATTGCCATT
s2:ATGGCCATT
s1:ATTGCCATT--
s2:ATGGCCATT--
s3:AT-CCAATTTT
s1:ATTGCCATT--
S2:ATGGCCATT--
S3:AT-CCAATTTT
s4:ATCTTC-TT--
Pairwise alignment step
![Page 44: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/44.jpg)
Another Example
s1:ATTGCCATT
s2:ATGGCCATT
s1:ATTGCCATT--
s3:AT-CCAATTTT
s1:ATTGCCATT
s4:ATCTTC-TT
s1:ATTGCC-ATT
s5:ATTGCCGATT
s1:ATTGCCATT
s2:ATGGCCATT
s1:ATTGCCATT--
s2:ATGGCCATT--
s3:AT-CCAATTTT
s1:ATTGCCATT--
S2:ATGGCCATT--
S3:AT-CCAATTTT
s4:ATCTTC-TT--
s1:ATTGCC-ATT--
S2:ATGGCC-ATT--
S3:AT-CCA-ATTTT
s4:ATCTTC--TT--
s5:ATTGCCGATT--
Shift right!
Pairwise alignment step Merging step
![Page 45: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/45.jpg)
Star Alignment Heuristics
● Produces an MSA whose SP score is < 2 * optimum
● Proof omitted● Reference: D. Gusfield “Efficient methods for
multiple sequence alignment with guaranteed error bounds”, Bulletin of Mathematical Biology, 1993.
![Page 46: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/46.jpg)
Tree Alignment
● If an evolutionary tree for the sequences is available
CAT
GT
CTG
CG
![Page 47: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/47.jpg)
Tree Alignment
● Find an assignment of sequences to the inner nodes such that the sum over the similarity scores on all branches is maximized
CAT
GT
CTG
CG
![Page 48: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/48.jpg)
Tree Alignment
p(a,b) := 1 if a = b
p(a,b) := 0 if a ≠b
p(a,-) := -1
CAT
GT
CTG
CG
CT CG
![Page 49: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/49.jpg)
Tree Alignment
p(a,b) := 1 if a = b
p(a,b) := 0 if a ≠b
p(a,-) := -1
CAT
GT
CTG
CG
CT CG
CATC-T
CGCG
CTGT
C-GCTG
CTCG
![Page 50: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/50.jpg)
Tree Alignment
p(a,b) := 1 if a = b
p(a,b) := 0 if a ≠b
p(a,-) := -1
CAT
GT
CTG
CG
CT CG1
1
1
1
2
![Page 51: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/51.jpg)
Tree Alignment
p(a,b) := 1 if a = b
p(a,b) := 0 if a ≠b
p(a,-) := -1
CAT
GT
CTG
CG
CT CG1
1
1
1
2
Overall score: 6 → maximize this score
![Page 52: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/52.jpg)
Tree Alignment
p(a,b) := 1 if a = b
p(a,b) := 0 if a ≠b
p(a,-) := -1
CAT
GT
CTG
CG
CT CG1
1
1
1
2
Overall score: 6 → maximize this scoreThis problem is NP-hard because we don't have the ancestral states
![Page 53: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/53.jpg)
Tree-Based Alignment
● Hen and egg problem
→ we need a MSA to build a tree
→ we need a tree to compute a MSA
→ if the alignment is wrong, the tree might be wrong
→ if the tree is wrong, the MSA might be wrong● One idea
→ simultaneous inference of tree & alignment
→ very hard problem: trying to solve two generally NP-hard problems simultaneously
![Page 54: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/54.jpg)
Practical approaches
s1 sn
s1
sn
Build a pair-wise distance matrix
![Page 55: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/55.jpg)
Practical approaches
s1 sn
s1
sn
Build a pair-wise distance matrix
Computation of pair-wise distance matrix Using pair-wise alignment scores can be time and memory-intensive due to O(n2) complexityOne may use approximate distance methods based on k-mers (remember last lecture!)
![Page 56: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/56.jpg)
Practical approaches
s1 sn
s1
sn
root
s1 s6 s44
s23
s33sn
Guide tree
![Page 57: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/57.jpg)
Practical approaches
s1 sn
s1
sn
root
s1 s6 s44
s23
s33sn
root
s1 s6 s44
s23
s33sn
rootroot
Post-order traversal to buildan alignment bottom-up
![Page 58: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/58.jpg)
Practical approaches
s1 sn
s1
sn
root
s1 s6 s44
s23
s33sn
root
s1 s6 s44
s23
s33sn
rootroot
Pair-wise sequencealignment
Pair-wise profilealignment
![Page 59: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/59.jpg)
Practical Approaches
● Guide-tree approach● Compute all (n2/2)-n pair-wise distances (alignments) between the n
sequences● Use these distances for hierarchical clustering
● e.g. with the neighbor joining algorithm → we will see this later-on for tree building
● Use the distance-based tree to calculate pair-wise● Sequence-sequence● Sequence-profile● Profile-profile
● … alignments bottom up toward the root via a post-order tree traversal ● Many widely-used MSA programs rely on this idea: e.g., Clustal family
of tools, T-COFFEE
![Page 60: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/60.jpg)
Progressive MSA
AC ATG TCG TCC
![Page 61: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/61.jpg)
Progressive MSA
AC ATG TCG TCC
ATGA-C
![Page 62: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/62.jpg)
Progressive MSA
AC ATG TCG TCC
ATGA-C
TCCTCG
![Page 63: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/63.jpg)
Progressive MSA
AC ATG TCG TCC
ATGA-C
TCCTCG
-TCC-TCGATG-A-C-
![Page 64: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/64.jpg)
Progressive MSA
AC ATG TCG TCC
ATGA-C
TCCTCG
-TCC-TCGATG-A-C-
Merge alignments ofthe two descendant nodes
![Page 65: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/65.jpg)
Profile Alignment
GC
CC
TT
AA T- GC
-TCC-TCGATG-A-C-
![Page 66: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/66.jpg)
Profile Alignment
● Generalization of pair-wise sequence alignment to pair-wise profile alignment
● Average over all possibilities
0123456789S1: PEEKSAVTALS2: GEEKAAVLALS3: PADKTNVKAAS4: AADKTNVKAA
0123456789S5: EGEWGLVLHVS6: AAEKTKIRSA
S5S6
S4S3S2S1
![Page 67: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/67.jpg)
Profile Alignment
● Generalization of pair-wise sequence alignment to pair-wise profile alignment
● Average over all possibilities
0123456789S1: PEEKSAVTALS2: GEEKAAVLALS3: PADKTNVKAAS4: AADKTNVKAA
0123456789S5: EGEWGLVLHVS6: AAEKTKIRSA
S5S6
S4S3S2S1
Compute score between position 6 of x and position 7 of y
x y
![Page 68: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/68.jpg)
Profile Alignment
● Generalization of pair-wise sequence alignment to pair-wise profile alignment
● Average over all possibilities
0123456789S1: PEEKSAVTALS2: GEEKAAVLALS3: PADKTNVKAAS4: AADKTNVKAA
0123456789S5: EGEWGLVLHVS6: AAEKTKIRSA
S5S6
S4S3S2S1
Weighted average over all 8 (2 * 4) possibilities:Score: 1/8 * [p(T,V) + p(T,I) + p(L, V) + p(L, I) + p(K,V) + p(K,I) + p(K,V) + p(K,I)]
x y
![Page 69: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/69.jpg)
Problems with progressive MSA
● Initial pair-wise alignments are “frozen”● Can't be corrected when new evidence emerges
x
y
z
w
x: GAAGTTy: GAC-TT → frozen by initial alignment
z: GAACTGw: GTACTG
y: GA-CTT
should be flipped
![Page 70: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/70.jpg)
Iterative Progressive MSA
● e.g. MUSCLE, PRRP, MAFFT● Execute progressive MSA several times to re-
fine the alignment
![Page 71: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/71.jpg)
MUSCLE Re-Finement
![Page 72: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/72.jpg)
MUSCLE Re-Finement
![Page 73: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/73.jpg)
MUSCLE Re-Finement
![Page 74: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/74.jpg)
MUSCLE Re-Finement
![Page 75: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/75.jpg)
Motif-based approaches
● Find a small motif (substring) common to all sequences
● Called: anchor, block, region, q-gram etc● If motif is found → shift sequences such that
the motifs are “in alignment”● Then, align regions around these motifs using
for instance progressive alignment
![Page 76: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/76.jpg)
Becnhmarking MSAs
● MSA benchmarks → mostly structural protein data that has been manually aligned to reflect the protein structure● Databases: BALiBASE 2.0, OXBench, PREFAB, etc
● Simulation
→ focus on alignment
→ focus on phylogeny
![Page 77: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/77.jpg)
Simulation
true MSA
simulate
![Page 78: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/78.jpg)
Simulation
true MSA
simulate
disalign
ACGTTTTACGGGTTTACGTTTGGCAATTTTTT
![Page 79: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/79.jpg)
Simulation
true MSA
simulate
disalign
ACGTTTTACGGGTTTACGTTTGGCAATTTTTT
aligninferred MSA
![Page 80: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/80.jpg)
Simulation
true MSA
simulate
disalign
ACGTTTTACGGGTTTACGTTTGGCAATTTTTT
aligninferred MSA
Count correct sitesCompare SP scores
![Page 81: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/81.jpg)
Simulation
true MSA
simulate
disalign
ACGTTTTACGGGTTTACGTTTGGCAATTTTTT
aligninferred MSA
Infer tree
![Page 82: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/82.jpg)
Simulation
true MSA
simulate
disalign
ACGTTTTACGGGTTTACGTTTGGCAATTTTTT
aligninferred MSA
Infer tree
Compare trees
![Page 83: Lecture 5 - HITS gGmbHsco.h-its.org/exelixis/web/teaching/lectures14_15/lecture5.pdf · (remember last lecture!) Practical approaches s1 sn s1 sn root s1 s6 s44 s23 s33 sn Guide tree.](https://reader035.fdocuments.in/reader035/viewer/2022081407/5f27c0459a452f284e2f4b9d/html5/thumbnails/83.jpg)
Summary
● MSA is generally difficult due to lack of objective criteria● MSA as defined per SP score is NP-complete● Tree-alignment MSA is also NP-complete● There exist heuristics with performance guarantees● However, practical approaches use ad hoc heuristics that typically
perform better● Classes of algorithms
● Progressive MSA● Progressive iterative MSA● Motif-based approaches● Statistical MSA (not covered)● Phylogeny-aware MSA (not covered)● Simultaneous MSA & tree inference (not covered)