Lecture 24
description
Transcript of Lecture 24
![Page 1: Lecture 24](https://reader036.fdocuments.in/reader036/viewer/2022062809/56815935550346895dc66e77/html5/thumbnails/1.jpg)
Bioinformatics
• Inferring molecular phylogeny
• Distance methods
• Discrete methods
• Comparisons of different tree building methods
• Estimating sampling error: the bootstrap
Lecture 24
![Page 2: Lecture 24](https://reader036.fdocuments.in/reader036/viewer/2022062809/56815935550346895dc66e77/html5/thumbnails/2.jpg)
Inferring molecular phylogeny
• The objective of molecular phylogenetics is to convert sequences information (DNA, RNA, proteins) into an evolutionary tree for this sequences.
• Ever growing number of tree building methods can very roughly be split into two approaches.
• Distance methods versus discrete characters methods.
• Clustering methods versus search methods.
• These methods will be considered during the lecture.
![Page 3: Lecture 24](https://reader036.fdocuments.in/reader036/viewer/2022062809/56815935550346895dc66e77/html5/thumbnails/3.jpg)
Distance methods
• The simplest distance method based on assumption of constant substitution rates and approximately equal length of neighboring branches called UPGMA (Unweighted Pair Group Method with Arithmetic Mean).
• A distance matrix, representing distances between all possible pairs of sequences used for the phylogenetic reconstruction must be built as a first step.
• The UPGMA starts from calculating branch length
![Page 4: Lecture 24](https://reader036.fdocuments.in/reader036/viewer/2022062809/56815935550346895dc66e77/html5/thumbnails/4.jpg)
Distance methods: an idealised case
A. Sequences
Sequence A ACGCGTTGGGCGATGGCAACSequence B ACGCGTTGGGCGACGGTAATSequence C ACGCATTGAATGATGATAATSequence B ACACATTGAGTGATAATAAT
B. Distances between sequences
nAB 3nAC 7nAD 8nBC 6nBD 7nCD 3
OTU A B C D
A - 3 7 8
B - - 6 7
C - - - 3
D - - - -
C. Distance table
D. The assumed unrooted tree
A C
DB
1
1
2
24
![Page 5: Lecture 24](https://reader036.fdocuments.in/reader036/viewer/2022062809/56815935550346895dc66e77/html5/thumbnails/5.jpg)
Diagram illustrating the stepwise construction of a phylogenetic tree for four OTUs according to unweighted pair group method with arithmetic
mean (UPGMA). The resulting tree is ultrametric. Methods used: distance and clustering.
8--C
1311-B
71114A
DCB
11-B
9.513.5AD
CB
A
D
dAD
2 d(AB)C
2d(ADC)B)
2
3.5
(AD)B = (AB + DB)/2
Values for these tables are calculated from the data presented in the initial table
(ADC)B = (AB + DB + CB)/3
A
D
C
3.5
4.75
6.33
A
D
C
3.5
4.75
B
12.67ADC
B
(AD)C = (AC + DC)/2
![Page 6: Lecture 24](https://reader036.fdocuments.in/reader036/viewer/2022062809/56815935550346895dc66e77/html5/thumbnails/6.jpg)
Neighbours-joining tree construction. Methods: distance and clustering.
OTU H C G O
C 1.45* - - -
G 1.51 1.57 - -
O 2.98 2.94 3.04 -
R 7.51 7.55 7.39 7.10
H – Human
C – Chimpanzee
G – Gorilla
O – Orangutan
R – Rhesus monkey
* Number of nucleotide substitutions per 100 sites between OTUs.
![Page 7: Lecture 24](https://reader036.fdocuments.in/reader036/viewer/2022062809/56815935550346895dc66e77/html5/thumbnails/7.jpg)
Neighbours-relation scores obtained from the distance matrix (see previous slide)
Calculation of the total scores:
(dHG + dCO) – min score
each pair (HG) and (CO) is assigned score of 1; other pairs score 0.
As a result the scores are obtained, which are shown in the table.
(OR) has the highest total score.
![Page 8: Lecture 24](https://reader036.fdocuments.in/reader036/viewer/2022062809/56815935550346895dc66e77/html5/thumbnails/8.jpg)
Building Neighbours-Joining (NJ) tree
5.225.255.25(OR)
1.571.51G
1.45C
GCHOTU
Treating (OR), which has the highest total score, as a separate single OUT, the following table can be calculated.
As only 4 OTUs are left, it is easy to see that dHC + dG(OR) = 6.67 <
< dHG + dC(OR) = 6.76 <
< dH(OR) + DCG = 6.82
Therefore, H and C are chosen as one pair of neighbours G and (OR) as the other.
![Page 9: Lecture 24](https://reader036.fdocuments.in/reader036/viewer/2022062809/56815935550346895dc66e77/html5/thumbnails/9.jpg)
Maximum parsimonyMethods: discrete characters and search/optimisation
Informative sites (*) in four compared sequences, used for phylogenetic reconstruction.
Site
Sequence1 2 3 4 5 6 7 8 9
1 A A G A G T G C A
2 A G C C G T G C G
3 A G A T A T C C A
4 A G A G A T C C G
Inf. sites * * *
![Page 10: Lecture 24](https://reader036.fdocuments.in/reader036/viewer/2022062809/56815935550346895dc66e77/html5/thumbnails/10.jpg)
Three possible unrooted trees (I, II and III) for four DNA sequences (1, 2, 3, 4) that have been used to
choose the most parsimonious tree.
![Page 11: Lecture 24](https://reader036.fdocuments.in/reader036/viewer/2022062809/56815935550346895dc66e77/html5/thumbnails/11.jpg)
Comparison of different tree-building methods
• Efficiency (how fast is the method?),
• Power (how much data does the method need to produce reasonable result?)
• Consistency (will it converge on the right answer given enough data?)
• Robustness (will minor violations of the method’s assumptions result in poor estimates of phylogeny?)
• Falsibility (will the method tell when its assumption violated, in order to avoid using this method)
![Page 12: Lecture 24](https://reader036.fdocuments.in/reader036/viewer/2022062809/56815935550346895dc66e77/html5/thumbnails/12.jpg)
Performance of UPGMA and parsimony methods
UPGMA PARSIMONY
The success rate is the percentage of times that the correct tree was recovered in that region of the parameter space. White area in the left top of the both diagram, where non of the methods performs well
![Page 13: Lecture 24](https://reader036.fdocuments.in/reader036/viewer/2022062809/56815935550346895dc66e77/html5/thumbnails/13.jpg)
![Page 14: Lecture 24](https://reader036.fdocuments.in/reader036/viewer/2022062809/56815935550346895dc66e77/html5/thumbnails/14.jpg)
MEGA 3
![Page 15: Lecture 24](https://reader036.fdocuments.in/reader036/viewer/2022062809/56815935550346895dc66e77/html5/thumbnails/15.jpg)
MEGA3: Sequence Data Explorer
Variable sites
Parsimonious sites
Sequences continue
![Page 16: Lecture 24](https://reader036.fdocuments.in/reader036/viewer/2022062809/56815935550346895dc66e77/html5/thumbnails/16.jpg)
MEGA 3: phylogenetic trees
Neighbor- joining (NJ) Minimum evolution (ME)
Maximum Parsimony (MP) UPGMA
![Page 17: Lecture 24](https://reader036.fdocuments.in/reader036/viewer/2022062809/56815935550346895dc66e77/html5/thumbnails/17.jpg)
Bootstrapping
NJ ME
MP UPGMA