Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
-
Upload
douglas-jones -
Category
Documents
-
view
233 -
download
0
Transcript of Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
![Page 1: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/1.jpg)
Building Phylogenies
Distance-Based Methods
![Page 2: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/2.jpg)
Methods
• Distance-based• Parsimony• Maximum likelihood
![Page 3: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/3.jpg)
Distance Matrices
a 0
b 6 0
c 7 3 0
d 14 10 9 0
a b c d
a
b
c
d
1 2 3 4 50 6 7 8
Distance matrix is additive if there is a tree that fits it exactly
![Page 4: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/4.jpg)
Ultrametric Matrices
a 0
b 2 0
c 6 6 0
d 10 10 10 0
a b c d
a
b
c
d
1 2 3 4 50
Additive + molecular clock assumption
![Page 5: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/5.jpg)
Methods
• Fitch - Margoliash• UPGMA• Neighbor-joining• Many others
![Page 6: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/6.jpg)
Least squares trees
• Minimize
over all trees
• Choice of weights wij :
– Uniform: wij 1
– Fitch-Margoliash: wij 1/Dij2
– Others . . .
ji
ijijij dDwQ 2
![Page 7: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/7.jpg)
Sarich's (1969) immunological distances
![Page 8: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/8.jpg)
Least squares tree for Sarich’s data
![Page 9: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/9.jpg)
Clustering Methods
• E.g., UPGMA and Neighbor-Joining• A cluster is a set of taxa• Interspecies distances translate into
intercluster distances• Clusters are repeatedly merged
– “Closest” clusters merged first– Distances are recomputed after
merging
![Page 10: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/10.jpg)
UPGMA• Unweighted pair group method using arithmetic
averages
• The distance between clusters Ci and Cj is
• After merging Ci and Cj to create cluster Ck define distance from k to every other cluster r as
ji CqCppq
ji
ij DCC
D,
1
ji
jjriir
krCC
CDCDD
![Page 11: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/11.jpg)
UPGMA: Initialization
1.Assign each sequence i to its own cluster Ci
2.Define one leaf (tip) of tree for each sequence and place it at height 0
![Page 12: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/12.jpg)
UPGMA: Iteration
1.Choose the two clusters i and j with smallest Dij
2.Create a new cluster k, where Ck = Ci Cj
3.Compute Dkr for all r.4.Define a new node k with children i and j,
and place it at height Dij /2.5.Add k to the current clusters and delete i
and j Let i and j be the remaining clusters.
Place root at height Dij /2
Repeat until only two clusters remain:
![Page 13: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/13.jpg)
UPGMA Example
![Page 14: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/14.jpg)
![Page 15: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/15.jpg)
![Page 16: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/16.jpg)
![Page 17: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/17.jpg)
UPGMA tree for Sarich’s data
![Page 18: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/18.jpg)
A pitfall of UPGMA
• The algorithm produces an ultrametric tree: the distance from the root to any leaf is the same
• UPGMA assumes a constant molecular clock: all species accumulate mutations (evolve) at the same rate.
![Page 19: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/19.jpg)
UPGMA fails when molecular clock assumption doesn’t
hold
![Page 20: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/20.jpg)
Neighbor Joining
• Saitou and Nei, Molecular Biology and Evolution 4 (1987)
• Idea: Find a pair of leaves that are close to each other but far from other leaves– Implicitly finds a pair of neighboring leaves
• Advantages: – Works well for additive and other nonadditive
matrices– Does not have the molecular clock assumption
![Page 21: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/21.jpg)
Long branches must be handled carefully!
0.1
0.1
0.1
0.4 0.4
and are closer to each other than to or . Obvious approach produces incorrect clusters!
![Page 22: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/22.jpg)
Compensating for long edges
Introduce “correction terms”
ji
iji Dn
u2
1
jiijij uuDD
“Corrected” distances:
Distances are reduced for pairs that are far away from all other species: They may be close to each other.
Average dist. to other taxa
![Page 23: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/23.jpg)
Neighbor-joining
1. Choose i, j such that Dij ui uj is minimum2. Define a new leaf k whose distances to i and j are
3. Compute the distance from k to every other leaf r
4. Delete i and j
ijijjk
jiijik
uuDd
uuDd
21
21
21
21
ijjrirkr DDDD 21
Repeat the following until only two leaves remain:
Connect the 2 remaining leaves by a branch of length Dij
![Page 24: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/24.jpg)
NJ tree for Sarich’s data
![Page 25: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/25.jpg)
Computing distance matrices
• Based on sequence alignment• Various possibilities:
– Distance = average number of differences– Try different PAM matrices; distance =
index of matrix that gives highest score– Feng and Doolitle: Based on alignment
scores – roughly ratio to max possible score (see text)
• Read, e.g., PHYLIP documentation:http://evolution.genetics.washington.edu/phylip/general.html
![Page 26: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/26.jpg)
Distance correction
• The amount of evolutionary change is not linearly related to time
• Over a long period of time, a series of substitutions may bring us back to where we started
• Percentage difference may underestimate evolutionary time
![Page 27: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/27.jpg)
Jukes-Cantor Model
![Page 28: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/28.jpg)
Correcting for multiple substitutions in the JC model
dt
34
1ln43
![Page 29: Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.](https://reader030.fdocuments.in/reader030/viewer/2022033100/56649d765503460f94a58281/html5/thumbnails/29.jpg)
Many other models!