A preliminary multigene phylogeny of the diatoms (Bacillariophyta
Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of...
-
Upload
abbigail-wootten -
Category
Documents
-
view
216 -
download
0
Transcript of Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of...
![Page 1: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/1.jpg)
Phylogenetic analysis in the context of multigene sequences
Sudhindra R. Gadagkar
University of Dayton
![Page 2: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/2.jpg)
DNA Evolution
The unifying force of all life on earth is DNA
Adenine, Cytosine, Guanine, Thymine
ATGGCATACGTGCAGTTCATCGGCTAGTGTGACATGA
![Page 3: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/3.jpg)
DNA sequence evolution
t0
t1ATGGCATACGTGCA
ATGGTATAGGTGCA
ATGGCATACGTGAA
![Page 4: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/4.jpg)
A phylogenetic treeA pattern of branching events, with each branching point showing a speciation (or divergence) event
Taxon ATaxon A
Taxon BTaxon B
3.53.5
3.53.5
7.57.5
44
Taxon CTaxon C
•Nodes (extinct ancestors) •Tips (living species)
•Branches (amount of evolution) •Taxon (pl. Taxa)
![Page 5: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/5.jpg)
• Reconstruction of the evolutionary relationships among “taxa”
•Representation in a graphical form.
What is phylogenetic inference?
M Fin Whale
M Blue Whale
M Cow
M Rat
M Mouse
M Opossum
B Chicken
A Xenopus
F Rainbow Trout
F Loach
F Carp
L Lamprey
S Sea urchin
0.05
M F
in W
hal
e
M B
lue
Wha
le
M C
ow
M Rat
M Mouse
M O
possum B C
hicken
A X
eno
pu
sF Rai
nbow
Tro
ut
F Loach
F Carp
L Lamprey
S S
ea urchin
0.05
M F
in W
hal
e
M Blue W
hale
M Cow
M Rat
M M
ouse
M O
po
ssum
B C
hic
ken
A X
enop
us
F Rainbow Tro
ut
F Loach
F Carp
L Lamprey
S S
ea urch
in
0.05
![Page 6: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/6.jpg)
Parts of a tree• Tree size: no. of taxa in the
phylogeny.
• Interior branch: partitions an unrooted tree into 2 subtrees, each containing 2 taxa.
• Cluster size: minimum of two subtree sizes partitioned by an interior branch.
• Depth of a branch: defined in terms of the no. of taxa clustered by it.
Root
Internal BranchF. Whale
B. Whale
Cow
Rat
Mouse
Opossum
External branch
Node
Outgroup
![Page 7: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/7.jpg)
Example of a 6-sequence treeF. Whale
B. Whale
Cow
Rat
Mouse
Opossum
F. Whale
B. Whale
Cow
Rat Mouse
Opossum
Rooted Tree
Unrooted Tree
![Page 8: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/8.jpg)
Phylogenetic analysis using DNA sequences
t0
t1ATGGCATACGTGCA
ATGGTATAGGTGCA
ATGGCATACGTGAA
![Page 9: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/9.jpg)
Gene Sequences
Homologous (orthologous) gene sequences• D. melanogaster ATGTCGTTGACCAACAAGAACGTGATTTTCGTGGCCGGTCT...• D. pseudoobscura ATGTCTCTCACCAACAAGAACGTCGTTTTCGTGGCCGGTCT...• D. crassifemur ATGTTCATCGCTGGCAAGAACATCATCTTTGTCGCTGGTCT...• D. mulleri ATGGCCATCGCTAACAAGAACATCATCTTCGTCGCTGGACT...
[ D.me D.ps D.cr D.mu][D.me] [D.ps] 0.14 [D.cr] 0.24 0.24 [D.mu] 0.21 0.20 0.21
Distance Matrix
D. melanogaster
D. pseudoobscura
D. mulleri
D. crassifemur
![Page 10: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/10.jpg)
Expected or Species tree
F. Whale
B. Whale
Cow
Rat
Mouse
Opossum
Realized tree for gene X
F. Whale
B. Whale
Cow
Rat
Mouse
Opossum
![Page 11: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/11.jpg)
Two-fold Challenge
• Today’s challenge is the flood of data, in two ways:
1. The increasing number of taxa (say, species) for which molecular data is available.
2. The increasing amount of molecular data that is available for each taxon.
![Page 12: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/12.jpg)
The number of possible trees increases enormously as the number of taxa increases
Why is reconstructing the evolutionary history of a large number of taxa a challenge?
![Page 13: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/13.jpg)
Number of rooted trees
• The number of bifurcating rooted trees is given by the following formula, where m is the number of taxa.
2
1 3 5 2 3
2 3 !
2 2 !m
m
m
m
Source: Nei and Kumar, 2000. Molecular Evolution and Phylogenetics
![Page 14: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/14.jpg)
3 taxa
Source: Brian Golding, Reconstructing Phylogenieshttp://helix.biology.mcmaster.ca/721/phylo/phylo.html
![Page 15: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/15.jpg)
4 taxa
Source: Brian Golding, Reconstructing Phylogenieshttp://helix.biology.mcmaster.ca/721/phylo/phylo.html
![Page 16: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/16.jpg)
More taxa
Source: Brian Golding, Reconstructing Phylogenieshttp://helix.biology.mcmaster.ca/721/phylo/phylo.html
![Page 17: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/17.jpg)
So many trees!
0
400
600
800
1000
1200
0 100 200 300 400
Millions
Billions
10200
10
10
10
10
10N
o. o
f P
oss i
ble
Tre
es
No. of Sequences
1079 atoms in the universe
1037 atoms in the bodies of all humans by year 2035
5 1030 prokaryotes living today
5 1011 stars in the milky way
How many trees represent the true relationship?
![Page 18: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/18.jpg)
Only ONE out of all possible trees is the true tree!
![Page 19: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/19.jpg)
Which is the true tree?
Choose a criterion (optimality criterion).
Score the fit of the data to a given tree for that criterion
Tree with the optimal score is chosen as the best tree.
Optimal tree found in this way is expected to be closest to the true tree.
![Page 20: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/20.jpg)
Optimality Criteria
Branch lengths computed for each tree using pair-wise distances obtained from sequences. Sum of branch lengths (S) is used as the optimality score.
Minimum Evolution (ME)
Branch lengths Computer
Data
Topology
Sum of branch lengths
Substitution Model
Distance Computer
Tree with the smallest S-value is chosen.
![Page 21: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/21.jpg)
The Neighbor-Joining method (Saitou and Nei, Mol. Biol.Evol. 4: 406 - 425, 1987)
![Page 22: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/22.jpg)
• Computationally efficient
• Desirable statistical properties
• Accuracy
• Performance with large phylogenies?
Properties of the NJ method
![Page 23: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/23.jpg)
Research ProblemPerformance of NJ optimality criteria in inferring large trees
Performance worse with more sequences?
More difficult to infer deep branches as compared to the shallow ones?
Reconstruct branches at similar depths in large and small trees with same efficiency?
![Page 24: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/24.jpg)
• 4 basic 6-taxa trees (topologies)
• Equal interior branch lengths
• Trees stacked to make larger trees (e.g., Dx = x trees of type D stacked)
Model trees and their features
E F G
D
D
D
D
D
D
D
D
B D
8
9
9
11
1 1
1
1
C
8
11
1
1
9
9
11 1 1
6
6
7
8
6
6
A
1 11
4
46
57
8
1
Kumar and Gadagkar, 2000, J. Mol. Evol.,51:544-553 (Fig. 1)
![Page 25: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/25.jpg)
Additional model topology - the rbcL tree
(From Hillis, Nature, 383:130-131, 1996)
![Page 26: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/26.jpg)
Tree parameters
• Rate: Up to 10 fold differences in rate.
• Sequence Length: Up to 10 multiples of 100 sites.
• Tree size: Ax, Bx, Cx, Dx, where x varied from 1 to 10, 16, and 32
![Page 27: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/27.jpg)
Simulating Evolutionary Change
• Starting point or “root” chosen.
• Random ancestral sequence generated
for the root.
• Branch length randomly obtained
from a Poisson distribution with mean
= expected no. of substitutions
(evolutionary rate sequence length
multiplier).
4
4
5
6
7
8
1
1
1
1
![Page 28: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/28.jpg)
• Equal probability of transition from one state to another.
• Process carried out for all branches
• Resulting data are sequences for the taxa for that “gene”.
• These sequences used to infer back the evolutionary
relationships using NJ.
• 1000 replications (A to D trees; 60 taxa), 100 reps (>
60 taxa, rbcL tree).
Simulating Evolutionary Change (contd.)
![Page 29: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/29.jpg)
Accurate Inference of Complete Trees
0
20
40
60
80
100
0 50 100 150 200
Number of sequences
% R
eco
very
of
co
mp
lete
tre
es
200 sites
500 sites
1000 sites
Kumar and Gadagkar, 2000, J. Mol. Evol.,51:544-553 (Table 1)
![Page 30: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/30.jpg)
Effect of 0-length branches on NJ performance
Sequence length (s)
0 200 400 600 800 10000
20
40
60
80
P0
PModel
PRealized100
% b
ranc
hes
corr
ect
Kumar and Gadagkar, 2000, J. Mol. Evol.,51:544-553 (Fig. 3)
![Page 31: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/31.jpg)
Reconstruction efficiency of 6 taxa monophyletic clusters
70
80
90
100
0 50 100 150 200
Number of sequences
200 sites
500 sites
1000 sites
% c
orre
ct r
epli
cate
s
![Page 32: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/32.jpg)
% branches inferred correctly
04
05
60
70
80
90
1 00
Tree size
618 30 42 54 96
192
0.00625
0.03125
0.0625
1000 sites
500 sites
200 sites
100 sites
Rat
e (r
)
Per
cen t
Eff
i ci e
ncy
(PB
R)
Kumar and Gadagkar, 2000, J. Mol. Evol.,51:544-553 (Fig. 4)
![Page 33: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/33.jpg)
Branch depth and NJ efficiency
70
80
90
100
Branch depth2 3 4 5 6 1 2 18 24 3 0 48 96
6
24
42
60
Tree size
192
pB
Kumar and Gadagkar, 2000, J. Mol. Evol.,51:544-553 (Fig. 5B)
![Page 34: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/34.jpg)
Shallow versus deep branchesResults from rbcL tree
70
80
90
100
2 5 8 11 14 18 23 26 29 32 47 51 54 67 74
Kumar and Gadagkar, 2000, J. Mol. Evol.,51:544-553 (Fig. 8B)
Branch depth
Rec
onst
ruct
ion
effi
cien
cy
![Page 35: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/35.jpg)
Branch depth and efficiency for different
inference methods (JC simulations)
Rosenberg and Kumar, 2001, Mol. Biol. Evol.,18:1823-1827 (Fig. 1)
![Page 36: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/36.jpg)
Branch depth and efficiency for different
inference methods (HKY simulations)
Rosenberg and Kumar, 2001, Mol. Biol. Evol.,18:1823-1827 (Fig. 2)
![Page 37: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/37.jpg)
The Challenge of Multi-Gene Sequences
• Multi-Gene/Whole Genome sequences increasingly available for many taxa.
• How best to obtain phylogenetic information from these multiple sequences?
![Page 38: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/38.jpg)
Concatenation vs Consensus
Concatenation approach
ATGCTGACTG ATGTCGTCAGTC
ATGCTGACTGATGTCGTCAGTC
A B C D E
A B C D E A B C D E
ATGCTGACTG ATGTCGTCAGTC
Consensus approach
A B C D E
![Page 39: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/39.jpg)
The worst-case scenario approach
• The worst-case scenario is when all the available genes yield highly incorrect phylogenetic reconstructions.
• When faced with such sequences, which strategy to employ: consensus or concatenation?
![Page 40: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/40.jpg)
Simulation with estimated parameters
• Model tree based on the phylogenetic relationships among 66 mammals from Murphy et al., (Nature 409:614-618, 2001).
![Page 41: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/41.jpg)
Source: Fig. 1 from Gadagkar, Rosenberg and Kumar, Molecular and Developmental Evolution (Accepted)
![Page 42: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/42.jpg)
Simulation with estimated parameters
• Sequences for 448 genes downloaded from HOVERGEN (Duret et al., Nucleic Acids Res. 22: 2360-2365, 1994).
• Sequence parameters (length, L, substitution rate, r, transition-transversion rate ratio, , and G+C content, ) were estimated from the data.
![Page 43: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/43.jpg)
Simulation with estimated parameters (contd.)
• For each of the 448 genes, 100 replicate sequences generated by computer simulation, using the estimated parameters and the HKY model of evolution.
![Page 44: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/44.jpg)
Computer Simulation
Rep1 Rep2 Rep3 . . . Rep100
Gene1
Gene2
Gene3
.
.
.
Gene448
![Page 45: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/45.jpg)
Simulation with estimated parameters (contd.)
• Phylogenetic inference was done on each of the 44,800 simulation replicates using NJ-JC and NJ-TN methods.
• The accuracy of each tree was recorded in terms of the number of incorrect branches when compared to the model tree.
![Page 46: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/46.jpg)
Spread of sequence attributes
0
20
40
60
80
Log Sequence Length
0
20
40
60
Substitution rate per site (x 10 - 9 )
0
20
40
60
80
log Kappa
0
20
40
60
G+C content
![Page 47: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/47.jpg)
Simulation lets us play God!• In computer simulation, evolution is simulated
based on a model tree, and replicate sequences are obtained.
• These replicate sequences are then used to infer back the true tree.
• Therefore, for the 100 simulation replicates for each of the 448 genes, we know the worst performing replicate.
![Page 48: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/48.jpg)
Simulation lets us play God!
D. melanogaster
D. pseudoobscura
D. mulleri
D. crassifemur
Start
![Page 49: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/49.jpg)
The Two-Gene Case
• Data: For each of NJ-JC and NJ-TN, we picked
* 10,000 pairs of worst replicates
* 10,000 pairs of randomly chosen replicates
![Page 50: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/50.jpg)
Two-gene concatenation
Source: Table 1 from Gadagkar, Rosenberg and Kumar, Molecular and Developmental Evolution (Accepted)
![Page 51: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/51.jpg)
Comparison of the number of incorrect
inferred branches (NJ-JC)
0
10
20
30
40
50
0 10 20 30 40 50
Gene 1 tree
Worst replicate pairs
0
10
20
30
40
50
0 10 20 30 40 50
Gene 1 tree
Random replicate pairs
![Page 52: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/52.jpg)
Effect of Gene Attributes
![Page 53: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/53.jpg)
Effect of gene attributes (contd.)
2.00
2.50
3.00
3.50
4.00
2.00 2.50 3.00 3.50 4.00
Log Length (Gene 1)
Log
Len
gth
(Gen
e 2)
![Page 54: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/54.jpg)
Quality of second gene
Source: Fig. 2 from Gadagkar, Rosenberg and Kumar, Molecular and Developmental Evolution (Accepted)
Worst case
Random case
![Page 55: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/55.jpg)
Progressive addition of genes
Source: Fig. 3 from Gadagkar, Rosenberg and Kumar, Molecular and Developmental Evolution (Accepted)
![Page 56: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/56.jpg)
Whe
n al
l 448
gen
es w
ere
used
Sour
ce: F
ig. 4
fro
m G
adag
kar,
Ros
enbe
rg a
nd K
umar
, Mol
ecul
ar a
nd
Dev
elop
men
tal E
volu
tion
(A
ccep
ted)
![Page 57: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/57.jpg)
Effect of neighboring branches
Source: Fig. 5 from Gadagkar, Rosenberg and Kumar, Molecular and Developmental Evolution (Accepted)
![Page 58: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/58.jpg)
Summary & Conclusions• Heck of a lot of data available
• Two dimensions – number of species, and number of sequences per species
• Many methods available to infer phylogenies from a large number of species
• Neighbor-joining (NJ), a fast, distance based algorithm works well and infers trees correctly as long as there are no polytomies (multifurcations) in the true tree
• NJ also infers shallow and deep branches with good and equal efficiency
![Page 59: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/59.jpg)
Summary and Conculsions – contd.
• Multigene data available for many species
• How best to obtain phylogenetic info from these sequences (consensus or concatenation)?
• Our simulation results, with biologically realistic parameters and the worst-case approach, show that concatenation is better
• However, concatenation approach appears excessively prone to certain systematic errors.
![Page 60: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/60.jpg)
Acknowledgements• Co-authors:
– Sudhir Kumar– Michael Rosenberg
• Help:– Roman Johnson– Tushar Gadagkar– Sankar Subramanian– Balaji Ramanujam
Arizona State University
University of Dayton
![Page 61: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/61.jpg)
please visit our Biology Department at:
http://biology.udayton.edu
To find out more about our graduate programs,
Apply online for free at:
http://gradadmission.udayton.edu
![Page 62: Phylogenetic analysis in the context of multigene sequences Sudhindra R. Gadagkar University of Dayton.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ca25503460f949617b1/html5/thumbnails/62.jpg)