Terminology of phylogenetic trees Types of phylogenetic trees Types of Data Character Evolution
Processing & Testing Phylogenetic Trees. Rooting.
-
Upload
austen-cannon -
Category
Documents
-
view
227 -
download
0
Transcript of Processing & Testing Phylogenetic Trees. Rooting.
Processing & Processing & Testing Testing
Phylogenetic Phylogenetic TreesTrees
RootingRooting
Rooting
1. Outgroup RootingOutgroup Rooting: Based on external Information.
2. Midpoint RootingMidpoint Rooting: Direct a posteriori use of the ultrametricity assumption.
3. Largest-Genetic-Variability-Group Largest-Genetic-Variability-Group RootingRooting: Indirect a posteriori use of the ultrametricity assumption.
Rooting with outgroupRooting with outgroup
plant
plant
plant
fungus
animal
animal
animal
Unrooted tree
Are fungi relatives of animals or plants?
Rooting with outgroupRooting with outgroup
plant
plant
plant
fungus
animal
animal
animal
Unrooted tree
Add an outgroup, e.g., a bacterium.
Rooted tree
Rooting with outgroupRooting with outgroup
plant
plant
plant
fungus
animal
animal
animal
bacterium
root
animal
animal
animal
fungus
Unrooted tree
plant
plantplant
Monophyletic group
Monophyleticgroup
bacterialoutgroup
Midpoint rooting
Largest variation = Most ancientLargest variation = Most ancient
Species Divergence TimesSpecies Divergence TimesIf we know T1 and the rate of evolution, then we can infer T2.
If we know T2 and the rate of evolution, then we can infer T1.
r =KAC+KBC
4T1
If T1 is known T2 =KAB2r
=KABT1
KAC+KBC
If T2 is known T1=KAC
+KBC( )T2
2KAB
•Dating divergence events requires paleontological calibrations.
•This is a complicated problem.
Topological comparisonsTopological comparisons• Topological comparisons entail measuring the similarity or dissimilarity among tree topologies. • The need to compare topologies may arise when dealing with trees that have been inferred from analyses of different sets of data or from different types of analysis of the same data set. • When two trees derived from different data sets or different methodologies are identical, they are said to be congruent. • Congruence can sometimes be partial, i.e., limited to some parts of the trees, other parts being incongruent.
Penny and Hendy's topological distance (dT)
A commonly used measure of dissimilarity between two tree topologies. The measure is based on tree partitioning.
dT = 2c
c = the number of partitions resulting in different divisions of the OTUs in the two tree topologies under consideration.
Trees inferred from the Trees inferred from the analysis of a particular analysis of a particular data set are called data set are called fundamental treesfundamental trees, i.e., , i.e., they summarize the they summarize the phylogenetic information in phylogenetic information in a data set. a data set.
Consensus treesConsensus trees are trees are trees that summarize the that summarize the phylogenetic information in phylogenetic information in a set of fundamental trees.a set of fundamental trees.
•In a strict consensus treestrict consensus tree, all conflicting branching patterns are collapsed into multifurcations. •In a X% majority-rule consensus trees majority-rule consensus trees, a branching pattern that occurs with a frequency of X% or more is adopted. •When X = 100%, the majority-rule consensus tree will be identical with the strict consensus tree.
A tree is an A tree is an evolutionary evolutionary hypothesishypothesis
Q: How can we ascertain that the methodology we have used yields reliable results?
A: We can test the methodology on a phylogeny that is known for certain to be true, and compare the inferred phylogeny with the true phylogeny.
Caminalcules are a group of artificial organisms (belonging to the genus Caminalculus) that were invented by Dr. Joseph H. Camin from the University of Kansas.
Interested in how taxonomists group species, he designed these creatures to show an evolutionary pattern of divergence and diversification in morphology. There are 29 recent “species” of Caminalculus and 48 fossil forms.
The Caminalcules first appeared in print in the journal Systematic Zoology (now Systematic Biology) in 1983, four years after Camin's death in 1979. The first four papers on Caminalcules were written by Robert R. Sokal.
Joseph H. Camin (1922–1979)
Extant
Extinct
Assessing tree Assessing tree reliabilityreliability
Phylogenetic reconstruction is a problem of statistical inference. One must assess the reliability of the inferred phylogeny and its component parts.
Questions:
(1) how reliable is the tree?(2) which parts of the tree are reliable? (3) is this tree significantly better than another one?
BootstrappiBootstrappingng
•A statistical A statistical technique that technique that uses intensive uses intensive random resampling random resampling of data to of data to estimate a estimate a statistic whose statistic whose underlying underlying distribution is distribution is unknownunknown..
•Characters are Characters are resampled with resampled with replacement replacement to create many to create many bootstrap bootstrap replicate data sets replicate data sets ((pseudosamplespseudosamples))
•Each bootstrap replicate data set Each bootstrap replicate data set is is analyzedanalyzed
•Frequency of occurrence of a group Frequency of occurrence of a group (bootstrap proportions) is a (bootstrap proportions) is a measure of support for the groupmeasure of support for the group
BootstrappiBootstrappingng
Bootstrapping - an Bootstrapping - an exampleexample
Ciliate SSUrDNA - parsimony bootstrap
123456789 Freq-----------------.**...... 100.00...**.... 100.00.....**.. 100.00...****.. 100.00...****** 95.50.......** 23.33...****.* 11.83...*****. 3.83.*******. 2.50.**....*. 1.00.**.....* 1.00
Partition Table
Ochromonas (1)
Symbiodinium (2)
Prorocentrum (3)
Euplotes (8)
Tetrahymena (9)
Loxodes (4)
Tracheloraphis (5)
Spirostomum (6)
Gruberia (7)
100
96
23
100
100
100
Reduction of a phylogenetic tree by the collapsing of internal branches associated with bootstrap values that are lower than a critical value (C).
(a) Gene tree for -tubulin (b) C = 50% (c) C = 90%
•All these tests use the null All these tests use the null hypothesis that the hypothesis that the differences between two trees differences between two trees (A and B) are no greater than (A and B) are no greater than expected by chance (from the expected by chance (from the sampling error).sampling error).
Tests for two competing trees
Likelihood Ratio Likelihood Ratio TestTest
•Likelihood of Hypothesis 1 = Likelihood of Hypothesis 1 = LL11
•Likelihood of Hypothesis 2 = Likelihood of Hypothesis 2 = LL22
= 2(ln = 2(ln LL1 1 – ln– ln LL22))•Compare Compare to to 22 distribution distributionor to a simulated distribution.or to a simulated distribution.
Reliability of Phylogenetic Reliability of Phylogenetic MethodsMethods
• Phylogenetic methods can also be evaluated in Phylogenetic methods can also be evaluated in terms of their general performance, particularly terms of their general performance, particularly their:their:
consistency - approach the truth with more dataconsistency - approach the truth with more data
efficiency - how quickly can they handle how much dataefficiency - how quickly can they handle how much data
robustness - how sensitive to violations of assumptionsrobustness - how sensitive to violations of assumptions
Problems with long branches
With long branches most methods may yield erroneous trees. For example, the maximum-parsimony method tends to cluster long branches together. This phenomenon is called long-branch attraction or the Felsenstein zone