A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually...
-
Upload
theodore-george -
Category
Documents
-
view
219 -
download
0
Transcript of A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually...
![Page 1: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/1.jpg)
A brief introduction to phylogenetics
![Page 2: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/2.jpg)
Definition:
The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences diverged from a common ancestor
Simplest distance: p-distance
= proportion of sites that are different
Genetic Distance
![Page 3: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/3.jpg)
A T T G C G CC
A T T G C G CT
CT
A A TA
C A
Differences
Sub
stit
utio
ns
Correcting for ‘multiple substitutions’
![Page 4: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/4.jpg)
Correcting for multiple substitutions
Requires a statistical ‘model’ of how the process of substitution works to correct for
- Differences in the rates of different substitution types (e.g. Jukes and Cantor – all substitutions are treated the same versus Kimura 2-parameter model – distinguishes between transitions and transversions)
- Different frequencies of different nucleotides (e.g. GC content – the HKY model adds nucleotide frequency parameters to the Kimura 2-parameter model)
- Different rates at different sites (often modelled using a distribution – e.g. Gamma distribution – see next)
![Page 5: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/5.jpg)
In order to perform a gamma correction for site specific rates you need to know the shape of the gamma distribution
![Page 6: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/6.jpg)
Correcting for multiple substitutions (continued…)
Correction for multiple substitutions implies a model of evolution, but some models have many more parameters than others
- Models with few parameters are easy to fit, but may miss some important biology (e.g. there’s typically a big difference between rates of transition and transversion, and it would be dangerous not to model that). Simple models can underfit the data.
- Complex models (many parameters) may be difficult and much slower to estimate. There can also be a danger of over-fitting the data when more parameters are included in a model than are necessary.
(see later…)
![Page 7: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/7.jpg)
Some general points:
- genetic distances can be far greater than 1
- smaller genetic distances are more reliable
- model choice has a bigger impact for distantly related sequences
- normally positions with gaps are ignored (complete deletion)
- IF you know the rate of evolution for a pair of sequences (and if the rate has remained more or less constant) you can estimate the date at which they diverged
![Page 8: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/8.jpg)
Phylogenetic tree
Diagram consisting of branches and nodes
Branches indicate relationships between the ‘objects’
Internal branches define partitions of the objects
![Page 9: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/9.jpg)
![Page 10: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/10.jpg)
Rooting the Tree
• In an unrooted tree the direction of evolution is unknown
• The root is the hypothesized ancestor of the sequences in the tree
• The root can either be placed on a branch or at a node
• You should start by viewing an unrooted tree
![Page 11: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/11.jpg)
![Page 12: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/12.jpg)
![Page 13: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/13.jpg)
• Many software packages will root trees automatically (e.g. mid-point rooting in NJPlot)
• This always involves assumptions… BEWARE!
![Page 14: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/14.jpg)
Rooting Using an Outgroup
1. The outgroup should be a sequence (or set of sequences) known to be less closely related to the rest of the sequences than they are to each other
2. It should ideally be as closely related as possible to the rest of the sequences while still satisfying condition 1
The root must be somewhere between the outgroup and the rest (either on the node or in a branch)
![Page 15: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/15.jpg)
Sometimes two trees may look very different but, in fact, differ only in the position of the
root
![Page 16: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/16.jpg)
Two trees are different if one tree specifies at least one partition that is not present in the other
Looking at trees
![Page 17: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/17.jpg)
A
B
C
D
E
F
G
H
I
J
0.01
![Page 18: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/18.jpg)
I
J
H
F
G
D
E
A
B
C
0.01
![Page 19: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/19.jpg)
I
J
H
F
G
D
E
A
B
C
0.01
A
B
C
D
E
F
G
H
I
J
0.01
![Page 20: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/20.jpg)
A
B
G
D
E
H
I
J
F
C
0.01
I
J
H
F
G
D
E
A
B
C
0.01
![Page 21: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/21.jpg)
Phylogenetic Inference
Distance, parsimony and maximum likelihood methods
![Page 22: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/22.jpg)
need
optimality criteria
+
algorithm to search for the best tree given the optimality criteria
![Page 23: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/23.jpg)
Best tree Vs True tree
![Page 24: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/24.jpg)
Types of optimality criteria used to infer phylogeny from sequence
• Distance methods• Parsimony• Likelihood• Others
![Page 25: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/25.jpg)
Distance based methods
Minimum Evolution Principal
“The tree with the smallest sum of branch lengths is the best tree”
![Page 26: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/26.jpg)
t
r s
u v
dAB ~ r + sdCD ~ u + vdAD ~ r + t + vdBC ~ s + t + uetc.
A B
C D
(r, s, u, v, t are estimated so that these relationships are as close as possible to being correct)
Tree length = u + v + t + r + s
![Page 27: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/27.jpg)
)!3(2
)!52(3
n
nN
nu
Number of possible unrooted trees from n sequences:
e.g. for 20 sequences there are approximately 1020
![Page 28: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/28.jpg)
For realistic numbers of sequences it is impossible to consider all possible trees.
Need algorithms that can arrive at the ‘best tree’ without considering all possible trees.
![Page 29: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/29.jpg)
Neighbour joining is a very fast approximation to minimum
evolution
![Page 30: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/30.jpg)
Neighbour Joining
87
6
54
1
2
3
8
7
6
5
23
4
1
Choose the pair that minimizes the length of the resulting tree
![Page 31: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/31.jpg)
Maximum Parsimony
Occam’s Razor
Entia non sunt multiplicanda praeter necessitatem.
William of Occam (1300-1349)
The best tree is the one which requires the least number of substitutions
![Page 32: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/32.jpg)
• Check each topology• Count the minimum number of changes required
to explain the data• Choose the tree with the smallest number of
changes• Usually performs well with closely related
sequences – but often performs badly with very distantly related sequences
• With distantly related sequences homoplasy becomes a major problem
![Page 33: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/33.jpg)
Informative sites: Not all sites contain information about the tree topology using the parsimony approach
Homoplasy: characters that are similar for reasons other than common ancestry (increasingly a problem as sequences become more divergent)
![Page 34: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/34.jpg)
Branch & Bound: A method that does not have to consider all trees but still guarantees finding the ‘best’ tree. Slow for large numbers of sequences.
Heuristic methods (No guarantee of finding the best tree)
- Start with some tree (e.g. the neighbour-joining tree)
- Consider making a random change to the tree
- make the change if it improves the score of the tree
- stop making changes when you can find no further improvement
NNI -> SPR -> TBR
(NNI fastest and least rigorous, TBR slowest and most rigorous)
Methods for searching for the ‘best’ tree without considering all trees
![Page 35: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/35.jpg)
How confident are we that the tree is correct?
Bootstrap values
Bootstrapping is a statistical technique that can use random resampling of data to determine sampling error for tree topologies
![Page 36: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/36.jpg)
Bootstrapping phylogenies
• Characters are resampled with replacement to create many bootstrap replicate data sets
• Each bootstrap replicate data set is analysed (e.g. with parsimony, distance, ML etc.)
• Agreement among the resulting trees is summarized with a majority-rule consensus tree
• Frequencies of occurrence of groups, bootstrap proportions (BPs), are a measure of support for those groups
![Page 37: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/37.jpg)
Pagurus bernhardus
Pagurus acadianus
Ellasochirus tenuimanus
Labidochirus splendescens
Lithodes aequispina
Paralithodes camtschatica
Pagurus pollicaris (NE)
Pagurus pollicaris (GU)
Pagurus longicarpus (NE)
Pagurus longicarpus (GU)
Clibanarius vittatus
Coenobita sp.
Artemia salina
82
100
99
100
100100
98
97
81
99
0.05
![Page 38: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/38.jpg)
• Bootstrapping is a very valuable and widely used technique (it is demanded by some journals)
• BPs give an idea of how likely a given branch would be to be unaffected if additional data, with the same distribution, became available
• BPs are not the same as confidence intervals. There is no simple mapping between bootstrap values and confidence intervals. There is no agreement about what constitutes a ‘good’ bootstrap value (> 70%, > 80%, > 85% ????)
• Some theoretical work indicates that BPs can be a conservative estimate of confidence
Bootstrap - interpretation
![Page 39: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/39.jpg)
Inferring trees using Likelihood
![Page 40: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/40.jpg)
The ‘optimality criterion’
The best tree is the one that makes the data have the highest likelihood
The ML optimality criterion will lead to the correct tree given
- enough data (e.g. long enough sequence alignment)
- the correct model (e.g. Kimura 2 parameter model)
![Page 41: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/41.jpg)
Distance
Lik
elih
ood
A C G
G A G
Suppose we have a model of evolution (e.g. Jukes & Cantor) that allows us to work out the probability of each pair of characters, given a particular genetic distance (c.f. series of scoring matrices like BLOSUM, PAM etc)
D = 0.3
L = 0.06
D = 0.6
0.6 * 0.6 * 0.4 = 0.144
D = 0.9
0.9 * 0.9 * 0.1 = 0.081
![Page 42: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/42.jpg)
Genetic Distance using Maximum Likelihood
• Require a model of evolution
• Optimise all parameters of the model
• Each evolutionary ‘event’ has an associated likelihood given an inferred genetic distance
• The likelihood of the sequence-pair is a function of the genetic distance (just the product of the likelihoods of each of the inferred ‘events’ at each sequence position)
• Function is maximized
![Page 43: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/43.jpg)
Phylogenetic trees using Maximum Likelihood
• Require a model of evolution• Each substitution has an associated likelihood given a
branch of a certain length• A function is derived to represent the likelihood of the data
given the tree, branch-lengths and additional parameters• Optimise over parameters of the model• Optimise over branch lengths• Sum the likelihood over all possible sequences at ancestral
nodes• Search for the best tree (using heuristics such as TBR)
![Page 44: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/44.jpg)
Models can be made more parameter rich to increase their realism
• The most common additional parameters are:
– A correction to allow different rates for each type of nucleotide change
– Parameters for equilibrium base frequencies
– A correction for the proportion of sites which are unable to change
– A correction for variable rates at those sites which can change
• The values of the additional parameters will be estimated in the process
![Page 45: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/45.jpg)
Likelihood and the number of parameters
More parameters always leads to a better fit of the data
![Page 46: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/46.jpg)
Likelihood and the number of parameters
More parameters always leads to a better fit of the data
![Page 47: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/47.jpg)
More parameters always leads to a higher value of the likelihood whether or not the additional parameters are providing a ‘significantly’ better fit to the data
![Page 48: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/48.jpg)
Are the extra parameters justified?
- Likelihood ratio test
Has chi-squared distribution
dof = number of additional parameters
Maximum Likelihood | H1
Maximum Likelihood | H0Likelihood ratio statistic: 2 log ( )
![Page 49: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/49.jpg)
One model is nested in another if it is a special case of the more general model
e.g. the Jukes and Cantor model and Kimura 2P model
G
C
T
A
GCTA
G
C
T
A
GCTA
J-C K2P
![Page 50: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/50.jpg)
Modeltest
- Uses PAUP
- Tries out many nested models of nucleotide substitution
- Decides how many parameters are justified by the data
GTR does not overfit the data for at least some HIV sequences
![Page 51: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/51.jpg)
Bayesian methods
![Page 52: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/52.jpg)
The ‘optimality criterion’
The best tree is the one that has the highest probability of being the true tree
![Page 53: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/53.jpg)
Likelihood: Choose the tree that makes the data the most likely
Bayesian: Choose the most probable tree (tree with the highest posterior probability)
)|( TDPEquivalent to maximizing
)|( DTPEquivalent to maximizing
![Page 54: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/54.jpg)
Bayes’ Rule
Probability = Likelihood X Prior Information
Some normalising factors
Mathematically: )(
)()|()|(
DP
TPTDPDTP
T = Tree
D = Data
![Page 55: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/55.jpg)
Important Terms
Prior probability: the probability of the event before considering the data
Posterior probability: the probability of the event after taking the data into consideration
![Page 56: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/56.jpg)
In molecular phylogenetics the prior is usually ‘flat’ so the max likelihood tree is usually also the max probability tree
So why bother?
![Page 57: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/57.jpg)
2. Because this formulation allows us to use another approach to get to the best tree (MCMC – see later)
3. Also allows us to integrate over parameters instead of optimising over parameters
1. Because we get the answer as a probability
![Page 58: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/58.jpg)
MCMC (Markov Chain Monte Carlo)
Produces a long chain of trees/parameters sampled according to their probability
The number of times the chain visits tree X is proportional to the probability of tree X
![Page 59: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/59.jpg)
Burnin
• Typically the chain will take some time before trees are sampled according to their probability
• Initially probability of trees increases with time
• Programmes need to be allowed to run until the probabilities are fluctuating randomly about a constant mean
• Data generated before the chain reaches a steadystate are discarded
![Page 60: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/60.jpg)
Bayesian methods can be
- relatively fast
- easily interpretable
- often very accurate
![Page 61: A brief introduction to phylogenetics. Definition: The number of evolutionary events (usually nucleotide substitutions) that have occurred since two sequences.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649efd5503460f94c11325/html5/thumbnails/61.jpg)
But
- sometimes overestimate confidence
- difficult to be sure of convergence (less of a problem with more recent software versions)
=> difficult to decide how long to run the chain
Software for Bayesian phylogenetics: MrBayes