Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups)...
-
Upload
angela-owen -
Category
Documents
-
view
219 -
download
0
description
Transcript of Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups)...
Parsimony and searching tree-space
The basic idea
To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived traits).
More simply put, we assume that if species are similar it is usually due to common descent rather than due to chance.
Sometimes the data agreesACCGCTTA
ACTGCTTA
ACTGCTAAACTGCTTA
ACCCCTTA
ACCCCTTA
Tim
e
ACCCCATA
ACCCCTTAACCCCATAACTGCTTAACTGCTAA
Sometimes notACCGCTTA
ACTGCTTA
ACTGCTAAACTGCTTC
ACCCCTTA
ACCCCTTC
Tim
e
ACCCCATA
ACCCCTTCACCCCATAACTGCTTCACTGCTAA
Homoplasy When we have two or more characters that can’t
possibly fit on the same tree without requiring one character to undergo a parallel change or reversal it is called homoplasy.
ACCGCTTA
ACTGCTTA
ACTGCTAAACTGCTTC
ACCCCTTA
ACCCCTTC
Tim
e
ACCCCATA
How can we choose the best tree?
To decide which tree is best we can use an optimality criterion.
Parsimony is one such criterion. It chooses the tree which requires the fewest
mutations to explain the data. The Principle of Parsimony is the general
scientific principle that accepts the simplest of two explanations as preferable.
S1 ACCCCTTC S2 ACCCCATA S3 ACTGCTTC S4 ACTGCTAA(1,2),(3,4)(1,3),(2,4)
1
2
3
4
1
3
2
4
S1 ACCCCTTC S2 ACCCCATA S3 ACTGCTTC S4 ACTGCTAA(1,2),(3,4) 0(1,3),(2,4) 0
A
A
A
A
A
A
A
A
S1 ACCCCTTC S2 ACCCCATA S3 ACTGCTTC S4 ACTGCTAA(1,2),(3,4) 001(1,3),(2,4) 002
C
C
T
T
C
T
C
T
C T
C
TT
C
S1 ACCCCTTC S2 ACCCCATA S3 ACTGCTTC S4 ACTGCTAA(1,2),(3,4) 0011(1,3),(2,4) 0022
C
C
G
G
C
G
C
G
C G
C
GG
C
S1 ACCCCTTC S2 ACCCCATA S3 ACTGCTTC S4 ACTGCTAA(1,2),(3,4) 001101(1,3),(2,4) 002201
T
A
T
T
T
T
A
T
T
A
A
T
S1 ACCCCTTC S2 ACCCCATA S3 ACTGCTTC S4 ACTGCTAA(1,2),(3,4) 0011011(1,3),(2,4) 0022011
T
T
T
A
T
T
T
A
T
A
T
A
S1 ACCCCTTC S2 ACCCCATA S3 ACTGCTTC S4 ACTGCTAA(1,2),(3,4) 00110112 6(1,3),(2,4) 00220111 7
C
A
C
A
C
C
A
A
A
C
C
A
C A
According to the parsimony optimality criterion we should prefer the tree (1,2),(3,4) over the tree (1,3),(2,4) as it requires the fewest mutations.
Maximum Parsimony The parsimony criterion tries to minimise the
number of mutations required to explain the data
The “Small Parsimony Problem” is to compute the number of mutations required on a given tree.
For small examples it is straightforward to see how many mutations are neededCat
Dog
Rat
Mouse
A
A
G
G
G
A
Cat
Dog
RatA
G
G
A
A
A
Mouse
The Fitch algorithm For larger examples we need an algorithm to solve
the small parsimony problem
a
b cd
e
fgh
Sitea Ab Ac Cd Ce Gf Gg Th A
The Fitch algorithm
Label the tips of the tree with the observed sequence at the site
A
A CC
G
GTA
The Fitch algorithm
Pick an arbitrary root to work towards
A
A CC
G
GTA
The Fitch algorithm Work from the tips of the tree towards the root. Label
each node with the intersection of the states of its child nodes.
If the intersection is empty label the node with the union and add one to the cost
A
A CC
G
GTA
A
{A,T}A
{C,G}C{A,C,G}
Cost 4
Fitch continued…
The Fitch algorithm also has a second phase that allocates states to the internal nodes but it does not affect the cost.
To find the Fitch cost of an alignment for a particular tree we just sum the Fitch costs of all the sites.
The “large parsimony problem”
The small parsimony problem – to find the score of a given tree - can be solved in linear time in the size of the tree.
The large parsimony problem is to find the tree with minimum score.
It is known to be NP-Hard.
How many trees are there?
#species #unrooted binary tip-labelled trees
4 35 3*5=156 3*5*7=1057 3*5*7*9=945
10 2,027,02520 2.2*1020
n (2n-5)!!
An exact search for the best tree, where each tree is evaluated according to some optimality criterion such as parsimony quickly becomes intractable as the number of species increases
Counting trees1
2 3
1 1 1
1
2
2 2
5 2
43
3
3 34 44
1 3 2
45
5 1 2
43
1 2 5
43
1 4 2
53
1
1 x 3 = 3
1 x 3 x 5 = 15
Search strategies Exact search
possible for small n only
Branch and Bound up to ~20 taxa
Local Search - Heuristics pick a good starting tree and use moves within a
“neighbourhood” to find a better tree.
Meta-heuristics Genetic algorithms Simulated annealing The ratchet
Exact searches for small number of taxa (n<=12) it is
possible to compute the score of every tree
Branch and Bound searches also guarantee to find the optimal solution but use some clever rules to avoid having to check all trees. They may be effective for up to 25 taxa.
http://evolution.gs.washington.edu/gs541/2005/lecture25.pdf
http://evolution.gs.washington.edu/gs541/2005/lecture25.pdf
No need to evaluate this whole branch of the search tree, as no tree can have a score better than 9
The problem of local optima
Nearest Neighbor Interchange(NNI)
Subtree Pruning Regrafting (SPR)
Tree Bisection Reconnection(TBR)