Lecture 8 – Searching Tree Space. The Search Tree.

12
Lecture 8 – Searching Tree Space

Transcript of Lecture 8 – Searching Tree Space. The Search Tree.

Page 1: Lecture 8 – Searching Tree Space. The Search Tree.

Lecture 8 – Searching Tree Space

Page 2: Lecture 8 – Searching Tree Space. The Search Tree.

The Search Tree

Page 3: Lecture 8 – Searching Tree Space. The Search Tree.

A. Nearest-neighbor interchange (NNI)

There are 2(n – 3) NNI rearrangements for any tree

Page 4: Lecture 8 – Searching Tree Space. The Search Tree.

B. Subtree Pruning-Regrafting (SPR)

4(n – 3)(n – 2) SPR rearrangements

Page 5: Lecture 8 – Searching Tree Space. The Search Tree.

C. Tree bisection-reconnection (TBR)

All branches are bisected, and reconnected in all possible ways. It’s not possible to generalize how many TBR rearrangements could be made for a tree of a given size (as we could with NNI & SPR), but TBR swapping searches tree space more

thoroughly, than SPR or NNI.

Page 6: Lecture 8 – Searching Tree Space. The Search Tree.

How greedy should we be?

26 taxon data set and first, let’s be very greedy .

Ignore ties in building starting tree and in swapping.

NNI, examine 42 trees

SPR, examine 2072 trees

TBR, examine 5816 trees

Less greedy - save all equally optimal trees at each step.

NN, examine 140 treesSPR, examine 6212 treesTBR, examine 16,604 trees

Page 7: Lecture 8 – Searching Tree Space. The Search Tree.

Random Addition Sequence and Tree Islands

So in the above example, using the least greedy strategies and using starting trees generated by 100 random addition sequences, we’ll look at 341,355 different trees.

First Last

First TimesIsland Size tree tree Score replicate hit--------------------------------------------------------------------------------- 1 2 1 2 278 1 99 2 1 - - 279 97 1

Page 8: Lecture 8 – Searching Tree Space. The Search Tree.

Transforming Tree Space

May be better off spending less effort searching on one island and more effort searching for multiple islands

Parsimony Ratchet (Nixon. 1999. Cladistics. 15: 407 )

Alternate searches using real data and searches on perturbed data set.

Get a starting tree by stepwise addition from the real data

Reweight a random set (20-25%) characters: this transforms tree space.

Hill climb from the starting tree via greedy TBR with perturbed data.

If a better tree is found, use that tree to start TBR using original data.

This is iterated a couple hundred times.

Page 9: Lecture 8 – Searching Tree Space. The Search Tree.

Simulated Annealing

Designed to search a large, complex, discrete search space

Laura Salter Kubatko was one of the first to apply it to phylogenies as a means of estimating ML trees (Salter and Perl, 2001. Syst. Biol. 50:7).

Metropolis-Hastings approach to search tree space and permits down-hill moves.

Steps:Generate an initial state (a starting tree). Initially, a random tree was

used.Propose a stochastic change to the initial state (usually a minor change).

This was initially derived via a random NNI.

If the proposal improves the tree (has a better ML score), the move is accepted.

Proposals (NNIs) that degrade the tree are accepted with a small probability proportional to how much worse the proposed tree is.

Early on, the acceptance probability is high and decreases as the search runs.

Page 10: Lecture 8 – Searching Tree Space. The Search Tree.

Simulated Annealing & RAxML

Stamatakis makes use of a modified simulated annealing in RAxML.

First, he starts with a tree generated by stepwise addition using parsimony.

The SA approach is used to alter topology via SPR under ML, but only thebranches involved in the swapping are reoptimized (lazy SPR).

Third, Stamatakis builds proposals to alter branch lengths and model parameters that are only accepted if they improve the likelihood (i.e.,

this aspect of the searches are entirely hill climbing).

This approach allows pretty thorough searches of tree space really quickly, which permits us to estimate ML trees for very large data sets (e.g., a couple thousand taxa).

Page 11: Lecture 8 – Searching Tree Space. The Search Tree.

Genetic Algorithms Paul Lewis (1998. Mol. Biol. Evol. 15:277)

There are n individuals: tree with parameters and branch lengths.

Ranked by their likelihood. Tree with highest fitness leaves k offspring in the next generation. Other trees leave offspring proportional to rank,

All offspring are subject to branch length and model mutations.

Some offspring ((n-1)/m) are subject to random SPR mutations.

Page 12: Lecture 8 – Searching Tree Space. The Search Tree.

Genetic Algorithms

Recombination searches tree space broadly.

GARLi was written by Derek Zwickl and modifies Lewis’ GA.

Topological mutations include NNI & SPR rearrangements and some SPR.

Starting trees are generated via stepwise addition with random addition sequences.

This approaches allows thorough searches of tree space for up to a couple thousand taxa.