BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D....

16
BALANCED MINIMUM EVOLUTION

Transcript of BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D....

Page 1: BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.

BALANCED MINIMUM EVOLUTION

Page 2: BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.

DISTANCE BASED PHYLOGENETIC RECONSTRUCTION

1. Compute distance matrix D. 2. Find binary tree using just D.

Balanced Minimum Evolution (BME) is a distance based method to go from a distance matrix to a phylogenetic tree.

Page 3: BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.

MINIMUM EVOLUTION PHYLOGENETIC RECONSTRUCTION

Fixed distance matrix.

Tree topology being considered.

Assign branch lengths using ME.

Sum up branch lengths(ex. 36)

Goal: Find tree topology T with smallest sum of branch lengths (assigned by ME).

That is, find smallest sum of branch lengths for all (2n-5)!! binary tree topologies!

Page 4: BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.

MINIMUM EVOLUTION PHYLOGENETIC RECONSTRUCTION

• Given the matrix of pairwise evolutionary distances, the ME approach estimates the length of any given tree topology and then selects the tree topology with shortest length.

• Minimum evolution is conceptually close to character-based parsimony.

• Complies with Occam’s principle of scientific inference, which essentially maintains that simpler explanations are preferable to more complicated ones and that ad hoc explanations should be avoided.

• Numerous variants of the ME principle exist, depending on how the branch lengths are estimated and how the tree length is calculated from these branch lengths.

Page 5: BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.

MINIMUM EVOLUTION PHYLOGENETIC RECONSTRUCTION

Fixed distance matrix.

Tree topology being considered.

Assign branch lengths using ME.

Sum up branch lengths(ex. 36)

How do we assign branch lengths to a tree topology???

Page 6: BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.

LEAST SQUARES ESTIMATE (HOW TO ASSIGN BRANCH LENGTHS TO A TREE TOPOLOGY)

Least Squares

Observe red data points.

Find blue quadratic which minimizes sum of the squared distances from the red points to the blue quadratic.

ME analogy for least squares on trees

Red dots Estimated distances (D)

Blue quadratic Binary tree

Residual/Error Sum of branch lengths

Page 7: BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.

MINIMUM EVOLUTION PHYLOGENETIC RECONSTRUCTION

Fixed distance matrix.

Tree topology being considered.

Assign branch lengths using least squares.

Sum up branch lengths(ex. 36)

Goal: Find tree topology T with smallest sum of branch lengths (assigned by ME).

That is, find smallest sum of branch lengths for all (2n-5)!! binary tree topologies!

Page 8: BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.

LEAST SQUARES ASSIGNMENT OF BRANCH LENGTHS

• If distance estimates are independent with the same variance, use ordinary least squares (OLS).

• If distance estimates are independent with different variance, use weighted least squares (WLS). (This is BME!)

• Well known that distance estimates obtained from sequences do not have the same variance, because the largest distances are much more variable than the shortest ones (Fitch and Margoliash, 1967) and are mutually dependent when they share a common history (or path) in the true phylogeny (Nei and Jin, 1989).

• Thus ordinary least-squares poorly fits the features of evolutionary distance data.

Page 9: BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.

BALANCED MINIMUM EVOLUTION• In BME, sibling subtrees have equal weight, as opposed to the

standard unweighted OLS, where all taxa have the same weight and thus the weight of a subtree is equal to the number of its taxa.

• BME is consistent!

• BME is NP-Hard [W. Day (87)].

• BME outperforms Neighbor Joining, BIONJ, WEIGHBOR and FITCH [Desper, Gascuel 2002].

• Software (and web version) FastME is a heuristic which finds the BME solution. Uses NNI and SPR moves.

Page 10: BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.

WHY IS IT CALLED “BALANCED”?

or

is the balanced distance between taxa in A and B in tree T.

If B is composed to two subtrees B1 and B2:

= distance estimate.

Page 11: BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.

PAUPLIN’S FORMULA(SHORTCUT FOR BME!)

D is the distance matrix. T is the tree topology considered.

is the sum of branch lengths assigned by BME.

Page 12: BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.

BME VERSION 2.0 (PAUPLIN’S FORMULA)

Instead of assigning branch lengths to tree topology T using weighted least squares then summing edge lengths, cut to the chase and use Pauplin’s formula!

Given distance matrix D, find binary tree T with the smallest sum of total branch lengths:

Page 13: BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.

EXERCISEWhich tree is the BME optimal?Why?

Page 14: BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.

FASTME ON THE WEB

http://www.atgc-montpellier.fr/fastme/

• Submit distance matrix in Phylip format.

• Initial tree: OLS_GME, balanced_GME, NJ or BIONJ.

• Finds optimal tree using moves: OLS_NNI or balanced_NNI.

• Enter email and wait for results!

• Self-contained executable available.

Page 15: BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.

COMPUTATIONAL EXAMPLE

Download sequence at: http://dl.dropbox.com/u/623333/BME%20Example/GeneSeq8taxa.nex

Calculate distance matrix (use HKY): http://bioweb2.pasteur.fr/phylogeny/intro-en.html

Compute BME tree:

http://www.atgc-montpellier.fr/fastme/

Page 16: BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.

REFERENCES "Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle.” Desper R., Gascuel O., Journal of Computational Biology. 2002 9(5):687-705.

"Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting.” Desper R., Gascuel O., Molecular Biology and Evolution. 2004 21(3):587-598.

"Getting a Tree Fast: Neighbor Joining, FastME, and Distance-Based Methods." Desper R., Gascuel O., Current Protocols in Bioinformatics. 2006 6.3.1-6.3.28. Edited by John Wiley & Sons

"Neighbor-Joining Revealed." Gascuel O., Steel M., Molecular Biology and Evolution. 2006 23(11):1997-2000.