A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees Jos é Augusto Amgarten Quitzau...

A Fully Resolved Consensus Between Fully Resolved

Phylogenetic Trees

José Augusto Amgarten QuitzauJoão Meidanis

Scylla Bioinformatics, BrazilUniversity of Campinas, Brazil

Phylogeny reconstruction methods

Phylogeny reconstruction methods aim at inferring the phylogenetic tree that best describes the evolutionary history for a set of taxa.

Which tree to choose?

“The field of systematics has been in considerable turmoil as various investigators developed different methods of classification and argued their merits. I guarantee you that no one method or view has all the good points.”

Walter M. Fitch – 1984

Consensus as tree constructor

Consensus trees have been used traditionally in tree comparison and calculation of bootstrap values

We propose the use of consensus as a tree constructor

It can be efficiently implemented as long as we keep trees fully resolved

Every edge in a phylogenetic tree divides the leaves in two subgroupssubgroups.

Each of these pairs of subgroups are splitssplits of the tree.

Splits

Tree weight

Our method relies on weighingweighing trees and taking the one with maximum weight

Let the frequencyfrequency of a split in a collection of trees be the number of trees which contain the split divided by the total number of trees in the collection

Let the weightweight of an unrooted phylogenetic tree be the product of its splits frequencies

Most probable tree

A most probable treemost probable tree for a collection of fully resolved phylogenetic trees is a tree that maximizes the weight:

Example

Solution

w = 0.0703125

Running time

The tree weight formula can be written as a product of the frequencies of the small subgroups

We designed an algorithm that finds all most probable trees for a given set of fully resolved phylogenetic trees

The complexity of the algorithm is O(l3t2log(lt)),where l is the number of leaves and t is the number of trees

Experiments

Data setsData sets used to test the new method:

Synthetic data: from Gascuel’s LIRMM site

K2P – Kimura 2 Parameter, no MC

K2Pm – Kimura 2 Parameter, with MC

COV – Covarion model, no MC

COVm – Covarion model, with MC

Real data: Ribosomal RNA

Experiments

ProgramsPrograms used to test the new method (19):Software Method Model

fastMe Minimum evolution JC, K2P

Mega Minimum evolution JC, K2P, TN

Mega Maximum parsimony

Mega Neighbor joining JC, K2P, TN

dnacomp DNA compatibility

dnaml Maximum likelihood

dnapars Maximum parsimony

neighbor Neighbor joining JC, K2P

neighbor UPGMA JC, K2P

weighbor Weighted neighbor joining JC, K2P

Most probable = Median

Reflects general tendency

Results: average split distance

Data set Minimum Distance

K2P 43.44

K2Pm 77.78

COV 52.67

COVm 69.11

Ribosomal 60.71

Consensus consistently yields minimum average split distance

May result in better tree

Results: distance to “real” tree

Data set Consensus Not Worse Than ...

K2P 72 %

K2Pm 39 %

COV 78 %

COVm 72 %

Ribosomal 100 %

Consensus consistently not worse off than majority of input trees

… of input trees

Theoretical foundations

All splits of a tree

H AA | BCDEFGH| BCDEFGHBB | ACDEFGH| ACDEFGH

ABAB | CDEFGH| CDEFGH

CC | ABDEFGH| ABDEFGHDD | ABCEFGH| ABCEFGH

HH | ABCDEFG| ABCDEFG

GG | ABCDEFH| ABCDEFH

FF | ABCDEGH| ABCDEGHEE | ABCDFGH| ABCDFGH

CDCD | ABEFGH| ABEFGH

EFEF | ABCDGH| ABCDGH

EFGEFG | ABCDH| ABCDH

ABCDABCD | EFGH| EFGH

Small subgroup of each split

H AA | BCDEFGH

BB | ACDEFGH

ABAB | CDEFGH

CC | ABDEFGH

DD | ABCEFGH

HH | ABCDEFG

GG | ABCDEFH

FF | ABCDEGH

EE | ABCDFGH

CDCD | ABEFGH

EFEF | ABCDGH

EFGEFG | ABCDH

ABCDABCD | EFGH

Small subgroups

EFGEFG

ABCDABCD

Maximal clusters (n-trees)

EFGEFG

ABCDABCD

Fundamental theoretical result

AA BBABAB

CC DDHH

GGFFEE

EFEFEFGEFG

ABCDABCD

● The small subgroup set of a phylogenetic tree is always a finite set of n-treesn-trees

● There are exactly three n-trees in this set, and all n-trees are maximal if and only if the phylogenetic tree is fully resolved

Implementation details

DD EE FF GG EFEF GHGH ABCABC

Dynamic programming

DD EE FF GG EFEF GHGH

FGHFGHDEFDEFABCABCDD EE DEDE

ABCABC

To Do List

Rooted trees

Polytomies

Non uniform weights for input trees

Acknowledgments

Scylla Bioinformatics and Institute of Computing, Unicamp, for machine time, infrastructure, and support

Brazilian Research Financing Agency CNPq, grant 470420/2004-9

A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees Jos é Augusto Amgarten Quitzau...

Documents

Transcript of A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees Jos é Augusto Amgarten Quitzau...

Polarization-resolved far-ﬁeld measurement of single-cell ...npl.khu.ac.kr/paper/20110526 Polarization-resolved far-field_APL.pdfPolarization-resolved far-ﬁeld measurement of single-cell

Time-resolved fluorescence measurements on leaves ... · Time-resolved fluorescence measurements on leaves: principles ... and can be monitored in real time via time-resolved fluorescence

1/8/2007 - L17 Resolved SiganlsCopyright 2006 - Joanne DeGroat, ECE, OSU1 Resolved Signals What are resolved signals and how do they work. Resolution???

PeopleTools853 Resolved Incidents

F2 – Block Headings - Wikispacesddi09.wikispaces.com/file/view/DDI09.T.Generic.doc · Web viewTopicality . Topicality 1 ***DEFINITIONS*** 5. RESOLVED 5. Resolved=fixed 5. Resolved=make

. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

Spin-Resolved Angle-Resolved Photoemission Spectroscopy: A ...

Time-resolved fluorescence imaging of solvent … Resolved fluorescence...Time-resolved fluorescence imaging of solvent interactions in microfluidic devices Richard K. P. Benninger,

Angle Resolved Photoemission Spectroscopy

Current Challenges in Bioinformatics SPIRE 2003 Manaus, Brazil João Meidanis.

Microsecond time-resolved X-ray diffraction for the ... · cycle fatigue; time-resolved stress measurement; ... X-ray diffraction. Microsecond time-resolved X-ray diffraction for

L19 – Resolved Signals. Resolved Signals What are resolved signals In systems In VHDL Resolution – Isn’t that for resolving conflicts? Ref: text Unit.

. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield 17.1-17.3, Setubal&Meidanis 6.1.

Environmental impacts of embedded bathroom practices Maj-Britt Quitzau National Environmental Research Institute Department of Policy Analysis Archived.

GENESIS64 10.6 Resolved Issues

Ultrafast angle-resolved photoemission … angle-resolved photoemission spectroscopy of quantum materials Time-resolved experimental apparatuses have been un-der development since

LCLS II Instrumentation Time-Resolved Photoelectron ... · LCLS II Instrumentation Time-Resolved Photoelectron Spectroscopy Albert Stolow National Research Council Canada. ... Time-resolved

Introduction to Computational Molecular Biology - Carlos Setubal, Joao Meidanis

Fundamentals of Molecular Biology - COnnecting …References 1. J. Setubal and J. Meidanis, Introduction to Computational Molecular Biology, PWS publishing company,Boston,1999. 2.

Integration of non-time-resolved PIV and time-resolved velocity …cwrowley.princeton.edu/papers/Tu-EIF-2012.pdf · 2012-12-26 · Integration of non-time-resolved PIV and time-resolved

F2 – Block Headings - Wikispacesddi09.wikispaces.com/file/view/DDI09.T.Generic.doc · Web viewTopicality . Topicality 1 DEFINITIONS 5. RESOLVED 5. Resolved=fixed 5. Resolved=make