A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks

Post on 13-Jan-2016

28 views 0 download

description

A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks. Luay Nakhleh Department of Computer Sciences UT Austin. Who’s Involved. UT CS : Tandy Warnow, Luay Nakhleh UT BIO : Randy Linder UNM CS : Bernard Moret. Why Networks?. Lateral gene transfer (LGT) - PowerPoint PPT Presentation

Transcript of A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks

A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks

Luay NakhlehDepartment of Computer Sciences

UT Austin

Who’s Involved

– UT CS: Tandy Warnow, Luay Nakhleh– UT BIO: Randy Linder – UNM CS: Bernard Moret

Why Networks?

• Lateral gene transfer (LGT)– Ochman estimated that 755 of 4,288 ORF’s in

E.coli were from at least 234 LGT events

• Hybridization– Estimates that as many as 30% of all plant

lineages are the products of hybridization– Fish– Some frogs

Phylogenetic Networks

• Rooted, directed, acyclic graphs that actually model the evolutionary process

• “tree” nodes and “network” nodes

• Time constraints

Separate Analysis

• Analyze individual genes separately

• Reconcile the resulting phylogenies

• As opposed to combined analysis in which the datasets are combined (via concatenation) and the combined dataset is then analyzed

SPR Distances Among Gene Trees

A B C D E

A B C D E A B C D E

SPR Distance 1

Maddison’s Method

Given two gene datasets

• Construct two gene trees T1 and T2

• If SPR(T1,T2)=0– Return a tree

• If SPR(T1,T2)=1– Return a network with one reticulation event

Open problem: extend to reconstructing a network with m reticulation events

Challenges

(1) Computational

– Computing SPR distances is of unknown computational complexity (probably hard)

Solving the Computational Challenge

• Galled-networks: reticulation events are independent

• For two gene trees T1 and T2 on n leaves we can– Decide whether SPR(T1,T2)=m in O(mn)

time, and – Construct network N from T1 and T2 in O(mn)

time

Challenges

(2) Systematic

– Obtaining the correct gene trees in practice is very hard (due to missing data, inaccuracy of tree reconstruction methods, wrong assumptions, etc.)

Solving the Systematic Challenge: Our Method SpNet

Given the sequences of two genes I & II on a set of species

• Run MP or ML on gene I and obtain a set U1 of trees, represented by its consensus tree t1

• Run MP or ML on gene II and obtain a set U2 of trees, represented by its consensus tree t2

• Find binary trees T1 and T2, that refine t1 and t2, respectively, and such that SPR(T1,T2)=1

• Build network N from T1 and T2

SpNet: Running Time

• We have a linear-time algorithm for the single hybrid case (implementation and experimental results are available as well)

• We are working on the general case of arbitrary number of reticulation events

Experimental Study

• Generated random networks on 10 and 20 taxa, with 0, 1, and 2 hybrids

• Evolved sequences under the GTR+Gamma model of evolution with invariant sites

• We studies the topological accuracy based on the splits defined by the model and inferred network

Evaluation Criteria

• Detection Quality– How often did the method infer the correct

number of hybrids in the model phylogeny?

• Reconstruction Quality– What is the topological accuracy of the

inferred phylogeny?

Methods

• SpNet(i): Our method where we contract i edges

• NNet: The method of Bryant and Moulton

• NJ

Detection Quality of SpNetModel Phylogeny: 20-taxon Tree

Detection Quality of SpNetModel Phylogeny: 20-taxon 1-hybrid network

Detection Quality of SpNetModel Phylogeny: 20-taxon 2-hybrid network

Reconstruction QualityModel Phylogeny: 20-taxon tree

Reconstruction QualityModel Phylogeny: 20-taxon 1-hybrid network

Reconstruction QualityModel Phylogeny: 20-taxon 1-hybrid network

Conclusions

• Considering a set of “good” trees rather than a single optimal tree is advantageous in network reconstruction

• Separate analysis approaches outperform combined analysis approaches

Ongoing research

• Using other techniques for obtaining unresolved trees (e.g., Bayesian analyses, bootstrapping, etc.)

• Detection vs. reconstruction – visualization and clustering techniques may also be useful (collaboration with St John)

• Refining unresolved networks

• DCM-like network reconstruction