Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1,...

38
Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1 , Judith Keijsper 1 , Steven Kelk 2 , Leen Stougie 12 (1) Technische Universiteit Eindhoven (TU/e) (2) Centrum voor Wiskunde en Informatica (CWI), Amsterdam Email: [email protected] Web: http://homepages.cwi.nl/~kelk

description

Triplet-based methods (2) Given a set of rooted triplets zw|x, yx|w, xy|z, wz|y. (Note zw|x = wz|x.) Find the tree that by contracting and deleting edges can give each of the triplet subgraphs as a minor zwx xyzyxwwzy algorithm wzxy solution

Transcript of Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1,...

Page 1: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Constructing a level-2 phylogenetic network from a dense set of input triplets

Leo van Iersel1, Judith Keijsper1, Steven Kelk2, Leen Stougie12

(1) Technische Universiteit Eindhoven (TU/e)(2) Centrum voor Wiskunde en Informatica (CWI), Amsterdam

Email: [email protected] Web: http://homepages.cwi.nl/~kelk

Page 2: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Triplet-based methods (1)

Given a set of rooted triplets zw|x, yx|w, xy|z, wz|y. (Note zw|x = wz|x.)Find the tree that by contracting and deleting edges can give each of the triplet subgraphs as a minor

z w x

x y z

y x w

w z y

algorithm

w z x y

solution

Page 3: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Triplet-based methods (2)

Given a set of rooted triplets zw|x, yx|w, xy|z, wz|y. (Note zw|x = wz|x.)Find the tree that by contracting and deleting edges can give each of the triplet subgraphs as a minor

z w x

x y z

y x w

w z y

algorithm

w z x y

solution

Page 4: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Triplet-based methods (2)

Given a set of rooted triplets zw|x, yx|w, xy|z, wz|y. (Note zw|x = wz|x.)Find the tree that by contracting and deleting edges can give each of the triplet subgraphs as a minor

z w x

x y z

y x w

w z y

algorithm

w z x y

solution

Page 5: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

From trees to networks…

• The algorithm of Aho et al. (1981) can be used to construct trees from rooted triplets.• But…what if the algorithm fails? Why might the algorithm fail?• Possible reason 1: The underlying evolution is tree-like, but the input triplets contain errors.• Possible reason 2: The triplets are correct, but the underlying evolution is not tree-like. Biological phenomena such as hybridization, horizontal gene transfer, recombination and gene duplication can lead to evolutionary scenarios that are not tree-like!• Response: try and construct not phylogenetic trees, but phylogenetic networks

Page 6: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

From trees to networks (2)

x y z

x z y

• For example, suppose the input is {xy|z, xz|y}.

z

x

y

(Note that there are cases when, even if there is at most one triplet per 3 species, a tree is not possible)

Page 7: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

From trees to networks (2)

x y z

x z y

• For example, suppose the input is {xy|z, xz|y}.

z

x

y

(Note that there are cases when, even if there is at most one triplet per 3 species, a tree is not possible)

Page 8: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

From trees to networks (2)

x y z

x z y

• For example, suppose the input is {xy|z, xz|y}.

z

x

y

(Note that there are cases when, even if there is at most one triplet per 3 species, a tree is not possible)

Page 9: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Level-k phylogenetic networks

z

x

y

root(only one!)

leaf-vertex

split-vertex

recombination-vertex

A level-k phylogenetic network is a rooted,

directed acyclic graph where every biconnected

component (in the underlying undirected

graph) contains at most k recombination vertices.

Page 10: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

• A set of input triplets is dense iff, for every subset of 3 species, there is at least one triplet corresponding to those 3 species.• Therefore, a dense set of input triplets for n species contains O(n3) triplets.• Jansson & Sung (2006) showed:

Level-1 Networks

Given a dense set of triplets T for a set L of species, it is possible to determine in polynomial-time whether a level-1 phylogenetic

network N exists such that all the triplets in T are consistent with N. (And if so, to construct such a network.)

• They later showed, together with Nguyen, how to do this in time linear in |T|. They also showed that, in the non-dense case, the problem is NP-hard.• But what about level-2 networks, and higher?

Page 11: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Here is an example of a level-2 network.

Main result: Given a dense set of triplets T for a set L of species, it is possible to determine in time O(|T|3) whether a level-2

phylogenetic network N exists such that all the triplets in T are consistent with N. (And if so, to construct such a network.)

Page 12: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Algorithm, basic idea• The basic idea behind Aho’s algorithm for trees is that we are

able to determine, recursively, which species belong to which of the two subtrees hanging from some root vertex.

• For the level-1 and level-2 networks if there again exists such a clear dichotomy, we iterate on the two subsets.

root

Sub-network

Sub-network

Page 13: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Algorithm, basic idea

• The basic idea behind Aho’s algorithm for trees is that we are able to determine, recusively, which species belong to which of the two subtrees hanging from some root vertex.

• For the level-1 networks if there again exists such a clear dichotomy, we iterate on the two subsets. Otherwise there must exist a network of the form

Sub-networ

k

Sub-networ

k Sub-networ

kSub-

network

Sub-networ

k

Page 14: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Algorithm, basic idea

• The basic idea behind Aho’s algorithm for trees is that we are able to determine, recusively, which species belong to which of the two subtrees hanging from some root vertex.

• For the level-1 networks if there again exists such a clear dichotomy, we iterate on the two subsets. Otherwise there must exist a network of the form

Sub-networ

k

Sub-networ

k Sub-networ

kSub-

network

Sub-networ

k

Find the partition of the species (leaves)

into the subnetworks

Find the blue backbone network

Treat each of the partition elements (sub-networks) as

leaves to be hanged on the backbone

Recurse on the subnetworks

Page 15: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Algorithm, high-level idea

For level-2 networks the idea is similar:

Sub-networ

k

Sub-networ

kSub-

network

Sub-networ

k

Sub-networ

k

Find the partition of the species (leaves)

into the subnetworks

There is a complication in

level-2

Find the blue backbone network!

There are more level-2 backbone

forms

Treat each of the partition elements (sub-networks) as

(meta-)leaves to be hanged on the

backbone

Recurse on the subnetworks

Page 16: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

• Suppose I have a partition P = {P1, P2, …, Pt} of the leaf set L.• Suppose I have a dense set of triplets T on the leaf set L.• Let T’ be a new triplet set on leaf set {q1, q2,…, qt} defined as follows: • qiqj|qk is in T’ if and only if i≠j≠k and there exists a triplet xy|z in T such that x is in Pi, y is in Pj and z is in Pk

• Then we say that T’ is the triplet set induced by the partition P of L.• Critically: if T is dense, then T’ is also dense.• In some sense this can be perceived as a ‘coarsening’ of the input set.

Definition: inducing new triplet sets from partitions of the leaf set

Page 17: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Definition: simple level-2 networks

Lemma: There are exactly 4 different backbone networks

A simple level-2 network is any network obtained by“hanging leaves” off one of the above structures.

Page 18: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Here the leaves{a,b,c,d,e,f,g,h} have

been ‘hung’ from structure 8a, to yield a simple level-2

network.

A picture description of the simple level-2 algorithm

Page 19: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Level-2 network algorithm

Assume some oracle gives us the partition of the leaves into sub-networks

Treat each subnetwork as a leaf and construct a simple level-2 network

The simple level-2 network algorithm Guess the right “recombination leaf” Remove it and remove the triplets that contain this leaf 1 recombination vertex left with below it a caterpillar

Page 20: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Suppose we can correctly ‘guess’ that leaf g hangs

directly below a recombination node

If we remove g, and all triplets that contain g, then we know that a

level-1 network must be possible on this new set of triplets (because now

fewer recombination nodes are needed)

Page 21: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)
Page 22: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Level-2 network algorithm

Assume some oracle gives us the partition of the leaves into sub-networks

Treat each subnetwork as a leaf and construct a simple level-2 network

The simple level-2 network algorithm Guess the right “recombination leaf” Remove it and remove the triplets that contain this leaf 1 recombination vertex left with below it a caterpillar Guess the right “caterpillar set”

Page 23: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Caterpillar set

A caterpillar set with respect to a dense triplet set T is the set of leaves of a caterpillar subgraph of a network consistent with T

The empty set is also a caterpillar set

Caterpillar

Page 24: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Suppose we subsequently guess that the caterpillar with h now

hangs below a recombination node in

the new network.

If we remove the h-caterpillar, and all triplets that contain leaves of it,

then we know that a level-0 network must be possible on this new set of triplets (because now

even fewer recombination nodes are

needed.)

Page 25: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Level-2 network algorithm

Assume some oracle gives us the partition of the leaves into sub-networks

Treat each subnetwork as a leaf and construct a simple level-2 network

The simple level-2 network algorithm Guess the right “recombination leaf” Remove it and remove the triplets that contain this leaf 1 recombination vertex left with below it a caterpillar Guess the right “caterpillar set” Remove it and remove the triplets that contain any

element of this set Construct the unique tree for the remaining triplets

[Jansson&Sung 2006]

Page 26: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

In such a case the resulting tree is UNIQUE

(J&S).

Page 27: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

So now we have a tree. We are going to guess

how to add the h-caterpillar back in, and then guess how to add

leaf g back in.

Page 28: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Adding the h-caterpillar back in.

Page 29: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

And finally adding leaf g back in.

g

Page 30: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Level-2 network algorithmAssume some oracle gives us the partition of the leaves into

sub-networksTreat each subnetwork as a leaf and construct a simple level-2

network

The simple level-2 network algorithm Guess the right “recombination leaf” Remove it and remove the triplets that contain this leaf 1 recombination vertex left with below it a caterpillar Guess the right “caterpillar set” Remove it and remove the triplets that contain any element

of this set Construct the unique tree for the remaining triplets

[Jansson&Sung 2006] Insert the caterpillar set and the recombination leaf in the

tree in the correct way

For each pair of guesses try all 4 backbone structures

Page 31: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Simple level-2 algorithm

Theorem: The simple level-2 network algorithm works in O(|T|^3)

Page 32: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

SN-sets to partition the set of leaves

• Jansson & Sung introduced the SN-set to partition the set of leaves• SN-sets are special subsets of the leaves L, and are defined w.r.t. T • All sets containing just a single leaf, are SN-sets.• Any other SN-set is any subset of leaves obtained by taking the

closure of some subset S of the leaves L w.r.t. the following operation

If x,y є S and xz|y є T or yz|x є T then z є S

The SN-set that is equal to the total leaf set L, is called the trivial SN-set.

An SN-set that is non-trivial, and is not a strict subset of any other non-trivial

SN-set, is called a maximal SN-set.

(If the network is a tree there are 2 maximal SN-sets: one the set of leaves of

the subtree right and the other the set of leaves of the subtree left of the root)

Page 33: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

• Jansson and Sung proved that the set of maximal SN-sets indeed partition the leaf set L. So no two maximal SN-sets overlap, and they completely cover the set of input leaves.

• All SN-sets and all maximal SN-sets can be found in polynomial-time.

• Jansson & Sung solved the level-1 problem by observing that each maximal SN-sets hangs as a ‘meta-leaf’ on the level-1 backbone network;

each maximal SN-set can completely be separated from the rest of the network by removing just one edge

• There are maximal SN-sets in level-2 networks that can hang under more than one edge!!!!

Definition: maximal SN-set

Page 34: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Definition highest cut-edge

In a phylogenetic network N, a cut-edge (x,y) is an edge whose removal disconnects the undirected graph.

A cut-edge (x,y) is said to be a trivial cut edge iff y is a leaf.

A cut-edge (x,y) is said to be highest iff there is no cut-edge (p,q) such that there is a directed path from q to x in N.

Page 35: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

• Fact. Let (x,y) be a highest cut-edge and let L’ be the set of leaves reachable from y. Let L* be a strict subset of L’. Then L* is not a maximal SN-set.• Proof: the set of leaves reachable from a highest cut-edge (x,y), is itself an SN-set. Clearly for any two leaves p,q in L’ and leaf r outside L’ there cannot be triplets pr|q and qr|p: the edge (x,y) forms a bottleneck. Thus pq|r must exist.

y

x

p q r

p r qL’

So: each maximal SN-set

can be expressed as

the union of the leaves

reachable by one or more highest cut-

edges.

Page 36: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Central Theorem (simplified). Suppose there is a dense triplet set T consistent with some simple level-2 network N. Then there

exists a level-2 network N’ (not necessarily simple) such that, with the exception of perhaps one maximal SN-set with respect to T,

every maximal SN-set appears below a single cut-edge in N’. The remaining, ‘odd-one-out’ maximal SN-set (if it exists) will be equal

to the union of leaves below two cut-edges. In other words: there exists at most one maximal SN-set which is the union of the leaves below two highest cut-edges, whereas all other

SN-sets consist of the leaves below one highest cut-edge

Page 37: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

The algorithm Determine the maximal SN-sets Guess the right SN-set to be split Treat the max SN-sets and the two split

sets as leaves {S1,S2,…,Sq} Adapt T to a new triplet set T’: SiSk|Sh є T’ if and only if there exist xєSi, yєSk,zєSh s.t. xy|z є T Construct a simple level-2 network for T’ Recursively find the sub-networks for

the sets S1,S2,…,Sq

Page 38: Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1, Judith Keijsper 1, Steven Kelk 2, Leen Stougie 12 (1)

Conclusions & open problems

• So we know how to efficiently construct level-2 networks from dense triplet sets. What’s next?• Applicability: how useful is it?• Initial implementation: programming and fine-tuning• Improving running time: in the spirit of the “SN-tree” of J&S&N• Complexity: what about level-3 and higher?• Bounds: worst-case, best-case scenarios• Building all networks• Properties of output networks as function of input• Different triplet restrictions• Confidence: how good are the solutions?• Exponential-time exact algorithms for NP-hard problems