Constructing evolutionary trees from rooted triples Bang Ye Wu Dept. of Computer Science and...

Post on 02-Jan-2016

216 views 0 download

Tags:

Transcript of Constructing evolutionary trees from rooted triples Bang Ye Wu Dept. of Computer Science and...

Constructing evolutionary trees from rooted triples

Bang Ye Wu

Dept. of Computer Science and Information Engineering

Shu-Te University

An evolutionary tree A rooted tree Each leaf represents one species. Internal nodes are unlabelled. (inferred

common ancestors)

a b c d e f

A (rooted) triple (triplet) An evolutionary tree of 3 species. A constraint in an evolutionary tree construction

problem. (c(ab)): lca(b,c)=lca(c,a)lca(a,b)

lca : lowest common ancestor : “is an ancestor of “

a,b should be closer than a,c or b,c.

a b c

A tree compatible with triples

Given a set of triples, construct a tree satisfying all the triples.

If such a tree exists, the problem is polynomial time solvable. [Aho et al, 1981]

a d b cab cca dba d

Incompatible (conflicting) triples

ab c ba c

Two conflicting triples

ab c bd c db a

Three conflicting triples (pairwise compatible)

Two optimization problems

The maximum consensus tree: – the tree satisfying maximum number of triples.– NP-hard [Jansson, 2001][Wu, to appear]– A new heuristic algorithm [this paper]

The maximum compatible set:– The compatible species subset of maximum

cardinality. – NP-hard [this paper]

Previous heuristicBest-One-Split-First

If a species x is split from a set V, all triples (x(v1v2)), v1 and v2 in V, will be satisfied.

Repeatedly split one species from the set. Choose the split species greedily.

triples: (a(bc)),(c(ad)),(b(ad)),(c(bd))

{a,b,d}

cb

{a,d}c

da b c

c is chosen, and the two triples is satisfied.

c is split

b is split

Previous heuristicMin-Cut-Split-First

Construct an auxiliary graph:– Vertex: species– Each edge is labeled by a set: for each

triple (x(yz)), x is in the label set of edge (y,z).

c

b

ca

d

a

b,c triples: (a(bc)),(c(ad)),(b(ad)),(c(bd))

– A bipartition corresponds to a split in the tree.– The label in the cut of the bipartition corresponds to

the triples conflicting the split. Repeatedly find the bipartition with minimum

cut.

{a,d} {b,c}

a d b cc

b

ca

d

a

b,c

a min-cut, triple (c(bd)) is conflicting

Previous heuristicBest-Pair-Merge-First

Instead of top-down splitting, BPMF uses the bottom-up merging strategy.

Starting from sets of singleton, we repeatedly merge the sets step by step.

Scoring functions are used to evaluate which pair should be merged in each step.

triples: (a(bc)),(c(ad)),(b(ad)),(c(bd))

{a} {b} {c} {d}

{a,d} {b} {c}

{a,d} {b,c}

{a,d,b,c}

a d

a d b c

a d b c

An exact algorithm for MCTT

Dynamic programming F(V)=max{F(V1)+F(V2)+W(V1,V2)},

taken among all bipartition (V1,V2) of V.– F(V): # of satisfied triples over V.– W(V1,V2): # of (x(v1v2) for x not in V and

v1, v2 in V1, V2 respectively. Computed with cardinality from small

to large.

n=4 abcd3

n=3 abc1

abd3

bcd2

n=2 ab0

ac0

ad2

bc1

bd1

cd0

n=1 a0

b0

c0

d0

ab c ca d ba d cb d

a d b c

Our new heuristic algorithm (DPWP)

Derived from the exact algorithm. The number of subsets of each

cardinality is limited by a parameter K. When K=infinity, it is just the exact

algorithm. Time-quality trade-off. The time complexity is O(n2k2(n3+k)).

– Sorry, there is a mistake in the paper.

The experiment results (time)

1

10

100

1000

10000

12 15 18 20 24 27 30

n

tim

e (s

ec) Exact

DPWP(300)

DPWP(600)

DPWP(900)

Average ratio in the test

0.80.850.9

0.951

1.051.1

1.151.2

12 15 18 20

ratio

BPMF

BOSF

MCSF

DWDP(300)

DWDP(600)

DWDP(900)

Worst ratio in the test

0.80.9

11.11.21.31.41.51.6

12 15 18 20

ratio

BPMF

BOSF

MCSF

DWDP(300)

DWDP(600)

DWDP(900)

Improvement100*(DPWP - BestofOther)/BestofOther

0

5

10

15

20

18 20 24 27 30

n

(%) Max

average

The MCST problem Given triples over species set S, find a

subset U of S such that all given triples over U is compatible and |U| is maximum.

We show the problem is NP-hard.– Transformed from the Feedback Vertex

Set problem.

The feedback vertex set problem

Feedback vertex set: a vertex subset containing at one vertex of each cycle of the given directed graph.– In other words, removing a feedback

vertex set results in an acyclic digraph.

The reduction

T 1

T 2

T p

....

x

rp

r1

r2

....

V p

V 3

V 2

V 1

x 1 ,x 2 ,...

Concluding remarks What is the approximation ratio?

– The Best-One-Split-First algorithm is a 3-approximation algorithm,

– The larger K give us better solution, but we do not know the theoretic bound of the ratio.