Evolutionary Trees: An Integer Multicommodity Max-Flow - Min …elp/Multi/ErdosSzekely-AAM92.pdf ·...

15
ADVANCES IN APPLIED MATHEMATICS 13, 375-389 (1992) Evolutionary Trees: An Integer Multicommodity Max-Flow - Min-Cut Theorem PETER L. ERD& Depar!ment of Applied Mathematics, University of Twente, Enschede, The Netherlands and Hungarian Academy of Sciences, H-1344 Budapest, P.0.B 127, Hungary AND LAsz~6 A. SZ~KELY Department of Computer Science, Eiitv6s Lorrind University, H-1088 Budapest, Hungary In biomathematics, the extensions of a leaf-colouration of a binary tree to the whole vertex set with minimum number of colour-changing edges are extensively studied. Our paper generalizes the problem for trees; algorithms and a Menger-type theorem are presented. The LP dual of the problem is a multicommodity flow problem, for which a max-flow-min-cut theorem holds. The problem that we solve is an instance of the NP-hard muhiway cut problem. 0 1992 Academic press, 1~. 1. INTRODUCTION Let T be a tree. A leaf is a vertex of degree 1. Let L denote the set of leaves, 1 L I = 1. We term the nonleaf vertices branching vertices. The set of branching vertices is denoted by B = V(T) - L. Set IBI = b, n = b + 1. Assume C be a set of r colours. A map x: L -+ 2’ is a leaf-colouration of T. A leaf-colouration is simple, if it has only singleton colours in its range. A map X: V(T) + C is termed colourution of T if for Vl E L: X(l) E x(l). The changing number of the colouration 2 is the number of edges whose end-vertices have different colours according to 2. The colouration is an optimal colourution according to the leaf-colouration x, if it has the minimum changing number among all colourations. We term the minimum changing number the length of the tree T, l(T) (according to x). 375 0196-8858/92 $9.00 Copyright Q 1992 by Academic Press, Inc. All rights of reproduction in any form reserved.

Transcript of Evolutionary Trees: An Integer Multicommodity Max-Flow - Min …elp/Multi/ErdosSzekely-AAM92.pdf ·...

Page 1: Evolutionary Trees: An Integer Multicommodity Max-Flow - Min …elp/Multi/ErdosSzekely-AAM92.pdf · 2006. 7. 11. · EVOLUTIONARY TREES 377 a multicommodity flow problem, for which

ADVANCES IN APPLIED MATHEMATICS 13, 375-389 (1992)

Evolutionary Trees: An Integer Multicommodity Max-Flow - Min-Cut Theorem

PETER L. ERD&

Depar!ment of Applied Mathematics, University of Twente, Enschede, The Netherlands and Hungarian Academy of Sciences,

H-1344 Budapest, P.0.B 127, Hungary

AND

LAsz~6 A. SZ~KELY

Department of Computer Science, Eiitv6s Lorrind University, H-1088 Budapest, Hungary

In biomathematics, the extensions of a leaf-colouration of a binary tree to the whole vertex set with minimum number of colour-changing edges are extensively studied. Our paper generalizes the problem for trees; algorithms and a Menger-type theorem are presented. The LP dual of the problem is a multicommodity flow problem, for which a max-flow-min-cut theorem holds. The problem that we solve is an instance of the NP-hard muhiway cut problem. 0 1992 Academic press, 1~.

1. INTRODUCTION

Let T be a tree. A leaf is a vertex of degree 1. Let L denote the set of leaves, 1 L I = 1. We term the nonleaf vertices branching vertices. The set of branching vertices is denoted by B = V(T) - L. Set IBI = b, n = b + 1. Assume C be a set of r colours. A map x: L -+ 2’ is a leaf-colouration of T. A leaf-colouration is simple, if it has only singleton colours in its range. A map X: V(T) + C is termed colourution of T if for Vl E L: X(l) E x(l). The changing number of the colouration 2 is the number of edges whose end-vertices have different colours according to 2. The colouration is an optimal colourution according to the leaf-colouration x, if it has the minimum changing number among all colourations. We term the minimum changing number the length of the tree T, l(T) (according to x).

375 0196-8858/92 $9.00

Copyright Q 1992 by Academic Press, Inc. All rights of reproduction in any form reserved.

Page 2: Evolutionary Trees: An Integer Multicommodity Max-Flow - Min …elp/Multi/ErdosSzekely-AAM92.pdf · 2006. 7. 11. · EVOLUTIONARY TREES 377 a multicommodity flow problem, for which

376 ERDijS AND SZkELY

The notions of changing number and length came from biomathematics. In genetics, an evolutionary tree is a leaf-coloured binary tree, where leaves represent species and colours represent some genetical properties. It is assumed by the parsimony principle, that the most likely evolutionary trees built on the given, coloured leaf set, are those with minimum possible length. (See, e.g., Felsenstein [Fe], Carter et al. [CHPSzW], Steel [St].) Fitch [Fi] and Hartigan [Ha] gave an algorithm to determine the length of the evolutionary tree according to the given simple leaf-coloura- tion. Section 2 provides an O(nr) algorithm to find an optimal colouration for a given x leaf-colouration and describes the structure of all of them. We included Section 2 in this paper, for the following three reasons: (1) our main result requires some corollaries of the algorithm which are not explicit in [Fi, Ha], (2) Fitch did not prove the correctness of his algorithm, and (3) the journals where they published may not be easily accessible for those who work in combinatorial optimization. We make use of the (virtual) generality of leaf-colouration in comparison with simple leaf-col- ouration in a more convenient inductive hypothesis in Section 2.

In the whole paper but Section 2 we restrict our interest to simple leaf-colourations. In this special case (Ix(l)] = 1 for all 1 E L), which seems to be of highest importance for applications, the length of T is equal to k iff the deletion of k well-chosen edges decomposes T into subtrees with one colour being present in each, but the deletion of less than k edges cannot do it. For r = 2, this property is known as the k-edge-connectivity of the tree between two colour-classes of leaves, and Menger’s theorem [Me, LPI gives that the length of T is the maximum number of edge-disjoint paths connecting the colour-classes. The main objective of the present paper is the generalization of this special case of Menger’s theorem.

We say that an oriented path is dour-changing, if its endpoints are vertices differently coloured by x (leaves). Two colour-changing paths are in confzict, if either they use an edge in opposite directions or they use an edge in the same direction and the endpoints to which they are directed have the same colour.

THEOREM 1. The length of T is equal to the maximum number of colour-changing paths, with no two in conflict.

Section 3 provides the proof of Theorem 1. The proof is algorithmic and yields a polynomial algorithm to construct l(T) colour-changing paths with no two in conflict. We also show that in a straightforward generalization of the problem for graphs, the cardinality of any colour-changing path system without conflict provides a lower bound for the changing number (see Section 3 for the definitions). Section 4 describes our problem in terms of linear programming. The dual of the problem of the optimal colouration is

Page 3: Evolutionary Trees: An Integer Multicommodity Max-Flow - Min …elp/Multi/ErdosSzekely-AAM92.pdf · 2006. 7. 11. · EVOLUTIONARY TREES 377 a multicommodity flow problem, for which

EVOLUTIONARY TREES 377

a multicommodity flow problem, for which Theorem 1 turns into a max- flow-min-cut theorem. We would like to know about a biological interpre- tation of the max-flow-min-cut theorem, if there is any.

Section 5 sets our problems in terms of multiway cuts. The Fitch-Hartigan algorithm solves the NP-hard multiway cut problem for a large class of graphs in polynomial time and Theorem 1 provides a min-max theorem for the minimum multiway cut in these instances.

What the biologists really would like to have, is the minimum length evolutionary tree over a set of species, in a much more complicated situation, where the colour of a species is a word made of colours and extensions of this partial colouration to internal vertices are considered. Every bit on every edge may contribute by one to the changing number and the length of the tree is the minimum possible changing number over all extensions. To decide, if a tree with given length (in this more general setting) exists, is NP-complete (see Graham and Foulds [GF, FG]). Hence, we would like to see statistical information on the length as a tool to decide if an evolutionary tree is “acceptable.” Statistical information on evolutionary trees with one-letter colour-words (i.e., evolutionary trees in our sense) in leaves does help, see Steel [St]. The statistical analysis of the most likely evolutionary trees requires the enumeration of evolutionary trees of length k built on a given, coloured leaf set. For r = 2 and k = r - 1, the enumeration was done by Carter et al. [CHPSzW], using multivariate Lagrange inversion and computer algebra. Steel [St] has found a combinatorial decomposition based on Menger’s theorem to solve the enumeration problem for r = 2 and he also solved the enumeration problem for k = r. Erdiis and SzCkely [ESz] gave a simple presentation of Steel’s decomposition. We understood that for the solution of the enumer- ation problem for arbitrary r and k it is necessary (but not sufficient) to come up with the proper generalization of Menger’s theorem. It was the starting point of the present paper. We are indebted to Professors A. Frank, E. Gyori, and L. Lovisz for fruitful discussions on the topics of the present paper.

2. AN ALGORITHM TO FIND OPTIMAL COLOURATIONS

Let T be a tree, x: L -+ 2’ be a leaf-colouration.

DEFINITION. The triplet xe = (xi, x2, p) is an extension of the left-col- ouration x if x1 and x2 are maps from V(T) to 2’, the map p: B --f (1,. . . , b} is a bijection, and, furthermore, for 1 E L we have xi(l) = x(l) and x,(l) = 0, and finally ,yr(v) n x2(u) = 0 for all u E V(T).

DEFINITION. The map X: V(T) -+ C is an implementation of the exten- sion (xi, x2; p) if it can be derived by the following algorithm:

Page 4: Evolutionary Trees: An Integer Multicommodity Max-Flow - Min …elp/Multi/ErdosSzekely-AAM92.pdf · 2006. 7. 11. · EVOLUTIONARY TREES 377 a multicommodity flow problem, for which

378 ERDiiS AND SZlkELY

IMPLEMENTATION ALGORITHM. Step 1. Set X(U) + 0 for u E V(7). Set A + 0. Step 2. If u is a neighbour of a u E A, u tir A (and hence x(u) = 0),

then -if x(u) E ,y,(u> then set x(u) + x(u), and A + A U {u).

-if i(v) E x&u) then (it is up to you), set i(u) + x(u), and A t A U (u}, or do not do anything.

Step 3. Repeat Step 2 with the current set A. Every vertex of T will be examined at most once. If every neighbour of the current A has been examined, then go to Step 4.

Step 4. If B \ A # 0 then take u E B \A for which p is maximal. Step 5. Let x(v) be an arbitrary element of xi(v). Let A + A U Iv).

Go to Step 2. Step 6. If B E A then repeat Step 5 with u + 1 for every leaf 1 with

$2) = 0. If X nowhere equals to 0 then finish the algorithm.

DEFINITION. An extension xe is proper extension of leaf-colouration x if

(i) every implementation of xe is an optimal colouration, (ii) every optimal colouration is an implementation of xe.

THEOREM 2. Let T be an arbitrary tree with leaf-colouration ,y: L -+ 2’. Than there exists a proper extension xe of ,y.

Proof. We construct a proper extension by mathematical induction on b. For a star with midpoint m, set p(m) = 1, xi(m) = the set of colours occurring with maximum multiplicity in the sets x(r): 1 E L, and finally set x,(m) = 0. This is obviously a good choice for xe. Assume T be a tree with diameter at least 3, with a given leaf-colouration x.

Let u be a vertex of T whose every neighbour (except a unique vertex w) is leaf. We denote the set of these leaves by L,. (The existence of such a u is obvious.) Let M, denote the set of colours of C which occur with maximum multiplicity in the sets x(r): 1 E L,, let M, denote the set of colours of C which occur with maximum minus one multiplicity in the sets x(l): I E L,.

Let us define the tree T’ that we obtain by deleting the set L, from T. Define the leaf-colouration x’ on T’ by

x’t”) = x(u) if u E V(T’) n (L \L,), M

1 ifu=u.

Now the tree T’ has fewer branching points than the tree T had; therefore, due to the hypothesis, the leaf-colouration x’ has an A? = (x;, ,&, p’) proper extension. Define the extension xe = (xi, x2, p) of x as

Page 5: Evolutionary Trees: An Integer Multicommodity Max-Flow - Min …elp/Multi/ErdosSzekely-AAM92.pdf · 2006. 7. 11. · EVOLUTIONARY TREES 377 a multicommodity flow problem, for which

EVOLUTIONARY TREES 379

follows:

x*(u) = i ;;(‘u”,’ if 24 E T’ ifuEL,,

( x;w if 24 E T’\[u}

x2(u) = M2 ifu=v 0 ifuEL,,

P(U) = i

1 ifu=v P’(U) + 1 if z4 EB\{v}.

For completeness, define x2(u) = 0 if p(u) = IBI. We have to prove that xe is a proper extension.

(A) Let X be an implementation of xe. We show that X is an optimal colouration. We distinguish two cases.

(al) jj(w) E x,(v). In that case iIT’ is an implementation of xle and the changing number of X on the tree T’ equals to l(T’). Let T, denote the subtree spanned by {v, L,}. The changing number of X on the tree T, is equal to Z(T,) (with respect to the leaf-colouration x on L,). If we apply the inequality 1(T) 2 1(T’) + /CT,) (which can be proved easily), then we obtain that the changing number of X on the tree T is less than or equal to I(T). It proves the claim.

(a2) x(w) @ xi(v). Now two further subcases are distinguished: (i) If i(w) = i(v) then the changing number of Xl T’ is equal to I(T’) - 1. The changing number of x on the tree T, is equal to /CT,) + 1. The repetition of the previous argument proves the claim. (ii) Assume j&w> # j?(v). Then, due to the imple- mentation algorithm, x(v) E xi(v) and iIT, is an optimal colouration. The same holds for jJ T’; therefore X itself is optimal again.

(B) Assume X is an optimal colouration of T. We have to show, that it is an implementation of the extension xe. It is easy to see that x(v) must belong to xi(v) U xz(v). Furthermore, we know that I(T’) + Z(T,) I Z(T). Therefore, if i(v) E xi(v), then ,?lT’ is an optimal colouration. Then, due to the hypothesis, FIT’ is an implementation of xle, and therefore X is an implementation of xe. If x(v) E x2(v), then the map X’: VU’) + C, where

F’(u) = i i(u) ifu#v, E xI( v) ifu=v,

Page 6: Evolutionary Trees: An Integer Multicommodity Max-Flow - Min …elp/Multi/ErdosSzekely-AAM92.pdf · 2006. 7. 11. · EVOLUTIONARY TREES 377 a multicommodity flow problem, for which

380 ERDijS AND SZhELY

is an optimal colouration of T’. So X’ is an implementation of x”. But the colouration 2 can be derived from X’ in a way that conforms to the implementation algorithm. Therefore, X is again an implementation of xe.

0

The combination of the implementation algorithm and the construction of the proof of the theorem gives us an algorithm to determine all the optimal colourations. However, if we need only one optimal colouration, e.g., we need the length of the tree, then we may do it more simply.

If this is the case, we do not construct the colour sets of x2 and it suffices to determine only one implementation of xe. Clearly, we can do it in polynomial time. On the other hand, to determine all implementations (i.e., all optimal colourations) has a higher complexity by the length of the output.

DEFINITION. We say that a u E V(T) k of order k (k = 1,2), if there exists an edge e adjacent to U, such that in the connected component of V(T) - e, containing U, BJu, e), the longest u - leaf distance is of length k. Let TJv, e) denote the union of the other connected component and e (see Fig. 1).

COROLLARY 1. If v is of order one, then there exists an optimal coloura- tion 2 with F(v) E xl(v).

COROLLARY 2. Zf v is of order two and B,(v, e) is totally multicoloured, then xl(v) is the set of colours of B,(v, e), and there is an optimal colouration which assigns the same element of that set to v and its non-leaf neighbours in B&v, e).

FIGURE 1

Page 7: Evolutionary Trees: An Integer Multicommodity Max-Flow - Min …elp/Multi/ErdosSzekely-AAM92.pdf · 2006. 7. 11. · EVOLUTIONARY TREES 377 a multicommodity flow problem, for which

EVOLUTIONARY TREES 381

Assume wi, i = 1,. . . , k, is the list of those neighbours of u in B,(u, e), for which wi is of order one by the edge uwi.

COROLLARY 3. Zf v is of order two and B,(v, e) is not totally multi- cofoured, but B,(wi, wiv) is (i = 1,. . . , k), then x1(v) is the set of colours occurring with maximum multiplicity in B&v, e>, and there i.v an optimal colouration which assigns the same element of that set to v and its nonleaf neighbours in B,(v, e).

Remark. After slight modification, the algorithm of the present section yields optimal colouration for the generalized problem, where some branching points also have an admissible colour set assigned and the colouration has to use an admissible colour in every vertex. Either we have to transform the tree before using the same algorithm by hanging suffi- ciently many leaves of every colour of the admissible colour set from the vertex, which the admissible colour set was assigned to, or, alternatively, we may restrict the set xl(v) to its intersection with the set of admissible colours in v everywhere in the algorithm. We leave the details for the reader.

3. THE PROOF OF THEOREM 1

Note that from now on every leaf-colouration is simple. We are going to prove an equivalent form of Theorem 1. First, we give a general lower bound for the length. Suppose G is a graph, L c V(G), x: L -+ C is a partial colouration with the elements of a set C. Define the,length of G, Z(G), as the minimum number of colour-changing edges over all exten- sions of the colouration to V(G). The notions of optimal colouration, colour-changing path, and conflict are naturally generalized for G.

We say, that 9 is a jungle,’ if it consists of rooted subtrees of a leaf-coloured tree T, such that

-the leaves of the trees are in L, and no colour is repeated in any of the trees,

-the roots are leaves in the trees,

-if an edge belongs to more than one tree, then the edge does not separate the roots of the trees,

-if an edge belongs to more than one tree, then the root-leaf paths of the trees containing that edge end in leaves of different colours.

‘Note that a jungle is thicker than a forest.

Page 8: Evolutionary Trees: An Integer Multicommodity Max-Flow - Min …elp/Multi/ErdosSzekely-AAM92.pdf · 2006. 7. 11. · EVOLUTIONARY TREES 377 a multicommodity flow problem, for which

382 ERDiiS AND SZlkELY

For F E 9, let c(F) denote the number of different colours present in the leaves of F. We claim

THEOREM 3. (a) For arbitrary G, L c V(G) and x: L + C, we have l(G) 2 maximum number colour-changing paths, with no two in con.ict; (b) for a tree T,

I(T) 2 my c (c(F) - l), FEY

where the maximum is taken for jungles.

Proof Assume we are given a colouration 2: V(G) + C, such that XJL = x, and a set of colour-changing paths with no two in conflict. Assign its last colour-changing edge to every colour-changing path, with respect to the orientation of the path and the given colouration. Clearly we have an injection.

For a tree T, note, that the root-leaf paths of the trees of 9 make a colour-changing path system with no two in conflict. 0

THEOREM 4. For a tree T with a simple leaf-colouration ,y, l(T) = maXg CF E ,(c(F) - l), where the maximum is taken for jungles.

It is easy to see the equivalence of Theorems 1 and 4 by building the trees of a jungle from the union of paths with the same starting point, taking the starting point for the root.

Proof. We have l(T) 2 maXg E,,,(c(F) - 1) from Theorem 3. We are left with the nontrivial inequality, Z(T) I maXg CFES(c(F) - 1). We assume that a proper extension of x has already been determined by Section 2.

We apply mathematical induction on b and n. The base case of the induction is b = 1, i.e., T is a star. Now E(T) = IL1 - maximum colour multiplicity in L. We build a jungle 9 with l(T) = CFE ,(c(F) - 0. It is possible to cover L with edge-disjoint trees that are totally multicoloured (no colour is repeated in any of them), with no more trees, than the maximum colour multiplicity in L. Take any leaf for root in any of them.

Our hypothesis is, that we already know Theorem 4 for every leaf-col- oured tree T’ with a fewer number of branching points or with the same number of branching points and fewer number leaves. We distinguish cases.

(A> There exists a vertex u of order one and two leaves of the same colour in V(T) \ TJv, e). Assume m is the maximum multiplicity of a

Page 9: Evolutionary Trees: An Integer Multicommodity Max-Flow - Min …elp/Multi/ErdosSzekely-AAM92.pdf · 2006. 7. 11. · EVOLUTIONARY TREES 377 a multicommodity flow problem, for which

EVOLUTIONARY TREES 383

colour in V(T) \ Ti(u, e). Chop off m - 1 leaves of every colour in V(T) \ T,(v, e), (or less, but all of them, if the multiplicity of the colour is less than m - 11, and call the leftover tree T’. Note that the surviving colours are exactly the elements of xi(u). Recall Corollary 1, we have an optimal colouration X with x(v) E xi(v). Hence we have I(T) = 1(T’) + d,(u) - IXJUN - m. By the hypothesis of the induction on n, there is a jungle F’ in T’ with a sum at least I@‘). In order to obtain a suitable jungle 9 for T, add m - 1 totally multicoloured edge-disjoint stars that contain all the deleted leaves to F’, and

l(T) = V’) + 4(4 -tx~(u>l -m

The choice for the roots of the stars added is arbitrary, since they are edge-disjoint to F’ and each to the other.

(B) We may assume that for all vertices u of order one, V(T) \ T,(u, e) is totally multicoloured. It is easy to see that if T is not a star, then it has a vertex of order two, say U, and edge e shows it. Assume wi, i = 1,. . . , k, is the list of those neighbours of u in B,(v, e), for which wi is of order one by the edge uwi.

(Bl) Assume B, = B&v, e) is totally multicoloured. Let T’ denote the tree that we obtain from T by deleting wi, i = 1,. . . , k, and joining the resulting isolated vertices to U. Let B’ denote what we obtain from B, in the same way. We also think of B’ as a subtree of T’. By hypothesis, we have a jungle 9’ in T’, which produces a sum that is at least l(T’). By Corollary 2, we have f(T) = I(T’).

DEFINITION. We say an F E 9’ is exterior, if it has at least one leaf in both of T2(u, e) and B’. If it has all its leaves from B’, we say it is in tefior .

We have to build a jungle F in T with the same sum as .F’ had. There is a natural way to extend a tree F E 9’ to T by subdivision of edges. We call ir: the lift-up of F and denote it by F*. Lifting up every element of F’, we obtain F’*, which is, unfortunately, not necessarily a jungle. However, we want to take advantage of the idea. Note that either all the exterior trees are rooted in B’, or none of them, since the edge e

Page 10: Evolutionary Trees: An Integer Multicommodity Max-Flow - Min …elp/Multi/ErdosSzekely-AAM92.pdf · 2006. 7. 11. · EVOLUTIONARY TREES 377 a multicommodity flow problem, for which

384 ERDijS AND SZlkELY

cannot be used in opposite directions by the definition of jungle. We distinguish further subcases.

(Bla) The total number of exterior and interior trees equals one. Take F= 9’*.

(Blp) Every exterior tree is rooted in B’. Let F,, . . . , Fk be the exterior tress in F’. F being a jungle, all the leaves of these exterior trees in I/(T) \B’ have different colours. If .P has an interior tree, then it is easy to see, by the optimality of 9’, that it has exactly one. In order to build 9, we start with 9’* and do some surgery on it. Note that at most one exterior tree is rooted in B,(wj, wju); otherwise we contradict the optimality of P. Assume that the exterior tree Fi* is rooted in B,(wj,wjv), then we want to add the whole B,(wj,wjv) to F;* (and delete it from every tree of the jungle that contained it before). Only a leaf of colour c causes a problem, if e* already has a leaf of colour c in IQ) \B&u, e). Delete the latter leaf from FF, and join it to another exterior tree. We can do it, since no other exterior tree may contain the colour c after the alterations. If we do not find another exterior tree, then we have an interior tree; otherwise we are in (Bla). Join the latter leaf to the interior tree (it does not contain colour c as B,(u, e) is totally multicoloured and its colour c has already been used up>. We may have had an interior tree. It did not survive the last paragraph unless it contained some BZ(wj, wju). If it did sur- vive, lift it up and take for its root any leaf of that B,(wj, wju) (see Fig. 2).

(Bly) No exterior tree is rooted in B’. If F’ has no interior trees, take 9= P*. If 9’ has an interior tree, then it is easy to see by the optimality of P, that it has exactly one. Construct 9 in the following way: restrict the exterior trees to V(T) \ B’ and add the star B’ with arbitrary root to them and finally, lift them up.

(B2) B, = B,(u, e) is not totally multicoloured. Since we are not in case (A), every B,(wj, WjU) is totally multicoloured. Assume m is the maximum multiplicity of a colour in B,, m 2 2. Delete every neighbour of u but edge e, and join to u a representative leaf of every colour occurring with maximum multiplicity. Call the new tree T’. By Corollary 3, we have I(T’) = 1(T) - (IL(T)I - lL(T’)l) + (m - 1). By hypothesis, there is a jungle F in T’, which produces a sum equal to l(T’). Call the star of u and the representative leaves B’; we speak about interior and exterior trees of F’ again.

Page 11: Evolutionary Trees: An Integer Multicommodity Max-Flow - Min …elp/Multi/ErdosSzekely-AAM92.pdf · 2006. 7. 11. · EVOLUTIONARY TREES 377 a multicommodity flow problem, for which

EVOLUTIONARY TREES 385

P={AdG,E,r;D-B;E+C}

F={A-rB,C,G,D+E;B+I}

Ieticr1 mwu IWIICl, nrmkr1 IRCUI wIarr1

FIGURE 2

Define a color-u-preserving injection T from the representative leaf set of T’ into the leaf set of B,, whose range does not intersect at least m - 1 components of B, \ {u). Such a r can be defined by the greedy algorithm; always try to define T on a leaf by choosing a leaf of the same colour from the already spoiled components. Whenever we have to pick a new component for a new colour, it means that there are m intact components containing that colour. Define the lift-up F* of F E 9’ by joining F n TJu, e) to 7(p) in T for any representative leaf p of E We may assume, that F’ covers all the representative leaves; if not, we may add trees of singleton leaves to F’. (It is easy to see that there is at most one tree of a singleton leaf.)

First extension. Define F in the following way. Take F’* and add m - 1 internal trees to it that cover all leaves of B, not covered by F’* in the following way: pick m - 1 roots from RZ - 1 arbitrary but fixed components of B, \ {u) not spoiled by 7, and build m - 1 totally multi- coloured rooted trees (stars), such that the selected m - 1 unspoiled components considered above wholly belong to one tree each, and then cover with the trees all the not covered leaves in B,.

The first extension may fail to give the required jungle F only if F has an interior tree or exterior tree rooted in a representative leaf. (Again, either all exterior trees are rooted in a representative leaf or none of them.) The problem may be the use of some wiu edges in both directions, by members of F. In this case we further modify the result of the first extension.

Second extension. If a B&w,, wiu) contains the root of a lift-up tree, then we would like to remove the leaves of B,(w,, wiu> from the other trees rooted elsewhere and add them to the lift-up tree. If we have two

Page 12: Evolutionary Trees: An Integer Multicommodity Max-Flow - Min …elp/Multi/ErdosSzekely-AAM92.pdf · 2006. 7. 11. · EVOLUTIONARY TREES 377 a multicommodity flow problem, for which

386 ERDbS AND SZlkELY

P={E-+A,I;F~N,o,P)

Firri edeuior {E + A, I; B + C, D, M; II + N, 0, P; K 4 F, L, G} Second cdcrrion {E + F, G, P, H + I, J, N; B + C, D, M; K + A, L, 0)

FIGURE 3

lift-up trees rooted in B,(w,, w;u>, then we can add any leaf to one or the other, since their colours are all distinct in T,(wi, w~u). If we have only one lift-up tree F* rooted in B,(wi, wiu), then we may fail in adding a leaf of colour c (belonging to a tree S) from B,(wi, wju) to the tree; i.e., the tree F* already has a leaf of colour c in Ti(wi, wiu). Then switch the leaves of colour c between F* and S (see Fig. 3).

It is easy to see that the second extension eliminated the conflicts that may have survived the first extension and did not create any new ones. 0

4. THE LP CONNECTION

For every edge pq of the tree T and every pair of distinct colours 4 define a variable z,,,~~. If q E L, set z,,,~~ = 0 for every j # x(q). Identify zPq ij and zqP, ji. Consider the linear program

for every Pab colour-changing path

c c zpq,ix(b) 2 ‘7 w eP.6 i: i #x(b)

Page 13: Evolutionary Trees: An Integer Multicommodity Max-Flow - Min …elp/Multi/ErdosSzekely-AAM92.pdf · 2006. 7. 11. · EVOLUTIONARY TREES 377 a multicommodity flow problem, for which

EVOLUTIONARY TREES 387

where the sum is for pq edges of the path with p reached first (remember, that colour-changing paths are oriented paths):

min CzPs,ij.

To describe the dual linear program, for every colour-changing path Pab, introduce a variable hab,

A,, 2 0.

If p, q are not leaves, then for every i, j E C, i # j, have

x(b) =i ,y(u)=i

if q is a leaf, then for j = x(q), i # j, have

,y(b)=j ,y(u)=i

(the first sums are taken for colour-changing paths Pa,, that contain pq and reach p first, while the second sums are taken for colour-changing paths P,, that contain pq and reach q first):

It is easy to see that the maximum number of colour-changing paths, no two in conflict min CZ,,~~: zPq ij

s max CA,,: A,, integer I max CA,, = min Cz,, ij I integer I 1(T). Only the first and last inequalities re-

quire proof from the chain of inequalities above. The first one holds, since the root-leaf paths of a jungle always provide a feasible integer solution for the second linear program; the last one holds, since we have an optimal colouration 2 with I(T) colour-changing edges, define zPaij = 1, if pq is one of the I(T) colour-changing edges, x(p) = i, X(q) = j. By Theorem 1, each of the inequalities holds by equality.

Finally, our second linear program is a multicommodity flow problem, for which a max-flow-min-cut theorem holds.

5. THE MULTIWAY CUT PROBLEM

The multiway cut problem [CR] is the following one: Given a connected simple graph G with positive edge weights and a set of vertices N c V(G), find a minimum-weight edge set that separates all pairs of N. Having a constant weight, the multiway cut problem is the problem of finding an

Page 14: Evolutionary Trees: An Integer Multicommodity Max-Flow - Min …elp/Multi/ErdosSzekely-AAM92.pdf · 2006. 7. 11. · EVOLUTIONARY TREES 377 a multicommodity flow problem, for which

388 ERD6S AND SZ&ELY

optimal colouration which extends a partial colouration that assigns dif- ferent colours to the members of N. Having an instance of the problem of finding an optimal colouration, identify the vertices to which the partial colouration assigned the same colour and represent the arising multiple edges by multiple weights in order to obtain an instance of the weighted multiway cut. People working on the multiway cut problem seem to be unaware of the related work in biomathematics and vice versa.

Dahlhaus et al. [DJPSY] proved that the multiway cut problem is NP-hard even for INI = 3 and equal edge weights. It easily implies that the problem “is 1(G) I m?” is NP-complete. Dahlhaus et al. [DJPSY] also proved that the multiway cut problem is polynomially solvable for planar graphs with fixed INI. Note that identifying some leaf sets of a tree, we may obtain nonplanar graphs. Conversely, by splitting vertices of N into as many leaves as their degree, we obtain

THEOREM 5. Zf N intersects every cycle of G, then the Fitch-Hartigan algorithm described in Section 2 provides a solution in polynomial time for the multiway cut problem with equal edge weights. It also provides a way to obtain all minimum multiway cuts. We have l(G) = the maximum number of colour-changing paths with no two in conflict. q

We already have a version of the Fitch-Hartigan algorithm for edge- weighted trees with positive integer weights and we also have the weighted version of Theorem 5.

REFERENCES

[CR1 S. CHOPRA AND M. R. RAO, On the multiway cut polyhedron, Networks 21(1991), 51-89.

[CHPSW] M. CARTER, M. HENDY, D. PENNY, L. A. SZ~KELY, AND N. C. WORMALD, On the

[DJPSYI

[ESzl

[Fe1

lFi1

[FGI

lGF1

distribution of lengths of evolutionary trees, SIAM J. Discrete Math. 3 (1990), 38-41. E. DAHLHAUS, D. S. JOHNSON, C. H. PAPADIMITRIOU, P. SEYMOUR, AND M. YANNAKAKIS, The complexity of multiway cuts, extended abstract 1983. P. L. ERDGS AND L. A. SZBKELY, Counting bichromatic evolutionary trees, Discrete Appl. Math., to appear. J. FELSENSTEIN, Phylogenies from molecular sequences: Inference and reliability, Annu. Reu. Genetics 22 (19881, 521-565. W. M. FITCH, Towards defining the course of evolution. Minimum change for specific tree topology, Syst. Zool. 20 (1970, 406-416. L. R. FOULDS AND R. L. GRAHAM, The Steiner problem in phylogeny is NP-com- plete, Adu. Appl. Math. 3 (19821, 43-49. R. L. GRAHAM AND L. R. FOULDS, Unlikelihood that minimal phylogenies for a realistic biological study can be constructed in reasonable computational time, Math. Biosci. 60 (19821, 133-142.

Page 15: Evolutionary Trees: An Integer Multicommodity Max-Flow - Min …elp/Multi/ErdosSzekely-AAM92.pdf · 2006. 7. 11. · EVOLUTIONARY TREES 377 a multicommodity flow problem, for which

EVOLUTIONARY TREES 389

[Hal

[LPI

[Mel WI

J. A. HARTIGAN, Minimum mutation fits to a given tree, Biometrics 29 (19731, 53-65. L. Lovkz AND M. D. PLUMMER, “Matching Theory,” AkadCmiai Kiad6, Bu- dapest; North-Holland, Amsterdam, 1986. K. MENGER, Zur allgemeinen Kurventheorie, Fund. Math. 10 (19261, 96-115. M. A. STEEL, Distributions on bicoloured binary trees arising from the principle of parsinomy, Discrete Appl. Math., to appear.