Algorithmic and complexity results for decompositions of...
Transcript of Algorithmic and complexity results for decompositions of...
B
3
4
5
6
7
8
9
A10
tiToasat
11
12
13
14
15
16
17
18
19
©20
21
122
23
m24
s25
s26
fi27
w28
s29
y(
C
m0
1 02 d
D P
RO
OF
IO 2594 1–18
BioSystems xxx (2006) xxx–xxx
Algorithmic and complexity results for decompositions ofbiological networks into monotone subsystems
Bhaskar DasGuptaa,1,∗, German Andres Encisob,2, Eduardo Sontagc,3, Yi Zhanga,1
a Department of Computer Science, University of Illinois at Chicago, Chicago, IL 60607, United Statesb Mathematical Biosciences Institute, 250 Mathematics Building, 231 W 18th Avenue, Columbus, OH 43210, United States
c Department of Mathematics, Rutgers University, New Brunswick, NJ 08903, United States
Received 23 January 2006; received in revised form 3 August 2006; accepted 3 August 2006
bstract
A useful approach to the mathematical analysis of large-scale biological networks is based upon their decompositions into mono-one dynamical systems. This paper deals with two computational problems associated to finding decompositions which are optimaln an appropriate sense. In graph-theoretic language, the problems can be recast in terms of maximal sign-consistent subgraphs.he theoretical results include polynomial-time approximation algorithms as well as constant-ratio inapproximability results. Onef the algorithms, which has a worst-case guarantee of 87.9% from optimality, is based on the semidefinite programming relaxation
CT
pproach of Goemans–Williamson [Goemans, M., Williamson, D., 1995. Improved approximation algorithms for maximum cut andatisfiability problems using semidefinite programming. J. ACM 42 (6), 1115–1145]. The algorithm was implemented and tested on
Facto
30
31
32
33
34
E
Drosophila segmentation network and an Epidermal Growtho optimally.
2006 Elsevier Ireland Ltd. All rights reserved.
. Introduction
In living cells, networks of proteins, RNA, DNA,etabolites, and other species process environmental
ignals, control internal events such as gene expres-ion, and produce appropriate cellular responses. The
UN
CO
RReld of systems (molecular) biology is largely concerned
ith the study of such networks, viewed as dynamicalystems. One approach to their mathematical analysis
∗ Corresponding author. Tel.: +1 3123551319; fax: +1 3124130024.E-mail addresses: [email protected] (B. DasGupta),
[email protected] (Y. Zhang), [email protected]. Enciso), [email protected] (E. Sontag).1 Partly supported by NSF grants CCR-0296041, CCR-0206795,CR-0208749 and IIS-0346973.2 Work done while the author was with the Mathematics Depart-ent of Rutgers University and partly supported by NSF grant CCR-
206789.3 Partly supported by NSF grants EIA 0205116 and DMS-0504557.
35
36
37
38
39
40
41
42
43
44
303-2647/$ – see front matter © 2006 Elsevier Ireland Ltd. All rights reservoi:10.1016/j.biosystems.2006.08.001
Please cite this article as: Bhaskar DasGupta et al., Algorithmic and cinto monotone subsystems, BioSystems (2006), doi:10.1016/j.bios
Er Receptor pathway model, and it was found to perform close
relies upon viewing them as made up of subsystemswhose behavior is simpler and easier to understand. Cou-pled with appropriate interconnection rules, the hope isthat emergent properties of the complete system can bededuced from the understanding of these subsystems.Diagrammatically, we picture this as in Fig. 1, whichshows a full system as composed of four subsystems.
A particularly appealing class of candidates for “sim-pler behaved” subsystems are monotone systems, as inHirsch (1985, 1983) and Smith (1995). Monotone sys-tems are a class of dynamical systems for which patho-logical behavior (“chaos”) is ruled out. Even thoughthey may have arbitrarily large dimensionality, mono-tone systems behave in many ways like one-dimensionalsystems. For instance, in monotone systems, bounded
BIO 2594 1–18
trajectories generically converge to steady states, and 45
there are no stable oscillatory behaviors. More precisely, 46
see below, one must extend the notion of monotone sys- 47
tem so as to incorporate input and output channels, as 48
ed.
omplexity results for decompositions of biological networksystems.2006.08.001
T
BIO 2594 1–18
2 B. DasGupta et al. / BioSystem
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
a given system. 109
C
Fig. 1. A system composed of four subsystems.
introduced and initially developed in Angeli and Sontag(2003); inputs and outputs are required so that intercon-nections like those shown in Fig. 1 can be defined.
Monotonicity is closely related, as explained later,to positive and feedback loops in systems. The topicof analyzing the behaviors of such feedback loops is along-standing one in biology in the context of regula-tion, metabolism, and development; a classical referencein that regard is the work (Monod and Jacob, 1961)of Monod and Jacob in 1961. See also, for example,Angeli et al. (2004), Angeli and Sontag (2004), Cinquinand Demongeot (2002), Lewis et al. (1977), Meinhardt(1978), Plathe et al. (1995), Remy et al. (2003), Snoussi(1998) and Thomas (1978).
An interconnection of monotone subsystems, that isto say, an entire system made up of monotone compo-nents, may or may not be monotone: “positive feedback”(in a sense that can be made precise) preserves mono-tonicity, while “negative feedback” destroys it. Thus,oscillators such as circadian rhythm generators requirenegative feedback loops in order for periodic orbits toarise, and hence are not themselves monotone systems,although they can be decomposed into monotone sub-
UN
CO
RR
Esystems (cf. Angeli and Sontag, 2004). A rich theory isbeginning to arise, characterizing the behavior of non-monotone interconnections. For example, Angeli andSontag (2003) shows how to preserve convergence to
Fig. 2. A consistent and an
Please cite this article as: Bhaskar DasGupta et al., Algorithmic andinto monotone subsystems, BioSystems (2006), doi:10.1016/j.bios
ED
PR
OO
F
s xxx (2006) xxx–xxx
equilibria; see also the follow-up papers (Angeli et al.,2004; Enciso et al., 2005; Enciso and Sontag, 2006;Gedeon and Sontag, 2005; De Leenheer et al., 2005).Even for monotone interconnections, the decomposi-tion approach is very useful, as it permits locating andcharacterizing the stability of steady states based uponinput/output behaviors of components, as described inAngeli and Sontag (2004); see also the follow-up papers(Angeli et al., 2004; Enciso and Sontag, 2005; De Leen-heer and Malisoff, 2006).
Moreover, a key point brought up in Sontag (2004,2005) is that new techniques for monotone systems inmany situations allow one to characterize the behaviorof an entire system, based upon the “qualitative” knowl-edge represented by general network topology and theinhibitory or activating character of interconnections,combined with only a relatively small amount of quan-titative data. The latter data may consist of steady-stateresponses of components (dose-response curves and soforth), and there is no need to know the precise formof dynamics or parameters such as kinetic constants inorder to obtain global stability conclusions.
In Section 2 of this paper, we briefly discuss mono-tonicity of systems described by ordinary differentialequations (the study of monotonicity can be extendedto partial differential equations, delay-differential equa-tions, and even more arbitrary dynamical systems, seee.g. Enciso and Sontag, 2006 in the context of mono-tone systems with inputs and outputs). We explain therehow the study of monotone systems, and more generallyof decompositions into monotone systems, relates to asign-consistency property for the graph which describeshow each state variable influences each other variable in
BIO 2594 1–18
inconsistent graph.
Generally, a graph, whose edges are labeled by “+” 110
or “−” signs (sometimes one writes +1, −1 instead of 111
+, −, or uses respectively activating “→” or inhibiting
complexity results for decompositions of biological networksystems.2006.08.001
CT
BIO 2594 1–18
B. DasGupta et al. / BioSystem
“112
c113
s114
i115
e116
o117
d118
c119
e120
e121
o122
3123
r124
(125
126
t127
(128
(129
(130
(131
(132
n133
i134
d135
s136
t137
b138
c139
p140
t141
b142
t143
s144
d145
e146
T147
148
b149
p150
e151
m152
153
(154
g155
c156
a157
a158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
NC
OR
RE
Fig. 3. Pulling-out inconsistent connections.
�” arrows as shown in Fig. 2), is said to be sign-onsistent if all paths between any two nodes have theame net sign, or equivalently, all closed loops have pos-tive parity, i.e. an even number, possibly 0, of negativedges. (For technical reasons, one ignores the directionf arrows, looking only at undirected graphs; see moreetails in Section 2.) Thus, the first graph in Fig. 2 isonsistent, but the second one, which differs in just onedge from the first one, is not (two paths with differ-nt parity are possible from node 1 to node 4, a directdd one as well as an even one transversing nodes 2 and). Self-loops, which in biochemical systems often rep-esent degradation terms, are ignored in this definition.We discuss this point further below.)
When applying decomposition theorems such ashose described in Angeli et al. (2004), Angeli et al.2004), Angeli and Sontag (2003, 2004), Enciso et al.2005), Enciso and Sontag (2005), Enciso and Sontag2006), Gedeon and Sontag (2005), De Leenheer et al.2005) and De Leenheer and Malisoff (2006), Sontag2004, 2005), it tends to be the case that the fewer theumber of interconnections among components, the eas-er it is to obtain useful conclusions. One may view aecomposition into interconnections of monotone sub-ystems as the “pulling out” of “inconsistent” connec-ions among monotone components, the original systemeing a “negative feedback” loop around an otherwiseonsistent system, as represented in Fig. 3. In this inter-retation, the number of interconnections among mono-one components corresponds to the number of variableseing fed-back. In addition, and independently from theheory developed in the above references, one mightpeculate that nature tends to favor systems that areecomposable into small monotone interconnections (orquivalently, have a small number of inconsistent paths).here are two reasons for this.
From a dynamical systems perspective, negative feed-ack loops, although required for homeostasis and foreriodic behavior, have potentially destabilizing effects,specially if there are signal propagation delays; thus,inimizing their number is desirable.Another advantage of consistency is as follows
Sontag, in preparation). Suppose that the nodes in the
Uraphs shown in Fig. 2 represent concentrations of ahemical species in a cell, such as receptors in a certainctivated state or transcription factors. Assume now thatperturbation instantaneously increases the value of thePlease cite this article as: Bhaskar DasGupta et al., Algorithmic and cinto monotone subsystems, BioSystems (2006), doi:10.1016/j.bios
ED
PR
OO
F
s xxx (2006) xxx–xxx 3
concentration of node 1. For the graph on the left, theinstantaneous effect on the other nodes is predictable:nodes 2 and 6 will increase, while nodes 3, 4, and 5will decrease. This unambiguous global effect holds trueregardless of the actual algebraic forms of reactions, val-ues of parameters such and kinetic constants, etc. Incontrast, consider the graph shown on the right. Nowthe net effect of an increase in node 1 is ambiguous. It isimpossible to know if node 4 will be repressed (becauseof the direct edge from 1 to 4) or activated (because ofthe indirect path). There is no way to resolve this ambi-guity unless equations and precise parameter values areassigned to the arrows. Since cells of the same type differin precise parameter values, due to varying concentra-tions of ATP, enzymes, and other chemicals, two cells ofthe same type may react in different ways to the same“stimulus” (increase in concentration of chemical 1).While such epigenetic diversity is sometimes desirable,it makes behavior less predictable. From an evolutionaryviewpoint, a “change in wiring” due to a mutation willhave an ambiguous effect, in this inconsistent network.
Of course, one should not expect large networks to beglobally consistent. However, if the number of inconsis-tencies in a biological interaction graph is small, it maywell be the case that the network is in fact consistentin a practical sense. For example, a gene regulatory net-work represents all potential effects among genes. Theseeffects are mediated by proteins which themselves mayneed to be “activated” in order to perform their func-tion, and this activation may, in turn, depend on certainextracellular ligands being present. Thus, depending onthe particular combination of external signals present,different subgraphs of the original graph describe thesystem under those conditions, and these graphs may beindividually consistent. For example, for the system inFig. 2, the edge from 1 to 2 may not be present under envi-ronmental conditions A, while the edge from 2 to 3 maynot be present under conditions B. Thus, under eitherconditions, A or B, the graph would be consistent, eventhough the entire network is not. See Sontag (in prepa-ration) for more discussion of these issues. In summary,consistency in biological networks may be desirable, andtherefore one might conjecture that true biological net-works tend to maximize it. Evidence that this is indeedthe case is provided by Ma’ayan et al. (in preparation),where the authors compare certain biological networksand appropriately randomized versions of them and showthat the original networks are closer to being consistent,
BIO 2594 1–18
when consistency is measured using a simple heuristic. 207
In the last section of this paper, we apply our algorithms 208
to perform a similar analysis, and once again derive the 209
conclusion that nature seems to favor consistency. 210
omplexity results for decompositions of biological networksystems.2006.08.001
T
BIO 2594 1–18
4 B. DasGupta et al. / BioSystem
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
NC
OR
RE
C
Fig. 4. Dropping the diagonal edge gives consistency.
Thus, we are led to the subject of this paper, namelycomputing the smallest number of edges that have tobe removed so that there remains a consistent graph.For example, for the particular graph shown in Fig. 4the answer is that one edge (the diagonal positive one)suffices (in this case, the solution is unique: no singleother edge would suffice; in other problems, there maybe more than one optimizing solutions).
There has been other work dealing with efficientknock-out strategies in biochemical reaction networks,also formulated, as in this paper, as edge deletion prob-lems. As an example, we mention the recent paper(Klamt, 2006), which dealt with the question of iden-tifying a minimal set of reactions whose removal wouldblock the operation of a prespecified reaction. The prob-lem that we consider is completely different, however.
In this paper, we will study the computational com-plexity of the question of how many edges must beremoved in order to obtain consistency, and we pro-vide a relaxation-based polynomial-time approximationalgorithm guaranteed to solve the problem to about87.9% of the optimum solution, which is based onthe semidefinite programming relaxation approach ofGoemans–Williamson Goemans and Williamson (1995)(A variant of the problem is discussed as well.) We alsoobserve that it is not possible to have a polynomial-timealgorithm with performance too close to the optimal.While our emphasis is on theory, one of the algorithmswas implemented, and we show results of its applica-tion to a Drosophila segmentation network and to anEpidermal Growth Factor Receptor pathway model. Itturns out that, when applying the algorithm, often thesolution is much closer to optimal than the worst-caseguarantee of 87.9%, and indeed often gives an optimalsolution.
The remainder of this paper is organized as follows.Section 2 briefly discusses monotonicity. The discussionis self-contained for the purposes of this paper, and ref-erences are given to the dynamical systems results thatmotivate the problem studied here. The connection to
Uconsistency is also explained there. Section 3 discussesthe associated graph-theoretic problems and notions ofapproximability used in the paper, leading to the state-ment of our main theoretical results in Section 4, whichPlease cite this article as: Bhaskar DasGupta et al., Algorithmic andinto monotone subsystems, BioSystems (2006), doi:10.1016/j.bios
ED
PR
OO
F
s xxx (2006) xxx–xxx
are proved in Section 5. Section 6 contains the men-tioned examples of application of the algorithm. Finally,in Section 6.3 we consider a yeast gene regulatory net-work and various randomized versions of it, concludingthat the original network is far closer to consistent thanmay be expected from chance alone. Several technicalproofs are separately provided in Appendix A.
2. Monotone systems and consistency
We will illustrate the motivation for the problem stud-ied here using systems of ordinary differential equations
x = F (x) (1)
(the dot indicates time derivative, and x = x(t) is a vec-tor), although the discussion applies as well to moregeneral types of dynamical systems such as delay-differential systems or certain systems of reaction-diffusion partial differential equations. In applicationsto biological networks, the component xi(t) of the vec-tor x = x(t) indicates the concentration of the ith speciesin the model at time t.
We will restrict attention to models in which the directeffect that one given variable in the model has overanother is unambiguous, in the sense that it is alwaysinhibitory or always promoting. Thus, if protein A bindsto the promoter region of gene B, we assume that it doesso either to prevent the transcription of the gene or tofacilitate it, no matter what are the respective concen-trations. Mathematically, what we are saying is that werequire that for every i, j = 1, . . . , n, i �= j, the partialderivative ∂Fi/∂xj be either ≥ 0 at all states or ≤ 0 at allstates.
Let us briefly discuss this non-ambiguity assump-tion. First of all, we remark that this assumption doesnot prevent protein A from having an indirect influ-ence, through other molecules, perhaps dimmers of Aitself, that can ultimately lead to the opposite effecton gene B from that of a direct connection. Indeed,this is the whole point of studying graph consistency.Second, in biomolecular networks, ambiguous signs inJacobians often represent heterogeneous mechanisms.For example, take the case where protein A enhances thetranscription rate of gene B only if it is present at low con-centrations, but represses B if its concentration is largerthan some threshold. A careful study of the chemicalmechanism often reveals the existence of an interme-diate form (perhaps a homodimer) that is responsible
BIO 2594 1–18
for this ambiguous effect. (Mathematically, an example 300
is a rate of transcription k1a − k2a2, where a denotes 301
the concentration of A.) Introducing a new species into 302
the model (mathematically, an additional state variable 303
complexity results for decompositions of biological networksystems.2006.08.001
CT
B
oSystem
r304
p305
o306
c307
w308
r309
F310
d311
e312
o313
l314
J315
I316
i317
o318
g319
f320
v321
s322
t323
w324
t325
t326
327
(328
y329
a330
r331
o332
w333
i334
n335
s336
(337
p338
i339
340
o341
l342
f343
i344
o345
≤346
1347
e348
l349
p350
S351
s352
L353
s354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
NCOR
RE
IO 2594 1–18
B. DasGupta et al. / Bi
epresenting this intermediate form) reduces one to theroblem in which Jacobian entries are unambiguous. (Inur example, we would write the rate as k1a − k2c, whereis the concentration of the dimer. In addition, thereould be a new equation such as dc/dt = k3a
2 − k4c
epresenting formation of the dimer and its degradation.)inally, we note that small-scale negative loops are abun-ant in nature. Self-loops or “auto repression” are anxtreme example of these, and appear as a consequencef degradation and other effects. Regarding such self-oops, observe that the requirement of a fixed sign foracobian entries is not imposed on diagonal elements.n fact, these elements play no role in the graph to bentroduced next, nor on monotonicity—the propertiesf monotone systems are not affected by them. Moreenerally, it is often the case that small loops representast dynamics which may be collapsed into a self-loopsia time-scale decomposition (singular perturbations or,pecifically for enzymes, “quasi-steady state approxima-ions”) and hence may be viewed and diagonal termshich may be safely ignored. This is a modeling ques-
ion, to be settled before the algorithms studied here areo be applied.
Given any partial order ≤ defined on Rn, a system1) is said to be monotone with respect to ≤ if x0 ≤0 implies x(t) ≤ y(t) for every t ≥ 0. Here x(t), y(t)re the solutions of (1) with initial conditions x0, y0,espectively. Of course, whether a system is monotoner not depends on the partial order being considered, bute one says simply that a system is monotone if the order
s clear from the context. Monotonicity with respect toontrivial orders rules out chaotic attractors and eventable periodic orbits; see Hirsch (1985, 1983), Smith1995), and is, as discussed in the introduction, a usefulroperty for components when analyzing larger systemsn terms of subsystems.
A useful way to define partial orders in Rn, and thenly one to be further considered in this paper, is as fol-ows. Given a tuple s = (s1, . . . , sn), where si ∈ {1, −1}or every i, we say that x ≤s y if sixi ≤ siyi for every. For instance, the “cooperative order” is the orthantrder ≤s generated by s = (1, . . . , 1). This is the order
defined by x ≤ y if and only if xi ≤ yi for all i =, . . . , n. It is not difficult to verify if a system is coop-rative with respect to an orthant order; the followingemma, known as “Kamke’s condition,” is not hard torove, see Smith (1995) for details (also Angeli andontag, 2003 in the more general context of monotone
Uystems with input and output channels).emma 1. Consider an orthant order ≤s generated by= (s1, . . . , sn). A system (1) is monotone with respect
Please cite this article as: Bhaskar DasGupta et al., Algorithmic and cinto monotone subsystems, BioSystems (2006), doi:10.1016/j.bios
ED
PR
OO
F
s xxx (2006) xxx–xxx 5
to ≤s if and only if
sisj∂Fj
∂xi
≥ 0, i, j = 1, . . . , n, i �= j. (2)
To provide intuition, let us sketch the sufficiency partof the proof for the special case of the cooperativeorder. Suppose by contradiction that the system is notmonotone, and that therefore there is a pair of ini-tial conditions x0 ≤ y0 whose solutions x(t), y(t) ceaseto satisfy x(t) ≤ y(t) at some point. This implies thatat a certain critical moment in time t, there is somecoordinate i so that xi(t−) < yi(t−) but xi(t+) > yi(t+).(This argument is not entirely accurate, but it givesthe flavor of the proof.) Thus xi(t) = yi(t) for some iand the derivative with respect to time of xi is largerthan that of yi at time t, meaning that that Fi(x) >
Fi(y), where x = xi(t) and y = yi(t). However, thiscannot happen if Fi is increasing on all the variablesxj except possibly xi, so that x ≤ y, xi = yi impliesFi(x) ≤ Fi(y). An equivalent way to phrase this con-dition is by ask that ∂Fi/∂xj ≥ 0 at all states for everyi, j, i �= j, which is the Kamke condition for the specialcase of the cooperative order. The name of the orderarises because in a monotone system with respect to thatorder each species promotes or “cooperates” with eachother.
A rephrasing of this characterization of monotonicitywith respect to orthant orders can be given by looking atthe signed digraph G associated to (1). We define thevertex set V (G) and the edge set E(G) of G as fol-lows. Let V (G) = {1, . . . , n}, and given vertices i, j,let (i, j) ∈ E(G) and fE(i, j) = 1 if both ∂Fj/∂xi ≥ 0and the strict inequality holds at least at one state.Similarly let (i, j) ∈ E(G) and fE(i, j) = −1 if both∂Fj/∂xi ≤ 0 and the strict inequality holds at least at onestate. Finally, let (i, j) �∈ E(G) if ∂Fj/∂xi ≡ 0. Recallthat we are assuming that one of the three cases musthold.
Now we can define an orthant cone using any func-tion fV : V (G) → {−1, 1}, by letting x ≤fV y if andonly if fV (i)xi ≤ fV (i)yi for all i. Given fV , we definethe consistency function g : E(G) → {true, false} byg(i, j) = fV (i)fV (j)fE(i, j). Then, the following analogof Lemma 1 holds.
Lemma 2. Consider a system (1) and an orthant cone≤fV . Then (1) is monotone with respect to ≤fV if and
BIO 2594 1–18
only if g(i, j) ≡ 1 on E(G). 399
Proof. Let si = fV (i), i = 1, . . . , n. Note that 400
sisj∂fi/∂xj = 0 if (i, j) �∈ E(G). For (i, j) ∈ E(G), it 401
holds that sisj∂fi/∂xj ≥ 0 if and only if sisjfE(i, j) = 1, 402
omplexity results for decompositions of biological networksystems.2006.08.001
T
oSystem
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
NC
OR
RE
C
BIO 2594 1–18
6 B. DasGupta et al. / Bi
that is, if and only if g(i, j) = 1. The result follows fromLemma 1. �
For the next lemma, let the parity of a chain in G be theproduct of the signs (+1, −1) of its individual edges. Wewill consider in the next result closed undirected chains,that is, sequences xi1 , . . . , xir such that xi1 = xir , andsuch that for every λ = 1, . . . , r − 1 either (xiλ, xiλ+1 ) ∈E(G) or (xiλ+1 , xiλ ) ∈ E(G).
The following lemma (see DeAngelis et al., 1986 aswell as Smith, 1988, page 101) is analogous to the factfrom vector calculus that path integrals of a vector fieldare independent of the particular path of integration ifand only if there exists a potential function. Since theresult is key to the formulation of the problem beingconsidered, we provide a simple and self-contained proofin Appendix A.
Lemma 3. Consider a dynamical system (1) with asso-ciated directed graph G. Then (1) is monotone withrespect to some orthant order if and only if all closedundirected chains of G have parity 1.
2.1. Systems with inputs and outputs
As we discussed in the introduction, a usefulapproach to the analysis of biological networks consistsof decomposing a given system into an interconnectionof monotone subsystems. The formulation of the notionof interconnection requires subsystems to be endowedwith “input and output channels” through which infor-mation is to be exchanged. In order to address this weconsider controlled dynamical systems (Sontag, 1990)which are systems with an additional parameter u ∈ Rm
and which have the form
x = g(x, u). (3)
The values of u over time are specified by means ofa function t → u(t) ∈ Rm, t ≥ 0, called an input orcontrol. Thus each input defines a time-dependentdynamical system in the usual sense. To system (3)there is associated a feedback function h : Rn → R
m,which is usually used to create the closed loop systemx = g(x, h(x)). Finally, ifRn,Rm are ordered by orthantorders ≤fV , ≤q respectively, we say that the system ismonotone if it satisfies (2) for every u, and also
q f (j)∂gj ≥ 0, for every k, j (4)
Uk V∂uk(see also Angeli and Sontag, 2003.) As an example, letus consider the following biological model of testos-terone dynamics (Enciso and Sontag, 2004; Murray and
Please cite this article as: Bhaskar DasGupta et al., Algorithmic andinto monotone subsystems, BioSystems (2006), doi:10.1016/j.bios
ED
PR
OO
F
s xxx (2006) xxx–xxx
Mathematical Biology, 2002):
x1 = A
K + x3− b1x1, x2 = c1x1 − b2x2,
x3 = c2x2 − b3x3. (5)
Drawing the digraph of this system, it is easy to see thatit is not monotone with respect to any orthant order,as follows by application of Lemma 3. On the otherhand, replacing x3 in the first equation by u, we obtaina system that is monotone with respect to the orders≤(1,1,1), ≤(−1) for state and input respectively. Definingh(x) = x3, the closed loop system of this controlledsystem is none other than (5). The paper (Enciso andSontag, 2004) shows how, using this decompositiontogether with the “small gain theorem” from monotoneinput/output theory (Angeli and Sontag, 2003) leadsone to a proof that the system does not have oscillatorybehavior, even under arbitrary delays in the feedbackloop, contrary to the assertion made in Murray andMathematical Biology (2002).
We can carry out this procedure on an arbitrary sys-tem (1) with a directed graph G, as follows: given aset E of edges in G, enumerate the edges in EC as(i1, j1), . . . , (im, jm). For every k = 1, . . . , m, replaceall appearances of xik in the function Fjk
by the vari-able uk, to form the function g(x, u). Define h(x) =(xi1 , . . . , xim ). It is easy to see that this controlled system(3) has closed loop (1).
Note that the controlled system (3) generated by theset E as above has, as associated digraph, the sub-digraphof G generated by E. This is because for every k, one has∂gjk
(x, u)/∂xik ≡ 0, i.e., the edge from ik to jk has been“erased”.
Denote by G the underlying undirected graph of adirected graph G obtained by ignoring the directions ofthe edges. Given a set E ⊆ V (G) of vertices in a (directedor undirected) graph G, denote by G(E) the undirectedsubgraph of G generated by E. The edges of both G andG(E) are labeled with ±1 using the labels in the edgesof G, whenever appropriate. Let E be called consistent ifG(E) has no closed chains with parity −1. Note that thisis equivalent to the existence of fV such that g ≡ 1 on E,by Lemma 4 applied to the open loop system (3). If E isconsistent, then the associated system (3) itself can alsobe shown to be monotone: to verify condition (4), sim-ply define each qk so that (4) is satisfied for k, jk. Since∂g /∂u = ∂F /∂x �≡ 0, this choice is in fact unam-
BIO 2594 1–18
jk k jk ik
biguous. Conversely, if (3) is monotone with respect to 493
the orthant orders ≤fV , ≤q, then in particular it is mono- 494
tone for every fixed constant u, so that E is consistent by 495
Lemma 3. We thus have the following result. 496
complexity results for decompositions of biological networksystems.2006.08.001
CT
B
oSystem
L497
T498
c499
o500
3501
502
a503
t504
o505
i506
s507
‖508
p509
P510
i511
a512
t513
a514
f515
t516
e517
T518
c519
P520
a521
a522
s523
f524
D525
t526
t527
P528
s529
t530
s531
x532
A533
e534
3535
“536
|537
a538
|539
v540
F541
o542
a543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
NC
OR
RE
IO 2594 1–18
B. DasGupta et al. / Bi
emma 4. Let E be a set of edges of the digraph G.hen E is consistent if and only if the correspondingontrolled system (3) is monotone with respect to somerthant orders.
. Statement of problem
A natural problem is therefore the following. Givendynamical system (1) that admits a digraph G, use
he procedure above to decompose it as the closed loopf a monotone controlled system (3), while minimiz-ng the number ‖EC‖ of inputs. Equivalently, find fV
uch that P(E+) = ‖E+‖ is maximized and P(E−) =E−‖ = ‖EC+‖ minimized. This produces the followingroblem formulation.
roblem 1 (Undirected labeling problem (ULP)). Annstance of this problem is (G, h), where G = (V, E) isn undirected graph and h : E �→ {0, 1}. A valid solu-ion is a vertex labeling function f : V → {0, 1}. Definen edge {u, v} ∈ E to be consistent iff h(u, v) ≡ (f (u) +(v)) (mod 2). The objective is then to find a valid solu-
ion maximizing |F | where F is the set of consistentdges.
hat ULP is a correct formulation for our problem isonfirmed by the following easy equivalence.
roposition 1. Consider an instance (G, h) of ULP withn optimal solution having x consistent edges given byvertex labeling function f. Let D be a set of edges of
mallest cardinality that have to be removed such thator the remaining graph, that is the graph G′ = (V, E \
) with the same vertex set V but an edge set E \ D,here exists a vertex labeling function f ′ : V → {0, 1}hat makes every edge consistent. Then, x = |E| − |D|.roof. Since f produces a solution of ULP with x con-istent edges, exactly |E| − x edges are inconsistent,hus |D| ≤ |E| − x, that is, x ≤ |E| − |D|. Conversely,ince there is a solution with |E| − |D| consistent edges,≥ |E| − |D|. �
special case of ULP, namely when h(e) = 1 for all∈ E, is the MAX-CUT problem (defined in Section.1). Moreover, ULP can be posed as a special type ofconstraint satisfaction problem” as follows. We haveE| linear equations over GF (2), one equation per edgend each equation involving exactly two variables, overV | Boolean variables. The goal is to assign values to the
Uariables to satisfy the maximum number of equations.or algorithms and lower-bound results for general casesf these types of problems, such as when the equationsre over GF (p) for an arbitrary prime p > 2, when therePlease cite this article as: Bhaskar DasGupta et al., Algorithmic and cinto monotone subsystems, BioSystems (2006), doi:10.1016/j.bios
ED
PR
OO
F
s xxx (2006) xxx–xxx 7
are an arbitrary number of variables per equation or whenthe goal is to minimize the number of unsatisfied equa-tions, see references such as Amaldi and Kann (1996),Berman and Karpinski (2001), Creignou et al. (2001) andHastad and Venkatesh (2002) and the references therein.
Another interpretation (Sontag, in preparation) ofULP is in statistical mechanics terms. Let us label edgesby “±1” instead of {0, 1}, denoting by wuv = (−1)h(u,v)
the edge parities, now called “interaction energies.” Sim-ilarly, let us consider ±1-valued vertex labeling func-tions, now called (magnetic) “spin configurations,” σ :V → {−1, +1}, σ(v) = (−1)f (v). An edge {u, v} is con-sistent provided that wuvσuσj = 1. A graph with ±1weights is called an Ising spin-glass model in statisticalphysics. A “non-frustrated” spin-glass model is one forwhich there is a spin configuration for which every edgeis consistent (Barahona, 1982; Cipra, 2000; De Simoneet al., 1995; Istrail, 2000). This is the same as a consis-tent graph in our sense. Moreover, a spin configurationthat maximizes the number of consistent edges is one forwhich the “free energy” (with no exterior magnetic field):
−∑ij
wuvσuσv
is minimized, a “ground state”. (When h(e) = 1 orequivalently we = −1 for all edges, one has whatis called the “anti-ferromagnetic case”.) Thus, ourproblem amounts to finding ground states.
Given orthant orders ≤fV and ≤q for Rn and Rm
respectively, we say that a feedback function h is positiveif x ≤fV y implies h(x) ≤q h(y), and that it is negativeif x ≤fV y implies h(x) ≥q h(y). It can be shown thatthe closed loop of a monotone system with a positivefeedback function is actually itself monotone, so that nosystem can be produced in this way that was not mono-tone already. But if h is a negative feedback function, thenseveral results become available which use the methodsof monotone systems for systems that are not monotone,see Angeli and Sontag (2003), Enciso and Sontag (2004)and Enciso and Sontag (2006). For the following result,let (C, ⊆) be the class of consistent subsets of E(G),ordered under inclusion.
Proposition 2. Let E be a consistent set. Then E ismaximal in (C, ⊆) if and only if h is a negative feedbackfunction for every fV such that g ≡ 1 on E.
Proof. Suppose that E is maximal, and let fV besuch that g ≡ 1 on E. Given any edge (i , j ) ∈ EC, it
BIO 2594 1–18
k k
holds that g(ik, jk) = −1. Otherwise one could extend 589
E by adding (ik, jk), thus violating maximality. That 590
is, fV (ik)fV (jk)fE(ik, jk) = −1. By monotonicity, it 591
holds that qkfV (jk)∂gjk/∂uk ≥ 0, and since ∂gjk
/∂uk = 592
omplexity results for decompositions of biological networksystems.2006.08.001
T
oSystem
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
NC
OR
RE
C
BIO 2594 1–18
8 B. DasGupta et al. / Bi
∂Fjk/∂xik , it follows necessarily that
qkfV (jk)fE(ik, jk) = 1.
Therefore it must hold that qk = −fV (ik) for each k,which implies that h is a negative feedback function.
Conversely, if fV is such that g ≡ 1 on E and h is anegative feedback function, then qk = −fV (ik). By thesame argument as above, qkfV (jk)fE(ik, jk) = 1 for allk by monotonicity. Therefore g ≡ −1 on EC. Repeatingthis for all admissible fV , maximality follows. �There is a second, slightly more sophisticated way ofwriting a system (1) as the feedback loop of a system (3)using an arbitrary set of edges E. Given any such E,define S(Ec) = {i|there is some jsuch that (i, j) ∈ Ec}.Now enumerate S(Ec) as {i1, . . . , im}, and for each klabel the set {j|(ik, j) ∈ Ec} as jk1, jk2, . . .. Then foreach k, l, one can replace each appearance of xik inFjkl
by uk, to form the function g(x, u). Then one letsh(x) = (xi1 , . . . , xim ) as above. The closed loop of thissystem (3) is also (1) as before but with the advantage thatthere are |S(Ec)| inputs, and of course |S(Ec)| ≤ |Ec|.
If E is a consistent and maximal set, then one canmake (3) into a monotone system as follows. By let-ting fV be such that g ≡ 1 on E, we define the order≤fV on Rn. For every ik, jkl such that (ik, jkl) ∈ EC,it must hold that fV (ik)fV (jkl)fE(ik, jkl) = −1. Other-wise E ∪ {(ik, jkl)} would be consistent, thus violatingmaximality. By choosing qk = −fV (ik), Eq. (4) is there-fore satisfied. See the proof of Proposition 2. Conversely,if the system generated by E using this second algorithmis monotone with respect to orthant orders, and if h is anegative function, then it is easy to verify that E must beboth consistent and maximal.
Thus the problem of finding E consistent and suchthat P(E−) = ‖S(E−)‖ = ‖S(EC)‖ is smallest, whenrestricted to those sets that are maximal and consistent(this does not change the minimum ‖S(EC)‖), is equiv-alent to the following problem: decompose (1) into thenegative feedback loop of an orthant monotone controlsystem, using the second algorithm above, and using asfew inputs as possible. This produces the following prob-lem formulation.
Problem 2 (Directed labeling problem (DLP)). Aninstance of this problem is (G, h) where G = (V, E) isa directed graph and h : E → {0, 1}. A valid solutionis a vertex labeling function f : V → {0, 1}. Define anedge (u, v) ∈ E to be consistent iff h(u, v) ≡ (f (u) +
Uf (v)) (mod 2). The objective is then to find a validsolution minimizing |g(E − F )| where g(C) = {u ∈ V |∃y ∈ V, (u, y) ∈ C} for any C ⊆ E and F is the set ofconsistent edges.Please cite this article as: Bhaskar DasGupta et al., Algorithmic andinto monotone subsystems, BioSystems (2006), doi:10.1016/j.bios
ED
PR
OO
F
s xxx (2006) xxx–xxx
3.1. Summary of key concepts and results inapproximation algorithms
For anyγ ≥ 1 (resp.γ ≤ 1), aγ-approximate solution(or simply an γ-approximation) of a minimization (resp.,maximization) problem is a solution with an objectivevalue no larger than γ times (resp., no smaller thatγ times) the value of the optimum, and an algorithmachieving such a solution is said to have an approxima-tion ratio of γ .
In Papadimitriou and Yannakakis (1991) Papadim-itriou and Yannakakis defined the class of MAX-SNPoptimization problems and a special approximation-preserving reduction, the so-called L-reduction, that canbe used to show MAX-SNP-hardness of an optimizationproblem. The version of the L-reduction that we providebelow is a slightly modified but equivalent version thatappeared in Berman and Schnitger (1992).
Definition 1. Berman and Schnitger (1992),Papadimitriou and Yannakakis (1991) Given two opti-mization problems Π and Π ′, we say that Π L-reduces toΠ ′ if there are three polynomial-time procedures T1,T2,T3 and two constants a and b > 0 such that the followingtwo conditions are satisfied: (1) For any instance I of Π,algorithm T1 produces an instance I ′ = f (I) of Π ′ gen-erated from T1 such that the optima of I and I ′, OPT(I)and OPT(I ′), denoted by respectively, satisfy OPT(I ′) ≤a · OPT(I). (2) For any solution of I ′ with cost c′, algo-rithm T2 produces another solution with a cost c′′ noworse than c′, and algorithm T3 produces a solution ofI of Π with cost c (possibly from the solution producedby T2) satisfying |c − OPT(I)| ≤ b · ∣∣c′′ − OPT(I ′)
∣∣.An optimization problem is MAX-SNP-hard if any prob-lem in MAX-SNP L-reduces to that problem. The impor-tance of proving MAX-SNP-hardness results comesfrom a result proved by Arora et al. Arora et al. (1998)which shows that, assuming P �=NP, for every MAX-SNP-hard minimization (resp., maximization) problemthere exists a constant ε > 0 such that no polynomialtime algorithm can achieve an approximation ratio bet-ter than 1 + ε (resp., better than 1 − ε).
A special case of the ULP problem, namely whenh(e) = 1 for all e ∈ E, is the well-known MAX-CUTproblem. An instance of this problem is an undirectedgraph G = (V, E). A valid solution is a set S ⊆ V . Theobjective is to find a valid solution that maximizes thenumber of edges {u, v} ∈ E such that |{u, v} ∩ S| = 1.
BIO 2594 1–18
The MAX-CUT problem is known to be MAX-SNP- 689
hard. For further details on these topics, the reader is 690
referred to the excellent book by Vazirani (Vazirani, 691
2001). 692
complexity results for decompositions of biological networksystems.2006.08.001
CT
B
oSystem
693
f694
a695
S696
d697
w698
n699
G700
r701
4702
703
T704
(705
706
707
708
(709
710
711
712
713
(714
715
716
717
O718
a719
n720
2721
G722
R723
U724
c725
t726
t727
e728
i729
s730
a731
s732
1733
i734
s735
e736
i737
t738
t739
i740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
NC
OR
RE
IO 2594 1–18
B. DasGupta et al. / Bi
Some terminology The following notation will be usedor the remainder of the paper. Given a set S of vertices indirected graph G, define Eout(S) = {(u, v) ∈ E(G)|u ∈} as the set of out-bound edges of vertices in S. OPTP (I)enotes the size of an optimal solution for a problem Pith instance I. Recall that the length of a circuit c isormally defined as the number of edges in the circuit.iven a weight function w : E �→ R, the length of c with
espect to w is defined as∑
e∈c w(e).
. Theoretical results
Our theoretical results are summarized as follows.
heorem 1.
a) For some constant ε > 0, it is not possible to approx-imate in polynomial time the ULP and the DLPproblems to within an approximation ratio of 1 − ε
and 1 + ε, respectively, unless P = NP .b) For ULP, we provide a polynomial time α-
approximation algorithm where α ≈ 0.87856 is theapproximation factor for the MAX-CUT problemobtained in Goemans and Williamson (1995) viasemidefinite programming.
c) For DLP, if dmaxin denotes the maximum in-degree of
any vertex in the graph, then we give a polynomial-time approximation algorithm with an approxima-tion ratio of at most dmax
in · O(log |V |).ur computational results are illustrated in Section 6 by
n implementation of the algorithms applied to a 13-ode Drosophila segmentation network, as well as to a00+ node recently published network of the Epidermalrowth Factor Receptor pathway.
emark 1. It should be noted that the complexity ofLP becomes tractable if the network is biased signifi-
antly towards excitatory connections. Obviously, if allhe edges of the given graph G = (V, E) are labeled 0,hen it is possible to label the vertices such that all thedges are consistent. Moreover, given any graph G, its easy to check in O((|V | + |E|)3) time if an optimalolution contains all the edges as consistent by solvingset of linear equations via Gaussian elimination. Now,
uppose that at most L of the edges of G are labeled. Then, obviously at most L inconsistent edges existn any optimal solution. Thus a straightforward way toolve the problem is to consider all possible subsets ofdges in which at most L edges are dropped and check-
Ung, for each such subset, if there is an optimal solutionhat contains all the edges as consistent. The total timeaken is O(|V |2L. · (|V | + |E|)3), which is a polynomialn |V | + |E| if L is a constant.
Please cite this article as: Bhaskar DasGupta et al., Algorithmic and cinto monotone subsystems, BioSystems (2006), doi:10.1016/j.bios
ED
PR
OO
F
s xxx (2006) xxx–xxx 9
5. Proof of Theorem 1
This section provides the proof of Theorem 1, brokenup into a series of technical parts.
5.1. Proof of Theorem 1(a)
Based on the discussion in Section 3.1, it sufficesto show that both these problems are MAX-SNP-hard.ULP is MAX-SNP-hard since its special case, the MAX-CUT problem, is MAX-SNP-hard. To prove MAX-SNP-hardness of DLP, we need the definitions of the followingtwo problems.
Problem 3 (Node deletion problem with bipartite prop-erty (NDBP)). An instance of this problem is an undi-rected graph G = (V, E). A valid solution is a vertexset S ⊆ V , such that G(V − S) is a bipartite graph. Theobjective is to find a valid solution minimizing |S|.Problem 4 (Variance of node deletion problem(VNDP)). An instance of this problem is (G, h) whereG = (V, E) is a directed graph and h : E → {0, 1}. Avalid solutions is a vertex set S ⊆ V with the followingproperty: if GS = (VS, ES) is the graph with VS = V
and ES = E − Eout(S), then GS is free of odd lengthcircuit with respect to weight function h. The objectiveis to find a valid solution minimizing |S|.First, we note that DLP is equivalent to VNDP. If oneidentifies the solution set S in UNDP with the solutionset g(E − F ) in DLP, then the set of consistent edges Fin DLP corresponds to the ES in UNDP since every edge(u, v) ∈ F satisfying h(u, v) ≡ (f (u) + f (v)) (mod 2) isequivalent to stating that GS is free of odd length circuitwith respect to weight function h.
Thus, to prove the MAX-SNP-hardness of DLP itsuffices to prove that of VNDP. NDBP is known to beMAX-SNP-hard (Lund and Yannakakis, 1993). We pro-vide a L-reduction from NDBP to VNDP. For an instanceof VNDP with graph G = (V, E), construct an instanceof DLP with instance (G′, h) as follows (note that G′ isa digraph):
V ′ = V (G′) = V ∪ {Au,v, Bu,v|{u, v} ∈ E},E′ = E(G′)
= {(u, Au,v), (Au,v, Bu,v), (v, Bu,v)|{u, v} ∈ E},and h(e) = 1 for all e ∈ E′ Now, the following
BIO 2594 1–18
holds: 782
(1) If S is a solution to NDBP, it is also a solution 783
to the generated instance of UNDP. The reason 784
omplexity results for decompositions of biological networksystems.2006.08.001
T
oSystems xxx (2006) xxx–xxx
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
Select a uniformly random vector r in the|V |-dimensional unit sphere and set{
0 if r · xv ≥ 0
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
CO
RR
EC
BIO 2594 1–18
10 B. DasGupta et al. / Bi
is as follows. Notice that every odd length (resp.,even length) circuit C in G corresponds to an oddlength (resp., even length) circuit C′ in G′ withrespect to the weight function h. Since G(V − S)is a bipartite graph, it is free of odd length circuits.So for each odd length cycle C of G, there existsu ∈ S such that the deletion of all out-bound edgesof u in G′ breaks its corresponding odd length cycleC′.
(2) If S′ is a solution to UNDP, then we can constructa solution S of NDBP in the following manner: foreach x ∈ S′:
if x = Au,v, add u to T ; if x = Bu,v, add v to T ;
if x = u or x = v, add xto T.
It is now easy to see that since the graph GS′ is free ofodd length circuit with respect to h, G(V − S) has noodd length circuit either.
Hence, we have OPTUNDP(G′, h) ≤ OPTNDBP(G).Moreover, given a solution S′ of UNDP, we are ableto generate a solution S of NDBP such that
||S| − OPTNDBP(G)| ≤ ||S′| − OPTUNDP(G′, h)|.Thus, our reduction satisfies Definition 1 of a L-reduction with a = b = 1.
5.2. Proof of Theorem 1(b)
Our algorithm for ULP uses the semidefinite pro-gramming (SDP) technique used by Goemans andWilliamson in Goemans and Williamson (1995); hencewe use notations and terminologies similar to that usedin the paper (readers not very familiar with this tech-nique are also referred to the excellent explanation ofthis technique in the book by Vazirani Vazirani (2001)).For each vertex v ∈ V , we have a real vector xv ∈ R|V |with ||xv||2 = 1. Then, we can generate from ULP thefollowing vector program (where · denotes the vectorinner product):
Solve the following vector program via SDPmethods:
UNmaximize
1
2
∑h(u,v)=1
(1−xu · xv)+1
2
∑h(u,v)=0
(1+xu · xv)
subject to : for each v ∈ V : xv · xv = 1for each v ∈ V
: xv ∈ R|V |.
850
Please cite this article as: Bhaskar DasGupta et al., Algorithmic andinto monotone subsystems, BioSystems (2006), doi:10.1016/j.bios
ED
PR
OO
F
f (v) =1 otherwise
This proof of the claimed approximation performanceof the above vector program is obtained by adapting theproof in Section 26.5 of Vazirani (2001) for the MAX-2SAT problem to deal with fact that, in our problem,aij = bij = 1/2 as opposed to a different set of values inVazirani (2001). Since there are some subtleties in adapt-ing that proof for readers unfamiliar with this approach,we provide a sketch of the proof in Appendix A. The pro-cedure can be derandomized via methods of conditionalprobabilities (e.g., see Mahajan and Ramesh (1995)).
5.3. Proof of Theorem 1(c)
For an instance of (G, h) of DLP, construct instance(G′ = (V ′, E′), h′) as follows:
V ′ = V ∪ {Cu,v|(u, v) ∈ E & h(u, v) = 0},E′ = {e|e ∈ E & h(e) = 1} ∪ {(u, Cu,v),
× (Cu,v, v)|(u, v) ∈ E & h(u, v) = 0},and
h′(e) = 1for alle ∈ E′.
Note that every odd (resp., even) length circuit in G withrespect to weight function h corresponds to an odd (resp.,even) length circuit in G′ with respect to weight functionh′, and vice versa. Let F is a set of consistent edges in(G, h) with a vertex labeling function f. Now, observethe following:
(1) F ′ is a set of consistent edges in (G′, h′) with avertex labeling function f ′ with f ′(x) = f (x) forx ∈ V ′ ∩ V and f ′(Cu,v) = f (u) = f (v) for an edge(u, v) ∈ F with h(u, v) = 0; thus, an edge (u, v) inF correspond to an edge (u, v) in F ′ if h(u, v) = 1and correspond to a pair of edges (u, Cu,v), (Cu,v, v)in F ′ if h(u, v) = 0.
(2) If (u, v) ∈ E − F is an inconsistent edge in (G, h),′
BIO 2594 1–18
then the edge (Cu,v, v) in G can always be made 851
consistent by choosing f ′(Cu,v) = f (v). 852
Thus, if F ′′ is the set of consistent edges obtained from F 853
following rules (1) and (2) above, then |g(E′ − F ′′)| = 854
complexity results for decompositions of biological networksystems.2006.08.001
UN
CO
RR
EC
T
BIO 2594 1–18
B. DasGupta et al. / BioSystem
|g(E − F )| and thus OPTDLP(G′, h′) = OPTDLP(G, h).855
Consider the NDBP problem on G′. Any solution to DLP856
on (G′, h′) with vertex labeling function f ′ and set of857
consistent edges F ′ cannot contain an odd cycle of con-858
sistent edges and thus provides a solution to NDBP on859
G′ of size |g(E′ − F ′)|. Thus,860
OPTNDBP(G′) ≤ OPTDLP(G′, h′) = OPTDLP(G, h).861
OPTNDBP(G′) can be approximated in polynomial time862
to within an approximation ratio of O(log |V ′|) (Lund863
and Yannakakis, 1993), i.e., we can find a solution864
SNDBP(G′) in polynomial time such that865
|SNDBP(G′)| ≤ O(log |V ′|) · OPTNDBP(G′)866
≤ O(log |V |) · OPTDLP(G, h).867
Now,868
SDLP(G, h) = SNDBP(G′)869
× ∪ {u | ∃v ∈ SNDBP(G′), (u, v) ∈ E},870
is obviously a solution to DLP on (G, h). Recall that871
dmaxin denotes the maximum in-degree of any vertex in872
G. Thus,873
|SDLP(G, h)| ≤ dmaxin · |SNDBP(G′)|874
≤ dmaxin · O(log |V |) · OPTDLP(G, h).875
876
6. Examples of applications of the ULP877
algorithm878
We have implemented the SDP-based algorithm for879
calculating approximate solutions of the undirected880
labeling problem using Matlab, and we illustrate this881
Fig. 5. The network associated to the Drosophila segment polarity, as proposethree edges that have been crossed have been chosen in order to let the remain
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
Please cite this article as: Bhaskar DasGupta et al., Algorithmic and cinto monotone subsystems, BioSystems (2006), doi:10.1016/j.bios
ED
PR
OO
F
s xxx (2006) xxx–xxx 11
algorithm with two applications to biological systems.The first application concerns the relatively small-scale13-variable digraph of a model of the Drosophila seg-ment polarity network. A second application involves adigraph with 300+ variables associated to the humanEpidermal Growth Factor Receptor (EGFR) signalingnetwork. This model was published recently and builtusing information from 242 published papers. Finally,we provide an example involving a yeast gene regula-tory network.
6.1. Drosophila segment polarity
An important part of the development of the earlyDrosophila (fruit fly) embryo is the differentiation ofcells into several stripes (or segments), each of whicheventually gives rise to an identifiable part of the bodysuch as the head, the wings, the abdomen, etc. Each seg-ment then differentiates into a posterior and an anteriorpart, in which case the segment is said to be polarized.(This differentiation process continues up to the pointwhen all identifiable tissues of the fruit fly have devel-oped.) Differentiation at this level starts with differingconcentrations of certain key proteins in the cells; theseproteins form striped patterns by reacting with each otherand by diffusion through the cell membranes.
A model for the network that is responsible for seg-ment polarity (von Dassow et al., 2000) is illustratedin Fig. 5. As explained above, this model is best stud-ied when multiple cells are present interacting with each
BIO 2594 1–18
d in von Dassow et al. (2000), Courtesy of N. Ingolia and PLoS. Theing edges form an orthant monotone system.
other. But it is interesting at the one-cell level in its own 910
right—and difficult enough to study that analytic tools 911
seem mostly unavailable. The arrows with a blunt end 912
are interpreted as having a negative sign in our notation.
omplexity results for decompositions of biological networksystems.2006.08.001
TED
PR
OO
F
oSystems xxx (2006) xxx–xxx
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
ics of the system. This is especially the case given that 961
the open loop digraph has almost no closed oriented 962
paths (except for WG ↔ wg), which is evidence that 963
the dynamics of the control system under constant inputs 964
may be especially simple, e.g. such that all solutions con- 965
verge towards a unique equilibrium. 966
6.1.1. Multiple copies 967
It was mentioned above that the purpose of this 968
network is to create striped patterns of protein con- 969
centrations along multiple cells. In this sense, it is 970
most meaningful to consider a coupled collection 971
of networks as it is given originally in Figs. 6 and 5. 972
Consider a row of k cells, each of which has independent 973
concentration variables for each of the compounds, and 974
let the cell-to-cell interactions be as in Fig. 5 with cyclic 975
boundary conditions (that is, the kth cell is coupled 976
with the first in the natural way). We show that the 977
results can be extended in a very similar manner as 978
before. 979
Given a partition fV of the one-cell network consid- 980
ered above, let fV be the partition of the k-cell network 981
defined by fV (eni) := fV (en) for every i, etc. Thus fV 982
consists of k copies of the partition fV in a natural way. 983
Lemma 6. Let fV be a partition of the nodes of the 1- 984
cell network with n consistent edges. Then with respect 985
Fig. 6. A diagram of the Drosophila embryo during early development.Each hexagon represents a cell containing a copy of the network in Fig.
NCO
RR
EC
BIO 2594 1–18
12 B. DasGupta et al. / Bi
Furthermore, the concentrations of the membrane-boundand inter-cell traveling compounds PTC, PH, HH andWG (membrane) on all cells have been identified inthe one-cell model (so that, say, HH→ PH is now inthe digraph). Finally, PTC acts on the reaction CI→CN itself by promoting it without being itself affected,which in our notation means PTC→+ CN and PTC→−CI.
The implementation. The Matlab implementation ofthe algorithm on this digraph with 13 nodes and 20 edgesproduced several partitions with as many as 17 consistentedges. One of these possible partitions simply consistsof placing the three nodes ci, CI and CN in one set andall other nodes in the other set, whereby the only incon-sistent edges are CL→+ wg, CL→+ ptc, and PTC→+CN. But note that it is desirable for the resulting openloop system to have as simple remaining loops as possi-ble after eliminating all inconsistent edges. In this case,the remaining directed loops
EN−→ ci
+→ CI+→ CN
−→ en+→ EN
EN−→ ci
+→ CI+→ CN
−→ wg+→
WG+→ WG (membrane)
+→ en+→ EN
can still cause difficulties.A second partition which generated 17 consistent
edges is that in which EN, hh, CN, and the membranecompounds PTC, PH, HH are on one set, and the remain-ing compounds on the other. The edges cut are ptc→+PTC, CI→+ CN and en→+ EN, each of which elim-inates one or several positive loops. By writing theremaining consistent digraph in the form of a cascade, itis easy to see that the only loop whatsoever remaining iswg ↔ WG; this makes the analysis proposed in Encisoand Sontag (2006) easier.
In this relatively low dimensional case we can provethat in fact OPT = 17, as the results below will show.
Lemma 5. Any partition of the nodes in the digraph inFig. 5 generates at most 17 consistent edges.
Proof. From Lemma 3, a simple way to prove this state-ment is by showing that there are three disjoint cycleswith odd weighted length in the network associated toFig. 5 (disjoint in the sense that no edge is part of morethan one of the cycles). Such three disjoint cycles existin this case, and they are CI-CN-wg, CI-ptc-PTC, CN-en-EN-hh-HH-PH-PTC. �
U
BIO 2594 1–18
It is surprising that a realistic biological system with asmany as 13 variables and 20 edges can be transformedinto a monotone system after the deletion of only 3 nodes.It is conceivable that this restricts the possible dynam-
6, and neighboring cells interact to form a collective behavior. In thisexample, an initial striped pattern of the genes en and wg induces theproduction of the gene hh, but only in those cells that are producing en.This will further strengthen the pattern of stripes and help differentiatethe various tissues. Courtesy of N. Ingolia and PLoS (Ingolia, 2004).
Please cite this article as: Bhaskar DasGupta et al., Algorithmic and complexity results for decompositions of biological networksinto monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001
UN
CO
RR
EC
T
BIO 2594 1–18
B. DasGupta et al. / BioSystem
to the partition fV , there are exactly kn consistent edges986
for the k-cell coupled model.987
Proof. Consider the network consisting of k isolated988
copies of the network, that is, k groups of nodes each of989
which is connected exactly as in the one-cell case. Under990
the partition fV , this network has exactly kn consistent991
edges. To arrive to the coupled network, it is sufficient to992
replace all edges of the form (HHi, PHi) by (HHi+1, PHi)993
and (WGi, eni) by (WGi+1, eni), i = 1, . . . , k (where we994
identify k + 1 with 1). Since by definition fV (HHi+1) =995
fV (HHi) and fV (WGi+1) = fV (WGi), the consistency996
of these edges does not change, and the number of con-997
sistent edges therefore remains constant. �998
In particular, OPT≥ 17k for the coupled system. The999
following result will establish an upper bound for OPT.1000
Lemma 7. Any partition of the nodes in the digraph in1001
the k-cell coupled network generates at most 17k con-1002
sistent edges.1003
Proof. Consider the signed graph in Fig. 7, which is a1004
sub-digraph of the network associated to Fig. 5. Since1005
the inter-cell edges (WGmem,en) and (HH,PH) are not1006
in this graph, it follows that there are k identical copies1007
of it in the k-cell model. If it is shown that at least three1008
edges need to be cut in each of these k sub-digraphs, the1009
result follows immediately.1010
Consider the negative cycle ci-CI-wg-CN-en-EN,1011
which must contain at least one inconsistent edge for1012
Fig. 7. A sub-digraph of the network in Fig. 5, using the notationdefined in the previous sections. Note that this sub-digraph does notinclude any of the two edges (WGmem,en) and (HH,PH), which con-nect the networks of different cells in Fig. 5; this will be important inthe proof of Lemma 7.
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
Please cite this article as: Bhaskar DasGupta et al., Algorithmic and cinto monotone subsystems, BioSystems (2006), doi:10.1016/j.bios
ED
PR
OO
F
s xxx (2006) xxx–xxx 13
any given partition. The remaining edges of the subgraphform a tetrahedron with four negative parity triangles,which cannot all be cut by eliminating any single edge.If follows that no two edges can eliminate all negativeparity cycles in this signed graph, and that therefore20k − 3k = 17k is an upper bound for the number ofconsistent edges in the k-cell network.
Corollary 1. For the k-cell linearly coupled networkdescribed in Fig. 5, it holds OPT = 17k.
Proof. Follows from the previous two results. �
6.2. EGFR signaling
The protein called epidermal growth factor is fre-quently stored in epithelial tissues such as skin, and it isreleased when rapid cell division is needed (for instance,it is mechanically triggered after an injury). Its functionis to bind to a receptor on the membrane of the cells, aptlycalled the epidermal growth factor receptor. The EGFR,on the inner side of the membrane, has the appearance ofa scaffold with dozens of docks to bind with numerousagents, and it starts a reaction of vast proportions at thecell level that ultimately induces cell division.
In their May 2005 paper (Oda et al., 2005), Odaet al. integrate the information that has become avail-able about this process from multiple sources, and theydefine a network with 330 known molecules under211 chemical reactions. The network itself is availablefrom supplementary material in SBML format (SystemsBiology Markup Language, http://www.sbml.org), andwill most likely be subject to continuous updates. Theimplementation. Each reaction in the network classifiesthe molecules as reactants, products, and/or modifiers(enzymes). This information was imported into Matlabusing the Systems Biology Toolbox. The digraph G thatis used for this analysis has many more edges than thedigraph considered in the digraph displayed in Oda et al.(2005). The reason for this is as follows: if molecules Aand B are both reactants in the same reaction, then thepresence of A will have an indirect inhibiting effect on theconcentration of B, since it will accelerate the reactionwhich consumes B (assuming B is not also a product).Therefore a negative edge must also appear from A to B,and vice versa. Similarly, modifiers have an inhibitingeffect on reactants.
We thus define G by letting sign(i, j) = 1 if thereexists a reaction in which j is a product and i is either
BIO 2594 1–18
a reactant or a modifier. We let sign(i, j) = −1 if there 1058
exists a reaction in which j is a reactant, and i is also 1059
either a reactant or a modifier. Similarly sign(i, j) = 0 1060
if the nodes i, j are not simultaneously involved in any 1061
omplexity results for decompositions of biological networksystems.2006.08.001
T
oSystem
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
NC
OR
RE
C
BIO 2594 1–18
14 B. DasGupta et al. / Bi
given reaction, and sign(i, j) is undefined (NaN) if thefirst two conditions above are both satisfied.
In a few of the reactions of this network there is amodifier or a reactant involved which has an inhibitoryeffect in the reaction. The effect of this compound onthe remaining participants of the reaction is the oppositefrom that described above. Determining which com-pounds were inhibitors in the reaction was difficult giventhe nature of this dataset. Therefore the digraph was cor-rected by hand in this implementation by looking at theannotations given for each reaction.
An undefined edge can be thought of as an edge that isboth positive and negative, and it can be dealt with, givenan arbitrary partition, by deleting exactly one of the twosigned edges so that the remaining edge is consistent.Thus, in practice, one can consider undefined edges asedges with sign 0, and simply add the number of unde-fined edges to the number of inconsistent edges in theend of each procedure, in order to form the total numberof inputs. This is the approach followed here; there areexactly seven such entries in the digraph G.
The results. After running the algorithm several hun-dred times for this problem, and choosing that partitionwhich produced the highest number of consistent edges,the induced consistent set contained 636 out of 855 edges(ignoring the edges on the diagonal and the 7 undefinededges). See supplementary material for the relevant Mat-lab functions that carry out this algorithm. A procedureanalogous to that carried out for system (5) allows todecompose the system as the feedback loop of a con-trolled monotone system using 855 − 636 = 219 inputs.Since the induced consistent set is maximal by definition,Proposition 2 guarantees that the function h is a negativefeedback.
Contrary to the previous application, many of thereactions involve several reactants and products in a sin-gle reaction. This induces a denser amount of negativeand positive edges: even though there are 211 reactions,there are 855 (directed) edges in the 330 × 330 graph G.It is very likely that this substantially decreases OPT forthis system.
The approximation ratio of the SDP algorithm is guar-anteed to be at least 0.87 for some r, which gives theestimate OPT≤≈ 636/0.87 ≈ 731 (valid to the extentthat r has sampled the right areas of the 330-dimensionalsphere, but reasonably accurate in practice).
One procedure that can be carried out to lower thenumber of inputs is a hybrid algorithm involving out-
Uhubs, that is, nodes with an abnormally high out-degree.Recall from the description of the DLP algorithm that allthe out-edges of a node xi can be potentially cut at theexpense of only one input u, by replacing all the appear-Please cite this article as: Bhaskar DasGupta et al., Algorithmic andinto monotone subsystems, BioSystems (2006), doi:10.1016/j.bios
ED
PR
OO
F
s xxx (2006) xxx–xxx
ances of xi in fj(x), j �= i, by u. We considered the knodes with the highest out-degrees, and eliminated allthe out-edges associated to these hubs from the reactiondigraph to form the graph G1. Then we run the ULPalgorithm on G1 to find a partition fV of the nodes anda set of m edges that can be cut to eliminate all remain-ing negative closed chains. Finally, we put back on thedigraph those edges that were taken in the first step, andwhich are consistent with respect to the partition fV . Theresult is a decomposition of the system as the negativefeedback loop of a controlled monotone system, usingat most k + m edges.
An implementation of this algorithm with k = 60yielded a total maximum number of inputs k + m = 136.This is a significant improvement over the 226 inputsin the original algorithm. Clearly, it would be worth-while to investigate further the problem of designingefficient algorithms for the DLP problem to generateimproved hybrid algorithmic approaches. The approx-imation ratios in Theorem 1(c) are not very satisfactorysince dmax
in and log |V | could be large factors; hencefuture research work may be carried out in designingbetter approximation algorithms.
We conclude with another, more tentative way to dras-tically reduce the number of inputs necessary to writethis system as the negative closed loop of a controlledmonotone system. The idea is to make suitable changesof variables in the original system using the mass conser-vation laws. Such changes of variables are discussed inmany places, for example in Volpert et al. (2000), Angeliand Sontag (2003). In terms of the associated digraph,the result of the change of variables is often the elimina-tion of one of the closed chains. The simplest target fora suitable change of variables is a set of three nodes thatform part of the same chemical reaction, for instance tworeactants and one product, or one reactant, one productand one modifier. It is easy to see that such nodes areconnected in the associated digraph by an odd lengthtriangle of three edges.
In order to estimate the number of inputs that canpotentially be eliminated by suitable changes of vari-ables, we counted pairwise disjoint, odd length trianglesin the digraph of the EGFR network. Using a greedy algo-rithm to find and tag disjoint negative feedback triangles,we found a maximal number of them in the subgraphassociated to each of the 211 chemical reactions. Specialcare was taken so that any two triangles from differentreactions were themselves disjoint. After carrying out
BIO 2594 1–18
this procedure we found 196 such triangles in the EGFR 1162
network. This is a surprisingly high number, considering 1163
that each of these triangles must have been opened in the 1164
ULP algorithm implementation above and that therefore 1165
complexity results for decompositions of biological networksystems.2006.08.001
CT
B
oSystem
e1166
t1167
i1168
m1169
v1170
s1171
61172
1173
S1174
M1175
n1176
n1177
t1178
i1179
e1180
1181
a1182
f1183
o1184
1185
g1186
o1187
c1188
e1189
t1190
1191
m1192
r1193
s1194
a1195
a1196
i1197
d1198
11199
g1200
1201
l1202
t1203
t1204
e1205
t1206
h1207
a1208
n1209
i1210
d1211
o1212
d1213
11214
11215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
• 1226
1227
• 1228
1229
• 1230
1231
1232
1233
1234
1235
1236
1237
1238
this provides a solution for the vector program with an 1239
objective value of precisely |FOPT|. Thus, it suffices if 1240
we prove our claim on the approximation ratio relative 1241
to SDPOPT. 1242
Next, note that the vector program can indeed be 1243
solved by a SDP approach. Let Y ∈ R|V |×|V | be an 1244
NC
OR
RE
IO 2594 1–18
B. DasGupta et al. / Bi
ach triangle must contain 1 of the 226 edges cut. Tohe extent to which most of these triangles can be elim-nated by suitable changes of variables, this can yield a
uch lower number of edges to cut, and it could pro-ide a way to thus stress the underlying structure of theystem.
.3. A yeast regulatory network
As a final example, we run our algorithm on the yeastaccharomyces cerevisiae gene regulatory network fromilo et al. (2002), downloaded from Anon (2006). This
etwork has 690 nodes and 1082 edges, of which 221 areegative and 861 are positive (we labeled the one “neu-ral” edge as positive; the conclusions will not changef we labeled it negative instead, or we deleted this onedge).
Our algorithm (with 200 randomizations) providesn answer of 43 inconsistent edges, for the best partitionound. In other words, it shows that deleting a mere 4%f edges makes the network consistent.
Also interesting is the following fact. The originalraph has 11 components: a large one of size 664, onef size 5, three of size 3, and six of size 2. All of theseomponents remain connected after edge deletion. Thedges deleted all belong to the largest component, andhey are incident on a total of 65 nodes in this component.
To better appreciate if this small number of deletionsight arise by chance, we also run our algorithm on
andom graphs having 690 nodes and 1082 edges (cho-en uniformly), of which 221 edges (chosen uniformly)re negative. We found that, for such random graphs,bout 12.6% (136.6 ± 5) of edges have to be removedn order to achieve consistency. Thus, the number ofeletions needed in the biological network is roughly5 standard deviations away from the mean for randomraphs.
It would appear that both the topology (i.e., the under-ying graph) and the actual sign assignments contributeo this near-consistency of the yeast network. To jus-ify this remark, we performed the following numericalxperiment. We randomly changed the signs of 50 posi-ive and 50 negative edges, thus obtaining a network thatas the same number of positive and negative edges,nd the same underlying graph, as the original yeastetwork, but with 100 edges, picked randomly, hav-ng different signs. Now, one needs 8.2% (88.3 ± 7.1)eletions, an amount in-between that obtained for the
Uriginal yeast network and the one obtained for ran-om graphs. Changing more signs, 100 positives and00 negatives, leads to a less consistent network, with15.4 ± 4.0 required deletions, or about 10.7% of thePlease cite this article as: Bhaskar DasGupta et al., Algorithmic and cinto monotone subsystems, BioSystems (2006), doi:10.1016/j.bios
ED
PR
OO
F
s xxx (2006) xxx–xxx 15
original edges, although still not as many as for a randomnetwork.
Appendix A. More details on SDP algorithm
In this appendix, we provide details regarding theproof of the SDP algorithm for Theorem 1(b) describedin Section 5.2. The proof method is similar to that usedin better-known problems. For simplicity, we do notdescribe the derandomization methods and provide aproof for the expected approximation ratio only. Definethe following notations for convenience:
The vertex set V of the graph for ULP is simply{1, 2, . . . , |V |};fOPT is an optimal vertex labeling for ULP with FOPTbeing the set of consistent edges;SDPOPT is the maximum value of the objective valueof the vector program
maximize1
2
∑h(u,v)=1
(1 − xu · xv) + 1
2
∑h(u, v)
= 0(1 + xu · xv)
subject to : for eachv ∈ V : xv · xv = 1
for eachv ∈ V : xv ∈ R|V |
It is easy to see that SDPOPT ≥ |FOPT| as follows. Forevery v ∈ V if fOPT(v) = 0 then set
xv = (1, 0, 0, . . . , 0︸ ︷︷ ︸|V |−1|
),
whereas if fOPT(v) = 1 then set
xv = (−1, 0, 0, . . . , 0︸ ︷︷ ︸|V |−1|
);
BIO 2594 1–18
unknown real matrix with yi,j denoting the (i, j)th ele- 1245
ment of Y. It is not difficult to see (via Cholesky decom- 1246
position for real symmetric matrices) that the above vec- 1247
tor program is equivalent to the followingsemidefinite
omplexity results for decompositions of biological networksystems.2006.08.001
T
oSystem
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
OR
RE
C
BIO 2594 1–18
16 B. DasGupta et al. / Bi
programming problem:
maximize1
2
∑h(u,v)=1
(1 − yu,v) + 1
2
∑h(u,v)=0
(1 + yu,v)
subject to : for each v ∈ V : yv,v = 1
Y is a positive semidefinite matrix
Such a problem can be solved in polynomial timewithin an additive error of any constant ε > 0 via ellip-soid, interior-point or convex-programming methods(Alizadeh, 1995; Grotschel et al., 1988; Nesterov andNemirovskii, 1989, 1994; Vaidya, 1989).
Let θu,v denote the angle between the two vectorsxu, xv ∈ R|V | in an optimal solution of the vector pro-gram. Then, using standard trigonometric results,
SDPOPT = 1
2
∑h(u,v)=1
(1 − cos θu,v)
+ 1
2
∑h(u,v)=0
(1 + cos θu,v). (A.1)
Let W be the expected value of the number of consistentedges of ULP after we have performed the randomizedrounding step, namely the step:
select a uniformly random vector r in the |V |-dimensional unit sphere;
setf (v) ={
0 ifr · xv ≥ 0
1 otherwise
Then, via linearity of expectation, it follows that
E[W] =∑
h(u,v)=1
Pr[f (u) �= f (v)]
+∑
h(u,v)=0
Pr[f (u) = f (v)]. (A.2)
Because the vector r was chosen randomly, it is true that
Pr[f (u) �= f (v)] = θu,v
π
and Pr[f (u) = f (v)] = 1 − θu,v
π. (A.3)
UN
CThus,
E[W] =∑
h(u,v)=1
θu,v
π+
∑h(u,v)=0
(1 − θu,v
π
)
≥ ∆ ·⎡⎣1
2
∑h(u,v)=1
(1 − cos θu,v) + 1
2
∑h(u,v)=0
(1 + cos θ
Please cite this article as: Bhaskar DasGupta et al., Algorithmic andinto monotone subsystems, BioSystems (2006), doi:10.1016/j.bios
ED
PR
OO
F
s xxx (2006) xxx–xxx
where
∆ = min
{2
πmin
0≤θ≤π
θ
1 − cos θ, min
0≤θ≤π
2 − 2θπ
1 + cos θ
}can be shown to satisfy ∆ > 0.87856 using elementarycalculus.
A.1. Proof of lemma 3
Proof. Suppose that the system is monotone withrespect to ≤fV , that is,
fV (i)fV (j)fE(i, j) = 1 for all i, j, i �= j.
(by Lemma 2). Let V (G) = A ∪ B, where i ∈ A iffV (i) = 1, and i ∈ B otherwise. Note that by hypoth-esis fE(i, j) = 1 if xi, xj ∈ A or if xi, xj ∈ B. Also,fE(i, j) = −1 if xi ∈ A, xj ∈ B or vice versa. Notingthat every closed chain in G must cross an even numberof times between A and B, it follows that every closedchain has parity 1.
Conversely, let all closed chains in G have parity 1.We define a function fV as follows: consider the par-tition of V (G) induced by letting i∼j if there existsan undirected open chain joining i and j. Pick a rep-resentative ik of every equivalence class, and definefV (ik) = 1, k = 1, . . . , K. Next, given an arbitrary ver-tex i and the representative ik of its connected com-ponent, define fV (i) as the parity (+1 of −1) of anyundirected open chain joining ik with i. To see thatthis function is well defined, note that any two chainsjoining i and j can be put together into a closedchain from ik to itself, which has parity 1 by hypoth-esis. Thus the parity of both open chains must be thesame.
Let now i, j be arbitrary different vertices. If∂Fj/∂xi ≡ 0, then (2) is satisfied for i, j; otherwisethere is an edge joining i with j. By construction ofthe “potential” function fV , it holds that if fV (i) =fV (j) then fE(i, j) = 1, i.e., ∂Fj/∂xi ≥ 0, and so (2)holds as well. If fV (i) �= fV (j), then fE(i, j) = −1, i.e.∂Fj/∂xi ≤ 0. In that case (2) also holds, and the proof iscomplete. �
BIO 2594 1–18
u,v)
⎤⎦ = ∆ · SDPOPT (A.4)
complexity results for decompositions of biological networksystems.2006.08.001
CTE
D P
RO
OF
B
oSystems xxx (2006) xxx–xxx 17
A1312
1313
c1314
j1315
R1316
A1317
1318
1319
A1320
1321
1322
1323
A1324
1325
1326
1327
A1328
1329
1330
A1331
1332
A1333
1334
A1335
1336
1337
1338
A1339
1340
1341
B1342
1343
B1344
1345
1346
1347
B1348
1349
C1350
1351
1352
C1353
1354
C1355
1356
1357
1358
D1359
1360
1361
D1362
1363
1364
D1365
1366
1367
1368
von Dassow, G., Meir, E., Munro, E.M., Odell, G.M., 2000. The seg- 1369
ment polarity network is a robust developmental module. Nature 1370
406, 188–192. 1371
DeAngelis, D.L., Post, W.M., Travis, C.C., 1986. Positive Feedback in 1372
Natural Systems. Springer-Verlag, New York. 1373
Enciso, G.A., Smith, H.L., Sontag, E.D., in press. Non-monotone 1374
systems decomposable into monotone systems with negative feed- 1375
back, J. Diff. Eq. 1376
Enciso, G., Sontag, E., 2004. On the stability of a model of testosterone 1377
dynamics. J. Math. Biol. 49, 627–634. 1378
Enciso, G., Sontag, E.D., 2005. Monotone systems under positive feed- 1379
back: multistability and a reduction theorem. Syst. Contr. Lett. 54, 1380
159–168. 1381
Enciso, G., Sontag, E., 2006. Global attractivity, I/O monotone 1382
small-gain theorems, and biological delay systems. Discr. Contin. 1383
Dynam. Syst. 14, 549–578. 1384
Gedeon, T., Sontag, E.D., 2005. Oscillation in multi-stable monotone 1385
system with slowly varying positive feedback, submitted for pub- 1386
lication (abstract in Sixth SIAM Conference on Control and its 1387
Applications, New Orleans, July). 1388
Goemans, M., Williamson, D., 1995. Improved approximation algo- 1389
rithms for maximum cut and satisfiability problems using semidef- 1390
inite programming. J. ACM 42 (6), 1115–1145. 1391
Grotschel, M., Lovasz, L., Schrijver, A., 1988. Geometric Algorithms 1392
and Combinatorial Optimization. Springer-Verlag, New York, NY. 1393
Hastad, J., Venkatesh, S., 2002. On the advantage over a random assign- 1394
ment. Proceedings of the 34th Annual ACM Symposium on Theory 1395
of Computing, 43–52. 1396
Hirsch, M., 1985. Systems of differential equations that are competitive 1397
or cooperative. II. Convergence almost everywhere. SIAM J. Math. 1398
Anal. 16, 423–439. 1399
Hirsch, M., 1983. Differential equations and convergence almost 1400
everywhere in strongly monotone flows. Contemp. Math. 17, 267– 1401
285. 1402
http://www.weizmann.ac.il/mcb/UriAlon/Papers/networkMotifs/ 1403
yeastData.mat. 1404
Ingolia, N., 2004. Topology and robustness in the drosophila segment 1405
polarity network. Publ. Libr. Sci. 2 (6), 0805–0815. 1406
Istrail, S., 2000. Statistical mechanics, three-dimensionality and np- 1407
completeness. I. Universality of intractability of the partition 1408
functions of the ising model across non-planar lattices. Proceed- 1409
ings of the 32nd ACM Symposium on the Theory of Computing 1410
(STOC00), Portland, Oregon, May 21–23, 2000. ACM Press 87– 1411
96. 1412
Klamt, S., 2006. Generalized concept of minimal cut sets in biochem- 1413
ical networks. Biosystems 83, 233–247. 1414
Lewis, J., Slack, J.M., Wolpert, L., 1977. Thresholds in development. 1415
J. Theor. Biol. 65, 579--590. 1416
Lund, C., Yannakakis, M., 1993. The approximation of maximum sub- 1417
graph problems. Proceedings of the International Colloquium on 1418
Automata, Languages and Programming, Lecture Notes in Com- 1419
puter Science, 700, Springer-Verlag. 1420
Mahajan, S., Ramesh, H., 1995. Derandomizing semidefinite program- 1421
ming based approximation algorithms. Proceedings of the 37th 1422
Annual IEEE symposium on Foundations of Computer Science, 1423
162–169. 1424
Ma’ayan, A., Iyengar, R., Sontag, E.D., in preparation. Sign- 1425
NCOR
RE
IO 2594 1–18
B. DasGupta et al. / Bi
ppendix B. Supplementary data
Supplementary data associated with this articlean be found, in the online version, at 10.1016/.biosystems.2006.08.001.
eferences
lizadeh, F., 1995. Interior point methods in semidefinite program-ming with applications to combinatorial optimization. SIAM J.Optimiz. 5, 13–51.
maldi, E., Kann, V., 1996. On the approximability of some NP-hardminimization problems for linear systems, ECCC Report TR96–015, (available electronically from http://eccc.uni-trier.de/eccc-reports/1996/TR96–015/).
ngeli, D., Ferrell Jr., J.E., Sontag, E.D., 2004. Detection of multi-stability, bifurcations, and hysteresis in a large class of biologicalpositive-feedback systems. Proc. Natl. Acad. Sci. U.S.A. 101,1822–1827.
ngeli, D., De Leenheer, P., Sontag, E.D., 2004. A small-gain theoremfor almost global convergence of monotone systems. Syst. Contr.Lett. 51, 185–202.
ngeli, D., Sontag, E.D., 2003. Monotone control systems. IEEETrans. Autom. Contr. 48, 1684–1698.
ngeli, D., Sontag, E.D., 2004. Multistability in monotone I/O sys-tems. Syst. Contr. Lett. 51, 185–202.
ngeli, D., Sontag, E.D., 2004. An analysis of a circadian model usingthe small-gain approach to monotone systems. Proceedings of theIEEE Conference Decision and Control, Paradise Island, Bahamas,December 2004. IEEE Publications 575–578.
rora, S., Lund, C., Motwani, R., Sudan, M., Szegedy, M., 1998. Proofverification and hardness of approximation problems. J. ACM 45(3), 501–555.
arahona, F., 1982. On the computational complexity of Ising spinglass models. J. Phys. A. Math. Gen. 15, 3241–3253.
erman, P., Karpinski, M., 2001. Efficient amplifiers and boundeddegree optimization, ECCC Report TR01-053, July (available elec-tronically from http://eccc.uni-trier.de/eccc-reports/2001/TR01–053/).
erman, P., Schnitger, G., 1992. On the complexity of approximatingthe independent set problem. Inform. Comput. 96, 77–94.
inquin, O., Demongeot, J., 2002. Positive and negative feedback:striking a balance between necessary antagonists. J. Theor. Biol.216, 229–241.
ipra, B.A., 2000. The Ising Model Is NP-Complete, SIAM News, vol.33, Number 6, July/August.
reignou, N., Khanna, S., Sudan, M., 2001. Complexity classificationsof Boolean constraint satisfaction problems, SIGACT News, 32(4), Whole Number 121, Complexity Theory Column 34, 24–33,November.
e Leenheer, P., Angeli, D., Sontag, E.D., 2005. On predator-prey systems and small-gain theorems. J. Math. Biosci. Eng. 2,25–42.
e Leenheer, P., Malisoff, M., 2006. A small-gain theorem for mono-tone systems with multivalued input-state characteristics. IEEE
U
BIO 2594 1–18
Trans. Automat. Contr. 51, 287–292.e Simone, C., Diehl, M., Junger, M., Mutzel, P., Reinelt, G., Rinaldi,
G., 1995. Exact ground states of Ising spin glasses: new experi-mental results with a branch and cut algorithm. J. Stat. Phys. 80,487–496.
consistency loops detected in biochemical regulatory networks 1426
may provide dynamical stability. 1427
Meinhardt, H., 1978. Space-dependent cell determination under 1428
the control of morphogen gradient. J. Theor. Biol. 74, 307-- 1429
321. 1430
Please cite this article as: Bhaskar DasGupta et al., Algorithmic and complexity results for decompositions of biological networksinto monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001
oSystem
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
BIO 2594 1–18
18 B. DasGupta et al. / Bi
Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Alon, D.U., 2002.Network motifs: simple building blocks of complex networks. Sci-ence 298, 824–827.
Monod, J., Jacob, F., 1961. General conclusions: telenomic mecha-nisms in cellular metabolism, growth, and differentiation. ColdSpring Harbor Symp. Quant. Biol. 26, 389--401.
Murray, J.D., 2002. Mathematical Biology. I. An introduction.Springer, New York.
Nesterov, Y., Nemirovskii, A., 1989. Self-Concordant Functions andPolynomial Time Methods in Convex Programming. Central Eco-nomic and Mathematical Institute, USSR Academy of Science.
Nesterov, Y., Nemirovskii, A., 1994. Interior Point Polynomial Meth-ods in Convex Programming. Society of Industrial and AppliedMathematics, Philadelphia, PA.
Oda, K., Matsuoka, Y., Funahashi, A., Kitano, H., 2005. A comprehen-sive pathway map of epidermal growth factor receptor signaling.Mol. Syst. Biol. doi:10.1038/msb4100014.
Papadimitriou, C.H., Yannakakis, M., 1991. Optimization, approxima-tion, and complexity classes. J. Comput. Syst. Sci. 43 (3), 425–440.
Plathe, E., Mestl, T., Omholt, S.W., 1995. Feedback loops, stability
UN
CO
RR
EC
Tand multistationarity in dynamical systems. J. Biol. Syst. 3, 409--413.
Remy, E., Mosse, B., Chaouiya, C., Thieffry, D., 2003. A descriptionof dynamical graphs associated to elementary regulatory circuits.Bioinformatics 19 (Suppl. 2), ii172--ii178.
1475
Please cite this article as: Bhaskar DasGupta et al., Algorithmic andinto monotone subsystems, BioSystems (2006), doi:10.1016/j.bios
OO
F
s xxx (2006) xxx–xxx
Smith, H.L., 1995. Monotone Dynamical Systems. AMS, Providence,R.I.
Smith, H.L., 1988. Systems of ordinary differential equations whichgenerate an order-preserving flow: a survey of results. SIAM Rev.30, 87–111.
Snoussi, E.H., 1998. Necessary conditions for multistationarity andstable periodicity. J. Biol. Syst. 6, 3–9.
Sontag, E.D., 1990. Mathematical Control Theory: DeterministicFinite Dimensional Systems, 2nd ed. 1998. Springer, New York.
Sontag, E.D., 2004. Some new directions in control theory inspired bysystems biology. Syst. Biol. 1, 9–18.
Sontag, E.D., 2005. Molecular systems biology and control. Eur. J.Contr. 11, 396–435.
Sontag, E.D., in preparation. Consistency of indirect effects in biolog-ical networks.
Thomas, R., 1978. Logical analysis of systems comprising feedbackloops. J. Theor. Biol. 73, 631--656.
Vaidya, P., 1989. A new algorithm for minimizing convex functionsover convex sets. Proceedings of the 30th Annual IEEE Symposiumon Foundations of Computer Science 338–343.
Vazirani, V.V., 2001. Approximation Algorithms. Springer-Verlag,
ED
PR
BIO 2594 1–18
Berlin. 1476
Volpert, A.I., Volpert, V.A., Volpert, V.A., 2000. Traveling Wave Solu- 1477
tions of Parabolic Systems, volume 140 of Translations of Mathe- 1478
matical Monographs, AMS. 1479
complexity results for decompositions of biological networksystems.2006.08.001