Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech
description
Transcript of Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech
![Page 1: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/1.jpg)
Discrete models of biological networks
Segunda Escuela Argentina de Matematica y BiologiaCordoba, Argentina
June 29, 2007
Reinhard LaubenbacherVirginia Bioinformatics Institute
andMathematics Department
Virginia Tech
![Page 2: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/2.jpg)
Topics
1. Boolean networks and cellular automata (including probabilistic and sequential BNs)
2. Polynomial dynamical systems over finite fields
3. Logical models
4. Dynamic Bayesian networks
![Page 3: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/3.jpg)
Boolean networks
Definition. Let f1,…,fn be Boolean functions in variables x1,…,xn. A Boolean network is a time-discrete dynamical system
f = (f1,…,fn) : {0, 1}n → {0, 1}n
The state space of f is the directed graph with the elements of {0,1}n as nodes. There is a directed edge b → c iff f(b) = c.
![Page 4: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/4.jpg)
f1 = : x2
f2 = x4 OR (x1 AND x3)
f3 = x4 AND x2
f4 = x2 OR x3
Boolean networks
![Page 5: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/5.jpg)
The phase plane
Com
pou
nd
y
Compound x
dx /dt = f (x,y)dy /dt = g(x,y)
(xo ,yo)
dx = f (xo ,yo) dt
dy = g(xo ,yo) dt
Courtesy J. Tyson
![Page 6: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/6.jpg)
Cellular automata
Definition. A 1-dimensional (binary) cellular automaton (CA) f is a Boolean network f in which fi only depends on some or all of
xi-1, xi, xi+1 (modulo n).
Example. fi = xi-1 XOR xi+1.
![Page 7: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/7.jpg)
t =1:
t =2:
Initial State:
t =3:
t =4:
t =5:
t =6:
t =7:
t =8:
t =9:
Example
![Page 8: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/8.jpg)
![Page 9: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/9.jpg)
Rule 90 with 5 nodes
f(x1,x2,…,x5) = (x5 XOR x2, x1 XOR x3, … , x4 XOR x1)
![Page 10: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/10.jpg)
![Page 11: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/11.jpg)
Boolean network models in biology
Stuart A. Kauffman
Metabolic stability and epigenesis in randomly constructed genetic nets
J. Theor. Biol. 22 (1969) 437-467.
Boolean networks as models for genetic regulatory networks:
Nodes = genes, functions = gene regulation
Variable states: 1 = ON, 0 = OFF
![Page 12: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/12.jpg)
Polynomial dynamical systems
Note: {0, 1} = k has a field structure (1+1=0).
Fact: Any Boolean function in n variables can be expressed uniquely as a polynomial function in
k[x1,…,xn] / <xi2 – xi>,
and conversely.
Proof: x AND y = xyx OR y = x+y+xy
NOT x = x+1(x XOR y = x+y)
![Page 13: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/13.jpg)
Polynomial dynamical systems
Let k be a finite field and f1, … , fn k[x1,…,xn]
f = (f1, … , fn) : kn → kn
is an n-dimensional polynomial dynamical system over k.
Natural generalization of Boolean networks.
Fact: Every function kn → k can be represented by a polynomial, so all finite dynamical systems kn → kn
are polynomial dynamical systems.
![Page 14: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/14.jpg)
Example
k = F3 = {0, 1, 2}, n = 3
f1 = x1x22+x3,
f2 = x2+x3,
f3 = x12+x2
2.
Dependency graph(wiring diagram)
![Page 15: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/15.jpg)
![Page 16: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/16.jpg)
Sequential polynomial systems
k = F3 = {0, 1, 2}, n = 3
f1 = x1x22+x3
f2 = x2+x3
f3 = x12+x2
2
σ = (2 3 1) update schedule:
First update f2.
Then f3, using the new value of x2.
Then f1, using the new values of x2 and x3.
![Page 17: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/17.jpg)
![Page 18: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/18.jpg)
Sequential systems as biological models
• Different regulatory processes happen on different time scales
• Stochastic effects in the cell affect the “update order” of variables representing different chemical compounds at any given time
Therefore, sequential update in models of regulatory networks adds realistic feature.
![Page 19: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/19.jpg)
Stochastic models
Polynomial dynamical systems (PDSs) can be modified:
• Choose random update order for each update
(see Sontag et al. for Boolean case)
• Choose an update function at random from a collection at each update
(see Shmulevich et al. for Boolean case)
![Page 20: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/20.jpg)
Open mathematical problems
• Determine the relationship between the structure of the fi and the dynamics of the system for special classes of models (see later lectures).
• Determine the effect of the update schedule on dynamics.
• Develop a categorical framework for (sequential/stochastic) PDSs.
• Determine and study a good class of “biologically meaningful” polynomial functions.
![Page 21: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/21.jpg)
Example
A. Jarrah, B. Raposa, and R. Laubenbacher,
Nested canalyzing, unate cascade, and polynomial functions, Physica D, in press
![Page 22: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/22.jpg)
Logical models
E. Snoussi and R. ThomasLogical identification of all steady states: the concept of feedback loop characteristic statesBull. Math. Biol. 55 (1993) 973-991
Key model features: • Time delays of different lengths for different
variables are important• Positive and negative feedback loops are important
![Page 23: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/23.jpg)
Model description
Basic structure of logical models:
1. Sets of variables x1, … , xn; X1, … , Xn
(Xi = genes and xi = gene products, e.g., proteins. A gene product x regulates a gene Y, with a certain time delay.)
Each variable pair xi, Xi takes on a finite number of distinct states or thresholds (possibly different for different i), corresponding to different modes of action of the variables for different concentration levels.
![Page 24: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/24.jpg)
Model description (cont.)
2. A directed weighted graph with the xi as nodes and threshold levels, indicating regulatory relationships and at what levels they occur.
Each edge has a sign, indicating activation (+) or inhibition (-).
3. A collection of “logical parameters” which can be used to determine the state transition of a given node for a given configuration of inputs.
![Page 25: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/25.jpg)
Features of logical models
• Sophisticated models that include many features of real networks
• Ability to construct continuous models based on the logical model specification
• Models encode intuitive network properties
• Ability to relate structure (+ and - feedback loops) to dynamics (multistationarity, fixed pt vs. periods)
![Page 26: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/26.jpg)
An Example
X = z
Y = x
Z = y
xy
z
![Page 27: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/27.jpg)
![Page 28: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/28.jpg)
Features of logical models
• Include many features of real biological networks
• Intuitive but complicated formalism and model description
• Difficult to study as a mathematical object
• Difficult to study dynamics for larger models
![Page 29: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/29.jpg)
Dynamic Bayesian networks
Definition. A Bayesian network (BN) is a representation of a joint probability distribution over a set X1, … , Xn of random variables. It consists of
• an acyclic graph with the Xi as vertices. A directed edge indicates a conditional dependence relation
• a family of conditional distributions for each variable, given its parents in the graph
![Page 30: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/30.jpg)
An example
http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html#repr
![Page 31: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/31.jpg)
Inference
Bayes’ rule: P(R=r | e) = P(e | R=r)P(R=r)/P(e)Cond. Prob.: P(A | B) = P(A∩B)/P(B)
![Page 32: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/32.jpg)
BN models of gene regulatory networks
Can use BNs to model gene regulatory networks:
Random variables Xi ↔ genes
Directed edges ↔ regulatory relationships
Problem: BNs cannot have directed loops. Hence cannot model feedback loops.
![Page 33: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/33.jpg)
Dynamic Bayesian networks
Definition. A dynamic Bayesian network (DBN) is a representation of the stochastic evolution of a set of random variables {Xi}, using discrete time.
It has two components:• a directed graph (V, E) encoding conditional
dependence conditions (as before);• a family of conditional probability distributions
P(Xi(t) | Pai(t-1)), where Pai = {Xj | (Xj, Xi) E}
(Doyer et al., BMC Bioinformatics (2006) 7)
![Page 34: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/34.jpg)
Dynamic Bayesian networks
DBNs generalize Hidden Markov Models and linear dynamical systems.
Recently used for inference of gene regulatory networks from time courses of microarray data.
![Page 35: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/35.jpg)
Summary
Modeling frameworks:
Boolean networks
Polynomial dynamical systems
Logical models
Dynamic Bayesian networks
(Petri nets)
![Page 36: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/36.jpg)
Model inference from data
Goal: Given a set of experimental observations, infer the most likely model of the network that generated the data.
Model framework: polynomial dynamical systems over a finite field
![Page 37: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/37.jpg)
Data discretization
Step 1: Discretize real-valued data into finitely many states.
This is a difficult problem.
E. Dimitrova, P. Vera-Licona, J. McGee, and R. Laubenbacher, Comparison of data discretization methods for inference of biochemical networks.
![Page 38: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/38.jpg)
Model inference from data
Variables x1, … , xn with values in a finite field k.
(s1, t1), … , (sr, tr) state transition observations
with sj, tj kn.
Goal: Identify a collection of “best” dynamical systems
f=(f1, … ,fn): kn → kn
such that f(sj)=tj for all j.
![Page 39: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/39.jpg)
Network inference
Problem: Given D={(sj, tj) kn×k}, find the “most likely” model f: kn → k such that
f(sj) = tj
Let M = {f: kn → k | f(sj) = tj } be the subset of k[x1, … , xn] of all possible models for a particular variable.
![Page 40: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/40.jpg)
Network inference
Let f, g M. Then f(sj) = g(sj) for all j. So
(f-g)(sj) = 0 for all j. Let
I = {h k[x1, … , xn] | h(sj)=0 for all j}Let f 0 be any element of M. Then
M = f 0+I.
Note that I is an ideal, since it is closed under + and × by arbitrary polynomials.
![Page 41: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/41.jpg)
Model selection
In the absence of additional network information, choose a “minimal” model f from M
(f only reflects relationships among variables that are inherent in the data)
If f = hg +f’, with g I and f’ is not divisible by any r I,
then f’ is preferable to f because hg vanishes on all sj.
![Page 42: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/42.jpg)
Model selection
Strategy:
1. Compute f 0 M and the coset f 0+I.
2. Compute f f 0+I with the property that f is not divisible by any g I.
Could use other criteria for model selection:
f must contain certain variables and can’t contain others.
Could also require certain constraints on the dynamics.
![Page 43: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/43.jpg)
Fundamental computational problem
Given I and f, decide whether f I. If not, compute the remainder of f under
“division by I.”
This is known as the “ideal membership problem.”
This problem can be solved by Gröbner basis theory.
![Page 44: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/44.jpg)
Wiring diagrams
Goal: Compute all possible minimal wiring diagrams for a given data set.
Wiring diagram:
Vertices = variables
Edges: xi → xj if xi is involved in the regulation of xj, that is, if xi appears in fj.
![Page 45: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/45.jpg)
Wiring diagrams
Problem: Given data (si, ti), i=1, … , r,
(a collection of state transitions for one node in the network), find all minimal (wrt inclusion) sets of variables y1, … , ym {x1, … , xn} such that
(f 0+I) ∩ k[y1, … , ym] ≠ Ø.
Each such minimal set corresponds to a minimal wiring diagram for the variable under consideration.
![Page 46: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/46.jpg)
The “minimal sets” algorithm
For a k, let Xa = {si | ti = a}.
Let X = {Xa | a k}.
Then
f 0+I = M = {f k[x1, … xn] | f(p) = a for all p Xa}.
Want to find f M which involves a minimal number of variables, i.e., there is no g M whose support is properly contained in the supp(f).
![Page 47: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/47.jpg)
Example
Let n = 5, k = F5. Let
(s1, t1) = [(3, 0, 0, 0, 0); 3](s2, t2) = [(0, 1, 2, 1, 4); 1](s3, t3) = [(0, 1, 2, 1, 0); 0](s4, t4) = [(0, 1, 2, 1, 1); 0](s5, t5) = [(1, 1, 1, 1, 3); 4]
ThenX0 = {s3, s4}, X1 = {s2}, X2 = Ø, X3 = {s1}, X4 = {s5}.
![Page 48: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/48.jpg)
The algorithm
Definitions. • For F {1, … , n}, let
RF = k[xi | i F].• Let ΔX = {F | M ∩ RF ≠ Ø}.
• For p Xa, q Xb, a ≠ b k, let
m(p, q) = pi≠qi xi.
Let MX = monomial ideal in k[x1, … , xn] generated by all monomials m(p, q) for all a, b k.
(Note that ΔX is a simplicial complex, and MX is the face ideal of the Alexander dual of ΔX.)
![Page 49: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/49.jpg)
The algorithm
Proposition. A subset F of {1, … , n} is in ΔX if and only if the ideal < xi | i F > contains the ideal MX.
Proof. Let F ΔX. Then Y ∩ RF ≠ Ø.
Let p Xa and q Xb, with a ≠ b. Then there is f k[xi | i F] such that
f(p) = a and f(q) = b.
So p and q differ in a coordinate j F. Hence m(p, q) contains xj as a factor, so is contained in
I = <xj | j F>.
Therefore, MX I.
![Page 50: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/50.jpg)
The algorithm
Conversely, suppose MX <xi | i F>.
Then all generators m(p, q) are in terms of the xi, i F.
Therefore, p Xa and q Xb differ in coordinates i F.
For p Xa and for all a k, define f to be the polynomial function
f(p) = a for p Xa, for all a k;f(p) = 0 otherwise.
Then f M and depends only on variables xi, i F. Hence
f M ∩ RF.
This completes the proof.
![Page 51: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/51.jpg)
The algorithm
Corollary. To find all possible minimal wiring diagrams, we need to find all minimal subsets of variables y1, … , ym such that MX is contained in <y1, … , ym>.
![Page 52: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/52.jpg)
Example
Let MX = < x1x2, x2x3, x1x4 >. Then
MX = < x1, x2x3, x1x4 > ∩ < x2, x2x3, x1x4 >
= < x1, x2x3 > ∩ < x2, x1x4 >
= < x1, x2> ∩ < x1, x3 > ∩ < x2, x1> ∩ < x2, x4 >
= < x1, x2> ∩ < x1, x3 > ∩ < x2, x4 >.
(primary decomposition of MX)
Therefore, the collection of minimal wiring diagrams includes
{x1, x2}, {x1, x3}, {x2, x4}.(minimal primes in the primary decomp.)
Can be done algorithmically, implemented in computer algebra systems.
![Page 53: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/53.jpg)
Model selection
How do we choose a “best” one from this list?
Example of a scoring method. (See alternative methods in (Jarrah, L., Stigler, Stillman)).
• First assign a score to each variable xi, i=1, … ,n.
• Then use these scores to assign a score to each minimal variable set.
• Choose the minimal set with the highest score.
![Page 54: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/54.jpg)
Scoring method
Let F = {F1, … , Ft} be the output of the algorithm.
For s = 1, … , n, let Zs = # sets in F with s elements.
For i = 1, … , n, let Wi(s) = # sets of size s which contain xi.
S(xi) := ΣWi(s) / sZs
where the sum extends over all s such that Zs ≠ 0.
T(Fj) := ΠxiFj S(xi).
Normalization probability distribution on F of min. var. sets
This scoring method has a bias toward small sets.
![Page 55: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/55.jpg)
Example
F1 = {x1, x2}, F2 = {x1, x3}, F3 = {x2, x4}
Z1=0, Z2=3, Z3=Z4=0;
W1(2) = 2, W2(2) = 2, W3(2) = 1, W4(2) = 1.
S(x1) = 2/2·3 = 1/3 = S(x2), S(x3) = 1/2·3 = 1/6 = S(x4).
T(F1) = 1/9, T(F2) = 1/18 = T(F3).
![Page 56: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/56.jpg)
Example with data
1 0 0 2
1 2 2 1
0 2 1 1
1 2 1 2
2 2 0 2
0 1 1 2
x1: { {x1, x3}, {x1, x2, x4}, {x2, x3, x4}}x2: {{x1}, {x2, x3} }x3: { {x1, x3}, {x1, x2, x4}, {x2, x3, x4}}x4: { {x1, x3}, {x2, x3}, {x1, x2, x4} }
returned from min. sets algorithm
![Page 57: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/57.jpg)
Example with data
Consider the variable sets for variable x1:F1 = {x1, x3}F2 = {x1, x2, x4}F3 = {x2, x3, x4}
Highest scoring set(s) for each variablex1: {x1, x3}x2: {x1}x3: {x1, x3}x4: {{x2, x3}, {x1, x3}}
S(x1) = 1/1 + 1/(2×3) = 7/6S(x2) = 2/(2×3) = 1/3S(x3) = 1/1 + 1/(2×3) = 7/6S(x4) = 2/(2×3) = 1/3
T(F1) = (7/6)(7/6) = 49/36T(F2) = (7/6)(1/3)(1/3) = 7/54T(F3) = (1/3)(7/6)(1/3) = 7/54
winner
![Page 58: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/58.jpg)
Method Validation:Segment polarity network in the fruitfly
Network in cell: 21 genes, proteins
Albert model: 21 Bool. functions – 44 known interactions
Time series data– Generated wildtype, knockout– < 0.01% of 221 total states
Minimal Sets Algorithm– 89% interactions– 0 false positives, 5 false negatives– PDS: identified 19/21 functions
J. Theor. Biol. 2003
The topology of the regulatory
interactions predicts the expression pattern of the
segment polarity genes in Drosophila
melanogaster
Albert and Othmer
![Page 59: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/59.jpg)
Pandapas network• 10 genes, 3 external biochemicals• 17 interactions
Time series data: 9 time points• Generated 8 time series for wildtype, knockouts G1, G2, G5 • 192 data points• G6, G9 constant
Data discretization• 5 states per node• 95 data points
– 49% reduction – < 0.00001% of 513 total states
Method Validation:Simulated gene network
![Page 60: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/60.jpg)
Method Validation:Simulated gene network
Minimal Sets Algorithm
• 77% interactions• Identified targets of P2, P3 (x12, x13)• 11 false positives, 4 false negatives
Pandapas Reverse engineered
![Page 61: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/61.jpg)
Summary
Algorithmic method to find all possible minimal wiring diagrams, given a data set:
Finds all possible minimal sets of variables for which there exists a PDS that is consistent with the data.
Provides a statistical measure to select most likely wiring diagram(s).
This algorithm can be used as a preprocessing step for the previous algorithm that actually finds dynamical models. It improves algorithm performance by reducing the variables to be considered.
![Page 62: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/62.jpg)
Optimization
Goal: For a given data set, select a model from M which is optimal with respect to
• Model complexity
• Properties of wiring diagram
• Expected dynamic properties
![Page 63: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/63.jpg)
dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGT-TTATTCdp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG
dm2.chr2L TATGGACTCACdp3.chr4_group3 TGT--ACTTAC
DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTTDroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroMel_4_ ATTCTATGGACTCACDroPse_1_ ------TGTACTTAC
Each alignment can be summarized by counting the number of matches (#M), mismatches (#X), gaps (#G), and spaces (#S).
#M=31, #X=22, #G=3, #S=12
#M=27, #X=18, #G=3, #S=28
2(#M+#X)+#S=112 so #X,#G and #S suffice to specify a summary.
This notation follows Chapter 7 (Parametric Sequence Alignment) by Colin Dewey and Kevin Woods in the book Algebraic Statistics for Computational Biology.
Courtesy Lior Pachter
![Page 64: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/64.jpg)
>melCTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGAC>pseCTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCGTGTAC
For the sequences:
49 #x=24, #S=10, #G=2
There are eight alignments that have this summary.
the alignment polytope is:
Courtesy Lior Pachter
![Page 65: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/65.jpg)
Parametric sequence alignment
Choose parameters a, b, c and minimize the linear functional
f(M, G, S) = aM+bG+cS
over the convex polytope spanned by the summaries of all possible alignments of the two sequences.
Theorem (Pachter, Sturmfels) This polytope can be described as the Minkowski sum of the Newton polytopes of a collection of polynomials.
![Page 66: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/66.jpg)
The Dynamotope(joint work with A. Jarrah, B. Sturmfels, P. Vera-Licona)
Define the summary (S1, S2, S3, S4) of a polynomial model
g f + I:
S1 = w1·(u1, u2, u3, … ),
where ui is the number of limit cycles of length i, and w1 is a suitably chosen weight vector.
+ w2·(v1, v2, v3, … ),
where vi is the number of trees of height i, and w2 is a suitably chosen weight vector.
S2 = number of edges in the dependency graph of g.
S3 = “complexity” of g (including complexity of the polynomials gi and the “distance to being a normal form”).
![Page 67: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/67.jpg)
Let w1 and w2 be (1, 1, …)S1 = (1,1,…)·(1, 1, 0, 1) = 3S2 = (1,1,…)·(0, 0, 1, 1, 1) = 3
![Page 68: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/68.jpg)
Optimization
Choose parameters a, b, c, d. Minimize the linear functional
F = aS1 + bS2 + cS3 + dS4
on the convex polytope (dynamotope) spanned by all summaries (S1, S2, S3, S4) of models in
f + I.
![Page 69: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/69.jpg)
Optimization
Problem: Don’t know how to describe this polytope.
Solution: Combinatorial optimization using an evolutionary algorithm.
![Page 70: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/70.jpg)
Evolutionary algorithm
For f=(f1, … ,fn)
Gene = fi
Chromosome = f
Genotype = {f}
![Page 71: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/71.jpg)
Evolution
Step 1: Choose an initial genotype
Step 2: Use mutation, cross-over of fittest models (with respect to linear functional F) to compute the next generation genotype
Step 3: Iterate many times
Step 4: Choose local/global minimum (if found)
![Page 72: Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech](https://reader036.fdocuments.in/reader036/viewer/2022062423/568146b3550346895db3cfc8/html5/thumbnails/72.jpg)
Future work
• Optimal parameter choices for different biological problems
• Further validation of the algorithm with real and simulated data sets
• Characterize the dynamotope computationally
• Study optimal experimental design for this type of network inference