Algebraic Problems in Graphical Modeling
Mathias Drton
Department of StatisticsUniversity of Chicago
Outline
1 What (roughly) are graphical models?a.k.a. Markov random fields, Bayesian networks,. . .
2 Gaussian models on undirected graphs
Determining conditional independences Maximum likelihood estimation
3 Gaussian models on directed graphs
Model (Markov) equivalence Parameter identification
[Drton, Sturmfels and Sullivant: Lectures on Algebraic Statistics, Ober-wolfach Seminars Series, Vol. 39, Birkhauser, Basel, 2009]
Mathias Drton 2 / 14
What (roughly) are graphical models?
Data Realizations of random variables X1, . . . ,Xp
Statistical model Family of candidates for joint distribution of(X1, . . . ,Xp)
Graphical model Statistical model associated with a graph thathas X1, . . . ,Xp as nodes
Points of view (i) Density function factors over graph:
f (x1, x2, x3) = g(x1, x2)h(x2, x3)
(ii) Non-adjacent r.v. ‘somehow’ independent:
X1 independent of X3 given X2
Graphical models Mathias Drton 3 / 14
What are graphical models good for?
Very literal application: Inference of networks
(Sachs et al., 2005, Science)
Suitably sparse graphs yield scalable models.(e.g., think of computing with 100 binary variables)
Graph helps structure computations.
Combinatorial answers to statistical questions.
Graphical models Mathias Drton 4 / 14
Gaussian/Multivariate normal distribution
Let µ ∈ Rp be any vector.
Let Σ ∈ Rp×p be a positive definite matrix.
Definition
The distribution with probability density function
f (x) =1√
(2π)p det(Σ)exp
−1
2(x − µ)TΣ−1(x − µ)
, x ∈ Rp,
is called the Gaussian or multivariate normal distribution withmean µ and covariance matrix Σ; in symbols Np(µ,Σ).
Graphical models Mathias Drton 5 / 14
Undirected Gaussian models
X1 X2
X4 X3
Inverse covariance matrix
Σ−1 =
σ11 σ12 0 σ14
σ12 σ22 σ23 00 σ23 σ33 σ34
σ14 0 σ34 σ44
Because exponent of the Gaussian density is
zTΣ−1z =4∑
i=1
4∑j=1
σijzizj ,
we have the density factorization
f (x1, . . . , x4) = g12(x1, x2)g23(x2, x3)g34(x3, x4)g14(x1, x4).
Undirected graphs Mathias Drton 6 / 14
Reading off conditional independences
If Xi and Xj are non-adjacent in the graph, then
Σ−1ij = 0 ⇐⇒ Xi⊥⊥Xj | Xk : k 6= i , j
Are there other conditional independences?
Check
XA⊥⊥XB | XC ⇐⇒ rank(ΣAC×BC ) ≤ |C |,
for pairwise disjoint sets A,B,C ⊂ 1, . . . , p.
Theorem (Global Markov property)
If Np(µ,Σ) is in a graphical model, then
C separates A and B in the graph =⇒ XA⊥⊥XB | XC ,
and equivalence holds for generic distributions in the model.
Undirected graphs Mathias Drton 7 / 14
Reading off conditional independences
If Xi and Xj are non-adjacent in the graph, then
Σ−1ij = 0 ⇐⇒ Xi⊥⊥Xj | Xk : k 6= i , j
Are there other conditional independences? Check
XA⊥⊥XB | XC ⇐⇒ rank(ΣAC×BC ) ≤ |C |,
for pairwise disjoint sets A,B,C ⊂ 1, . . . , p.
Theorem (Global Markov property)
If Np(µ,Σ) is in a graphical model, then
C separates A and B in the graph =⇒ XA⊥⊥XB | XC ,
and equivalence holds for generic distributions in the model.
Undirected graphs Mathias Drton 7 / 14
Reading off conditional independences
If Xi and Xj are non-adjacent in the graph, then
Σ−1ij = 0 ⇐⇒ Xi⊥⊥Xj | Xk : k 6= i , j
Are there other conditional independences? Check
XA⊥⊥XB | XC ⇐⇒ rank(ΣAC×BC ) ≤ |C |,
for pairwise disjoint sets A,B,C ⊂ 1, . . . , p.
Theorem (Global Markov property)
If Np(µ,Σ) is in a graphical model, then
C separates A and B in the graph =⇒ XA⊥⊥XB | XC ,
and equivalence holds for generic distributions in the model.
Undirected graphs Mathias Drton 7 / 14
Maximum likelihood estimation
Optimize the log-likelihood function
Σ−1 7→ log det(Σ−1)− trace(Σ−1 · S),
where S is a data-derived positive definite matrix.
How difficult? What is algebraic degree of ‘likelihood equations’?
Theorem
The following two statements are equivalent:
(i) The ML estimator in the Gaussian graphical model associatedwith the graph G is a rational function of S.
(ii) The graph G is chordal.
More on ML estimation −→ Caroline Uhler
Undirected graphs Mathias Drton 8 / 14
Maximum likelihood estimation
Optimize the log-likelihood function
Σ−1 7→ log det(Σ−1)− trace(Σ−1 · S),
where S is a data-derived positive definite matrix.
How difficult? What is algebraic degree of ‘likelihood equations’?
Theorem
The following two statements are equivalent:
(i) The ML estimator in the Gaussian graphical model associatedwith the graph G is a rational function of S.
(ii) The graph G is chordal.
More on ML estimation −→ Caroline Uhler
Undirected graphs Mathias Drton 8 / 14
Maximum likelihood estimation
Optimize the log-likelihood function
Σ−1 7→ log det(Σ−1)− trace(Σ−1 · S),
where S is a data-derived positive definite matrix.
How difficult? What is algebraic degree of ‘likelihood equations’?
Theorem
The following two statements are equivalent:
(i) The ML estimator in the Gaussian graphical model associatedwith the graph G is a rational function of S.
(ii) The graph G is chordal.
More on ML estimation −→ Caroline Uhler
Undirected graphs Mathias Drton 8 / 14
Directed graphs
X1
X2
X3
X4
Structural/regression equations:
X1 = ε1,
X2 = λ12X1 + ε2,
X3 = λ13X1 + λ23X2 + ε3,
X4 = λ34X3 + ε4,
with independent errors εi ∼ N (0, ωi ).
So,X1
X2
X3
X4
=
1 −λ12 −λ13 00 1 −λ23 00 0 1 −λ340 0 0 1
−T
ε1ε2ε3ε4
Directed graphs Mathias Drton 9 / 14
Directed Gaussian models
X1
X2
X3
X4
Gaussian distribution in directed graphicalmodel if
Σ = (I − Λ)−TΩ(I − Λ)−1
for Λ ∈ RE and Ω 0 diagonal.
Factorization:
f (x) =
p∏i=1
fi(xi | xpa(i)
)= f1(x1) f2(x2 | x1) f3(x3 | x1, x2) f4(x4 | x3)
Are models associated with different graphs different?
Directed graphs Mathias Drton 10 / 14
Directed Gaussian models
X1
X2
X3
X4
Gaussian distribution in directed graphicalmodel if
Σ = (I − Λ)−TΩ(I − Λ)−1
for Λ ∈ RE and Ω 0 diagonal.
Factorization:
f (x) =
p∏i=1
fi(xi | xpa(i)
)= f1(x1) f2(x2 | x1) f3(x3 | x1, x2) f4(x4 | x3)
Are models associated with different graphs different?
Directed graphs Mathias Drton 10 / 14
Model equivalence
Useful to obtain implicit description of the image of
φG (Λ,Ω) = (I − Λ)−TΩ(I − Λ)−1.
Model cut out by conditional independences: d-separation
Theorem
The images of φG1 and φG2 for two acyclic digraphs G1 = (V ,E1)and G2 = (V ,E2) are the same if and only if G1 and G2 have
(i) same skeleton, and
(ii) same unshielded colliders (induced subgraphs u → v ← w).
Directed graphs Mathias Drton 11 / 14
Model equivalence
Useful to obtain implicit description of the image of
φG (Λ,Ω) = (I − Λ)−TΩ(I − Λ)−1.
Model cut out by conditional independences: d-separation
Theorem
The images of φG1 and φG2 for two acyclic digraphs G1 = (V ,E1)and G2 = (V ,E2) are the same if and only if G1 and G2 have
(i) same skeleton, and
(ii) same unshielded colliders (induced subgraphs u → v ← w).
Directed graphs Mathias Drton 11 / 14
Model equivalence
Useful to obtain implicit description of the image of
φG (Λ,Ω) = (I − Λ)−TΩ(I − Λ)−1.
Model cut out by conditional independences: d-separation
Theorem
The images of φG1 and φG2 for two acyclic digraphs G1 = (V ,E1)and G2 = (V ,E2) are the same if and only if G1 and G2 have
(i) same skeleton, and
(ii) same unshielded colliders (induced subgraphs u → v ← w).
X1 X2 X3 ≡ X1 X2 X3
Directed graphs Mathias Drton 11 / 14
Model equivalence
Useful to obtain implicit description of the image of
φG (Λ,Ω) = (I − Λ)−TΩ(I − Λ)−1.
Model cut out by conditional independences: d-separation
Theorem
The images of φG1 and φG2 for two acyclic digraphs G1 = (V ,E1)and G2 = (V ,E2) are the same if and only if G1 and G2 have
(i) same skeleton, and
(ii) same unshielded colliders (induced subgraphs u → v ← w).
X1 X2 X3 6≡ X1 X2 X3
Directed graphs Mathias Drton 11 / 14
Hidden/unobserved variables
Hidden variables: Project to principal submatrix of Σ
Example: ‘Verma graph’
X1
X3 X5X2 X4
Are models still cut out by conditional independence?
Verma graph: Relations in Σ2345×2345 generated by
σ23σ24σ25σ34 − σ22σ25σ234 − σ23σ224σ35 + σ22σ24σ34σ35
−σ223σ25σ44 + σ22σ25σ33σ44 + σ223σ24σ45 − σ22σ24σ33σ45.
More on related topics−→ Jan Draisma, Kelli Talaska & Thomas Richardson
Directed graphs Mathias Drton 12 / 14
Hidden/unobserved variables
Hidden variables: Project to principal submatrix of Σ
Example: ‘Verma graph’
X1
X3 X5X2 X4
Are models still cut out by conditional independence?
Verma graph: Relations in Σ2345×2345 generated by
σ23σ24σ25σ34 − σ22σ25σ234 − σ23σ224σ35 + σ22σ24σ34σ35
−σ223σ25σ44 + σ22σ25σ33σ44 + σ223σ24σ45 − σ22σ24σ33σ45.
More on related topics−→ Jan Draisma, Kelli Talaska & Thomas Richardson
Directed graphs Mathias Drton 12 / 14
Hidden/unobserved variables
Hidden variables: Project to principal submatrix of Σ
Example: ‘Verma graph’
X1
X3 X5X2 X4
Are models still cut out by conditional independence?
Verma graph: Relations in Σ2345×2345 generated by
σ23σ24σ25σ34 − σ22σ25σ234 − σ23σ224σ35 + σ22σ24σ34σ35
−σ223σ25σ44 + σ22σ25σ33σ44 + σ223σ24σ45 − σ22σ24σ33σ45.
More on related topics−→ Jan Draisma, Kelli Talaska & Thomas Richardson
Directed graphs Mathias Drton 12 / 14
Identification
Parametrization map φG for an acyclic digraph G is injective.
Statistical inference about all parameters possible because wecan recover = identify all λij and ωi from the cov. matrix Σ.
What about hidden variables?
Identifiability questions can be answered by studying the ideal⟨σij−[(I−Λ)−TΩ(I−Λ)−1]ij , i ≤ j observed
⟩⊂ R[Σ,Λ,Ω].
Directed graphs Mathias Drton 13 / 14
Identification
Parametrization map φG for an acyclic digraph G is injective.
Statistical inference about all parameters possible because wecan recover = identify all λij and ωi from the cov. matrix Σ.
What about hidden variables?
Identifiability questions can be answered by studying the ideal⟨σij−[(I−Λ)−TΩ(I−Λ)−1]ij , i ≤ j observed
⟩⊂ R[Σ,Λ,Ω].
Directed graphs Mathias Drton 13 / 14
Identification
Example: ‘Verma graph’
X1
X3 X5X2 X4
Ideal contains in particular
λ45σ24 − σ25,λ45(σ22σ33σ44 − σ22σ234 − σ223σ44
)+ σ23σ25σ34−
− σ23σ24σ35 + σ22σ34σ35 + σ223σ45 − σ22σ33σ45.
More on identification−→ Luis Garcia & Rina Foygel (Gaussian models)−→ Jason Morton & Marco Valtorta (Discrete models)
Directed graphs Mathias Drton 14 / 14
Identification
Example: ‘Verma graph’
X1
X3 X5X2 X4
Ideal contains in particular
λ45σ24 − σ25,λ45(σ22σ33σ44 − σ22σ234 − σ223σ44
)+ σ23σ25σ34−
− σ23σ24σ35 + σ22σ34σ35 + σ223σ45 − σ22σ33σ45.
More on identification−→ Luis Garcia & Rina Foygel (Gaussian models)−→ Jason Morton & Marco Valtorta (Discrete models)
Directed graphs Mathias Drton 14 / 14
Top Related