Parameter identifiability of discrete DAG models with...
Transcript of Parameter identifiability of discrete DAG models with...
Parameter identifiability of discrete DAG models
with latent variables
John A. Rhodes
Algebraic Statistics 2014
IIT, May 19 – 22
Note: AK ⊂ US, P(x ∈ AK|x ∈ US) ≈ .16, P(p ∈ AK|p ∈ US) ≈ .0023, P(c ∈ AK|c ∈ US) ≈ 1
Thanks to those who made AS2014 possible:
Local Organizers: Sonja Petrovic, Despina Stasi
Program Committee: Stephen Fienberg, Sonja Petrovic, Seth
Sullivant, Henry Wynn, Ruriko Yoshida
IIT grad students: Weronika J. Swiechowicz, Carlo Pierandozzi,
Martin Dillon, Kawkab Alhejoj, Dane Wilburne, Junyu He
IIT undergrads: Xintong Li, Meng (Mamie) Wang
Discrete DAG identifiability 2/29
Parameter identifiability of discrete DAG models
with latent variables
Collaborators:
Elizabeth Allman, Mathematics, UAF
Elena Stanghellini, Statistics, Perugia
Marco Valtorta, Computer Science, South Carolina
Discrete DAG identifiability 3/29
Example:
0
2
1
3
4
56
7
Variables Xi have finite state spaces, of size ni .
Q: Are model parameters identifiable?
Discrete DAG identifiability 4/29
Example:
0
2
1
3
4
56
7
Parameters: Conditional probabilities P(Xi | pa(Xi))
Joint Distribution:∑
X0
∏
i
P(Xi | pa(Xi))
Identifiability: The joint distribution of observable variables
determines the parameters (up to...)
Discrete DAG identifiability 5/29
Identifiability:
1) Parameterization is polynomial, so focus on generic behavior.
(generic complex? real? stochastic?)
2) Latent variables ⇒ “label-swapping”
⇒ n0!-to-1 parametrization, at best
Q’: Is the parameterization generically k-to-1 for some finite k?
If so, characterize the fibers of the parameterization.
Discrete DAG identifiability 6/29
Common practical approach:
With J = Jacobian of parameterization, N=dim(parameter space)
compute rank(J) at many random points
• If rank (J) < N everywhere, parameters not identifiable (∞-to-1).
• If rank (J) = N, then parameters locally identifiable
• Since local identifiability 6⇒ global identifiability, assume/hope
label swapping is only issue, k = n0!.
Discrete DAG identifiability 7/29
For any specific DAG and finite state spaces, one can (try to)
answer this question with computational algebra, but....
Q”: What graphical criteria addresses identifiability?
Cf: ”do”-calculus for identifiability of causal effects – determines
exactly what is identifiable, gives rational formulas.
— for nonparametric latent variables —
Discrete DAG identifiability 8/29
Simple DAGs:
0
1 2
P(X1,X2) = MT1 DM2
D = diag(P(X0)), M1 = P(X1 | X0), M2 = P(X2 | X0)
Non-uniqueness of matrix factorization
⇒ ∞-to-1 parameterization
Discrete DAG identifiability 9/29
Simple DAGs:
Star model — tensor decomposition
0
1 2 3
Kruskal’s Theorem: Decomposition of a generic 3-tensors is unique
if n1, n2, n3 sufficiently large relative to n0,
n1 + n2 + n3 ≥ 2n0 + 2
⇒ Parameters are identifiable, up to label swapping.
Discrete DAG identifiability 10/29
Example (Kuroki and Pearl 2014)
0
1 2 3 4
By do-calculus, P(X3 | do(X2)) is not identifiable,
But...
If X0 has finite state space, X1,X4 have larger state spaces,
P(X3 | do(X2)) is identifiable.
Discrete DAG identifiability 11/29
In fact, all parameters are identifiable, up to label swapping.
0
1 2 3 4
• reverse 1 → 2, Markov equivalent model
• condition on X2 generic Kruskal model with“same”
parameters
• identify P(X4 | X0), up to label swap
• Solve P(X1,X2,X3;X4) = P(X1,X2,X3;X0)P(X4 | X0)
for P(X1,X2,X3;X0) to “uncover” latent
• From P(X0,X1,X2,X3) find remaining parameters.
Discrete DAG identifiability 12/29
More generally...
To gain insight, consider all DAG models with:
• 1 latent, parent of at most 4 observables
• binary variables
Goals:
• Develop algebraic arguments not tied to binary case
• Reduce more complex DAGs to these... (more later)
Is conditioning/marginalizing/Kruskal enough to successfully
analyze these?
Discrete DAG identifiability 13/29
Almost ....
Model Graph dim(Θ) 2A − 1 k
2-B, B ≥ 0 ≥ 5 3 ∞
3-0
0
1 2 3 7 7 23-Bx , B ≥ 1 ≥ 9 7 ∞
4-0
0
1 2 3 4 9 15 2
4-1
0
1 2 3 4 11 15 2
4-2a
0
1 2 3 4 13 15 ∞
4-2b,c
0
1 2 3 4 ,
0
2 1 3 4 13 15 2
4-2d
0
1 3 2 4 15 15 2
4-3a,b (A)
0
1 2 3 4 ,
0
2 1 3 4 15 15 2
4-3c,d
0
1 3 2 4 ,
0
1 2 4 3 17 15 ∞
4-3e,f (B)
0
2 1 3 4 ,
0
1 2 3 4 15 15 4
4-3g
0
1 2 3 4 17 15 ∞
4-3h
0
1 2 4 3 25 15 ∞
4-3i
0
1 2 3 4 25 15 ∞4-Bx , B ≥ 4 ≥ 19 15 ∞
2 interesting cases...
Discrete DAG identifiability 14/29
Model A (binary)
With binary variables, the parameterization for
0
2 1 3 4
is generically 2-to-1 on stochastic parameter space.
This model is not reducible to Kruskal.
Discrete DAG identifiability 15/29
With binary variables, the parameterization for
0
2 1 3 4
is generically 2-to-1 on stochastic parameter space.
Sketch:
• Condition on X1,X3 (4 ways), to give 4 matrices
• Construct expressions in these matrices whose eigenvectors
identify parameters.
• Need generic condition: distinct eigenvalues. Equivalently:
There is a 3-way interaction between X0,X1,X3
Discrete DAG identifiability 16/29
Why is the 3-way interaction needed?
0
2 1 3 4
has ∞-to-1 parametrization. So
0
2 1 3 4
5
does as well.
Discrete DAG identifiability 17/29
But conditioning
0
2 1 3 4
5
on X5 yields
0
2 1 3 4
still with an ∞-to-1 parameterization.
Discrete DAG identifiability 18/29
Contradiction, since
0
2 1 3 4
has a 2-to-1 parameterization (Model A).
FLAW: Conditioning gave a non-generic instance – no 3-way
interaction between 0,1,3 – there is no contradiction
Discrete DAG identifiability 19/29
Moral 1: Conditioning must be done carefully, to give a generic
model.
Moral 2: Frameworks such as summary graphs and maximal
ancestral graphs which graphically depict some consequences of
conditioning are not helpful here – don’t get generic instances.
Discrete DAG identifiability 20/29
Model A (general)
The model
0
2 1 3 4
is generically identifiable, up to label swapping, provided
n2, n4 ≥ n0
Discrete DAG identifiability 21/29
Model B (binary)With binary variables, the parameterization for
0
2 1 3 4
is generically 4-to-1 on stochastic parameter space,
– not just label swapping –
Sketch:
• Conditioning a generic model on X1 yields 2 generic
0
1 2 3
models
• These each have 2-to-1 parameterizations.
• Any of the 4 choices of parameters for them can be “combined”
to give parameters for the original model.
Discrete DAG identifiability 22/29
Model B (general)
If n2, n3, n4 sufficiently large relative to n0, then
0
2 1 3 4
has a (n0!)n1-to-1 parameterization.
Moreover, a full fiber can be obtained from any single element by
rational formulas.
Discrete DAG identifiability 23/29
Large DAG models
If a DAG model has a k-to-1 parameterization, then
k is unchanged if:
• remove observable sinks with all parents observable
• pass to Markov equivalent graphs
k may change if:
• marginalize/condition on observed variables
Cautions:
• Marginalize only over sinks, but risk losing identifiability.
• Condition carefully, to get generic model.
Discrete DAG identifiability 24/29
A general result
Building on Model B0
2 1 3 4 ,
Theorem: Suppose a DAG has one latent node 0 with no parents,
and three observable sinks 1, 2, 3 that are children of 0.
Let
C = Anc (Chd(0) ∩ Anc(1) ∩ Anc(2) ∩ Anc(3)) r {0},
and
u = |C ∩ Pa(1) ∩ Pa(2) ∩ Pa(3)| .
Then for binary variables, the parametrization is generically k-to-1
with k = 22u
the potential fiber can be described, and thus k can be determined
exactly.
(non-binary version also)
Discrete DAG identifiability 25/29
Example: Model B
0
2 1 3 4
Sinks 2,3,4, all children of 0,
C = Anc (Chd(0) ∩ Anc(2) ∩ Anc(3) ∩ Anc(4)) r {0}
= {1}
u = |C ∩ Pa(2) ∩ Pa(3) ∩ Pa(4)|
= 1
so 221= 4-to-1 parameterization
Discrete DAG identifiability 26/29
Example: from beginning of talk
0
2
1
3
4
56
7
Remove 7 (observable child with observable parents):
0
2
1
3
4
56
Discrete DAG identifiability 27/29
0
2
1
3
4
56
Sinks 4,5,6 all children of 0,
C = Anc (Chd(0) ∩ Anc(4) ∩ Anc(5) ∩ Anc(6)) r {0}
= ∅
u = |C ∩ Pa(4) ∩ Pa(5) ∩ Pa(6)|
= 0
so 220= 2-to-1 parameterization
Discrete DAG identifiability 28/29
Final comments:
• A 2-sink theorem is “under development,” building on0
2 1 3 4
• Multiple latent variables with no/limited common children may
be handlable.
• Main impediment to non-binary variables is awkwardness of
statements.
Discrete DAG identifiability 29/29