Parameter identifiability of discrete DAG models with...

29
Parameter identifiability of discrete DAG models with latent variables John A. Rhodes Algebraic Statistics 2014 IIT, May 19 – 22 Note: AK US, P(x AK|x US) .16, P(p AK|p US) .0023, P(c AK|c US) 1

Transcript of Parameter identifiability of discrete DAG models with...

Page 1: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Parameter identifiability of discrete DAG models

with latent variables

John A. Rhodes

Algebraic Statistics 2014

IIT, May 19 – 22

Note: AK ⊂ US, P(x ∈ AK|x ∈ US) ≈ .16, P(p ∈ AK|p ∈ US) ≈ .0023, P(c ∈ AK|c ∈ US) ≈ 1

Page 2: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Thanks to those who made AS2014 possible:

Local Organizers: Sonja Petrovic, Despina Stasi

Program Committee: Stephen Fienberg, Sonja Petrovic, Seth

Sullivant, Henry Wynn, Ruriko Yoshida

IIT grad students: Weronika J. Swiechowicz, Carlo Pierandozzi,

Martin Dillon, Kawkab Alhejoj, Dane Wilburne, Junyu He

IIT undergrads: Xintong Li, Meng (Mamie) Wang

Discrete DAG identifiability 2/29

Page 3: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Parameter identifiability of discrete DAG models

with latent variables

Collaborators:

Elizabeth Allman, Mathematics, UAF

Elena Stanghellini, Statistics, Perugia

Marco Valtorta, Computer Science, South Carolina

Discrete DAG identifiability 3/29

Page 4: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Example:

0

2

1

3

4

56

7

Variables Xi have finite state spaces, of size ni .

Q: Are model parameters identifiable?

Discrete DAG identifiability 4/29

Page 5: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Example:

0

2

1

3

4

56

7

Parameters: Conditional probabilities P(Xi | pa(Xi))

Joint Distribution:∑

X0

i

P(Xi | pa(Xi))

Identifiability: The joint distribution of observable variables

determines the parameters (up to...)

Discrete DAG identifiability 5/29

Page 6: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Identifiability:

1) Parameterization is polynomial, so focus on generic behavior.

(generic complex? real? stochastic?)

2) Latent variables ⇒ “label-swapping”

⇒ n0!-to-1 parametrization, at best

Q’: Is the parameterization generically k-to-1 for some finite k?

If so, characterize the fibers of the parameterization.

Discrete DAG identifiability 6/29

Page 7: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Common practical approach:

With J = Jacobian of parameterization, N=dim(parameter space)

compute rank(J) at many random points

• If rank (J) < N everywhere, parameters not identifiable (∞-to-1).

• If rank (J) = N, then parameters locally identifiable

• Since local identifiability 6⇒ global identifiability, assume/hope

label swapping is only issue, k = n0!.

Discrete DAG identifiability 7/29

Page 8: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

For any specific DAG and finite state spaces, one can (try to)

answer this question with computational algebra, but....

Q”: What graphical criteria addresses identifiability?

Cf: ”do”-calculus for identifiability of causal effects – determines

exactly what is identifiable, gives rational formulas.

— for nonparametric latent variables —

Discrete DAG identifiability 8/29

Page 9: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Simple DAGs:

0

1 2

P(X1,X2) = MT1 DM2

D = diag(P(X0)), M1 = P(X1 | X0), M2 = P(X2 | X0)

Non-uniqueness of matrix factorization

⇒ ∞-to-1 parameterization

Discrete DAG identifiability 9/29

Page 10: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Simple DAGs:

Star model — tensor decomposition

0

1 2 3

Kruskal’s Theorem: Decomposition of a generic 3-tensors is unique

if n1, n2, n3 sufficiently large relative to n0,

n1 + n2 + n3 ≥ 2n0 + 2

⇒ Parameters are identifiable, up to label swapping.

Discrete DAG identifiability 10/29

Page 11: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Example (Kuroki and Pearl 2014)

0

1 2 3 4

By do-calculus, P(X3 | do(X2)) is not identifiable,

But...

If X0 has finite state space, X1,X4 have larger state spaces,

P(X3 | do(X2)) is identifiable.

Discrete DAG identifiability 11/29

Page 12: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

In fact, all parameters are identifiable, up to label swapping.

0

1 2 3 4

• reverse 1 → 2, Markov equivalent model

• condition on X2 generic Kruskal model with“same”

parameters

• identify P(X4 | X0), up to label swap

• Solve P(X1,X2,X3;X4) = P(X1,X2,X3;X0)P(X4 | X0)

for P(X1,X2,X3;X0) to “uncover” latent

• From P(X0,X1,X2,X3) find remaining parameters.

Discrete DAG identifiability 12/29

Page 13: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

More generally...

To gain insight, consider all DAG models with:

• 1 latent, parent of at most 4 observables

• binary variables

Goals:

• Develop algebraic arguments not tied to binary case

• Reduce more complex DAGs to these... (more later)

Is conditioning/marginalizing/Kruskal enough to successfully

analyze these?

Discrete DAG identifiability 13/29

Page 14: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Almost ....

Model Graph dim(Θ) 2A − 1 k

2-B, B ≥ 0 ≥ 5 3 ∞

3-0

0

1 2 3 7 7 23-Bx , B ≥ 1 ≥ 9 7 ∞

4-0

0

1 2 3 4 9 15 2

4-1

0

1 2 3 4 11 15 2

4-2a

0

1 2 3 4 13 15 ∞

4-2b,c

0

1 2 3 4 ,

0

2 1 3 4 13 15 2

4-2d

0

1 3 2 4 15 15 2

4-3a,b (A)

0

1 2 3 4 ,

0

2 1 3 4 15 15 2

4-3c,d

0

1 3 2 4 ,

0

1 2 4 3 17 15 ∞

4-3e,f (B)

0

2 1 3 4 ,

0

1 2 3 4 15 15 4

4-3g

0

1 2 3 4 17 15 ∞

4-3h

0

1 2 4 3 25 15 ∞

4-3i

0

1 2 3 4 25 15 ∞4-Bx , B ≥ 4 ≥ 19 15 ∞

2 interesting cases...

Discrete DAG identifiability 14/29

Page 15: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Model A (binary)

With binary variables, the parameterization for

0

2 1 3 4

is generically 2-to-1 on stochastic parameter space.

This model is not reducible to Kruskal.

Discrete DAG identifiability 15/29

Page 16: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

With binary variables, the parameterization for

0

2 1 3 4

is generically 2-to-1 on stochastic parameter space.

Sketch:

• Condition on X1,X3 (4 ways), to give 4 matrices

• Construct expressions in these matrices whose eigenvectors

identify parameters.

• Need generic condition: distinct eigenvalues. Equivalently:

There is a 3-way interaction between X0,X1,X3

Discrete DAG identifiability 16/29

Page 17: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Why is the 3-way interaction needed?

0

2 1 3 4

has ∞-to-1 parametrization. So

0

2 1 3 4

5

does as well.

Discrete DAG identifiability 17/29

Page 18: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

But conditioning

0

2 1 3 4

5

on X5 yields

0

2 1 3 4

still with an ∞-to-1 parameterization.

Discrete DAG identifiability 18/29

Page 19: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Contradiction, since

0

2 1 3 4

has a 2-to-1 parameterization (Model A).

FLAW: Conditioning gave a non-generic instance – no 3-way

interaction between 0,1,3 – there is no contradiction

Discrete DAG identifiability 19/29

Page 20: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Moral 1: Conditioning must be done carefully, to give a generic

model.

Moral 2: Frameworks such as summary graphs and maximal

ancestral graphs which graphically depict some consequences of

conditioning are not helpful here – don’t get generic instances.

Discrete DAG identifiability 20/29

Page 21: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Model A (general)

The model

0

2 1 3 4

is generically identifiable, up to label swapping, provided

n2, n4 ≥ n0

Discrete DAG identifiability 21/29

Page 22: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Model B (binary)With binary variables, the parameterization for

0

2 1 3 4

is generically 4-to-1 on stochastic parameter space,

– not just label swapping –

Sketch:

• Conditioning a generic model on X1 yields 2 generic

0

1 2 3

models

• These each have 2-to-1 parameterizations.

• Any of the 4 choices of parameters for them can be “combined”

to give parameters for the original model.

Discrete DAG identifiability 22/29

Page 23: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Model B (general)

If n2, n3, n4 sufficiently large relative to n0, then

0

2 1 3 4

has a (n0!)n1-to-1 parameterization.

Moreover, a full fiber can be obtained from any single element by

rational formulas.

Discrete DAG identifiability 23/29

Page 24: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Large DAG models

If a DAG model has a k-to-1 parameterization, then

k is unchanged if:

• remove observable sinks with all parents observable

• pass to Markov equivalent graphs

k may change if:

• marginalize/condition on observed variables

Cautions:

• Marginalize only over sinks, but risk losing identifiability.

• Condition carefully, to get generic model.

Discrete DAG identifiability 24/29

Page 25: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

A general result

Building on Model B0

2 1 3 4 ,

Theorem: Suppose a DAG has one latent node 0 with no parents,

and three observable sinks 1, 2, 3 that are children of 0.

Let

C = Anc (Chd(0) ∩ Anc(1) ∩ Anc(2) ∩ Anc(3)) r {0},

and

u = |C ∩ Pa(1) ∩ Pa(2) ∩ Pa(3)| .

Then for binary variables, the parametrization is generically k-to-1

with k = 22u

the potential fiber can be described, and thus k can be determined

exactly.

(non-binary version also)

Discrete DAG identifiability 25/29

Page 26: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Example: Model B

0

2 1 3 4

Sinks 2,3,4, all children of 0,

C = Anc (Chd(0) ∩ Anc(2) ∩ Anc(3) ∩ Anc(4)) r {0}

= {1}

u = |C ∩ Pa(2) ∩ Pa(3) ∩ Pa(4)|

= 1

so 221= 4-to-1 parameterization

Discrete DAG identifiability 26/29

Page 27: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Example: from beginning of talk

0

2

1

3

4

56

7

Remove 7 (observable child with observable parents):

0

2

1

3

4

56

Discrete DAG identifiability 27/29

Page 28: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

0

2

1

3

4

56

Sinks 4,5,6 all children of 0,

C = Anc (Chd(0) ∩ Anc(4) ∩ Anc(5) ∩ Anc(6)) r {0}

= ∅

u = |C ∩ Pa(4) ∩ Pa(5) ∩ Pa(6)|

= 0

so 220= 2-to-1 parameterization

Discrete DAG identifiability 28/29

Page 29: Parameter identifiability of discrete DAG models with ...mypages.iit.edu/~as2014/talks/Rhodes.pdf · is generically 4-to-1 on stochastic parameter space, – not just label swapping

Final comments:

• A 2-sink theorem is “under development,” building on0

2 1 3 4

• Multiple latent variables with no/limited common children may

be handlable.

• Main impediment to non-binary variables is awkwardness of

statements.

Discrete DAG identifiability 29/29