Bayes Net Perspectives on Causation and Causal Inference

39
Peter Spirtes Bayes Net Perspectives on Causation and Causal Inference 1

description

Bayes Net Perspectives on Causation and Causal Inference. Peter Spirtes. Example Problems. Genetic regulatory networks Yeast – ~5000 genes, ~2,500,000 potential edges. A gene regulatory network in mouse embryonic stem cells http://www.pnas.org/content/104/42/16438/F3.expansion.html. - PowerPoint PPT Presentation

Transcript of Bayes Net Perspectives on Causation and Causal Inference

Page 1: Bayes Net Perspectives on Causation and Causal Inference

Peter Spirtes

Bayes Net Perspectives on Causation and Causal Inference

1

Page 2: Bayes Net Perspectives on Causation and Causal Inference

Example Problems Genetic regulatory networks

Yeast – ~5000 genes, ~2,500,000 potential edges

2

A gene regulatory network in mouse embryonic stem cells http://www.pnas.org/content/104/42/16438/F3.expansion.html

Page 3: Bayes Net Perspectives on Causation and Causal Inference

Causal Models → Predictions Probabilistic – Among the cells that have active

Oct4 what percentage have active Rcor2?

3

Causal – If I experimentally set a cell to have active Oct4, what percentage will have active Rcor2?

Page 4: Bayes Net Perspectives on Causation and Causal Inference

Causal Models → Predictions

4

Counterfactual – Among the cells that did not have active Oct4 at t-1, what percentage would have active Rcor2 if I had experimentally set a cell to have active Oct4 at t-1?

Page 5: Bayes Net Perspectives on Causation and Causal Inference

Data → Causal Models Large number of variables Small observed sample size

5

Overlapping variables Small number of

experiments Feedback Hidden common causes Selection bias Many kinds of entities

causally interacting

Page 6: Bayes Net Perspectives on Causation and Causal Inference

Outline Bayesian Networks Search Limitations and Extensions of Bayesian

Networks Dynamic Relational Cycles Counterfactual

6

Page 7: Bayes Net Perspectives on Causation and Causal Inference

7

Directed Acyclic Graph (DAG)SES – Socioeconomic StatusPE – Parental EncouragementCP – College PlansIQ – Intelligence QuotientSEX – Sex

The vertices are random variables. All edges are directed. There are no directed cycles.

SES

SEX PE CP

IQ

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 8: Bayes Net Perspectives on Causation and Causal Inference

Population

8

SES

SEX PE CP

IQ

SES

SEX PE CP

IQ

SES

SEX PE CP

IQ

Independent, identically distributed

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 9: Bayes Net Perspectives on Causation and Causal Inference

9

P Factoring According to G P(SES,SEX,PE,IQ,CP) = P(SEX)P(SES)P(IQ|SES) P(PE|SES,SEX,IQ) P(CP|PE,SES,IQ)

SES

SEX PE CP

IQ If then P factors according to G G represents all of the distributions that

factor according to G

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 10: Bayes Net Perspectives on Causation and Causal Inference

Conditional Independence

X is independent of Y conditional on Z (denoted IP(X,Y|Z)) iff P(X|Y,Z) = P(X|Z).

IP(CP,SEX|{SES,IQ,PE}) iff P(CP|{SES,IQ,PE,SEX}) = P(CP|{SES,IQ,PE})

10

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 11: Bayes Net Perspectives on Causation and Causal Inference

11

Graphical Entailment If for every P that factors

according to G, IP(X,Y|Z) holds, then G entails I(X,Y|Z).

Examples: G entails I(IQ,SEX|∅) I(IQ,SEX|SES)

Can read entailments off of graph through d-separation

SES

SEX PE CP

IQ

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 12: Bayes Net Perspectives on Causation and Causal Inference

12

D-separation and D-connection X d-separated from Y conditional

on Z in G iff G entails X independent of Y conditional on Z

D-separation between X and Y conditional on Z holds when certain kinds of paths do not exist between X and Y

SES

SEX PE CP

IQ

D-connection (the negation of d-separation) between X and Y conditional on Z holds when certain kinds of paths do exist between X and Y

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 13: Bayes Net Perspectives on Causation and Causal Inference

13

Definition of D-connection A node X is active on a path U

conditional on Z iff X is a collider (→ X ←) and

there is a directed path from X to a member of Z or X is in Z; or

X is not a collider and X is not in Z.

SES

SEX PE CP

IQ

SES → IQ → PE ← SEX is a path U. PE is active on U conditional on {CP, IQ}. IQ is inactive on U conditional on {CP, IQ}.

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 14: Bayes Net Perspectives on Causation and Causal Inference

14

Definition of D-connection A path U is active conditional on

Z iff every vertex on U is active relative to Z.

X is d-connected to Y conditional on Z iff there is an active path between X and Y conditional on Z.

SES

SEX PE CP

IQ

SES → IQ → PE ← SEX is inactive conditional on {CP, IQ}.

SES is d-connected to SEX conditional on {CP, IQ} because SES → PE ← SEX is active conditional on {CP, IQ}

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 15: Bayes Net Perspectives on Causation and Causal Inference

15

If I is Not Entailed by G If conditional independence

relation I is not entailed by G, then I may hold in some (but not every) distribution P that factors according to G.

SES

SEX PE CP

IQ

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Example: There are P and P’ that factor according to G such that ~IP(SES,CP|∅) and IP’(SES,CP|∅). P’ is said to be unfaithful to G.

Page 16: Bayes Net Perspectives on Causation and Causal Inference

Manipulations An ideal manipulation assigns a density to

a set X of properties (random variables) as a function of the values of a set Z of properties (random variables) Directly affects only the variables in X Successful

Example – randomized experiment

16

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 17: Bayes Net Perspectives on Causation and Causal Inference

There is an edge SES → CP in G because there are two ways of manipulating {SES,SEX,IQ,PE} that differ only in the value they assign to SES that changes the probability of CP.

Manipulations and Causal Graph SES

SEX PE CP

IQ

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Stable Unit Treatment Value Assumption 17

Page 18: Bayes Net Perspectives on Causation and Causal Inference

Causal Sufficiency A set S of variables is

causally sufficient if there are no variables not in S that are direct causes of more than one variable in S.

S = {SES,IQ} is causally sufficient.

S = {SES,PE,CP} is not causally sufficient.

18

SES

SEX PE CP

IQ

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 19: Bayes Net Perspectives on Causation and Causal Inference

In a population Pop with distribution P and causal graph G, if V is causally sufficient, P(V) factors according to G.

P(SES,SEX,PE,IQ,CP) = P(SEX)P(SES)P(IQ|SES) P(PE|SES,SES,IQ) P(CP|PE,SES,IQ)

Causal Markov Assumption SES

SEX PE CP

IQ

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

19

Page 20: Bayes Net Perspectives on Causation and Causal Inference

20

Representation of ManipulationP(SES,SEX,PE=1,IQ,CP||PE=1) = P(SEX)P(SES)P(IQ|SES) * 1 * P(CP|PE,SES,IQ) = P(SES,SEX,PE=1,IQ,CP)/P(PE|SEX,SES,IQ)

SES

SEX PE CP

IQ

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 21: Bayes Net Perspectives on Causation and Causal Inference

21

FCI Algorithm

Looks for set of DAGs (possibly with latent variables and selection bias) that entail all and only the conditional independence relations that hold in the data according to statistical tests.

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 22: Bayes Net Perspectives on Causation and Causal Inference

22

Markov Equivalence Two DAGs G1 and G2 are Markov

equivalent when they contain the same variables, and for all disjoint X, Y, Z, X is entailed to be independent from Y conditional on Z in G1 if and only if X is entailed to be independent from Y conditional on Z in G2

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 23: Bayes Net Perspectives on Causation and Causal Inference

23

Markov Equivalence Class

DAG G DAG G’

SES

SEX PE CP

IQ

SES

SEX PE CP

IQ

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 24: Bayes Net Perspectives on Causation and Causal Inference

 In a population Pop with causal graph G and distribution P(V), if V is causally sufficient, IP(X,Y|Z) only if G entails I(X,Y|Z).

~IP(SES,CP|∅) because I(SES,CP|∅)is not entailed by G

+…

Causal Faithfulness Assumption SES

SEX PE CP

IQ

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

24

Page 25: Bayes Net Perspectives on Causation and Causal Inference

 Causal Faithfulness is too strong because can prove consistency with

assumptions about fewer conditional independencies

is unlikely to hold, especially when there are many variables.

Causal Faithfulness Assumption SES

SEX PE CP

IQ

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

 Causal Faithfulness is too weak because it is not sufficient to prove uniform consistency (put error bounds at finite sample sizes.)

25

Page 26: Bayes Net Perspectives on Causation and Causal Inference

26

Good Features of FCI Algorithm Is pointwise consistent: As sample size → ∞, P(error

in output pattern) → 0. Can be applied to distributions where tests of

conditional independence are known Can be applied to hidden variable models (and

selection bias models)

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 27: Bayes Net Perspectives on Causation and Causal Inference

27

Bad Features of FCI Algorithm

There is no reliable way to set error bounds on the pattern without making stronger assumptions.

Can only get set of Markov equivalent DAGs, not a single DAG

Doesn’t allow for comparing how much better one model is than another

Need to assume some version of Causal Faithfulness Assumption

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 28: Bayes Net Perspectives on Causation and Causal Inference

28

Non Independence Constraints Depending on the parametric family, a

DAG can entail constraints that are not conditional independence constraints Assuming linearity and non-Gaussian error terms,

if a distribution is compatible with X → Y it is not compatible with X ← Y, even though they are Markov equivalent.

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 29: Bayes Net Perspectives on Causation and Causal Inference

Score-Based Search Strategy

Assign score to graph and sample based on maximum likelihood of data given graph simplicity of model

Do search over graph space for highest score

29

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 30: Bayes Net Perspectives on Causation and Causal Inference

Advantages of Score-Based Search Strategy

Get more information about graph Additive noise models, unique DAG

Doesn’t rely on binary decisions Local mistakes don’t propagate

30

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 31: Bayes Net Perspectives on Causation and Causal Inference

Disadvantages of Score-Based Search Strategy Often slower to calculate or not known how

to calculate exactly if include unmeasured variables selection bias unusual distributions

Search over graph space is often heuristic

31

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 32: Bayes Net Perspectives on Causation and Causal Inference

Dynamic Bayes Nets

If measure same variable at different times, then the samples from the variable are not i.i.d.

Solution: index each variable by time (time series)

32

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 33: Bayes Net Perspectives on Causation and Causal Inference

Dynamic Bayes Nets

Make a template for the causal structure that can be filled in with actual times

Xt-2 Xt-1 Xt

Yt-2 Yt-1 Yt

Continuous time or differential equations?

Continuous time or differential equations?

33

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 34: Bayes Net Perspectives on Causation and Causal Inference

Population

34

SES

SEX PE CP

IQ

SES

SEX PE CP

IQ

SES

SEX PE CP

IQ

parent-ofparent-ofparent-of

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 35: Bayes Net Perspectives on Causation and Causal Inference

Population

35

parent-of parent-of

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

parent-of

Not i.i.d. distribution Violations of SUTVA Causal relations between relations (e.g. sibling

causes rivalry)

SES

SEX PE CP

IQ

Page 36: Bayes Net Perspectives on Causation and Causal Inference

Extended Manipulation Specification A manipulation assigns a density to

a set of properties or relations at a set of times (measurable set of times T) for a set of units

as a function of the values of a set of properties of relations at a set of times (measurable set of times T) for a set of units

36

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 37: Bayes Net Perspectives on Causation and Causal Inference

Extended Factorization Assumption

37

SES

SEX PE CP

IQ

parent-ofparent-of

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

P([Alice&Jim.SES, Sue.SEX, Sue.PE, Sue.IQ, Sue.CP, Alice&Jim.SES, Bob.SEX, Bob.PE, Bob.IQ, Bob.CP) =

BobSue

Alice&Jim

Page 38: Bayes Net Perspectives on Causation and Causal Inference

Extended Factorization Assumption

38

P(Sue.SEX) P(Alice&Jim.SES) P(Sue.IQ|Alice&Jim.SES) P(Sue.PE|Alice&Jim.SES, Sue.SEX, Sue.IQ) P(Sue.CP|Sue.PE, Alice&Jim.SES, Sue.IQ)P(Bob.SEX) P(Alice&Jim.SES) P(Bob.IQ|Alice&Jim.SES) P(Bob.PE|Alice&Jim.SES, Bob.SEX, Bob.IQ) P(Bob.CP|Bob.PE, Alice&Jim.SES, Bob.IQ)

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

Page 39: Bayes Net Perspectives on Causation and Causal Inference

Equilibrium values of PE and CP cause each other.

Average of values of PE and CP while reaching equilibrium influence each other.

Mixture of PE → CP and CP → PE

3 Interpretation of Cycles: PE ⇆ CP

SES

SEX PE CP

IQ

• Bayesian Networks• Search

• Limitations and Extensions• Dynamic• Relational

• Cycles• Counterfactual

39