Bayes Net Perspectives on Causation and Causal Inference
description
Transcript of Bayes Net Perspectives on Causation and Causal Inference
Peter Spirtes
Bayes Net Perspectives on Causation and Causal Inference
1
Example Problems Genetic regulatory networks
Yeast – ~5000 genes, ~2,500,000 potential edges
2
A gene regulatory network in mouse embryonic stem cells http://www.pnas.org/content/104/42/16438/F3.expansion.html
Causal Models → Predictions Probabilistic – Among the cells that have active
Oct4 what percentage have active Rcor2?
3
Causal – If I experimentally set a cell to have active Oct4, what percentage will have active Rcor2?
Causal Models → Predictions
4
Counterfactual – Among the cells that did not have active Oct4 at t-1, what percentage would have active Rcor2 if I had experimentally set a cell to have active Oct4 at t-1?
Data → Causal Models Large number of variables Small observed sample size
5
Overlapping variables Small number of
experiments Feedback Hidden common causes Selection bias Many kinds of entities
causally interacting
Outline Bayesian Networks Search Limitations and Extensions of Bayesian
Networks Dynamic Relational Cycles Counterfactual
6
7
Directed Acyclic Graph (DAG)SES – Socioeconomic StatusPE – Parental EncouragementCP – College PlansIQ – Intelligence QuotientSEX – Sex
The vertices are random variables. All edges are directed. There are no directed cycles.
SES
SEX PE CP
IQ
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
Population
8
SES
SEX PE CP
IQ
SES
SEX PE CP
IQ
SES
SEX PE CP
IQ
Independent, identically distributed
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
9
P Factoring According to G P(SES,SEX,PE,IQ,CP) = P(SEX)P(SES)P(IQ|SES) P(PE|SES,SEX,IQ) P(CP|PE,SES,IQ)
SES
SEX PE CP
IQ If then P factors according to G G represents all of the distributions that
factor according to G
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
Conditional Independence
X is independent of Y conditional on Z (denoted IP(X,Y|Z)) iff P(X|Y,Z) = P(X|Z).
IP(CP,SEX|{SES,IQ,PE}) iff P(CP|{SES,IQ,PE,SEX}) = P(CP|{SES,IQ,PE})
10
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
11
Graphical Entailment If for every P that factors
according to G, IP(X,Y|Z) holds, then G entails I(X,Y|Z).
Examples: G entails I(IQ,SEX|∅) I(IQ,SEX|SES)
Can read entailments off of graph through d-separation
SES
SEX PE CP
IQ
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
12
D-separation and D-connection X d-separated from Y conditional
on Z in G iff G entails X independent of Y conditional on Z
D-separation between X and Y conditional on Z holds when certain kinds of paths do not exist between X and Y
SES
SEX PE CP
IQ
D-connection (the negation of d-separation) between X and Y conditional on Z holds when certain kinds of paths do exist between X and Y
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
13
Definition of D-connection A node X is active on a path U
conditional on Z iff X is a collider (→ X ←) and
there is a directed path from X to a member of Z or X is in Z; or
X is not a collider and X is not in Z.
SES
SEX PE CP
IQ
SES → IQ → PE ← SEX is a path U. PE is active on U conditional on {CP, IQ}. IQ is inactive on U conditional on {CP, IQ}.
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
14
Definition of D-connection A path U is active conditional on
Z iff every vertex on U is active relative to Z.
X is d-connected to Y conditional on Z iff there is an active path between X and Y conditional on Z.
SES
SEX PE CP
IQ
SES → IQ → PE ← SEX is inactive conditional on {CP, IQ}.
SES is d-connected to SEX conditional on {CP, IQ} because SES → PE ← SEX is active conditional on {CP, IQ}
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
15
If I is Not Entailed by G If conditional independence
relation I is not entailed by G, then I may hold in some (but not every) distribution P that factors according to G.
SES
SEX PE CP
IQ
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
Example: There are P and P’ that factor according to G such that ~IP(SES,CP|∅) and IP’(SES,CP|∅). P’ is said to be unfaithful to G.
Manipulations An ideal manipulation assigns a density to
a set X of properties (random variables) as a function of the values of a set Z of properties (random variables) Directly affects only the variables in X Successful
Example – randomized experiment
16
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
There is an edge SES → CP in G because there are two ways of manipulating {SES,SEX,IQ,PE} that differ only in the value they assign to SES that changes the probability of CP.
Manipulations and Causal Graph SES
SEX PE CP
IQ
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
Stable Unit Treatment Value Assumption 17
Causal Sufficiency A set S of variables is
causally sufficient if there are no variables not in S that are direct causes of more than one variable in S.
S = {SES,IQ} is causally sufficient.
S = {SES,PE,CP} is not causally sufficient.
18
SES
SEX PE CP
IQ
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
In a population Pop with distribution P and causal graph G, if V is causally sufficient, P(V) factors according to G.
P(SES,SEX,PE,IQ,CP) = P(SEX)P(SES)P(IQ|SES) P(PE|SES,SES,IQ) P(CP|PE,SES,IQ)
Causal Markov Assumption SES
SEX PE CP
IQ
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
19
20
Representation of ManipulationP(SES,SEX,PE=1,IQ,CP||PE=1) = P(SEX)P(SES)P(IQ|SES) * 1 * P(CP|PE,SES,IQ) = P(SES,SEX,PE=1,IQ,CP)/P(PE|SEX,SES,IQ)
SES
SEX PE CP
IQ
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
21
FCI Algorithm
Looks for set of DAGs (possibly with latent variables and selection bias) that entail all and only the conditional independence relations that hold in the data according to statistical tests.
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
22
Markov Equivalence Two DAGs G1 and G2 are Markov
equivalent when they contain the same variables, and for all disjoint X, Y, Z, X is entailed to be independent from Y conditional on Z in G1 if and only if X is entailed to be independent from Y conditional on Z in G2
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
23
Markov Equivalence Class
DAG G DAG G’
SES
SEX PE CP
IQ
SES
SEX PE CP
IQ
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
In a population Pop with causal graph G and distribution P(V), if V is causally sufficient, IP(X,Y|Z) only if G entails I(X,Y|Z).
~IP(SES,CP|∅) because I(SES,CP|∅)is not entailed by G
+…
Causal Faithfulness Assumption SES
SEX PE CP
IQ
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
24
Causal Faithfulness is too strong because can prove consistency with
assumptions about fewer conditional independencies
is unlikely to hold, especially when there are many variables.
Causal Faithfulness Assumption SES
SEX PE CP
IQ
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
Causal Faithfulness is too weak because it is not sufficient to prove uniform consistency (put error bounds at finite sample sizes.)
25
26
Good Features of FCI Algorithm Is pointwise consistent: As sample size → ∞, P(error
in output pattern) → 0. Can be applied to distributions where tests of
conditional independence are known Can be applied to hidden variable models (and
selection bias models)
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
27
Bad Features of FCI Algorithm
There is no reliable way to set error bounds on the pattern without making stronger assumptions.
Can only get set of Markov equivalent DAGs, not a single DAG
Doesn’t allow for comparing how much better one model is than another
Need to assume some version of Causal Faithfulness Assumption
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
28
Non Independence Constraints Depending on the parametric family, a
DAG can entail constraints that are not conditional independence constraints Assuming linearity and non-Gaussian error terms,
if a distribution is compatible with X → Y it is not compatible with X ← Y, even though they are Markov equivalent.
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
Score-Based Search Strategy
Assign score to graph and sample based on maximum likelihood of data given graph simplicity of model
Do search over graph space for highest score
29
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
Advantages of Score-Based Search Strategy
Get more information about graph Additive noise models, unique DAG
Doesn’t rely on binary decisions Local mistakes don’t propagate
30
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
Disadvantages of Score-Based Search Strategy Often slower to calculate or not known how
to calculate exactly if include unmeasured variables selection bias unusual distributions
Search over graph space is often heuristic
31
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
Dynamic Bayes Nets
If measure same variable at different times, then the samples from the variable are not i.i.d.
Solution: index each variable by time (time series)
32
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
Dynamic Bayes Nets
Make a template for the causal structure that can be filled in with actual times
Xt-2 Xt-1 Xt
Yt-2 Yt-1 Yt
Continuous time or differential equations?
Continuous time or differential equations?
33
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
Population
34
SES
SEX PE CP
IQ
SES
SEX PE CP
IQ
SES
SEX PE CP
IQ
parent-ofparent-ofparent-of
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
Population
35
parent-of parent-of
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
parent-of
Not i.i.d. distribution Violations of SUTVA Causal relations between relations (e.g. sibling
causes rivalry)
SES
SEX PE CP
IQ
Extended Manipulation Specification A manipulation assigns a density to
a set of properties or relations at a set of times (measurable set of times T) for a set of units
as a function of the values of a set of properties of relations at a set of times (measurable set of times T) for a set of units
36
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
Extended Factorization Assumption
37
SES
SEX PE CP
IQ
parent-ofparent-of
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
P([Alice&Jim.SES, Sue.SEX, Sue.PE, Sue.IQ, Sue.CP, Alice&Jim.SES, Bob.SEX, Bob.PE, Bob.IQ, Bob.CP) =
BobSue
Alice&Jim
Extended Factorization Assumption
38
P(Sue.SEX) P(Alice&Jim.SES) P(Sue.IQ|Alice&Jim.SES) P(Sue.PE|Alice&Jim.SES, Sue.SEX, Sue.IQ) P(Sue.CP|Sue.PE, Alice&Jim.SES, Sue.IQ)P(Bob.SEX) P(Alice&Jim.SES) P(Bob.IQ|Alice&Jim.SES) P(Bob.PE|Alice&Jim.SES, Bob.SEX, Bob.IQ) P(Bob.CP|Bob.PE, Alice&Jim.SES, Bob.IQ)
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
Equilibrium values of PE and CP cause each other.
Average of values of PE and CP while reaching equilibrium influence each other.
Mixture of PE → CP and CP → PE
3 Interpretation of Cycles: PE ⇆ CP
SES
SEX PE CP
IQ
• Bayesian Networks• Search
• Limitations and Extensions• Dynamic• Relational
• Cycles• Counterfactual
39