Penn State - March 23, 2007 1
The TETRAD Project: Computational Aids to
Causal Discovery
Peter Spirtes, Clark Glymour, Richard Scheines
and many others
Department of Philosophy
Carnegie Mellon
Penn State - March 23, 2007 2
Agenda
1. Morning I: Theoretical Overview
Representation, Axioms, Search
2. Morning II: Research Problems
3. Afternoon: TETRAD Demo - Workshop
Penn State - March 23, 2007 3
Part I: Agenda
1. Motivation
2. Representation
3. Connecting Causation to Probability (Independence)
4. Searching for Causal Models
5. Improving on Regression for Causal Inference
Penn State - March 23, 2007 4
1. Motivation
Non-experimental Evidence
Typical Predictive Questions
• Can we predict aggressiveness from the amount of violent TV watched
Causal Questions:
• Does watching violent TV cause Aggression?
• I.e., if we intervene to change TV watching, will the level of Aggression
change?
Day Care Aggressiveness
John
Mary
A lot
None
A lot
A little
Penn State - March 23, 2007 5
Causal Estimation
Manipulated Probability P(Y | X set= x, Z=z)
from
Unmanipulated Probability P(Y,X,Z)
When and how can we use non-experimental data to tell us about the effect of an intervention?
Penn State - March 23, 2007 6
Spartina in the Cape Fear Estuary
Penn State - March 23, 2007 7
What FactorsDirectly Influence Spartina Growth in the Cape Fear Estuary?
pH, salinity, sodium, phosphorus, magnesium, ammonia, zinc, potassium…, what?
14 variables for 45 samples of Spartina from Cape Fear Estuary.
Biologist concluded salinity must be a factor.
Bayes net analysis says only pH directly affects Spartina biomass
Biologist’s subsequent greenhouse experiment says: if pH is controlled for, variations in salinity do not affect growth; but if salinity is controlled for, variations in pH do affect growth.
Penn State - March 23, 2007 8
2. Representation
1. Association & causal structure -
qualitatively
2. Interventions
3. Statistical Causal Models
1. Bayes Networks
2. Structural Equation Models
Penn State - March 23, 2007 9
Causation & Association
X is a cause of Y iff x1 x2 P(Y | X set= x1) P(Y | X set= x2)
Causation is asymmetric: X Y Y X
X and Y are associated (X _||_ Y) iff
x1 x2 P(Y | X = x1) P(Y | X = x2)
Association is symmetric: X _||_ Y Y _||_ X
Penn State - March 23, 2007 10
Direct Causation
X is a direct cause of Y relative to S, iff z,x1 x2 P(Y | X set= x1 , Z set= z)
P(Y | X set= x2 , Z set= z)
where Z = S - {X,Y} X Y
Penn State - March 23, 2007 11
Causal Graphs
Causal Graph G = {V,E} Each edge X Y represents a direct causal claim:
X is a direct cause of Y relative to V
Exposure Rash
Exposure Infection Rash
Chicken Pox
Penn State - March 23, 2007 12
Causal Graphs
Not Cause Complete
Common Cause Complete
Exposure Infection Symptoms
Omitted Causes
Exposure Infection Symptoms
Omitted
Common Causes
Penn State - March 23, 2007 13
Sweaters
On
Room Temperature
Pre-experimental SystemPost
Modeling Ideal Interventions
Interventions on the Effect
Penn State - March 23, 2007 14
Modeling Ideal Interventions
Sweaters
OnRoom
Temperature
Pre-experimental SystemPost
Interventions on the Cause
Penn State - March 23, 2007 15
Ideal Interventions & Causal Graphs
• Model an ideal intervention by adding an “intervention” variable outside the original system
• Erase all arrows pointing into the variable intervened upon
Exp Inf
Rash
Intervene to change Inf
Post-intervention graph?Pre-intervention graph
Exp Inf Rash
I
Penn State - March 23, 2007 16
Conditioning vs. Intervening
P(Y | X = x1) vs. P(Y | X set= x1)
Teeth Slides
Penn State - March 23, 2007 17
Causal Bayes Networks
P(S = 0) = .7P(S = 1) = .3
P(YF = 0 | S = 0) = .99 P(LC = 0 | S = 0) = .95P(YF = 1 | S = 0) = .01 P(LC = 1 | S = 0) = .05P(YF = 0 | S = 1) = .20 P(LC = 0 | S = 1) = .80P(YF = 1 | S = 1) = .80 P(LC = 1 | S = 1) = .20
Smoking [0,1]
Lung Cancer[0,1]
Yellow Fingers[0,1]
P(S,YF, L) = P(S) P(YF | S) P(LC | S)
The Joint Distribution Factors
According to the Causal Graph,
i.e., for all X in V
P(V) = P(X|Immediate Causes of(X))
Penn State - March 23, 2007 18
Structural Equation Models
1. Structural Equations2. Statistical Constraints
Education
LongevityIncome
Statistical Model
Causal Graph
Penn State - March 23, 2007 19
Structural Equation Models
Structural Equations: One Equation for each variable V in the graph:
V = f(parents(V), errorV)for SEM (linear regression) f is a linear function
Statistical Constraints: Joint Distribution over the Error terms
Education
LongevityIncome
Causal Graph
Penn State - March 23, 2007 20
Structural Equation Models
Equations: Education = ed
Income =Educationincome
Longevity =EducationLongevity
Statistical Constraints: (ed, Income,Income ) ~N(0,2)
2diagonal - no variance is zero
Education
LongevityIncome
Causal Graph
Education
Income Longevity
1 2
LongevityIncome
SEM Graph
(path diagram)
Penn State - March 23, 2007 21
3. Connecting
Causation to Probability
Penn State - March 23, 2007 22
Causal Structure
Statistical Predictions
The Markov Condition
Causal Graphs
Z Y X
Independence
X _||_ Z | Y
i.e.,
P(X | Y) = P(X | Y, Z)
Causal Markov Axiom
Penn State - March 23, 2007 23
Causal Markov Axiom
If G is a causal graph, and P a probability distribution over the variables in G, then in P:
every variable V is independent of its non-effects, conditional on its immediate causes.
Penn State - March 23, 2007 24
Causal Markov Condition
Two Intuitions: 1) Immediate causes make effects independent
of remote causes (Markov).
2) Common causes make their effects independent (Salmon).
Penn State - March 23, 2007 25
Causal Markov Condition
1) Immediate causes make effects independent of remote causes (Markov).
E || S | I
E = Exposure to Chicken Pox
I = Infected
S = Symptoms
S I E
Markov Cond.
Penn State - March 23, 2007 26
Causal Markov Condition
2) Effects are independent conditional on their common causes.
YF || LC | S
Smoking (S)
Yellow Fingers (YF)
Lung Cancer (LC)
Markov Cond.
Penn State - March 23, 2007 27
Causal Structure Statistical Data
X3 | X2 X1
X2 X3 X1
Causal Markov Axiom (D-separation)
Independence Relations
Acyclic Causal Graph
Penn State - March 23, 2007 28
Causal Markov Axiom
In SEMs, d-separation follows from assuming independence among error terms that have no connection in the path diagram -
i.e., assuming that the model is common cause complete.
Penn State - March 23, 2007 29
Causal Markov and D-Separation
• In acyclic graphs: equivalent
• Cyclic Linear SEMs with uncorrelated errors:• D-separation correct
• Markov condition incorrect
• Cyclic Discrete Variable Bayes Nets:• If equilibrium --> d-separation correct
• Markov incorrect
Penn State - March 23, 2007 30
D-separation: Conditioning vs. Intervening
X3
T
X2 X1
X3
T
X2 X1
I
P(X3 | X2) P(X3 | X2, X1)
X3 _||_ X1 | X2
P(X3 | X2 set= ) = P(X3 | X2 set=, X1)
X3 _||_ X1 | X2 set=
Penn State - March 23, 2007 31
4. Search
From Statistical Data
to Probability to Causation
Penn State - March 23, 2007 32
Causal Discovery
Statistical Data Causal Structure
Background Knowledge
- X2 before X3
- no unmeasured common causes
X3 | X2 X1
Independence Relations
Data
Statistical Inference
X2 X3 X1
Equivalence Class of Causal Graphs
X2 X3 X1
X2 X3 X1
Discovery Algorithm
Causal Markov Axiom (D-separation)
Penn State - March 23, 2007 33
Faithfulness
X3 | X2 X1
X2 X3 X1Causal Markov Axiom
(D-separation)
IndependenceRelations
X2 X3
X1
Causal Markov Axiom(D-separation)
Special ParameterValues
No Independence Relations
Penn State - March 23, 2007 34
Faithfulness Assumption
Statistical Constraints arise from Causal
Structure, not Coincidence
All independence relations holding in a
probability distribution P generated by a
causal structure G are entailed by d-
separation applied to G.
Penn State - March 23, 2007 35
Faithfulness Assumption
Revenues = aRate + cEconomy + Rev.
Economy = bRate + Econ.
a -bcTax Revenues
Economyc
ba
Tax Rate
Penn State - March 23, 2007 36
Representations ofD-separation Equivalence Classes
We want the representations to:
• Characterize the Independence Relations Entailed by the Equivalence Class
• Represent causal features that are shared by every member of the equivalence class
Penn State - March 23, 2007 37
Patterns & PAGs
• Patterns (Verma and Pearl, 1990): graphical representation of an acyclic d-separation equivalence - no latent variables.
• PAGs: (Richardson 1994) graphical representation of an equivalence class including latent variable models and sample selection bias that are d-separation equivalent over a set of measured variables X
Penn State - March 23, 2007 38
Patterns
X2 X1
X2 X1
X2 X1
X4 X3
X2 X1
Possible Edges Example
Penn State - March 23, 2007 39
Patterns: What the Edges Mean
X2 X1
X2 X1X1 X2 in some members of theequivalence class, and X2 X1 inothers.
X1 X2 (X1 is a cause of X2) inevery member of the equivalenceclass.
X2 X1 X1 and X2 are not adjacent in anymember of the equivalence class
Penn State - March 23, 2007 40
Patterns
X2
X4 X3
X1
X2
X4 X3
Represents
Pattern
X1 X2
X4 X3
X1
Penn State - March 23, 2007 41
PAGs: Partial Ancestral Graphs
X2 X1
X2 X1
X2 X1
X2 There is a latent commoncause of X1 and X2
No set d-separates X2 and X1
X1 is a cause of X2
X2 is not an ancestor of X1
X1
X2 X1 X1 and X2 are not adjacent
What PAG edges mean.
Penn State - March 23, 2007 42
PAGs: Partial Ancestral Graphs
X2
X3
X1
X2
X3
Represents
PAG
X1 X2
X3
X1
X2
X3
T1
X1
X2
X3
X1
etc.
T1
T1 T2
Penn State - March 23, 2007 43
Overview of Search Methods
• Constraint Based Searches• TETRAD
• Scoring Searches• Scores: BIC, AIC, etc.• Search: Hill Climb, Genetic Alg., Simulated
Annealing• Difficult to extend to latent variable models
Heckerman, Meek and Cooper (1999). “A Bayesian Approach to Causal Discovery” chp. 4 in Computation, Causation, and Discovery, ed. by Glymour and Cooper, MIT Press, pp. 141-166
Penn State - March 23, 2007 44
Search - Illustration
X4 X3
X2
X1 Independencies entailed
X1 _||_ X2
X1_||_ X4 | X3
X2_||_ X4 | X3
Penn State - March 23, 2007 45
X1
X2
X3 X4
CausalGraph
Independcies
Begin with:
X1
X2
X3 X4
X1 X2
X1 X4 {X3}
X2 X4 {X3}
Search: Adjacency
Penn State - March 23, 2007 46
X1
X2
X3 X4
Causal Graph
Independcies
Begin with:
From
X1
X2
X3 X4
X1 X2
X1 X4 {X3}
X2 X4 {X3}
X1
X2
X3 X4
X1
X2
X3 X4
X1
X2
X3 X4
From
From
X1 X2
X1 X4 {X3}
X2 X4 {X3}
Penn State - March 23, 2007 47
Search: Orientation in Patterns
X Y Z
X Z | YX Z | Y
Before OrientationY Unshielded
Collider Non-collider
X Y Z
X Y Z
X Y Z
X Y Z
X Y Z
Penn State - March 23, 2007 48
Search: Orientation
X4 X3
X2
X1
X4 X3
X2
X1
X4 X3
X2
X1
X4 X3
X2
X1
X4 X3
X2
X1
PAG Pattern
X4 X3
X2
X1
X1 || X2
X1 || X4 | X3
X2 || X4 | X3
After Orientation Phase
Penn State - March 23, 2007 49
The theory of interventions, simplified
Start with an graphical causal model, without feedback.
Simplest Problem: To predict the probability distribution of other represented variables resulting from an intervention that forces a value x on a variable X, (e.g., everybody has to smoke) but does not otherwise alter the causal structure.
Penn State - March 23, 2007 50
First Thing
Remember: The probability distribution for values of Y conditional on X = x is not in general the same as the probability distribution for values of Y on an intervention that sets X = x.
Recent work by Waldemann gives evidence that adults are sensitive to the difference.
Penn State - March 23, 2007 51
Example
X
Y Z WBecause X influences Y, the value of X gives information about the value of Y, and vice versa. X and Y are dependent in probability.
But: An intervention that forces a value Y = y on Y, and otherwise does not disturb the system should not change the probability distribution for values of X.
It should, necessarily, make the value of Y independent of X—informally, the value of Y should give no information about the value of X, and vice-versa.
Penn State - March 23, 2007 52
Representing a Simple Manipulation
Observed Structure:
Structure uponManipulating Yellow Fingers:
Smoking [0,1]
Lung Cancer[0,1]
Yellow Fingers[0,1]
Smoking [0,1]
Lung Cancer [0,1]
Yellow Fingers = 0
Penn State - March 23, 2007 53
Intervention Calculations
X
Y Z W
1. Set Y = y
2. Do “surgery” on the graph: eliminate edges into Y
3. Use the Markov factorization of the resulting graph and probability distribution to compute the probability distribution for X, Z, W—various effective rules incorporated in what Pearl calls the “Do” calculus.
Penn State - March 23, 2007 54
Intervention Calculations
X
Y= y Z W
Original Markov Factorization
Pr(X, Y, Z, W) = Pr(W | X,Z) Pr(Z | Y) Pr(Y | X) Pr(X)
The Factorization After Intervention
Pr(X, Y, W | Do(Y = y) = Pr(W | X,Z) Pr(Z | Y = y) Pr(X)
Penn State - March 23, 2007 55
What’s The Point?
Pr(X, Y, W | Do(Y = y) = Pr(W | X,Z) Pr(Z | Y = y) Pr(X)
The probability distribution on the left hand side is a prediction of the effects of an intervention.
The probabilities on the right are all known before the intervention.
So causal structure plus probabilities => prediction of intervention effects; provide a basis for planning.
Penn State - March 23, 2007 56
Surprising Results
X
Y Z WSuppose we know:
the causal structure
the joint probability for Y, Z, W only—not for X
We CAN predict the effect on Z of an intervention on Y—even though Y and W are confounded by unobserved X.
Penn State - March 23, 2007 57
Surprising Result
X
Y Z WThe effect of Z on W is confounded by the the probabilistic
effect of Y on Z, X on Y and X on W.
But the probabilistic effect on W of an intervention on Z CAN be computed from the probabilities before the intervention.
How? By conditioning on Y.
Penn State - March 23, 2007 58
Surprising Result
X
Y Z WPr(W, Y, X | Do(Z = z )) = Pr(W | Z =z, X) Pr(Y | X) Pr(X) (by surgery)
Pr(W, X | Do(Z = z), Y) = Pr(W | Z =z, X) Pr(X | Y) (condition on Y)
Pr(W | Do(Z = z), Y) = x Pr(W | Z = z, X, Y) Pr(X | Y) (marginalize out X)
Pr(W | Do(Z = z), Y) = Pr(W | Z = z, Y = y) (obscure probability theorem)
Pr(W | Do(Z = z)) = y Pr(W | Z = z, Y) Pr(Y) (marginalize out Y)
The right hand side is composed entirely of observed probabilities.
Penn State - March 23, 2007 59
Pearl’s “Do” Calculus
Provides rules that permit one to avoid the probability calculations we just went through—graphical properties determine whether effects of an intervention can be predicted.
Penn State - March 23, 2007 60
Applications
• Spartina Grass
• Parenting among Single, Black Mothers
• Pneumonia
• Photosynthesis
• Lead - IQ
• College Retention
• Corn Exports
• Rock Classification
• College Plans
• Political Exclusion
• Satellite Calibration
• Naval Readiness
Penn State - March 23, 2007 61
References
• Causation, Prediction, and Search, 2nd Edition, (2000), by P. Spirtes, C. Glymour, and R. Scheines ( MIT Press)
• Computation, Causation, & Discovery (1999), edited by C. Glymour and G. Cooper, MIT Press
• Causality in Crisis?, (1997) V. McKim and S. Turner (eds.), Univ. of Notre Dame Press.
• TETRAD IV: www.phil.cmu.edu/projects/tetrad
• Web Course on Causal and Statistical Reasoning : www.phil.cmu.edu/projects/csr/
• Causality Lab: www.phil.cmu.edu/projects/causality-lab
Top Related