Theory-based causal induction
Tom Griffiths
Brown University
Josh Tenenbaum
MIT
Three kinds of causal induction
Three kinds of causal induction
contingency data
“To what extent does C cause E?”(rate on a scale from 0 to 100)
E present (e+)
E absent (e-)
C present(c+)
C absent(c-)
a
b
c
d
Three kinds of causal induction
contingency data physical systems
A B
The stick-ball machine
(Kushnir, Schulz, Gopnik, & Danks, 2003)
Three kinds of causal induction
contingency data physical systems perceived causality
Michotte (1963)
Michotte (1963)
Three kinds of causal induction
contingency data physical systems perceived causality
bottom-upcovariationinformation
top-downmechanism knowledge
objectphysicsmodule
Three kinds of causal induction
contingency data physical systems perceived causality
more constrainedless constrained
prior knowledge+
statistical inference
more data less data
prior knowledge+
statistical inference
Theory-based causal induction
Theory
Bayesianinference
X Y
Z
X Y
Z
X Y
Z
X Y
Z
Hypothesis space
generates
Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0
...
Datagenerates
An analogy to language
Theory
X Y
Z
X Y
Z
X Y
Z
X Y
Z
Hypothesis space
generates
Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0
...
Datagenerates
Grammar
Parse trees
generates
Sentence
generates
The quick brown fox …
Outline
contingency data physical systems perceived causality
Outline
contingency data physical systems perceived causality
“To what extent does C cause E?”(rate on a scale from 0 to 100)
E present (e+)
E absent (e-)
C present(c+)
C absent(c-)
a
b
c
d
Buehner & Cheng (1997)
“To what extent does the chemical cause gene expression?”(rate on a scale from 0 to 100)
E present (e+)
E absent (e-)
C present(c+)
C absent(c-)
6
2
4
4Gen
e
Chemical
Humans
Buehner & Cheng (1997)
• Showed participants all combinations of P(e+|c+) and P(e+|c-) in increments of 0.25
Humans
Buehner & Cheng (1997)
• Showed participants all combinations of P(e+|c+) and P(e+|c-) in increments of 0.25
• Curious phenomenon: “frequency illusion”:– why do people’s judgments change when the cause does not change the
probability of the effect?
Causal graphical models
• Framework for representing, reasoning, and learning about causality (also called Bayes nets)
(Pearl, 2000; Spirtes, Glymour, & Schienes, 1993)
• Becoming widespread in psychology(Glymour, 2001; Gopnik et al., 2004; Lagnado & Sloman, 2002; Tenenbaum & Griffiths, 2001; Steyvers et al., 2003; Waldmann
& Martignon, 1998)
Causal graphical models
X Y
Z
• Variables
Causal graphical models
X Y
Z
• Variables
• Structure
Causal graphical models
X Y
Z
• Variables
• Structure
• Conditional probabilities
P(Z|X,Y)
P(X) P(Y)
Defines probability distribution over variables(for both observation, and intervention)
Causal graphical models
• Provide a basic framework for representing causal systems
• But… where is the prior knowledge?
Hamadeh et al. (2002) Toxicological sciences.
Clofibrate Wyeth 14,643 Gemfibrozil Phenobarbital
p450 2B1 Carnitine Palmitoyl Transferase 1
chemicalsgenes
Clofibrate Wyeth 14,643 Gemfibrozil Phenobarbital
p450 2B1 Carnitine Palmitoyl Transferase 1
X
Hamadeh et al. (2002) Toxicological sciences.
chemicalsgenes
Clofibrate Wyeth 14,643 Gemfibrozil Phenobarbital
p450 2B1 Carnitine Palmitoyl Transferase 1
Chemical X
+++
peroxisome proliferators
Hamadeh et al. (2002) Toxicological sciences.
chemicalsgenes
Beyond causal graphical models
• Prior knowledge produces expectations about:– types of entities– plausible relations– functional form
• This cannot be captured by graphical models
A theory consists of three interrelated components: a set of phenomena that are in its domain, the causal laws and other
explanatory mechanisms in terms of which the phenomena are accounted for, and the concepts in terms of which the phenomena
and explanatory apparatus are expressed. (Carey, 1985)
Component of theory:• Ontology• Plausible relations• Functional form
Generates:• Variables• Structure• Conditional probabilities
A causal theory is a hypothesis space generator
P(h|data) P(data|h) P(h)
Hypotheses are evaluated by Bayesian inference
Theory-based causal induction
• Ontology– Types: Chemical, Gene, Mouse
– Predicates:
Injected(Chemical,Mouse)
Expressed(Gene,Mouse)
Theory
E
CB
E = 1 if effect occurs (mouse expresses gene), else 0C = 1 if cause occurs (mouse is injected), else 0
• Plausible relations– For any Chemical c and Gene g, with prior probability p:
For all Mice m, Injected(c,m) Expressed(g,m)
Theory
P(Graph 1) = p P(Graph 0) =1 – p
No hypotheses with E C, B C, C B, ….
E
B C
E
B CB B
• Ontology– Types: Chemical, Gene, Mouse
– Predicates:
Injected(Chemical,Mouse)
Expressed(Gene,Mouse)
• Plausible relations– For any Chemical c and Gene g, with prior probability p :
For all Mice m, Injected(c,m) Expressed(g,m)
• Functional form of causal relations
Theory
Functional form
• Structures: 1 = 0 =
• Parameterization:
E
B C
E
B C
C B
0 01 00 11 1
1: P(E = 1 | C, B) 0: P(E = 1| C, B)
p00
p10
p01
p11
p0
p0
p1
p1
Generic
Functional form
• Structures: 1 = 0 =
• Parameterization:
E
B C
E
B C
w0 w1w0
w0, w1: strength parameters for B, C
C B
0 01 00 11 1
1: P(E = 1 | C, B) 0: P(E = 1| C, B)
0w1
w0
w1+ w0 – w1 w0
00w0
w0
“Noisy-OR”
• Ontology– Types: Chemical, Gene, Mouse
– Predicates:
Injected(Chemical,Mouse)
Expressed(Gene,Mouse)
• Constraints on causal relations– For any Chemical c and Gene g, with prior probability p:
For all Mice m, Injected(c,m) Expressed(g,m)
• Functional form of causal relations– Causes of Expressed(g,m) are independent probabilistic
mechanisms, with causal strengths wi. An independent background cause is always present with strength w0.
Theory
Evaluating a causal relationship
P(Graph 1) = p P(Graph 0) =1 – p
E
B C
E
B CB B
P(Graph 1|D) = P(D|Graph 1) P(Graph 1)
i P(D|Graph i) P(Graph i)
Humans
Bayesian
P
Causal power(Cheng, 1997)
Generativity is essential
• Predictions result from “ceiling effect”– ceiling effects only matter if you believe a cause increases the probability of an effect– follows from use of Noisy-OR (after Cheng, 1997)
P(e+|c+)P(e+|c-)
8/88/8
6/86/8
4/84/8
2/82/8
0/80/8
Bayesian10050
0
Noisy-AND-NOT• causes decrease
probability of their effects
Noisy-OR• causes increase
probability of their effects
Generic• probability
differs across conditions
Generativity is essential
Generativity is essential
Humans
Noisy-OR
Generic
Noisy AND-NOT
Manipulating functional form
Noisy-AND-NOT• causes decrease
probability of their effects
• appropriate for preventive causes
Noisy-OR• causes increase
probability of their effects
• appropriate for generative causes
Generic• probability
differs across conditions
• appropriate for assessing differences
Manipulating functional form
Noisy AND-NOTGenericNoisy-OR
Generative Difference Preventive
Causal induction from contingency data
• The simplest case of causal learning: a single cause-effect relationship and plentiful data
• Nonetheless, exhibits complex effects of prior knowledge (in the assumed functional form)
• These effects reflect appropriate causal theories
Outline
contingency data physical systems perceived causality
A B
The stick-ball machine
(Kushnir, Schulz, Gopnik, & Danks, 2003)
Inferring hidden causal structure
• Can people accurately infer hidden causal structure from small amounts of data?
• Kushnir et al. (2003): four kinds of structure
A causes B B causes A
common causeseparate causes
Inferring hidden causal structureCommon unobserved cause
4 x 2 x 2 x
(Kushnir, Schulz, Gopnik, & Danks, 2003)
A causes B B causes A
common causeseparate causes
Inferring hidden causal structureCommon unobserved cause
4 x 2 x 2 x
Independent unobserved causes
1 x 2 x 2 x 2 x 2 x
(Kushnir, Schulz, Gopnik, & Danks, 2003)
A causes B B causes A
common causeseparate causes
Inferring hidden causal structureCommon unobserved cause
4 x 2 x 2 x
Independent unobserved causes
1 x 2 x 2 x 2 x 2 x
One observed cause
2 x 4 x(Kushnir, Schulz, Gopnik, & Danks, 2003)
A causes B B causes A
common causeseparate causes
Common unobserved cause
Independent unobserved causes
One observed cause
Prob
abil
ity
Prob
abil
ity
Prob
abil
ity
separate common A causes B B causes A
separate common A causes B B causes A
separate common A causes B B causes A
A causes B B causes A
common causeseparate causes
• Ontology– Types: Ball, HiddenCause, Trial
– Predicates: Moves(Ball, Trial), Active(HiddenCause, Trial)
• Plausible relations– For any Ball a and Ball b (a b), with prior probability p:
For all Trials t, Moves(a,t) Moves(b,t)
– For some HiddenCause h and Ball b, with prior probability q: For all Trials t, Active(h,t) Moves(b,t)
• Functional form of causal relations– Causes result in Moves(b,t) with probability .
Otherwise, Moves(b,t) occurs with probability 0.
– Active(h,t) occurs with probability .
Theory
Hypotheses
()2 (1-) (1-) (1-) (1-)
2 (1-) (1-) (1-) (1-)
2 (1-) 0 (1-) (1-)
2 0 (1-) (1-) (1-)
Independent unobserved causes
One observed cause
Prob
abil
ity
Prob
abil
ity
Prob
abil
ity
separate common A causes B B causes A
separate common A causes B B causes A
separate common A causes B B causes A
Common unobserved cause
A causes B B causes A
common causeseparate causes
Other physical systems
From blicket detectors…
…to lemur colonies
Oooh, it’s a blicket!
Outline
contingency data physical systems perceived causality
Michotte (1963)
Affected by…– timing of events– velocity of balls– proximity
Nitro X
Affected by…– timing of events– velocity of balls– proximity
(joint work with Liz Baraff)
Test trials
• Show explosions involving multiple cans– allows inferences about causal structure
• For each trial, choose one of:– chain reaction– spontaneous explosions– other
• Ontology– Types: Can, HiddenCause– Predicates: ExplosionTime(Can), ActivationTime(HiddenCause)
• Constraints on causal relations– For any Can y and Can x, with prior probability 1:
ExplosionTime(y) ExplosionTime(x)– For some HiddenCause c and Can x, with prior probability 1:
ActivationTime(c) ExplosionTime(x)
• Functional form of causal relations– Explosion at ActivationTime(c), and after appropriate delay
from ExplosionTime(y) with probability set by . Otherwise explosions occur with probability 0.
– Low probability of hidden causes activating.
Theory
Using the theory
Using the theory
• What kind of explosive is this?
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
spon
tane
ity
vola
tili
tyra
te
Using the theory
• What kind of explosive is this?
• What caused what?
Using the theory
• What kind of explosive is this?
• What caused what?
• What is the causal structure?
Testing a prediction of the theory
• Evidence for a hidden cause should increase with the number of simultaneous explosions
• Four groups of 16 participants saw displays using m = 2, 3, 4, or 6 cans
• For each trial, choose one of:– chain reaction– spontaneous explosions– other coded for reference to hidden cause
2(3) = 11.36, p < .01
Number of canisters
Pro
babi
lity
of
hidd
en c
ause
Gradual transition from few to most identifying hidden cause
Further predictions
• Explains chain reaction inferences
• Attribution of causality should be sensitive to interaction between time and distance
• Simultaneous explosions that occur sooner provide stronger evidence for common cause
Three kinds of causal induction
contingency data physical systems perceived causality
more constrainedless constrained
prior knowledge+
statistical inference
more data less data
Combining knowledge and statistics
• How do people...– identify causal relationships from small samples?– learn hidden causal structure with ease?– reason about complex dynamic causal systems?
• Constraints from knowledge + powerful statistics
• Key ideas:– prior knowledge expressed in causal theory– theory generates hypothesis space for inference
Further questions
• Are there unifying principles across theories?
• Stick-balls:– Causes result in Moves(b,t) with probability .
Otherwise, Moves(b,t) occurs with probability 0.
• Nitro X:– Explosion at ActivationTime(c), and after appropriate delay
from ExplosionTime(y), with probability set by Otherwise explosions occur with probability 0.
Functional form
1. Each force acting on a system has an opportunity to change its state
2. Without external influence a system will not change its state
Further questions
• Are there unifying principles across theories?
• How are theories learned?
Learning causal theories Theory
OntologyPlausible relationsFunctional form
X Y
Z
X Y
Z
X Y
Z
X Y
Z
Hypothesis space
generates
Bayesianinference
Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0
...
Datagenerates
Learning causal theories Theory
OntologyPlausible relationsFunctional form
X Y
Z
X Y
Z
X Y
Z
X Y
Z
Hypothesis space
generates
Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0
...
Datagenerates
Learning causal theories Theory
OntologyPlausible relationsFunctional form
Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0
...
Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0
...
Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0
...
Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0
...
Bayesianinference
X Y
Z
X Y
Z
X Y
Z
X Y
Z
Hypothesis space
generates
Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0
...
Datagenerates
Further questions
• Are there unifying principles across theories?
• How are theories learned?
• What is an appropriate prior over theories?
Causal induction with rates
• Different functional form results in models that apply to different kinds of data
• Rate: number of times effect occurs in time interval, in presence and absence of cause
Does the electric field cause the mineral to emit particles?
• Ontology– Types: Mineral, Field, Time
– Predicates: Emitted(Mineral,Time), Active(Field,Time)
• Plausible relations– For any Mineral m and Field f, with prior probability p: For all Times t,
Active(f,t) Emitted(m,t)
• Functional form of causal relations– Causes of Emitted(m,t) are independent probabilistic mechanisms, with
causal strengths wi. An independent background cause is always present with strength w0.
– Implies number of emissions is a Poisson process, with rate at time t given by w0 + Active(f,t) w1.
Theory
)|( +ceRate)|( −ceRate
Humans
R
Bayesian
Causal induction with rates
Power (N = 150)
Learning causal theories
• T1: bacteria die at random
• T2: bacteria die at random, or in waves
P(wave|T2) > P(wave|T1)
• Having inferred the existence of a new force, need to find a mechanism...
Lemur colonies
A researcher in Madagascar is studying the effects of environmental resources on the location of lemur colonies. She has studied twelve different parts of Madagascar, and is trying to establish which areas show evidence of being affected by the distribution of resources in order to decide where she should focus her research.
(uniform)
Spread
Location
Ratio
Number
Change in... Human data
• Ontology– Types: Colony, Resource
– Predicates: Location(Colony), Location(Resource)
• Plausible relations– For any Colony c and Resource r, with probability p:
Location(r) Location(c)
• Functional form of causal relations– Without a hidden cause, Location(c) is uniform
– With a hidden cause r, Location(c) is Gaussian with mean Location(r) and covariance matrix
– Location(r) is uniform
Theory
Is there a resource?
C
x x xx xx x xx xNo: Yes:
uniformuniform
+regularity
sum over all structures
sum over all regularities
(uniform)
Spread
Location
Ratio
Number
Change in... Human data Bayesian
A B C E
1 0 0 0
0 1 0 0
0 0 1 1
1 1 1 1
Schulz & Gopnik (in press)
A B C E Biology
Ahchoo!
Ahchoo!
1 0 0 0
0 1 0 0
0 0 1 1
1 1 1 1
Schulz & Gopnik (in press)
Schulz & Gopnik (in press)
A B C E Biology Psychology
Ahchoo!
Ahchoo!
Eek!
Eek!
1 0 0 0
0 1 0 0
0 0 1 1
1 1 1 1
• A theory of sneezing– a flower is a cause with probability – no sneezing without a cause– causes each produce sneezing with probability
• A theory of fear– an animal is a cause with probability – no fear without a cause– a cause produces fear with probability
Common functional form
Common functional form
A B C E
1 0 0 0
0 1 0 0
0 0 1 1
1 1 1 1
• Children: choose just C, never just A or just B
Common functional form
A B C E
1 0 0 0
0 1 0 0
0 0 1 1
1 1 1 1
• Children: choose just C, never just A or just B
A B C
E
(1-)3 (1-)2
2(1-) 3
Common functional form
A B C E
1 0 0 0
0 1 0 0
0 0 1 1
1 1 1 1
• Children: choose just C, never just A or just B
• Bayes: just C is preferred, never just A or just B
(1-)2
2(1-) 3
Inter-domain causation
• Physical: noise-making machine– A & B are magnetic buttons, C is talking
• Psychological: confederate giggling– A & B are silly faces, C is a switch
• Procedure:– baseline: which could be causes?– trials: same contingencies as Experiment 3– test: which are causes?
(Schulz & Gopnik, in press, Experiment 4)
Inter-domain causation
• A theory with inter-domain causes– intra-domain entities are causes with probability 1
– inter-domain entities are causes with probability 0
– no effect occurs without a cause – causes produce effects with probability
• Lower prior probability for inter-domain causes (i.e. 0 much lower than 1)
A problem with priors?
• If lack of mechanism results in lower prior probability, shouldn’t inferences change?
• Intra-domain causes (Experiment 3):– biological: 78% took C – psychological: 67% took C
• Inter-domain causes (Experiment 4):– physics: 75% took C – psychological: 81% took C
A B C E
1 0 0 0
0 1 0 0
0 0 1 1
1 1 1 1
A B C
E
(1- 0)(1-1)2 0(1-1)2
01(1-1) 012
(1- 0)(1-1)1
(1-0)12
A B C E
1 0 0 0
0 1 0 0
0 0 1 1
1 1 1 1
0(1-1)2
01(1-1) 012
A B C E
1 0 0 0
0 1 0 0
0 0 1 1
1 1 1 1
0(1-1)2
01(1-1) 012
A direct test of inter-domain priors
• Ambiguous causes:– A and C together produce E– B and C together produce E– A and B and C together produce E
• For C intra-domain, choose C (Sobel et al., in press)
• For C inter-domain, should choose A and B
The plausibility matrix
Grounded predicates
Gro
unde
d pr
edic
ates
Plausibility of relationIdentifies plausiblecausal graphs
Injected(c1) 1 1 1Injected(c2) 1 1 1Injected(c3) 1 1 1Expressed(g1)Expressed(g2)Expressed(g3)
Inje
cted
(c1)
Inje
cted
(c2)
Inje
cted
(c3)
Exp
ress
ed(g
1)E
xpre
ssed
(g2)
Exp
ress
ed(g
3)
Entities: c1, c2, c3, g1, g2, g3Predicates: Injected, Expressed
M =
The Chomsky hierarchy
Languages
• Type 0 (computable)• Type 1 (context sensitive)• Type 2 (context free)• Type 3 (regular)
Machines
Turing machine
Bounded TM
Push-down automaton
Finite state automaton
(Chomsky, 1956)
Languages in each class a strict subset of higher classes
Grammaticality and plausibility
• Grammar:– indicates admissibility of (infinitely
many) sentences generated from terminals
• Theory:– indicates plausibility of (infinitely
many) relations generated from grounded predicates
sent
ence
s
pred
icat
es
predicates
Top Related