1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander...
-
Upload
milo-wheeler -
Category
Documents
-
view
218 -
download
0
Transcript of 1 Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander...
1
Causality challenge #2:Pot-Luck
Isabelle Guyon, ClopinetConstantin Aliferis and Alexander Statnikov, Vanderbilt Univ.
André Elisseeff and Jean-Philippe Pellet, IBM Zürich
Gregory F. Cooper, Pittsburg University
Peter Spirtes, Carnegie Mellon
2
Motivations
* Motivations * Learning causal structure ** Cross-sectional studies * … from experiments * … without experiments * Equivalent MB ** Longitudinal studies * Bring your own problem(s) *
3
Causality Workbench
• February 2007: Project starts. Initial funding of the EU Pascal network.
• August 15, 2007: Two-year grant from the US National Science Foundation.
• December 15, 2007: Workbench made alive. First causality challenge: causation an prediction.
• June 3-4, 2008: WCCI 2008, workshop to discuss the results of the first challenge.
• September 15, 2008: Start pot-luck challenge. Target: NIPS 2008.
• Fall, 2008: Start developing an interactive workbench.
4
Why a new challenge?
• Causality challenge #1– Favor “depth”
• Single well defined task
• Rigor of performance assessment
• Causality challenge #2– Favor “breadth”
• Many different tasks
• Encourage creativity
5
http://clopinet.com/causality
5
6
artif
Pot-Luck challenge
• CYTO: Causal Protein-Signaling Networks in human T cells. Learn a protein signaling network from multicolor flow cytometry data. N=11 proteins, P~800 samples per experimental condition. E=9 conditions.
• LOCANET: LOcal CAusal NETwork. Find the local causal structure around a given target variable (depth 3 network) in REGED, CINA, SIDO, MARTI.
• PROMO: Simulated marketing task. Time series of 1000 promotion variables and 100 product sales. Predict a 1000x100 boolean influence matrix, indicating for each (i,j) element whether the ith promotion has a causal influence of the sales of the jth product. Data is provided as time series, with a daily value for each variable for three years.
• SIGNET: Abscisic Acid Signaling Network. Determine the set of 43 boolean rules that describe the interactions of the nodes within a plant signaling network. 300 separate Boolean pseudodynamic simulations of the true rules. Model inspired by a true biological system.
• TIED: Target Information Equivalent Dataset. Illustrates a case in which there are many equivalent Markov boundaries. Find them all.
self eval
self eval
real
real
artif
artif
artif
7
Learning causal structure
* Motivations * Learning causal structure ** Cross-sectional studies * … from experiments * … without experiments * Equivalent MB ** Longitudinal studies * Bring your own problem(s) *
8
What is causality?
• Many definitions.• Pragmatic (engineering) view: predicting
the consequences of ACTIONS.• Distinct from making predictions in a
stationary environment.• Canonical methodology: designed
experiments.• Causal discovery from observational data.
9
The “language” ofcausal Bayesian networks
• Bayesian network:– Graph with random variables X1, X2, …Xn as
nodes.– Dependencies represented by edges.– Allow us to compute P(X1, X2, …Xn) as
i P( Xi | Parents(Xi) ).
– Edge directions have no meaning.
• Causal Bayesian network: egde directions indicate causality.
10
Lung Cancer
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
Fatigue
LUCAS0: natural
Small example
Markov boundary
11
Arrows indicate “mechanisms”
If Lung Cancer (LC) is determined by Smoking (S) and Genetics (G),
• In the language of BN, use the data table:P(LC=1| S=1, G=1)=… , P(LC=0| S=1, G=1)=…P(LC=1| S=1, G=0)=… , P(LC=0| S=1, G=0)=…P(LC=1| S=0, G=1)=… , P(LC=0| S=0, G=1)=…P(LC=1| S=0, G=0)=… , P(LC=0| S=0, G=0)=…
• In the language of Structural Equation Models (SEM), use:
LC = f(S, G) + noisewhere usually f is a linear function.
12
Common simplifications
– Assume a Markov process– Assume a DAG– Assume causal sufficiency (no hidden common cause)
– Assume stability or faithfulness (no particular parameterization implying dependencies not reflected by the structure)
– Assume linearity of relationships– Assume Gaussianity of PDF’s– Discard relationships of low statistical significance– Focus on a local neighborhood of a target variable– Learn unoriented or partially oriented graphs– Assume uniqueness of the Markov boundary
13
How about time?
Cross-sectional study
0
9 4
11
61
10 2
3
7
5
8
14
How about time?
Cross-sectional study
0
9 4
11
61
10 2
3
7
5
8
01234567891011
01234567891011
01234567891011
01234567891011
Longitudinal study
15
Learning causal structurefrom “cross-sectional”
studies:
CYTOLOCANET
TIED
* Motivations * Learning causal structure ** Cross-sectional studies * … from experiments * … without experiments * Equivalent MB ** Longitudinal studies * Bring your own problem(s) *
16
Causal models as particular “generative models”
• Imagine we have “prior knowledge” about a few alternative plausible “causal models” (we basically know the architecture).
• Fit the parameters of the model to data.• Select the model based on goodness of fit
(score), perhaps penalizing higher complexity models.
• Could two models have identical scores?
17
Key types of causal relationships 1
Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
Fatigue
Lung Cancer
Smoking
Direct cause
18
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
Fatigue
Key types of causal relationships 2
Indirect cause (chain)AN LC | S
Lung Cancer
Anxiety
19
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Car Accident
Born an Even Day
Fatigue
Key types of causal relationships 3
Confounder (fork)YF LC | S
Lung Cancer
Yellow Fingers
20
How this might look in data
Lung cancer
Yellow Fingers
21
Simpson’s paradox
YF LC | S
How this might look in data
Non-smokingSmoking
Lung cancer
Yellow Fingers
22
Markov equivalence
X1 Y | X2
X1 YX2
X1 YX2
X1 YX2
P(X1, X2 , Y)
= P(X1 | X2 , Y) P(Y | X2) P(X2)
P(X1, X2 , Y)
= P(Y | X2 , X1 ) P(X2 | X1) P(X1)
P(X1, X2 , Y)
= P(X1 | X2 , Y) P(X2 | Y) P(Y)
23
Smoking Genetics
Coughing
AttentionDisorder
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
Fatigue
Key types of causal relationships 4
Collider (V-structure)
AL LC | C
Lung CancerAllergy
24
How this might look in data
Lung cancer
Allergy
25
How this might look in data
Lung cancer
Allergy
Coughing=1
Coughing=0
26
No Markov equivalence
Colliders (V-structures) : X1 Y | X2
X1 Y
X2
P(X1, X2 , Y) = P(X2 | X1,Y) P(X1) P(Y)
27
Structural methods
1. Build unoriented graph (using conditional independencies).
2. Orient colliders.3. Add more arrows
by constraint propagation without creating new colliders.
0
9 4
11
61
10 2
3
7
5
8
0
9 4
11
61
10 2
3
7
5
8
28
… towards CYTO:using experiments to
learn the causal structure
* Motivations * Learning causal structure ** Cross-sectional studies * … from experiments * … without experiments * Equivalent MB ** Longitudinal studies * Bring your own problem(s) *
29
Lung Cancer
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
Fatigue
Direct cause
Manipulating a single variable
1
Smoking manipulated (disconnected from its direct causes): remains predictive of LC.
30
Lung Cancer
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
Fatigue
Indirect cause
Manipulating a single variable
2
Anxiety manipulated: remains predictive of Lung Cancer.
31
Lung Cancer
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
Fatigue
Manipulating a single variable
3
Consequence of common
cause (correlated,
but not cause) Yellow Fingers manipulated: no longer predictive of LC.
32
Lung Cancer
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
Fatigue
Direct cause
Manipulating a single variable
4
Genetics manipulated: remains predictive of LC and AD.
?
33
Lung Cancer
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
Fatigue
Direct cause
Manipulating a single variable
5
Attention disorder manipulated: no longer predictive of Genetics.
34
MEK3/6
MAPKKK
PLC
Erk1/2
Mek1/2
Raf
PKC
p38
Akt
MAPKKK
MEK4/7
JNK
L
A
TLck
VAVSLP-76
RAS
PKA
1 2 3CD28CD3
PI3K
LFA-1
Cytohesin
Zap70
PIP3
PIP2
JAB-1
Activators
1.-CD3
2.-CD28
3. ICAM-2
4. PMA
5. 2cAMP
Inhibitors
6. G06976
7. AKT inh
8. Psitect
9. U0126
10. LY294002
10
5
46
7
9
8
The CYTO problem
Karen Sachs et al
35
… towards LOCANET:learning the causal structure without
experimentsto predict the
consequences of future actions.
* Motivations * Learning causal structure ** Cross-sectional studies * … from experiments * … without experiments * Equivalent MB ** Longitudinal studies * Bring your own problem(s) *
36
What if we cannot experiment?
• Experiments may be infeasible, costly or unethical
• Using only observations we may want to predict the effect of new policies.
• Policies may consist in manipulating several variables.
37
LUCAS1: manipulate
d
Lung Cancer
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
Fatigue
Manipulating a few variables
Markov boundary
38
Lung Cancer
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
Fatigue
LUCAS2: manipulate
d
Manipulating all variables
Markov boundary
39
Causality challenge #1:causation and prediction
• Task: Predict the target (e.g., Lung cancer) in “unmanipulated” or “manipulated” test data.
• Goals:– Introduce ML people to causal discovery problems.– Investigate ties between causation and prediction.
• Findings:– Participants used either causal or non-causal feature
selection.– Good causal discovery (feature set containing the
“manipulated” MB) correlated with good predictions.– However, some participants using non-causal feature
selection obtained good prediction results.
40
Causality challenge #2:The LOCANET problem
• Task: Find the local causal structure around a given target variable (depth 3 network) in REGED, CINA, SIDO, MARTI.
• Goal: Analyze more finely to which extent causal discovery methods recover the causal structure and how this affects predicting the target values.
41
TIEDEquivalent Markov
boundaries
* Motivations * Learning causal structure ** Cross-sectional studies * … from experiments * … without experiments * Equivalent MB ** Longitudinal studies * Bring your own problem(s) *
42
Equivalent Markov boundaries
Markov boundary
Many almost identical measurements of the same (hidden) variable can lead to many statistically
undistinguishable Markov boundaries.
Y
43
Target Information Equivalence (TIE)
Two disjoint subsets of variables V1 and V2 are Target Information Equivalent (TIE) with respect to target Y iff:
• V1Y
• V2Y
• V1Y | V2
• V2Y | V1
Alexander Statnikov & Constantin Aliferis
44
TIE Data (TIED)Exact equivalence
X2 X3 X11 Y
0
1
2
0
1
2
0
2
0
1
2
3 3 3
1
0
1
2
3
X1
3
Small example of the type of relationships implemented in TIED.The following TIE relations hold in the data:
TIEY(X1, X2) TIEY(X1, X3) TIEY(X1, X11)TIEY(X2, X3) TIEY(X2, X11)TIEY(X3, X11)
TIEX11(X1, X2) TIEX11(X1, X3) TIEX11(X2, X3)Notice that variables X1, X2, X3, X11, and Y are not deterministically related.
Alexander Statnikov & Constantin Aliferis
45
Learning causal structurefrom “longitudinal”
studies:
SIGNET PROMO
* Motivations * Learning causal structure ** Cross-sectional studies * … from experiments * … without experiments * Equivalent MB ** Longitudinal studies * Bring your own problem(s) *
46
SIGNET: a plant signaling network
• Plants loose water and take in carbone dioxide through microscopic pores.
• During drought, plant hormone abiscisic acid (ABA) inhibits pore opening (important for the genetic engineering of new drought resistant plants).
• Unraveling the ABA signal transduction network took years of research. A recent dynamic model synthesizes many findings (Li, Assmann, Albert, PLOS, 2006).
• The model is used by Jenkins and Soni to generate artificial data. The problem is to reconstruct the network from the data.
47
Abscisic Acid Signaling Network
Li, Assmann, Albert, PLOS, 2006
47
48
SIGNET: sample data10111011101011011011010010100010110000110011100001110111101101101111111011001011101011110001111011111010110110001101000111010101011000011101111101011011000110000111101010101100001110111110101101100011000011110101010
- Boolean model; asynchronous updates- 43 nodes- 300 simulations
Example of asynchronous updates for a
4-node network:
time
49
PROMO: simulated marketing task
• 100 products• 1000 promotions• 3 years of daily
data• Goal: quantify
the effect of promotions on sales
products
promotions
Jean-Philippe Pellet
50
PROMO: schematically…
The difficulties include:
- non iid samples
- seasonal effects
- promotions are binary, sales are continuous
- the problem is more quantifying the relationships than determining the causal skeleton
other
1000
100
51
Pot-luck challenge:Bring your own problem
* Motivations * Learning causal structure ** Cross-sectional studies * … from experiments * … without experiments * Equivalent MB ** Longitudinal studies * Bring your own problem *
52
From NIPS 2006 workshop…
1. 1. Predict the consequences of a manipulation (similar to a usual predictive modeling task, but the test data is no longer distributed in the same way as the training data; the system undergoes a manipulation to produce the test data).
2. 2. Determine what manipulations are needed to reach a desired system state with maximum probability (e.g., select variables and propose values to achieve a certain value of a response/target variable, with perhaps a cost per variable).
3. 3. Propose system queries to acquire more training data, i.e. design experiments, with perhaps an associated cost per variable and per sample and perhaps with constraints on variables, which cannot be controllable.
4. 4. Determine all causal relationships between variables.5. 5. Determine a local causal region around a response/target variable (causal adjacency).6. 6. Determine the source cause(s) for a response/target variable.7. 7. Determine for all variables whether they are, with respect to a response/target variable: cause, effect,
consequence of a common cause, cause of a common effect, or unrelated.8. 8. Predict the existence of unmeasured variables (not part of the set of variables provided in the data),
which are potential confounders (are common causes of an observed variable and the target).9. 9. Predict which variables called “relevant” by feature selection algorithms are potentially causally
irrelevant because their correlation to the target is the result of an experimental artifact (e.g., sampling bias or systematic error).
10. 10. Determine a causal order of all variables.11. 11. Determine a causal direction in time series data in which one variable is causing the other.12. 12. Determine the direction of time in a time series (mostly of fundamental rather than practical interest).13. 13. Incorporate prior knowledge in causal discovery.14. 14. Predict counterfactuals.
53
http://clopinet.com/causality
• September 15, 2008: challenge start. • October 15, 2008: deadline for (optional)
submission of milestone challenge results.• October 24, 2008: workshop abstracts due.• November 12, 2008: challenge ends (last day to
submit challenge results).• November 21, 2008: JMLR proceedings paper
submission deadline.• December 12, 2008: challenge results publicly
released; workshop.
54
Prizes
• Four prizes (free NIPS workshop entrance or $200). – Best solution to one or more problems: 3 prizes.– Best problem:1 prize.
• All competitors must submit a 6-page paper.• Criteria: performance/usefulness,
novelty/originality, sanity, insight, reproducibility, clarity.