Philip Dawid Statistical Laboratory University of...

Statistical Causality

Philip DawidStatistical Laboratory

University of Cambridge

Statistical Causality

1. The Problems of Causal Inference2. Formal Frameworks for Statistical Causality3. Graphical Representations and Applications4. Causal Discovery

3. Graphical Representations and Applications

Graphical Representation

• Certain collections of CI properties can be described and manipulated using a DAG representation– very far from complete

• Each CI property is represented by a graphical separation property – d-separation– moralization

Moralization: 1

Moralization: 2

Moralization: 3

Extended Conditional Independence

Distribution of Y | T the same in observational and experimental regimes:

Y | (FT , T) does not depend on value of FT

Can express and manipulate using notation and theory of conditional independence:

(even though FT is not random)

(FT = ∅) ∼

TFT Y0/1/∅ 0/1

Augmented DAG

Absence of arrow FT → Y expresses

– with random variables and intervention variables– probabilistic (not functional) relationships

• Treatment assignment ignorable given U– (generally) not marginally ignorable

• If U is observed, can fit model (e.g. regression) for dependence of Y on (T,U)– causally meaningful

Sufficient Covariate “(un)confounder”

Can estimate ACE:

Similarly, whole interventional distribution:

Sufficient covariate “(un)confounder”

(“back-door” formula)

Non-confounding

Ignorable marginally if either a or b is absent:Treatment assignment ignorable given U

a“randomization”

b“irrelevance”

–then need not even observe U

Pearlian DAG

• Envisage intervention on every variable in the system

• Augmented DAG model– but with intervention indicators implicit

• Every arrow has a causal interpretation

Pearlian DAG

Intervention DAG

Intervention DAGA B

• e.g.,• When E is not manipulated, its conditional

distribution, given its parents C, D is unaffected by the values of A, B and by whether or not any of the other variables is manipulated– modular component

More complex DAGs

= treatment strategy

By d-separation:

(would fail if e.g. U → Y)

(influence diagram)

Instrumental Variable

Linear model: E(Y |

X=x, W, FX = x ) = f(W) + β

xSo E(Y |

FX = x) = E{f(W) | FX = x} + β

is causal regression coefficient

– but not estimable from observational data:E(Y |

X=x) = E{f(W) | X = x} + β

Instrumental Variable

–so can now identify β

Discrete case

Can develop inequalities for ACEE(Y |

FX = 1) – E(Y |

FX = 0)

in terms of estimable quantities

X, Y, Z binary

Hypothesis Test

Mendelian Randomisation

X = serum cholesterolY = cancerW = diet, smoking, hidden tumour,…Z = APOE gene(E2 allele induces particularly low serum cholesterol)

Does low serum cholesterol level increase the risk of cancer?

Equivalence

causal?

Non-equivalence

causal?

• Model with domain and (explicit or implicit) intervention variables, specified ECI properties– e.g. augmented DAG, Pearlian DAG

• Observed variables V, unobserved variables U• Can identify observational distribution over V• Want to answer causal query, e.g. p(y |

FX = x)

– write as• When/how can this be done?

Can we identify a causal effect from observational data?

Example: “back-door formula”

Works for Z = (Z3 , Z4 ), and also for Z = (Z4 , Z5 )

Example: “front-door formula”

do-calculus

For a problem modelled by a Pearlian DAG, the do-calculus is complete:

• We can tell whether a given causal effect is computable (from the observational distribution)

• Any computable causal effect can be computed by successive applications of rules 2 and 3– together with probability calculus, and property

(delete dotted arrows)

• There exist algorithms to accomplish this

do-calculus

4. Causal Discovery

Probabilistic Causality

• Intuitive concepts of “cause”, “direct cause”,…• Principle of the common cause:

“Variables are independent, given their common causes”

• Assume causal DAG representation: – direct causes of V are its DAG parents– all “common causes” included

Probabilistic Causality

CAUSAL MARKOV CONDITION– The causal DAG also represents the observational

conditional independence properties of the variables• WHEN??• WHY??

• CAUSAL FAITHFULNESS CONDITION– No extra conditional independencies

• WHY??

Causal Discovery

• An attempt to learn causal relationships from observational data

• Assume there is an underlying causal DAG (possibly including unobserved variables) satisfying the (faithful) Causal Markov Condition

• Use data to search for a DAG representing the observational independencies model selection

• Give this a causal interpretation

Causal Discovery

• “Constraint-based”– Qualitative– Infer (patent or latent) conditional independencies

between variables– Fit conforming DAG model(s)

• Statistical model selection– Quantitative– General approach, applied to DAG models– Need not commit to one model (model uncertainty)

Two main approaches:

Constraint-Based Methods

• Identify/estimate conditional independencies holding between observed variables

• Assume sought-for causal DAG does not involve any variables other than those observed

(complete data)

Wermuth-Lauritzen algorithm

• Assume variables are “causally ordered” a priori: (V1 , V2 ,…, VN ), s.t arrows can only go from lower to

higher• For each i, identify (smallest) subset Si of

Vi-1 := (V1 , V2 ,…, Vi-1 ) such that

• Draw arrow from each member of Si to Vi

SGS algorithm (no prior ordering)

1. Start with complete undirected graph over VN

2. Remove edges V–W s.t., for some S,

3. Orient any V–Z–W as V→Z←W if:– no edge V–W

– for each S ⊆

VN with Z ∈

S, 4. Repeat while still possible:

i. if V→Z –W but not V–W, orient as V→Z →Wii. If VÃW and V–W, orient as V→W

Comments• Wermuth-Lauritzen algorithm

– always finds a valid DAG representation– need not be faithful– depends on prior ordering

• SGS algorithm– may not succeed if there is no faithful DAG

representation– output may not be fully oriented– computationally inefficient (too many tests)– better variations: PC, PC*

Constraint-Based Methods (incomplete data)

• Allow now for unobserved (latent) variables• Can modify previous algorithms to work just

with conditional independencies between observed variables

• But latent CI has other (quantitative) implications too…

– does not depend on a

No CI properties between observables A, B, C, D.

Discrete variables:

No CI properties between observables X1 , X2 , X3 , X4 .

Normal variables:

Such properties form basis of TETRAD II program

Bayesian Model Selection

• Consider collection M = {M} of models• Have prior distribution M (M ) for parameter M

of model M• Based on data x, compute marginal likelihood for

each model M: LM = ∫

M ) dM

• Use as score for comparing models, or combine with prior distribution {wM } over models to get posterior:

w*M ∝

• Algebraically straightforward for discrete or Gaussian DAG models, parametrised by parent-child conditional distributions, having conjugate priors (with local and global independence) Zoubin Ghahramani’s lectures

• Can arrange hyperparameters so that indistinguishable (Markov equivalent) models get same score

Bayesian Model Selection

Mixed data

• Data from experimental and observational regimes

• Model-selection approach: – assume Pearlian DAG– ignore local likelihood contribution when the

response variable is set• Constraint-based approach?

– base on ECI properties, e.g.

A Parting Caution

“NO CAUSES IN, NO CAUSES OUT”

• We have powerful statistical methods for attacking causal problems

• But to apply them we have to make strong assumptions (e.g. ECI assumptions, relating distinct regimes)

• Important to consider and justify these in context– e.g., Mendelian randomisation

Thank you!

Further Reading• A. P. Dawid (2007). Fundamentals of Statistical Causality.

Research Report 279, Department of Statistical Science, University College London. 94 pp.http://www.ucl.ac.uk/Stats/research/reports/abs07.html#279

• R. E. Neapolitan (2003). Learning Bayesian Networks. Prentice Hall, Upper Saddle River, New Jersey.

• J. Pearl (2009). Causality: Models, Reasoning and Inference (second edition). Cambridge University Press.

• D. B. Rubin (1978). Bayesian inference for causal effects: the role of randomization. Annals of Statistics 6, 34–68.

• P. Spirtes, C. Glymour, and R. Scheines (2000). Causation, Prediction and Search (second edition). Springer-Verlag, New York.

• P. Suppes (1970). A Probabilistic Theory of Causality. North Holland, Amsterdam.

Philip Dawid Statistical Laboratory University of...

Documents

Transcript of Philip Dawid Statistical Laboratory University of...

TTI UChicago June09 Workshop serializedweb.cse.ohio-state.edu/mlss09/mlss09_talks/6.june-SAT/... · 2009-06-29 · Outline SVM Clustering speculations, other problems wild speculations,

Spectral Graph Theory, Linear Solvers, and Applicationsweb.cse.ohio-state.edu/mlss09/mlss09_talks/9.june-TUE/GaryMiller.pdf · Spectral Graph Theory, Linear Solvers, and Applications

Active Learning and Selective Sensingweb.cse.ohio-state.edu/mlss09/mlss09_talks/4.june-THURS/MLSS_Nowak_talk.pdfConclusions MinimaxBounds selective sampling/querying can accelerate

Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/Dawid_1.pdf · Causal Queries • If I had taken aspirin half an hour ago, would my

Lecture 3 and 4: Gaussian Processes - University of Cambridgemlg.eng.cam.ac.uk/teaching/4f13/1516/lect0304.pdf · 2015-10-16 · Ghahramani Lecture 3 and 4: Gaussian Processes 17

CAUSALITY - University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_1.pdf · Statistical Causality 1. The Problems of Causal Inference 2. Formal Frameworks for Statistical

Graphical Models - University of Cambridgemlg.eng.cam.ac.uk/zoubin/talks/lect2gm.pdf · Graphical Models Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk

Learning Theory - University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/ShawTaylor_1.pdfLearning Theory John Shawe-Taylor Centre for Computational Statistics and Machine Learning

Kernel Methods and Support Vector Machinesweb.cse.ohio-state.edu/mlss09/mlss09_talks/1.june-MON/jst_tutorial.… · Kernel methods approach † Data embedded into a Euclidean feature

MLSS Tutorial on: Deep Belief Nets - University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/Hinton_1.pdf · MLSS Tutorial on: Deep Belief Nets (An updated and extended version

Penney Ante - University of Cambridgemlg.eng.cam.ac.uk/adrian/Penney.pdf · Penney’s game (nontransitive): for sequences of 3 tosses, the second player can always do better 4/8.

The GPML Toolbox version 3 - University of Cambridgemlg.eng.cam.ac.uk/carl/gpml/doc/manual.pdf · The GPML Toolbox version 3.5 Carl Edward Rasmussen & Hannes Nickisch December 8,

Unsupervised Learning - University of Cambridgemlg.eng.cam.ac.uk/zoubin/papers/ul.pdf · Unsupervised Learning ... thought of as ﬁnding patterns in the data above and beyond ...

Solving Challenging Non-linear Regression Problems by ...mlg.eng.cam.ac.uk/mlss09/mlss_slides/Rasmussen_1.pdf · Carl Edward Rasmussen Department of Engineering, University of Cambridge

Are You a Bayesian or a Frequentist?mlg.eng.cam.ac.uk/mlss09/mlss_slides/Jordan_1.pdf– natural in the setting of writing software that will be used by many people with many data

CAUSALITY - University of Cambridgemlg.eng.cam.ac.uk/zoubin/tut06/cambridge_causality.pdf · Spurious causality Eating makes you faithful Will he cheat? How to tell. Ladies, you probably

Lecture 3 and 4: Gaussian Processes - University of Cambridgemlg.eng.cam.ac.uk/teaching/4f13/1314/lect0304.pdf · 2014-01-29 · Rasmussen and Ghahramani Lecture 3 and 4: Gaussian

Unsupervised Learning - University of Cambridgemlg.eng.cam.ac.uk/zoubin/course05/lect9ms.pdf · • mixture of experts (Waterhouse, MacKay & Robinson, 1996) • hidden Markov models

Machine Learning and Cognitive Sciencemlg.eng.cam.ac.uk/mlss09/mlss_slides/mlss09... · – Learning about kinds of objects and their properties – Inferring causal relations –

A Bahadur Type Representation of the Linear Support Vector ...web.cse.ohio-state.edu/mlss09/mlss09_talks/8.june-MON/Lee.pdf · Regularity Conditions (A1) The densities f and g are