Philip Dawid Statistical Laboratory University of...

49
Statistical Causality Philip Dawid Statistical Laboratory University of Cambridge

Transcript of Philip Dawid Statistical Laboratory University of...

Page 1: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Statistical Causality

Philip DawidStatistical Laboratory

University of Cambridge

Page 2: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Statistical Causality

1. The Problems of Causal Inference2. Formal Frameworks for Statistical Causality3. Graphical Representations and Applications4. Causal Discovery

Page 3: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

3. Graphical Representations and Applications

Page 4: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Graphical Representation

• Certain collections of CI properties can be described and manipulated using a DAG representation– very far from complete

• Each CI property is represented by a graphical separation property – d-separation– moralization

Page 5: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Moralization: 1

Page 6: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Moralization: 1

Page 7: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Moralization: 2

Page 8: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Moralization: 3

Page 9: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Extended Conditional Independence

Distribution of Y | T the same in observational and experimental regimes:

Y | (FT , T) does not depend on value of FT

Can express and manipulate using notation and theory of conditional independence:

(even though FT is not random)

Page 10: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

T |

(FT = ∅) ∼

PT

TFT Y0/1/∅ 0/1

Y |

T

Augmented DAG

Absence of arrow FT → Y expresses

– with random variables and intervention variables– probabilistic (not functional) relationships

Page 11: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Y

U

TFT

• Treatment assignment ignorable given U– (generally) not marginally ignorable

• If U is observed, can fit model (e.g. regression) for dependence of Y on (T,U)– causally meaningful

Sufficient Covariate “(un)confounder”

Page 12: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Can estimate ACE:

Similarly, whole interventional distribution:

Sufficient covariate “(un)confounder”

(“back-door” formula)

Page 13: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

T Y

U

Non-confounding

ab

Ignorable marginally if either a or b is absent:Treatment assignment ignorable given U

FT

a“randomization”

b“irrelevance”

–then need not even observe U

Page 14: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Pearlian DAG

• Envisage intervention on every variable in the system

• Augmented DAG model– but with intervention indicators implicit

• Every arrow has a causal interpretation

Page 15: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

A B

C D

E

Pearlian DAG

Page 16: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

A B

C D

E

FA FB

FC FD

FE

Intervention DAG

Page 17: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Intervention DAGA B

C D

E

FA FB

FC FD

FE

• e.g.,• When E is not manipulated, its conditional

distribution, given its parents C, D is unaffected by the values of A, B and by whether or not any of the other variables is manipulated– modular component

Page 18: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

More complex DAGs

= treatment strategy

By d-separation:

(would fail if e.g. U → Y)

U

Y

A1

L

A2

(influence diagram)

Page 19: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

X

Instrumental Variable

Y

W

FX

Linear model: E(Y |

X=x, W, FX = x ) = f(W) + β

xSo E(Y |

FX = x) = E{f(W) | FX = x} + β

x

= α

+ β

x β

is causal regression coefficient

– but not estimable from observational data:E(Y |

X=x) = E{f(W) | X = x} + β

x

Page 20: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

X

Instrumental Variable

Y

Z

W

FX

–so can now identify β

Page 21: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

X

Discrete case

Y

W

FX

Can develop inequalities for ACEE(Y |

FX = 1) – E(Y |

FX = 0)

in terms of estimable quantities

X, Y, Z binary

Z

Page 22: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

X

Hypothesis Test

Y

W

FX

Z

Page 23: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

X

Mendelian Randomisation

Y

W

FX

X = serum cholesterolY = cancerW = diet, smoking, hidden tumour,…Z = APOE gene(E2 allele induces particularly low serum cholesterol)

Does low serum cholesterol level increase the risk of cancer?

Z

Page 24: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

X

Equivalence

Y

Z

W

FX

Z

X

U

W

YFX

causal?

Page 25: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

ZZ

Non-equivalence

X

U

W

YX Y

W

FZ FZ

FX FX

causal?

Page 26: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

• Model with domain and (explicit or implicit) intervention variables, specified ECI properties– e.g. augmented DAG, Pearlian DAG

• Observed variables V, unobserved variables U• Can identify observational distribution over V• Want to answer causal query, e.g. p(y |

FX = x)

– write as• When/how can this be done?

Can we identify a causal effect from observational data?

Page 27: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Example: “back-door formula”

Y

Z

XFX

Page 28: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Example: “back-door formula”

Y

Z1

XFX

Z5Z4

Z2

Z3

Z6

Works for Z = (Z3 , Z4 ), and also for Z = (Z4 , Z5 )

Page 29: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Example: “front-door formula”

ZX

U

YFX

Page 30: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

do-calculus

Page 31: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

For a problem modelled by a Pearlian DAG, the do-calculus is complete:

• We can tell whether a given causal effect is computable (from the observational distribution)

• Any computable causal effect can be computed by successive applications of rules 2 and 3– together with probability calculus, and property

(delete dotted arrows)

• There exist algorithms to accomplish this

do-calculus

Page 32: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

4. Causal Discovery

Page 33: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Probabilistic Causality

• Intuitive concepts of “cause”, “direct cause”,…• Principle of the common cause:

“Variables are independent, given their common causes”

• Assume causal DAG representation: – direct causes of V are its DAG parents– all “common causes” included

Page 34: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Probabilistic Causality

CAUSAL MARKOV CONDITION– The causal DAG also represents the observational

conditional independence properties of the variables• WHEN??• WHY??

• CAUSAL FAITHFULNESS CONDITION– No extra conditional independencies

• WHY??

Page 35: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Causal Discovery

• An attempt to learn causal relationships from observational data

• Assume there is an underlying causal DAG (possibly including unobserved variables) satisfying the (faithful) Causal Markov Condition

• Use data to search for a DAG representing the observational independencies model selection

• Give this a causal interpretation

Page 36: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Causal Discovery

• “Constraint-based”– Qualitative– Infer (patent or latent) conditional independencies

between variables– Fit conforming DAG model(s)

• Statistical model selection– Quantitative– General approach, applied to DAG models– Need not commit to one model (model uncertainty)

Two main approaches:

Page 37: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Constraint-Based Methods

• Identify/estimate conditional independencies holding between observed variables

• Assume sought-for causal DAG does not involve any variables other than those observed

(complete data)

Page 38: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Wermuth-Lauritzen algorithm

• Assume variables are “causally ordered” a priori: (V1 , V2 ,…, VN ), s.t arrows can only go from lower to

higher• For each i, identify (smallest) subset Si of

Vi-1 := (V1 , V2 ,…, Vi-1 ) such that

• Draw arrow from each member of Si to Vi

Page 39: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

SGS algorithm (no prior ordering)

1. Start with complete undirected graph over VN

2. Remove edges V–W s.t., for some S,

3. Orient any V–Z–W as V→Z←W if:– no edge V–W

– for each S ⊆

VN with Z ∈

S, 4. Repeat while still possible:

i. if V→Z –W but not V–W, orient as V→Z →Wii. If VÃW and V–W, orient as V→W

Page 40: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Comments• Wermuth-Lauritzen algorithm

– always finds a valid DAG representation– need not be faithful– depends on prior ordering

• SGS algorithm– may not succeed if there is no faithful DAG

representation– output may not be fully oriented– computationally inefficient (too many tests)– better variations: PC, PC*

Page 41: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Constraint-Based Methods (incomplete data)

• Allow now for unobserved (latent) variables• Can modify previous algorithms to work just

with conditional independencies between observed variables

• But latent CI has other (quantitative) implications too…

Page 42: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

– does not depend on a

A C

U

B D

No CI properties between observables A, B, C, D.

But

Discrete variables:

Page 43: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

No CI properties between observables X1 , X2 , X3 , X4 .

But

Normal variables:

X2X1

U

X4X3

Such properties form basis of TETRAD II program

Page 44: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Bayesian Model Selection

• Consider collection M = {M} of models• Have prior distribution M (M ) for parameter M

of model M• Based on data x, compute marginal likelihood for

each model M: LM = ∫

p(x |

M ) dM

• Use as score for comparing models, or combine with prior distribution {wM } over models to get posterior:

w*M ∝

wM LM

Page 45: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

• Algebraically straightforward for discrete or Gaussian DAG models, parametrised by parent-child conditional distributions, having conjugate priors (with local and global independence) Zoubin Ghahramani’s lectures

• Can arrange hyperparameters so that indistinguishable (Markov equivalent) models get same score

Bayesian Model Selection

Page 46: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Mixed data

• Data from experimental and observational regimes

• Model-selection approach: – assume Pearlian DAG– ignore local likelihood contribution when the

response variable is set• Constraint-based approach?

– base on ECI properties, e.g.

Page 47: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

A Parting Caution

“NO CAUSES IN, NO CAUSES OUT”

• We have powerful statistical methods for attacking causal problems

• But to apply them we have to make strong assumptions (e.g. ECI assumptions, relating distinct regimes)

• Important to consider and justify these in context– e.g., Mendelian randomisation

Page 48: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Thank you!

Page 49: Philip Dawid Statistical Laboratory University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/MLSS09_Dawid_2.pdf · “randomization” b “irrelevance” ... Two main approaches:

Further Reading• A. P. Dawid (2007). Fundamentals of Statistical Causality.

Research Report 279, Department of Statistical Science, University College London. 94 pp.http://www.ucl.ac.uk/Stats/research/reports/abs07.html#279

• R. E. Neapolitan (2003). Learning Bayesian Networks. Prentice Hall, Upper Saddle River, New Jersey.

• J. Pearl (2009). Causality: Models, Reasoning and Inference (second edition). Cambridge University Press.

• D. B. Rubin (1978). Bayesian inference for causal effects: the role of randomization. Annals of Statistics 6, 34–68.

• P. Spirtes, C. Glymour, and R. Scheines (2000). Causation, Prediction and Search (second edition). Springer-Verlag, New York.

• P. Suppes (1970). A Probabilistic Theory of Causality. North Holland, Amsterdam.