Lan Liu - University of...

52
Introduction to Causal Inference Lan Liu University of Minnesota at Twin Cities [email protected] 1

Transcript of Lan Liu - University of...

Introduction to Causal Inference

Lan Liu

University of Minnesota at Twin Cities

[email protected]

1

Table of contents

Causal ... or not?

How

Topics in Causal Inference

Tools we use...

Causal Inference in Industry

2

The Danger of Ice Cream

3

The Danger of Ice Cream

I “Confounding Bias”

4

The Danger of Ice Cream

I “Confounding Bias”

4

The Danger of Ice Cream

I “Confounding Bias”

4

Marriage vs Longivity

Science points to a very easy way to be happier, have less stress, reduceyour risk of dying from cancer and heart disease, and potentially livelonger:

Simply get married!!

I “Reverse Causality”

5

Marriage vs Longivity

Science points to a very easy way to be happier, have less stress, reduceyour risk of dying from cancer and heart disease, and potentially livelonger: Simply get married!!

I “Reverse Causality”

5

Marriage vs Longivity

Science points to a very easy way to be happier, have less stress, reduceyour risk of dying from cancer and heart disease, and potentially livelonger: Simply get married!!

I “Reverse Causality”

5

World War II

Abraham Wald (THE Wald as in Wald test)

I Britian vs Germany

I Bomber: cumbersome, easily hit by fighters

I Install armour: heavy

I Look at aircraft that had returned from missionsI add to the most hitted areas

I “Selection Bias”

6

World War II

Abraham Wald (THE Wald as in Wald test)

I Britian vs Germany

I Bomber: cumbersome, easily hit by fighters

I Install armour: heavy

I Look at aircraft that had returned from missionsI add to the most hitted areas

I “Selection Bias”

6

World War II

Abraham Wald (THE Wald as in Wald test)

I Britian vs Germany

I Bomber: cumbersome, easily hit by fighters

I Install armour: heavy

I Look at aircraft that had returned from missionsI add to the most hitted areas

I “Selection Bias”

6

How to Make Causal Inference

I Time machine ...

I Parallel universe

I Potential outcomes: Y0, Y1.I Individual causal effect Y1 − Y0

I Movies: Sliding Door, Mr. Nobody

7

How to Make Causal InferenceI Time machine ...

I Parallel universe

I Potential outcomes: Y0, Y1.I Individual causal effect Y1 − Y0

I Movies: Sliding Door, Mr. Nobody

7

How to Make Causal InferenceI Time machine ...

I Parallel universe

I Potential outcomes: Y0, Y1.I Individual causal effect Y1 − Y0

I Movies: Sliding Door, Mr. Nobody7

How to Make Causal Inference

Key: have control over intervention

Golden rule: randomization

8

Not so easy to randomize ...

I Randomization may be costly!

I E.g., google search story, try search: BMW, sun country, iphone

9

Not so easy to randomize ...

I Randomization may be costly!

I E.g., google search story, try search: BMW, sun country, iphone

9

Not so easy to randomize....

I People don’t listen....

I E.g., non-compliance −→ smaller treatment effect

I Confounding

I Ethical reasons: e.g., smoking vs lung cancer

10

Topics in Causal Inference

Measured confounding

I E.g., Study: working out vs body fat

I Subject matter knowledge: women differ from men!

I woman: gym goers vs non goers

I man: gym goers vs non goers

I Stratify on gender

I Better knowledge: not only gender, but also age, race, eating habitsmatter!

I Even better knowledge: what if genes also matter?!

I Only need to stratify on the value of propensity score, i.e.,Pr(go to gym|X ) −→ propensity score matching

11

Topics in Causal Inference

Measured confounding

I E.g., Study: working out vs body fat

I Subject matter knowledge: women differ from men!

I woman: gym goers vs non goers

I man: gym goers vs non goers

I Stratify on gender

I Better knowledge: not only gender, but also age, race, eating habitsmatter!

I Even better knowledge: what if genes also matter?!

I Only need to stratify on the value of propensity score, i.e.,Pr(go to gym|X ) −→ propensity score matching

11

Topics in Causal Inference

Measured confounding

I E.g., Study: working out vs body fat

I Subject matter knowledge: women differ from men!

I woman: gym goers vs non goers

I man: gym goers vs non goers

I Stratify on gender

I Better knowledge: not only gender, but also age, race, eating habitsmatter!

I Even better knowledge: what if genes also matter?!

I Only need to stratify on the value of propensity score, i.e.,Pr(go to gym|X ) −→ propensity score matching

11

Topics in Causal Inference

Measured confounding

I E.g., Study: working out vs body fat

I Subject matter knowledge: women differ from men!

I woman: gym goers vs non goers

I man: gym goers vs non goers

I Stratify on gender

I Better knowledge: not only gender, but also age, race, eating habitsmatter!

I Even better knowledge: what if genes also matter?!

I Only need to stratify on the value of propensity score, i.e.,Pr(go to gym|X ) −→ propensity score matching

11

Topics in Causal Inference

Measured confounding

I E.g., Study: working out vs body fat

I Subject matter knowledge: women differ from men!

I woman: gym goers vs non goers

I man: gym goers vs non goers

I Stratify on gender

I Better knowledge: not only gender, but also age, race, eating habitsmatter!

I Even better knowledge: what if genes also matter?!

I Only need to stratify on the value of propensity score, i.e.,Pr(go to gym|X ) −→ propensity score matching

11

Topics in Causal Inference

Unmeasured confounding

Y = β Xi + εi

Figure: Causal diagram of the confounding bias

I

β̂LS =∆y

∆x=

∆yx + ∆yε∆x

= β +∆yε∆x

I Biased!12

Topics in Causal Inference

Unmeasured confounding

Y = β Xi

Zi

+ εi

Figure: Causal diagram of the confounding bias

I One solution: Instrumental variable −→ Unbiased!

β̂IV =∆y

∆x=

∆yx∆x

= β

13

Topics in Causal Inference

Mediation

I Mediation: causal pathway, underlying mechanism

I E.g.,

Exercise Happy mood Health

Bone density

Heart rate

Figure: Causal diagram of the causal pathways from exercise to health

14

Topics in Causal Inference

Interference

I Interference: your outcome also depends on other people’streatment

I E.g., flu vaccine study −→ herd immunity

15

Topics in Causal Inference

Other topics includes:

I measurement error (surrogate)

I heterogeneity treatment effect

I graphical models

I ...

16

Tools we use...

Almost everything in statistics ...

I Multiple comparison

I Hypothesis testing

I Parametric modeling

I Semiparametric efficiency

I Nonparametric smoothing

I Structural modeling

I ...

Causal inference is a special type of statistics, where we care only certaintype of association, which is due to causation ...

17

Tools we use...

Almost everything in statistics ...

I Multiple comparison

I Hypothesis testing

I Parametric modeling

I Semiparametric efficiency

I Nonparametric smoothing

I Structural modeling

I ...

Causal inference is a special type of statistics, where we care only certaintype of association, which is due to causation ...

17

Do Industry ppl care?

Of course!

I Tech companies: e.g., facebook (interference), amazon, bing (causaleffect of advertisement)...

I Insurance companies: effect of training program for sales persons

I Finance: policy (e.g., increase interest rate) consequence

I Pharmaceutical companies: curing ppl, who are we curing ...

I Sports: effect of certain play strategy

I ...

18

Do Industry ppl care?

Of course!

I Tech companies: e.g., facebook (interference), amazon, bing (causaleffect of advertisement)...

I Insurance companies: effect of training program for sales persons

I Finance: policy (e.g., increase interest rate) consequence

I Pharmaceutical companies: curing ppl, who are we curing ...

I Sports: effect of certain play strategy

I ...

18

My recent research

Optimal Criteria to Exclude the Surrogate Paradox

19

Introduction

I What is surrogate?

Scapegoat

20

Introduction

I What is surrogate?

Scapegoat

20

Introduction

I In biomedical and econometric studies, the measurement of theprimary endpoint may be

I expensive

I inconvenient

I infeasible to collect in a practical length of time.

I Surrogate variables/ biomarkers are usually used as substitutes forthe primary outcomes.

I In cancer studies, the primary outcome is death;

I Surrogate: tumor shrinkage/ other laboratory measure −→ reduce thecost or the duration of the clinical trials

21

Horrible Consequences

I Eg 1., Lipid levels (especially total cholesterol levels) −→ predictors ofcardiovascular-related mortality.

I However, the use of cholesterol-lowering agents −→ increase in overallmortality (Gordon, 1995).

I Eg 2., Anti-arrythmia drug Tamnbocor −→ suppresses arrythmia −→death of over 50,000 people!!

22

Surrogate paradox

I The surrogate paradox: + treatment effect on the surrogate, +surrogate effect on the true endpoint ⇒ − treatment effect on thetrue endpoint.

I Even the sign of the treatment effect is hard to predict, not to saymagnitude!!!

I Happen even in randomize studies

T S Y

U

Figure: Causal diagram of the strong surrogate S for the effect of thetreatment T on outcome Y .

23

Methods

I Long story short: old methods all assume unverifiable assumptions,thus may not be practical to use

I We developed bounds for the treatment effect with surrogatewithout any unverifiable assumptions

I We used linear programming to solve this

I We show that it is not enough to avoid the surrogate paradox merelywith the ACE of surrogate on outcome being positive, instead, werequire its magnitude to pass certain positive threshold.

I Transportability; testability; optimality.

24

Excluding the Paradox

Figure: Partition of the parameter space of (δ0, δ1)

25

Statistical Analysis

Anti-hypertension Drugs

I Thus, we conclude that for evaluating the effect of anti-hypertensiondrug on the long-term death, using high blood pressure as asurrogate cannot guarantee the bounds to exclude null.

I That is, if the unmeasured confounders have certain value, it ispossible that the treatment has a possible effect in reducing the highblood pressure and lowering the high blood pressure could reducethe death rate, but the treatment could increase the death rate.

I Thus, for the development of such anti-hypertension drug, it isrecommended to also collect the information on the long-term deathrate.

26

Entry Level Causal References

I Book by Hernan and Robins https://www.hsph.harvard.edu/

miguel-hernan/causal-inference-book/

I Imbens, Guido W., and Donald B. Rubin. Causal inference instatistics, social, and biomedical sciences. Cambridge UniversityPress, 2015.

27

Edge cutting References

H. Chen, Z. Geng, and J. Jia.Criteria for surrogate end points.Journal of the Royal Statistical Society: Series B (StatisticalMethodology), 69:919–932, 2007.

M.G. Hudgens and M.E. Halloran.Causal vaccine effects on binary postinfection outcomes.Journal of the American Statistical Association, 101:51–64, 2006.

C. Ju and Z. Geng.Criteria for surrogate end points based on causal distributions.Journal of the Royal Statistical Society: Series B (StatisticalMethodology), 72:129–142, 2010.

W. Li, Y. Gu, L. Liu, and E. Tchetgen Tchetgen.Demystify the multiple robust estimators.In Preparation.

L. Liu and M.G. Hudgens.Large sample randomization inference with interference.Journal of the American Statistical Association, 109:288–301, 2014.

28

L. Liu, M.G. Hudgens, and S. Becker-Dreps.On inverse probability-weighted estimators in the presence ofinterference.Biometrika, 103:829–842, 2016.

L. Liu, W. Miao, B. Sun, J. Robins, and E. Tchetgen Tchetgen.Instrumental variable estimation of the marginal average effect oftreatment on the treated.Submitted, 2015.

L. Liu and E. Tchetgen Tchetgen.Indirect adjustment for homophily bias with a negative controlvariable in peer effect analysis.In Preparation.

B. Sun, W. Miao, L. Liu, J. Robins, and E. Tchetgen Tchetgen.Doubly robust instrumental variable estiamtion in missing not atrandom problems.Major revision at Statistical Sinica.

Y. Yin, L. Liu, and Z. Geng.Assessing the treatment effect heterogeneity with a latent variable.Statistical Sinica, 2017.

29

Y. Yin, L. Liu, Z. Geng, and P. Luo.Optimal criteria to exclude the surrogate paradox and sensitivityanalysis.Under Review at JASA, 2017.

30

Acknowledgments

Yunjian Yin

Thank you all for coming

Feel free to let me know if you encounter any causal problem in yourresearch

31

Excluding the Paradox

Figure: Partition of the parameter space of (δ0, δ1)

1

Notation

I Binary T treatment, Y primary outcome, S surrogate endpoint.

I U unmeasured confounder that affects both S and Y

I St the potential outcome of surrogate if T = t

I Yts the potential outcome if T = t and S = s

I We may also use the notation YT=t as the potential primaryoutcome when the intervention is only to set T = t

I Parameter of Interest:

ACE(T −→ Y ) = P(YT=1 = 1)− P(YT=0 = 1)

I Assumption 1. (Randomization) T⊥(Y00,Y01,Y10,Y11,S0,S1,U)

2

Optimality

Apart from the testability, our criterion also has the following optimality.

I DefinitionA criterion to exclude the surrogate paradox is optimal if

I (i) when the criterion is satisfied, the surrogate paradox is absent

I (ii) when the criterion is not satisfied, one can always find a datagenerating mechanism that yields the same observed data but suffersfrom the surrogate paradox.

I That is, one cannot exclude the possibility of surrogate paradoxaccording to the observed data.

3

Optimality

I Intuitively, an ideal criterion to exclude the surrogate paradox will bebased on a sufficient and “almost necessary” condition.

I The sufficiency gives the condition enough strength to rule outsurrogate paradox: if the condition is satisfied, the surrogate paradoxis guaranteed to be absent.

I The “almost necessity” gives the condition enough sharpness tohold as long as the observed data could rule out the possibility ofsurrogate paradox: if the condition fails, there exists a datagenerating process (a set of parameters) with surrogate paradox thatcan generate the same observed data.

4

Optimality

I The “almost necessity” differs from necessity in the sense that anecessary condition would require a criteria to rule out the possibilityof surrogate paradox whenever it is absent.

I Such necessity is impossible to achieve due to non-identification.

I More specifically, we can only identify a set of data-generatingprocess that is consistent with the observed data.

I If and only if none of these data generating mechanisms hassurrogate paradox, the criterion enable us to exclude surrogateparadox.

I The optimality requires a criterion to capture all the information inthe observed data to exclude the surrogate paradox.

5