1 Causal Directed Acyclic Graphs (DAG) (Causal Diagrams) 2013 Eyal Shahar, MD, MPH Professor.

Post on 12-Jan-2016

226 views 2 download

Tags:

Transcript of 1 Causal Directed Acyclic Graphs (DAG) (Causal Diagrams) 2013 Eyal Shahar, MD, MPH Professor.

1

Causal Directed Acyclic Graphs (DAG)(Causal Diagrams)

2013

Eyal Shahar, MD, MPHProfessor

2

What is a causal diagram?

Components Variables Unidirectional arrows

A C

D

E

B

3

Rules: displaying variables

Called “nodes” or “vertices”

Should be clearly understood by others

Variables, not values of variables “Smoking status” is okay; “smoking” is not

Displayed along the time axis (left to right) but sometimes we ignore this rule

4

Rules: drawing arrows

An arrow From a postulated cause to its

postulated effect

No bidirectional arrows

An arrow with a question mark The research question at hand

An arrow without a question mark Background theory or axiomatic

A B

A B

A B

C

?

5

Rules: drawing arrows

Directed Acyclic Graph Circularity does not exist A future effect cannot be a

cause of its cause in the past

So-called “circularity” Directed acyclic graph with

time-indexed variables

A B C

At=1 Bt=2 At=3 Bt=4

6

R1

D1

S1

D2

T

R2

D1dx D2

dx

S2

I1 I2R=reflux

S=symptoms

T=treatment

I=imaging

D=esophagus status

Ddx=diagnosed esophagus status

Example: a causal diagram for gastroesophageal reflux and esophageal disease

?

7

How does a causal diagram help in research?

Decodes causal assertions All of science is about causation!

Clarifies our wordy or vague causal thoughts about the research topic

Connects “association” with “causation”

Helps us decide which covariates should enter the statistical model—and which should not

Unifies our understanding of confounding bias, colliding bias, information bias (and three other, less well, known biases)

Can depict and explain all types of bias

8

PubMed search (through 2012)

“Causal diagrams”: 83 titles

“Directed acyclic graph”: 137 titles (some irrelevant)

Still not widely known

Rarely used

9

Some references

Pearl J. Causality: models, reasoning, and inference. 2000. Cambridge University Press (2009, second edition)

Greenland S et al. Causal diagrams for epidemiologic research. Epidemiology 1999;10:37-48

Robins JM. Data, design, and background knowledge in etiologic inference. Epidemiology 2001;11:313-320

Hernan MA et al. A structural approach to selection bias. Epidemiology 2004;15:615-625

Shahar E, Shahar DJ. Causal diagrams, information bias, and thought bias. Pragmatic and Observational Research 2010:1;33-47

Shahar E, Shahar DJ: Causal diagrams and three pairs of biases. In: Epidemiology –Current Perspectives on Research and Practice (Lunet N, Editor). www.intechopen.com/books/epidemiology-current-perspectives-on-research-and-practice, 2012:pp. 31-62 (reading material for this module)

10

A natural path between two variables

Formally: a sequence of arrows, regardless of their direction, that connects two variables (and does not pass more than once through each variable)

Informally: “can walk from A to Z, or from Z to A, on bridges”

A B C D E Z

A B C D Z

A B C D Z

A Z

11

Types of natural paths between two variables

Causal paths Confounding paths Colliding paths

12

A causal path between two variables (also called “directed path”)

A natural path between A and Z, in which all the arrows point in the same direction (hence, “directed path”)

“A is a cause of Z” or “Z is a cause of A

A B C Z

A ZA Z

A B Z

A B Z

C D

13

“Direct” versus “indirect” causal path

“Direct” is often (maybe always) over-simplification Is it really direct? No intermediary exists?

Better terminology: “causal paths in which no intermediary variables are known or displayed”

Overall (total) effect: by all directed paths (combined)

A B Z

“direct” causal path

Indirect causal path

14

A confounding path between two variables

A natural path between A and Z that contains a shared cause of A and Z on this path (a confounder)

A Z

CX

A Z

C

A C Z A C X Z

Alternative display

15

A colliding path between two variables

A natural path between A and Z that contains at least two arrowheads that “collide” at some variable along this path (a collider on the path)

A Z

A ZC

K

L

M

A C Z A K M Z

Alternative display

L

Side point: collider (and confounder) are path-specific terms

1616

A variable called a collider (or a confounder) on one path need not be a collider (or a confounder) on another path

A

B D

Z

C

C is a collider on one path (ABCDZ) and a confounder on another path (ACZ)

Identify and name each natural path between A and Z

17

P

A Z

K

Q

L M

R

S

18

A bridge to “association”

What is “association”? Mathematical phenomenon Ability to guess the value of one variable based on the value of

another variable

Are there “spurious associations”? Mathematical relation between variables is never “spurious” Poor word choice “The association of A with Z is spurious.” What does the writer have

in mind, though?

What creates associations? A causal structure

19

A bridge between natural paths and associations

Which natural paths between A and Z contribute to the marginal (crude) association between A and Z?

Causal paths

Confounding paths

Which natural paths between A and Z do not contribute to an association between A and Z?

Colliding paths

Open paths

Blocked paths

20

Identify open paths and blocked paths (between A and Z) in this diagram

A C Z

B

A Z

B

A Z

Open paths

A C Z

B

A C Z

Blocked paths

A C Z

B

D

A Z

D

21

When does an association between A and Z reflect the effect of A on Z?

When only causal paths contribute to the association between A and Z

When confounding paths do not exist, or are somehow blocked Almost true: not a sufficient condition

22

How do we block a confounding path?

By conditioning on some variable along the path

What is “conditioning” on a variable?

Restricting the variable to one of its values Various forms of “adjustment”

Standardization Stratification and a weighted average (Mantel-Haenszel) Adding an independent variable to a regression model

23

Conditioning on a variable…

Dissociates a variable from its causes and its effects

Turns an open natural path into a blocked path

A

B

C

X

Y

Z

V

A V Z A V Z

24

Deconfounding = blocking a confounding path

A Z

C

A Z

C

X

A Z

C

XBut what if?

??

?

25

Induced paths Conditioning on a collider creates (or contributes to)

the association between the colliding variables

A Z

A ZC

K

L

M

Why? Later…

26

Induced paths

An induced path may contain Only dashed lines Dashed lines and arrows Colliders

An induced path may be blocked or open An induced path is blocked

If there is at least one collider on the path An induced path is open

If there are no colliders on the path

27

Blocked induced paths

A

B

C E Z

D

Blocked natural path

A

B

C E Z

D

Blocked induced path

A

B

C E Z

D

Blocked natural path

A

B

C E Z

D

Blocked induced path

28

Open induced paths

Blocked natural path Open induced path

A B Z

C

A B Z

C

Blocked natural path Open induced path

A

B

C E Z

D

A

B

C E Z

D

29

Confounding bias and colliding bias

A confounding path contributes to the (marginal) association between A and Z This unwanted contribution is called confounding bias

An open induced path contributes to the (conditional) association between A and Z This unwanted contribution is called colliding bias

30

Can we block an open induced path? --Yes

A B Z

C

A

B

C E Z

D

We can eliminate these paths by conditioning on C

Open induced paths

A B Z

C

A

B

C E Z

D

31

Key questions

Why does a collider block a path? Why don’t we observe an association between colliding variables?

Why does conditioning on a collider create an association between the colliding variables?

Blocked path Open induced path

A

C

Z A

C

Z

32

Intuitive explanation

A sample of N patients Variables

M: meningitis status (yes, no) S: stroke status (yes, no) V: vital status (alive, dead)

Assume: causal reality is fully described in the diagram

M

V

S

33

Is there a marginal (crude) association between meningitis status and stroke status?

No, we cannot guess stroke status from meningitis status (or vice versa)

Intuition: a common effect (vital status) cannot induce an association between its (past) causes

There is no transfer of guesses across a collider A colliding path is a blocked path

34

Suppose we condition on V (vital status)…

Stratum 1 (V=alive) Stratum 2 (V=dead)

Alive patients Dead patientsPt

Stroke status

Vital status

Meningitis status

1 No Alive ?

PtStroke status

Vital status

Meningitis status

2 No Dead ?

My guess: “No” My guess: “Yes”

We can make some guesses after conditioning M (meningitis status) and S (stroke status) are

associated within the strata of V (the collider)

35

Before and after conditioning…

Blocked path Open induced path

M

V

S M

V

S

36

Theorem and implications

Theorem Colliding variables will be associated within at least one

stratum of their collider

Implications a Mantel-Haenszel summary measure of association will

differ from the crude, if we summarize across a collider A regression coefficient will change if we “adjust” for a

collider

37

Goal: estimate a measure of effect (causation) by a measure of association

Association is estimating causation (AZ) when: The association between A and Z is due only to AZ

direct and indirect paths combined

Methods Display variables and causal assumptions in a causal diagram Block all confounding paths between A and Z Do not create open induced paths between A and Z

or eliminate them, if created

38

Confounding bias (again)

The most widely known Historical definitions and identification methods

“Lack of exchangeability” “Mixed effects” “Non-collapsibility” “Change-in-estimate”

A fair amount of confusion

A Z

C

?

The basic causal structure

39

So what is a confounder? A confounder is a common cause of the exposure (A) and the

disease (Z)

A B C D Z

Confounder

A

B

C

D

Z

Note: we can block the path by conditioning on B or C or D.

40

Endless complexity

E0 E1

D1

E−1

D2D0

E−2E−3

D−1D−2

Q0Q−1Q−2Q−3

Exposure: E0 (baseline exposure)Disease: D2 (follow-up)Question: Which is the confounder?

41

Colliding bias Formerly known as “selection bias” Confusing names and types

“No representativeness” “Biased sample” “Convenient sampling” “Control-selection bias” “Survival bias” “Informative censoring”

The basic causal structure

A Z

C

?

42

But there are many more versions

A Z

X Y

C

? A Z

X

C

?

A Z

X

C

? A Z C

43

Confounder versus collider

A Z

Confounder

Collider

44

A

confounding bias and colliding bias: an antithetical pair

Z

C A Z

C?

?

A Z

C

?

A Z

C

?

Bias No bias

BiasNo bias

Confounder Collider

45

Even more impressive in text…

Confounder ColliderMain attribute common cause common effect

Association contributes to the association between its

effects

does not contribute to the association between its causes

Type of path open path blocked path

Effect of conditioning

blocked path open path

Bias before conditioning?

Yes, confounding bias

No

Bias after conditioning?

No Yes, colliding bias

46

What is selection bias?

A type of colliding bias

Should be called “sampling colliding bias”

47

Types of colliding bias

Sampling colliding bias Every study is restricted to selected people Inevitable conditioning on “selection status” (S) Sometimes, this unavoidable conditioning creates colliding bias

Analytical colliding bias Restricted analysis: computing association for one stratum of a

collider Stratified analysis: computing association for each stratum of a

collider Adjustment by analysis

Computing a weighted average across the collider Adding the collider to a regression model, as a covariate

48

Sampling colliding bias: a wrong sampling decision

What happens if we estimate the effect of marital status (A) on dementia status (Z) in a sample of nursing home residents? Restricting recruitment to nursing home residents

Assumptions No effect of A on Z Both variables affect “place of residence” (P)

(nursing home or elsewhere)

49

Causal diagrams

(marital status)

A

(dementia status)

Z

P

(marital status)

A

(dementia status)

Z

S S

P P P

(Selection status)

50

Sampling colliding bias: a wrong sampling decision

What happens if we estimate the effect of coughing status (A) on abdominal pain status (Z) in a sample of hospitalized patients? Restricting recruitment to hospitalized patients

Assumptions Displayed in the diagram (next slide) H is hospitalization status

51

Causal diagram

pneumonia status

ulcer status

A (coughing

status)

Z (abdominal pain

status)

?

HH H

H

S

52

Basic causal diagrams for every case-control study

The key feature of a case-control study Disease status affects selection into the case-control sample Diseased people are much more likely to be selected than disease-free

people

A Z

S(selection status)

?A Z

S

?

No bias, unless we mistakenly create an open path between A and S!

53

Sampling colliding bias: a wrong sampling decision

Research question: What is the effect of smoking status (A) on cancer status (Z)?

Design: Hospital-based case-control study

Controls: patients with cardiovascular disease (CVD)

54

Causal diagram: smoking and cancer

(smoking status)

A

(cancerstatus)

Z

CVD status

S

Sampling decision for controls

?

Always exists in a case-control study

Note: CVD and Z collide at S

Background knowledge

55

Colliding bias (AKA control selection bias)

(smoking status)

A

(cancerstatus)

Z

S

?

CVD status

56

Sampling colliding bias:Willingness to participate in a case-control study

(smoking status)

A

(cancerstatus)

Z

Willing to participate?

S

?

Background knowledge

57

Control (or case) selection bias

Two main mechanisms

A Z?

S

B

A Z

SB Sampling/participation

of controls (or cases)

Remember: ZS always exists We always condition on S

Sampling/participationof controls (or cases)

58

Types of colliding bias

Sampling colliding bias Every study is restricted to selected people Inevitable conditioning on “selection status” (S) Sometimes, this unavoidable conditioning creates colliding bias

Analytical colliding bias Restricted analysis: computing association for one stratum of a

collider Stratified analysis: computing association for each stratum of a

collider Adjustment by analysis

Computing a weighted average across the collider Adding the collider to a regression model, as a covariate

59

Analytical colliding bias: restricted analysis

Research question: what is the effect of dietary fibers on colon polyp?

Design: a cross-sectional study

Analysis: restricted to people who have not developed yet colon cancer

60

Causal diagram

(Dietary fibers)A

(Colon polyp status) Z

Colon cancer status

?

Note: A and Z collide at colon cancer status

Assumed knowledge Assumed knowledge

61

Analytical colliding bias

(Dietary fibers)A

(Colon polyp status) Z

Colon cancer status

?

Despite “intuition” we should not restrict the sample tocancer-free people

62

Analytical colliding bias: adjustment

“We adjusted for everything, but the kitchen sink”

Traditional steps Add a laundry list of covariates to the regression model See what happens to the exposure coefficient Use the “change-in-estimate” method

Change in the coefficient = Evidence for confounding Report the “adjusted” coefficient as a better (less

confounded) measure of effect

Prone to colliding bias

63

Analytical colliding bias

Research question: what is the overall effect of gender on blood pressure?

Design: a cross-sectional study

Analysis Crude mean difference in systolic blood pressure “Adjusted” mean difference (conditioned on waist

circumference)

64

Results

Analysis Mean SBP men

(mmHg)

Mean SBP women (mmHg)

Mean difference (mmHg)

Crude 123.8 122.1 1.7

“Adjusted” for waist

circumference

-3.1

Why do the estimates differ? Which estimate should be reported? Is the adjusted estimate less biased?

65

Is abdominal fat (measured by waist circumference) a confounder?

(abdominal fat)C

A (Gender)

Z (Blood pressure)

No!

66

Revised diagram(abdominal fat)

C

A (Gender)

Z (Blood pressure)

No need to “adjust for” abdominal fat “Adjustment” could have:

blocked a causal path created colliding bias

67

Could have blocked a causal path…

(abdominal fat)C

A (Gender)

Z (Blood pressure)

68

Could have created colliding bias…

U

(abdominal fat)C

A (Gender)

Z (Blood pressure)

69

Advice on multivariable regression

Do not adjust for an effect of the exposure

Do not adjust for an effect of the outcome

Select covariates according to theory (causal diagram), not mechanistically (change in estimate, stepwise regression)

“Every variable is adjusted for all others” is almost always false Confounding is not a reciprocal property

70

Key points

The essence of epidemiology (and all of science) is causal theories

Your theories (about causation) are not “A is associated with Z” “Possessing a cigarette lighter is associated with lung

cancer” is true, but who cares? That’s not causal knowledge

Your theories about bias are not “intuition” about bias; they are causal theories, too.

Almost every theory in science is about causation, which means an arrow between variables

71

Key points

Magnitude of bias is more important than merely its presence Small bias may be ignored Magnitude of bias may be difficult to estimate

The bias-variance tradeoff