lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning...

77
Lecture 3, part A: Statistical Reasoning 2 1 Simple Logistic Regression Lecture 2

Transcript of lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning...

Page 1: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

1

Simple Logistic Regression

Lecture 2

Page 2: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

2

2

Learning Objectives

In this set of lectures we will develop a framework for simple logistic regression, a method for relating a binary outcome to a single predictor that can be binary, categorical or continuous

Page 3: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

3

Section A

Simple Logistic Regression With a Binary (or Categorical) Predictor

Page 4: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

4

4

Learning Objectives

Understand how logistic regression relates a function of the probability (proportion) of a binary outcome to a predictor via a linear equation

Interpret the resulting intercept and slope(s) from a logistic regression model in which the predictor of interest is binary or categorical

Page 5: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

5

5

The “Left Hand Side”

For logistic regression, the equation is a bit more convoluted than with linear regression : the regression models the log odds of a binary outcome(y) as a function of the predictor x

Where p = proportion (probability) of y=1

As noted in the previous section, x can be binary, nominal categorical or continuous

11o1ln x

pp

Page 6: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

6

6

The “Left Hand Side”

As with everything else we have done thus far, we will only be able to estimate the regression equation from a sample of data: to indicate the estimates, can write as:

In the next section, the reason for this choice of scaling will be explained

11oˆˆ

1ln x

pp

Page 7: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

7

7

The “Left Hand Side”

For a given value of x1, we can estimate the ln(odds) via the equation

The slope compared the ln(odds of y=1) for two groups who differ by one unit of x1, and hence is interpretable as a difference in ln(odds) between two groups

11oˆˆ

1ln x

pp

Page 8: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

8

8

The “Left Hand Side”

Difference in ln(odds)?

Page 9: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

9

9

Example 1: Breast Feeding and Sex

Data on anthropometric measures from a random sample of 236 Nepali children [0, 36) months old

Question: what is the relationship between breast feeding and sex of a child?

Data: Breast fed: 75% Sex: 52% female (1= male, 0 = female)

Page 10: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

10

10

Example 1: Breast Feeding and Sex

Notice: this equation is only estimating two values: ln(odds of being breast fed for males) and ln(odds of being breast fed for females)

For male children:

For female children

10101ˆˆ1ˆˆ1x :ln Odds

0101ˆ0ˆˆ0x :ln Odds

Page 11: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

11

11

Example 1: Breast Feeding and Sex

Interpretation: 1

Page 12: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

12

12

Example 1: Breast Feeding and Sex

Interpretation: o

Page 13: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

13

13

Example 1: Breast Feeding and Sex

The resulting equation

: the ln(odds ratio) of being breast fed for males to females is 0.002

: the ln(odds) of being breast fed for female children is 1.12

002.0ˆ1

12.1ˆ o

11oˆˆ

1ln x

pp

Page 14: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

14

14

Example 1: Breast Feeding and Sex

Results, antilogged

Page 15: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

15

Question

The coding choice for a binary predictor is completely arbitrary. For this breast feeding arm circumference and sex analysis, what would the values of and be if sex was coded as a 1 for females, and 0 for males?

15

o1

Page 16: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

16

16

Example 2: Respiratory Failure

Respiratory Failure and Gestational Age1

1 Respiratory Morbidity in Late Preterm Births: The Consortium on Safe Labor, JAMA. 2010;304(4):419-425

Page 17: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

17

17

Example 2: Respiratory Failure

Respiratory failure and gestational Age1

1 Respiratory Morbidity in Late Preterm Births: The Consortium on Safe Labor, JAMA. 2010;304(4):419-425

Gestational Age Percentage Total34 weeks 0.02 3,70035 weeks 0.03 5,47736 weeks 0.05 10,157

37‐40 weeks 0.90 165,993

Page 18: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

18

18

Example 2: Respiratory Failure

Even though the gestational age categories are ordinal, authors did not want to assume linearity of ln(odds) of respiratory failure and gestational age category

There are four categories: make one category the reference, and make binary xs indicators for the other 3. The authors used 37-40 weeks as the reference.

x1 = 1 if gestational age =34 weeksx2 = 1 if gestational age =35 weeksx3 = 1 if gestational age =36 weeks

Page 19: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

19

19

Example 2: Respiratory Failure

Respiratory failure and gestational age

332211oˆˆˆˆ

1ln xxx

pp

Page 20: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

20

20

Example 2: Respiratory Failure

In this model

40-37 agegest failure,y respirator of odds

34 agegest failure,y respirator of oddsln1

40-37 agegest failure,y respirator of odds

35 agegest failure,y respirator of oddslnˆ2

40-37 agegest failure,y respirator of odds

36 agegest failure,y respirator of oddslnˆ3

40-37 agegest failure,y respirator of oddslnˆ0

Page 21: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

21

21

Example 2: Respiratory Failure

Respiratory failure and gestational age

321 0.28.24.35.51

ln xxxp

p

Page 22: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

22

22

Example 2: Respiratory Failure

Respiratory failure and gestational age

321 0.28.24.35.51

ln xxxp

p

Page 23: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

23

23

Example 2: Summary

Logistic regression is a method for relating a binary outcome to a predictor x via a linear equation The predictor can be binary, categorical or continuous

The resulting linear equation relates the ln(odds) of the binary outcome to the predictor x

Slopes from logistic regression have ln(odds ratio) interpretation and can be exponentiated to estimate odds ratios

The intercept estimates the ln(odds) for the groups with x = 0

Page 24: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

24

Section B

Simple Logistic Regression With a Continuous Predictor

Page 25: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

25

25

Learning Objectives

Understand why transforming the estimated probability (proportion) that y=1 where y is a binary outcome is necessary to be able to properly estimate logistic regression equations

Use a lowess plot to get a snapshot of the relationship between the ln(odds of y=1) and the continuous predictor x1

Interpret the slope and intercept from simple logistic regression models

Translate the estimated slope into an estimated odds ratio

Page 26: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

26

26

Background: Underlying Model

Why model the ln(odds) as a linear function of x1?

Page 27: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

27

27

Background: Underlying Model

Equation for Pr(y = 1) – the proportion of subjects with y =1

e is the “natural constant” 2.718

p = probability (proportion) of y=1

.ˆˆ

ˆˆ

110

110

1 x

x

eep

Page 28: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

28

28

Example 1: Risk of Obesity and HDL

Data from 2009-10 NHANES1

Sample of over 6,400 US residents, 16-80 years old

HDL levels: mean 52.4 mg/dl, sd = 16, range 11-1415% of sample is obese by (BMI)

1 Data obtained via Hosmer, D.W., Lemeshow, S. and Sturdivant, R.X. (2013) Applied Logistic Regression: Third Edition.

Page 29: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

29

Example 1: Risk of Obesity and HDL

Question : does a line reasonably describe the general shape of the relationship between obesity and HDL?

We can estimate a line, using the computer

The line we estimate will be of the form:

Here: p is probability of being obese (proportion of individuals who are obese), for a given value of HDL cholesterol, x1

29

11ˆˆ

1ln x

pp

o

Page 30: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

30

Example 1: Risk of Obesity and HDL

This formulation makes a strong assumption about the nature of the relationship between the ln(odds) of obesity and HDL cholesterol

How to investigate this assumption?

30

11ˆˆ

1ln x

pp

o

Page 31: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

31

Example 1: Risk of Obesity and HDL

A smoothed scatterplot of estimated ln(odds) versus HDL

31

-8-6

-4-2

02

Estim

ated

ln(o

dds)

Of O

besi

ty

0 50 100 150HDL Cholesterol (mg/dL)

bandwidth = .5

Data from 2009-10 NHANESEstimated ln(odds) of Obesity By HDL Cholesterol Level

Page 32: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

32

Equation of regression line relating ln(odds) of obesity to HDL : from computer

Here, and

32

Example 1: Risk of Obesity and HDL

1033.00.05 obesity) of oddsln(1

ln xp

p

05.0ˆ o 033.0ˆ1

Page 33: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

33

Interpretation of

33

Example 1: Risk of Obesity and HDL

1

Page 34: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

34

So the odds ratio estimate is 0.967, or ≈ 0.97. The odds ratio of being obese for two groups of persons who differ by one mg/dL in HDL levels is 0.97, higher HDL to lower LDL In other words, higher HDL subjects (by one mg/dL) have 3%

lower odds of being obese when compared to the lower HDL subjects

This estimate is for any two groups who differ by one mg/dL in HDL in our the population from which the samples was taken 60 mg/dL to 59 mg/dL 44 mg/dL to 43 mg/dL Etc..

34

Example 1: Risk of Obesity and HDL

Page 35: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

35

Interpretation of

35

Example 1: Risk of Obesity and HDL

o

Page 36: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

36

What is odds ratio of being obese for persons with HDL of 100 mg/dLversus persons with HDL of 80 mg/dL?

36

Example 1: Risk of Obesity and HDL

Page 37: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

37

Data on a random sample of 192 Nepali Children between 1 and 3 years old (12-36 months) . Information includes breast feeding status at time of study (1 = yes, 0 = no), and age of the child in months

The following model can be used to estimate this breast feeding/age association

37

Example 2: Breast Feeding and Age

11o xββ)p1

pln(

Page 38: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

38

Example 2: Breast Feeding and Age

A smoothed scatterplot of estimated ln(odds) versus age (months)

38

-20

24

6ln

(odd

s) o

f Bei

ng B

reas

tfed

10 15 20 25 30 35 Age of Child (months)

bandwidth = .8

192 Nepalese Children 12-36 MonthsEstimated ln(odds) of Being Breastfed by Age

Page 39: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

39

Equation of regression line relating ln(odds) of being breast fed to age (months): from computer

Here, and

39

Example 2: Breast Feeding and Age

124.030.7 breastfed) being of oddsln(1

ln xp

p

30.7ˆ o 24.0ˆ1

Page 40: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

40

Interpretation of

40

Example 2: Breast Feeding and Age

1

Page 41: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

41

So the odds ratio estimate is 0.79: The odds ratio of being breastfed for two groups of children who differ by one month in age is 0.79, older to younger In other words, older children (by one month of age) have 21%

lower odds of being breast fed when compared to younger children

This estimate is for any two groups who differ by one month of age in the population of Nepalese children 12-36 months 15 months to 14 months 27 months to 26 months, etc..

41

Example 2: Breast Feeding and Age

Page 42: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

42

Question: what is the estimated relative odds (odds ratio) of being breast fed for children who are 30 months old compared to children who are 24 months old?

42

Example 2: Breast Feeding and Age

Page 43: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

43

Interpretation of

43

Example 2: Breast Feeding and Age

o

Page 44: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

44

44

Example 3: To Categorize or Not?

Respiratory failure and gestational age

321 4.38.20.25.51

ln xxxp

p

Page 45: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

45

Simple logistic regression can be done with binary, categorical and continuous predictors

When the predictor x1 is continuous, the model estimates a linear relationship between the ln(odds y=1) and x1

The resulting estimated slope from logistic regression with a continuous predictor still has a ln(odds ratio) interpretation, and the intercept a ln(odds when x1=0) interpretation

45

Summary

Page 46: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

46

Section C

Simple Logistic Regression : Accounting for Uncertainty in the Estimates

Page 47: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

47

47

Learning Objectives

Create 95% CIs for the intercept and slopes from simple logistic regression and convert these to 95% CIs for odds and odds ratios

Estimate p-values for testing the null Ho: β1=0 (and hence the OR =1)

Page 48: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

48

48

So in the last two sections, we showed the results from several simple logistic regression models

For example, the relationship between breast feeding and child sex estimated from a random sample of 236 Nepali children [0, 36) months old was given by the following equation: (x1=1 for males)

I told you this came from a computer package: but what is the algorithm to estimate this equation?

Example 1: Breast Feeding and Sex

1002.012.11

ln xp

p

Page 49: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

49

There must be some algorithm that will always yield the same results for the same data set

For logistic regression, this approach is called “maximum likelihood”: the estimates for the intercept ( ) and the slope ( ) are the values that make the observed data “most” likely among all choices for and .

This must be done via the computer

49

Example 1: Breast Feeding and Sex

o 1

o 1

Page 50: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

50

50

The values chosen for are just estimates based on a single sample. For a different random sample of 236 children from the same population of [0,36) month olds, the resulting estimates would likely be different:

As such, all regression coefficients have an associated standard error that can be used to make statements about the true relationship between ln(odds y=1) and x1 (for example, the true slope ) based on a single sample

Example 1: Breast Feeding and Sex

1ˆˆ and o

1

Page 51: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

51

51

The estimated regression equation for the relationship between breast feeding and child sex estimated from a random sample of 236 Nepali children [0, 36) months old was given by the following equation

Example 1: Breast Feeding and Sex

210)ˆ(ES and12.1ˆ30.0)ˆ(ES and 002.0ˆ

o

11

. o

1002.012.11

ln xp

p

Page 52: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

52

52

Random sampling behavior of estimated regression coefficients is normal

As such, we can use same ideas to create 95% CIs and get p-values

Example 1: Breast Feeding and Sex

Page 53: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

53

53

The estimated regression equation relating arm circumference to sex was:

95% CI for β1

Example 1: Breast Feeding and Sex

)602.0,598.0(30.02002.0)ˆ(ˆ2ˆ11 ES

1002.012.11

ln xp

p

30.0)ˆ(ES and 002.0ˆ11

Page 54: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

54

54

95% CI for eβ1

Example 1: Breast Feeding and Sex

Page 55: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

55

55

p-value for testing:Ho: β1 =0 (eβ1 =1) HA: β1 ≠0 (eβ1 ≠ 1)

Assume null true, and calculate standardized “distance “ of from 0

The p-value is probability of being 0.01 or more standard errors away from mean of 0 on a normal curve is very large: in this example, p = .997

Example 1: Breast Feeding and Sex

1

01.03.0

002.0)(ˆ

ˆ

)(ˆ0ˆ

1

1

1

1

ESES

z

Page 56: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

56

56

Logistic regression was used to estimate the relationship between breast feeding and child sex using data from a random samples of 236 Nepalese children 0-36 months old. The results showed no substantive or statistically significant association between breast feeding status and sex (odds ratio = 1.00, 95% CI 0.55 to 1.83)

Summarizing findings: Breast Feeding and Sex

Page 57: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

57

Equation of regression line relating ln(odds) of obesity to HDL : from computer

57

Example 2: Risk of Obesity and HDL

1033.00.05 obesity) of oddsln(1

ln xp

p

13.0)ˆ(;05.0ˆ oo SE

003.0)ˆ(ˆ ;033.0ˆ11 ES

Page 58: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

58

95% CI for β1

95% CI for eβ1

58

Example 2: Risk of Obesity and HDL

003.0)ˆ(ˆ ;033.0ˆ11 ES

Page 59: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

59

What is a 95% CI for the odds ratio of being obese for persons with HDL of 100 mg/dL versus persons with HDL of 80 mg/dL?

59

Example 2: Risk of Obesity and HDL

Page 60: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

60

60

Example 3: Respiratory Failure and Gestational Age

Respiratory failure and gestational age

321 0.28.24.35.51

ln xxxp

p

066.0)ˆ(ˆ ;4.3ˆ11 ES039.0)ˆ(ˆ ;5.5ˆ

00 ES

Page 61: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

61

61

Example 3: Respiratory Failure and Gestational Age

Respiratory failure and gestational age; p-value for slope

321 0.28.24.35.51

ln xxxp

p

066.0)ˆ(ˆ ;4.3ˆ11 ES

Page 62: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

62

62

Summary

Page 63: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

63

Section D

Estimating Risk and Functions of Risk from Logistic Regression Results

Page 64: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

64

64

Learning Objectives

While the results from logistic regression can be interpreted in terms of odds and odds ratios (after exponentiation), for prospective cohort studies, risks can be estimated

With a little bit of work, the results from logistic regression can be converted to probabilities (proportions, risks) and presented on this scale

Page 65: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

65

65

In the last several sections we have explored how to relate a binary outcome to a predictor (binary, ordinal and nominal categorical , continuous) via simple logistic regression

We have shown how translate the results into estimates of odds and odds ratio

The results from logistic regression can also be used to get estimated risks and functions of risk (if the study design allows for risk estimates)

Risk Estimates From Logistic Regression

Page 66: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

66

66

Recall, the estimated odds of a binary outcome is given by:

where is the estimated proportion of sample (probability of, risk of ) having event

This expression can be solved in terms of :

Risk Estimates From Logistic Regression

ppSDOD

ˆ1ˆˆ

p

p

Page 67: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

67

67

Example 2: Respiratory Failure

Respiratory failure and gestational age

There are four categories, and 37-40 weeks is the reference category.

x1 = 1 if gestational age =34 weeksx2 = 1 if gestational age =35 weeksx3 = 1 if gestational age =36 weeks

321 0.28.24.35.51

ln xxxp

p

Page 68: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

68

68

To compute estimate risk (probability, proportion) of respiratory failure for reference group: (37-40 weeks)

To compute estimate risk (probability, proportion) of respiratory failure for gestational age=34 weeks:

Example 1: Breast Feeding and Sex

Page 69: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

69

Equation of regression line relating ln(odds) of obesity to HDL : from computer

What is the estimate proportion of obese persons with HDL measurements of 75? (ie: estimated risk of obesity for persons with HDL of 75)

69

Example 2: Risk of Obesity and HDL

1033.00.05 obesity) of oddsln(1

ln xp

p

Page 70: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

70

What is estimated risk difference and relative risk of being obese for persons with HDL of 100 mg/dL versus persons with HDL of 75 mg/dL?

70

Example 2: Risk of Obesity and HDL

Page 71: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

71

The following graphic shows the estimated risk of obesity as function of HDL level

71

Example 2: Risk of Obesity and HDL

0.1

.2.3

.4E

stim

ated

Pro

porti

on (P

roba

bilit

y) o

f Obe

se

0 50 100 150HDL (mg/dl)

Simple Logistic Regession Using NHANESEstimated Probability (Proportion) of Obesity

Page 72: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

72

72

Example 3: Breast Feeding Status and Age

Recall the resulting equation from the same of 192 Nepali children 12-36 months, relating breast feeding status to age in months

What does the above estimate for 24 month old children?

124.030.7 breastfed) being of oddsln(1

ln xp

p

Page 73: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

73

73

Example 3: Breast Feeding Status and Age

This is the estimated ln odds of being breast fed of 24 month olds in the sample

To get the corresponding odds, exponentiate the ln(odds)

To estimate from the odds:

So the above results translate into an estimated probability of .82 (82%)

1.5424.24.7.3024Age;p1

pln

66.41.54 e

82.066.566.4

ˆ1

ˆˆ

sdodsdodpp

Page 74: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

74

74

Example 3: Breast Feeding Status and Age

So an estimated 82% of 24 month olds are breast fed in this sample of Nepali children

What about the estimated proportion of 16 month olds?

the corresponding odds is

the corresponding estimated probability is

46.361.24.7.361Age;p1

pln

Page 75: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

75

75

Example 3: Breast Feeding Status and Age

So an estimated 82% of 24 month olds are breast fed in this sample of Nepali children; and an estimated 95% of 16 month olds are breast fed

What about the estimated relative risk of being breast fed for 24 month olds to 16 month olds?

The estimate risk difference for the same age comparison:

0.850.970.82

pp

RR61age

24age

0.1597.0.82pp 61age24age

Page 76: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

76

76

Example 3: Breast Feeding Status and Age

The following graphic shows the estimated risk (probability, proportion) of being breast fed as function of child age.

.2.4

.6.8

1P

roba

bilit

y (P

ropr

ortio

n, R

isk)

of B

reas

t Fed

10 15 20 25 30 35Age of Child (months)

192 Nepali Children 12-36 MonthsProbability of Being Breast Fed by Age

Page 77: lecture 2lecturecontent.s3.amazonaws.com/pdf/14993.pdf · Lecture 3, part A: St atistical Reasoning 2 5 5 The “Left Hand Side” For logistic regression, the equation is a bit more

Lecture 3, part A: Statistical Reasoning 2

77

77

Summary

For most types of studies (case control studies excepted), the results from logistic regression can be used to estimate risk (probability, proportion) and hence risk differences and relative risks