Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren,...

44
Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer

Transcript of Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren,...

Page 1: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Introduction to Logistic Regression

Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer

Page 2: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Objectives

• When do we need to use logistic regression

• Principles of logistic regression

• Uses of logistic regression

• What to keep in mind

Page 3: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Chlamorea

• Sexually transmitted infection –Virus recently identified

–Leads to general rash, blush, pimples and feeling of shame

–Increasing prevalence with age

–Risk factors unknown so far

Page 4: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Case control study

• Population of Berlin

• 150 cases, 150 controls

• Hypothesis: Consistent use of condoms protects against chlamorea

• Questionnaire with questions on demographic characteristics, sexual behaviour

• OR, t-test

Page 5: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Results bivariate analysis

Cases n=150

Controls n=150

Odds ratio

Used condoms at last sex 40 90 0.17

Did not use condoms 110 60 Ref

Page 6: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Results bivariate analysis

Cases n=150

Controls n=150

Odds ratio

Single 125 50 4.7

Currently in a relationship 25 100 Ref

Page 7: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Results bivariate analysis

Cases n=150

Controls n=150

T-test

nr partners during last year 4 2 p=0.001

Mean age in years 39 26 p=0.001

Confounding?

Page 8: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

a

c

b

d

OR raw

a1c1

b1d1

a2

c2

b2

d2

OR1

OR2

ai

ci

bi

di ORi

a1c1

b1d1

a2

c2

b2

d2

OR1

OR2

ai

ci

bi

di ORi

a1c1

b1d1

a2

c2

b2

d2

OR1

OR2

ai

ci

bi

di ORi

a1c1

b1d1

a2

c2

b2

d2

OR1

OR2

ai

ci

bi

di ORi

a1c1

b1d1

a2

c2

b2

d2

OR1

OR2

ai

ci

bi

di ORi

a1c1

b1d1

a2

c2

b2

d2

OR1

OR2

ai

ci

bi

di ORi

a1c1

b1d1

a2

c2

b2

d2

OR1

OR2

ai

ci

bi

di ORi

a1c1

b1d1

a2

c2

b2

d2

OR1

OR2

ai

ci

bi

di ORi

a1c1

b1d1

a2

c2

b2

d2

OR1

OR2

ai

ci

bi

di ORi

a1c1

b1d1

a2

c2

b2

d2

OR1

OR2

ai

ci

bi

di ORi

a1c1

b1d1

a2

c2

b2

d2

OR1

OR2

ai

ci

bi

di ORi

a1c1

b1d1

a2

c2

b2

d2

OR1

OR2

ai

ci

bi

di ORia1c1

b1d1

a2

c2

b2

d2

OR1

OR2

ai

ci

bi

di ORi

a1c1

b1d1

a2

c2

b2

d2

OR1

OR2

ai

ci

bi

di ORi

a1c1

b1d1

a2

c2

b2

d2

OR1

OR2

ai

ci

bi

di ORi

a1c1

b1d1

a2

c2

b2

d2

OR1

OR2

ai

ci

bi

di ORi

a1

c1

b1

d1

a2

c2

b2

d2

OR1

OR2

a3

c3

b3

d3OR3

ai

ci

bi

diOR4

Chlamorea and condom use

Single statusAgegroup

Number of partnersStratification

Page 9: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Let’s go one step back

Page 10: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Simple linear regression

Age SBP Age SBP Age SBP

22 131 41 139 52 128 23 128 41 171 54 105 24 116 46 137 56 145 27 106 47 111 57 141 28 114 48 115 58 153 29 123 49 133 59 157 30 117 49 128 63 155 32 122 50 183 67 176 33 99 51 130 71 172 35 121 51 133 77 178 40 147 51 144 81 217

Table 1 Age and systolic blood pressure (SBP) among 33 adult women

Page 11: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

80

100

120

140

160

180

200

220

20 30 40 50 60 70 80 90

SBP (mm Hg)

Age (years)

adapted from Colton T. Statistics in Medicine. Boston: Little Brown, 1974

Page 12: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Simple linear regression

• Relation between 2 continuous variables (SBP and age)

• Regression coefficient 1

–Measures association between y and x–Amount by which y changes on average when x

changes by one unit–Least squares method

y

x

11xβαy Slope

α

Page 13: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

What if we have more than one independent variable?

Page 14: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Multiple risk factors

• Objective:To attribute to each risk factors the respective effect (RR) it has on the occurrence of disease.

Page 15: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Types of multivariable analysis• Multiple models

–Linear regression–Logistic regression–Cox model–Poisson regression–Loglinear model–Discriminant analysis…

• Choice of the tool according objectives, study design and variables

Page 16: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Multiple linear regression

• Relation between a continuous variable and a set of i variables

• Partial regression coefficients i

–Amount by which y changes when xi changes by one unit and all the other xi remain constant

–Measures association between xi and y adjusted for all other xi

• Example–Number of partners in relation to age & income

xβ ... xβ xβαy ii2211

Page 17: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Multiple linear regression

Predicted Predictor variables

Response variable Explanatory variablesOutcome variable CovariablesDependent Independent variables

xβ ... xβ xβα y ii2211

y (number of partners) = α + β1 age + β2 income + β3 gender

Page 18: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

What if our outcome variable is dichotomous?

Page 19: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Logistic regression (1)Table 2 Age and chlamorea

Page 20: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

How can we analyse these data?

• Compare mean age of diseased and non-diseased

–Non-diseased: 26 years

–Diseased: 39 years (p=0.0001)

• Linear regression?

Page 21: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Dot-plot: Data from Table 2P

rese

nce

of

Ch

lam

ore

a

Page 22: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Logistic regression (2)Table 3 Prevalence (%) of chlamorea according to age group

Page 23: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Dot-plot: Data from Table 3

0

20

40

60

80

100

0 2 4 6 8

Diseased %

Age group

Page 24: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Logistic function (1)

0.0

0.2

0.4

0.6

0.8

1.0Probability of disease

x

Page 25: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Logistic function

• Logistic regression models the logit of the outcome=natural logarithm of the odds of the outcome

Probability of the outcome (p)

Probability of not having the outcome (1-p)ln

ii2211 xβ ... xβ xβαP-1

P ln

Page 26: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Logistic function

= log odds of disease in unexposed

= log odds ratio associated with being exposed

e = odds ratio

ii2211 xβ ... xβ xβαP-1

P ln

Page 27: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Multiple logistic regression

• More than one independent variable–Dichotomous, ordinal, nominal, continuous …

• Interpretation of i – Increase in log-odds for a one unit increase in x i with

all the other xis constant–Measures association between xi and log-odds

adjusted for all other xi

ii2211 xβ ... xβ xβαP-1

P ln

Page 28: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Uses of multivariable analysis

• Etiologic models–Identify risk factors adjusted for

confounders–Adjust for differences in baseline

characteristics

• Predictive models –Determine diagnosis–Determine prognosis

Page 29: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Fitting equation to the data

• Linear regression: –Least squares

• Logistic regression: –Maximum likelihood

Page 30: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Elaborating eβ

• eβ = OR What if the independent variable

is continuous?

what’s the effect of a change in x by more than one unit?

Page 31: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

The Q fever example

• Distance to farm as independent continuous variable counted in meters–β in logistic regression was -0.00050013 and

statistically significant

• OR for each 1 meter distance is 0.9995 –Too small to use

• What’s the OR for every 1000 meters?

–e1000*β = e-1000*0.00050013 = 0.6064

Page 32: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Continuous variables

• Increase in OR for a one unit change in exposure variable

• Logistic model is multiplicative OR increases exponentially with x–If OR = 2 for a one unit change in exposure

and x increases from 2 to 5: OR = 2 x 2 x 2 = 23 = 8

• Verify if OR increases exponentially with x –When in doubt, treat as qualitative variable

Page 33: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Coding of variables (2)

• Nominal variables or ordinal with unequal classes:–Preferred hair colour of partners:

» No hair=0, grey=1, brown=2, blond=3

–Model assumes that OR for blond partners = OR for grey-haired partners3

–Use indicator variables (dummy variables)

Page 34: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Indicator variables: Hair colour

• Neutralises artificial hierarchy between classes in variable “hair colour of partners"

• No assumptions made• 3 variables in model using same reference • OR for each type of hair adjusted for the

others in reference to “no hair”

Page 35: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Classes

• Relationship between number of partners during last year and chlamorea

– Code number of partners: 0-1 = 1, 2-3 = 2, 4-5 = 3

• Compatible with assumption of multiplicative model – If not compatible, use indicator variables

Code nr partners

Cases Controls OR

1 20 40 1.0

2 22 30 1.5

3 12 11 2.2

1.52 2.2

Page 36: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Risk factors for Chlamorea

No condom use

Chlamorea

SexHair colourAgegroupSingleVisiting barsNumber of partners

Page 37: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Unconditional Logistic RegressionTerm Odds

Ratio 95% C.I. Coef. S. E. Z-Statistic

P-Value

# partners 1,2664 0,2634 10,7082 0,2362 0,9452 0,5486 0,5833

Single (Yes/No) 1,0345 0,3277 3,2660 0,0339 0,5866 0,0578 0,9539

Hair colour (1/0) 1,6126 0,2675 9,7220 0,4778 0,9166 0,5213 0,6022

Hair colour (2/0) 0,7291 0,0991 5,3668 -0,3159 1,0185 -0,3102 0,7564

Hair colour (3/0) 1,1137 0,1573 7,8870 0,1076 0,9988 0,1078 0,9142

Visiting bars 1,5942 0,4953 5,1317 0,4664 0,5965 0,7819 0,4343

Used no Condoms 9,0918 3,0219 27,3533 2,2074 0,5620 3,9278 0,0001

Sex (f/m) 1,3024 0,2278 7,4468 0,2642 0,8896 0,2970 0,7665

CONSTANT * * * -3,0080 2,0559 -1,4631 0,1434

Page 38: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Last but not least

Page 39: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Why do we need multivariable analysis?

• Our real world is multivariable

• Multivariable analysis is a tool to determine the relative contribution of all factors

Page 40: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Sequence of analysis

• Descriptive analysis–Know your dataset

• Bivariate analysis–Identify associations

• Stratified analysis–Confounding and effect modifiers

• Multivariable analysis–Control for confounding

Page 41: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

What can go wrong

• Small sample size and too few cases

• Wrong coding

• Skewed distribution of independent variables–Empty “subgroups”

• Collinearity–Independent variables express the same

Page 42: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

Do not forget

• Rubbish in - rubbish out

• Check for confounders first

• Number of subjects >> variables in the model

• Keep the model simple–Statisticians can help with the model but

you need to understand the interpretation

• You will need several attempts to find the “best” model

Page 43: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

• If in doubt…

Really call a statistician !!!!

Page 44: Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren, Viviane Bremer.

References

• Norman GR, Steiner DL. Biostatistics. The Bare Essentials. BC Decker, London, 2000

• Hosmer DW, Lemeshow S. Applied logistic regression. Wiley & Sons, New York, 1989

• Schwartz MH. Multivariable analysis. Cambridge University Press, 2006