Introduction to logistic regression Introduction to logistic regression and Generalized Linear Modelsand Generalized Linear Models
July 14, 2011July 14, 2011
Introduction to Statistical Introduction to Statistical Measurement and Measurement and
ModelingModeling
Karen Bandeen-Roche, PhDDepartment of Biostatistics
Johns Hopkins University
Data motivation
Osteoporosis data Scientific question: Can we detect
osteoporosis earlier and more safely?
Some related statistical questions:
How does the risk of osteoporosis vary as a function of measures commonly used to screen for osteoporosis?
Does age confound the relationship of screening measures with osteoporosis risk?
Do ultrasound and DPA measurements discriminate osteoporosis risk independently of each other?
Outline Why we need to generalize linear models
Generalized Linear Model specification Systematic, random model components
Maximum likelihood estimation
Logistic regression as a special case of GLM Systematic model / interpretation
Inference
Example
Regression for categorical outcomes
Why not just apply linear regression to categorical Y’s?
Linear model (A1) will often be unreasonable.
Assumption of equal variances (A3) will nearly always be unreasonable.
Assumption of normality will never be reasonable
Introduction:Regression for binary outcomes
Yi = 1{event occurs for sampling unit i}= 1 if the event occurs= 0 otherwise.
pi = probability that the event occurs for sampling unit i:= Pr{Yi = 1}
Begin by generalizing random model (A5): Probability mass function: Bernoulli
Pr{Yi = 1} = pi; Pr{Yi = 0} = 1-pi
all other yi occur with 0 probability
fY i(y) : Pr{Yi y} piy(1 pi)
1 y.fY i(y) : Pr{Yi y} piy(1 pi)
1 y.
Binary regression By assuming Bernoulli: (A3) is definitely not reasonable
Var(Yi ) = pi(1-pi)
Variance is not constant: rather a function of the mean
Systematic model
Goal remains to describe E[Yi|xi]
Expectation of Bernoulli Yi = pi
To achieve a reasonable linear model (A1): describe some function of E[Yi|xi] as a linear function of covariates
g(E[Yi|xi]) = xi’β
Some common g: log, log{p/(1-p)}, probit
General framework:Generalized Linear Models
Random model
Y~a density or mass function, fY, not necessarily normal
Technical aside: fY within the “exponential family”
Systematic model
g(E[Yi|xi]) = xi’β = ηi
“g” = “link function”; “xi’β” = “linear predictor”
Reference: Nelder JA, Wedderburn RWM, Generalized linear models, JRSSA 1972; 135:370-384.
Types of Generalized Linear Models
Model
(link function)
Response Distribution Regression
Coef Interp
Linear Continuous Gaussian Change in ave(Y) per unit
change in X
Logistic Binary Binomial Log odds ratio
Log-linear Times to events/counts
Poisson Log relative rate
Proportional hazards
Times to events
Semi-parametric
Log hazard
Estimation Estimation: maximizes L(β,a;y,X) =
General method: Maximum likelihood (Fisher)
Given {Y1,...,Yn} distributed with joint density or mass function fY(y;θ), a likelihood function L(θ;y) is any function (of θ) that is proportional to fY(y;θ).
If sampling is random, {Y1,...,Yn} are statistically independent, and L(θ;y) α product of individual f.
Maximum likelihood The maximum likelihood estimate (MLE), ,
maximizes L(θ;y):
Under broad assumptions MLEs are asymptotically Unbiased (consistent)
Efficient (most precise / lowest variance)
Logistic regression
Yi binary with pi = Pr{Yi = 1}
Example: Yi = 1{person i diagnosed with heart disease}
Simple logistic regression (1 covariate)
Random Model: Bernoulli / Binomial
Systematic Model: log{pi/(1- pi)}= β0 + β1xi
log odds; logit(pi)
Parameter interpretation
β0 = log(heart disease odds) in subpopulation with x=0
β1 = log{px+1/(1-px+1)}- log{px/(1-px)}
Logistic regressionInterpretation notes
β1 = log{px+1/(1-px+1)}- log{px/(1-px)}
=
exp(β1) =
= odds ratio for association of prevalent heart disease with each (say) one year increment in age
= factor by which odds of heart disease increases / decreases with each 1-year cohort of age
Multiple logistic regression Systematic Model: log{pi/(1- pi)}= β0 + β1xi1 + … + βpxip
Parameter interpretation
β0 = log(heart disease odds) in subpopulation with all x=0
βj = difference in log outcome odds comparing subpopulations who differ by 1 on xj, and whose values on all other covariates are the same
“Adjusting for,” “Controlling for” the other covariates
One can define variables contrasting outcome odds differences between groups, nonlinear relationships, interactions, etc., just as in linear regression
Logistic regression - prediction
Translation from ηi to pi
log{pi/(1- pi)}= β0 + β1xi1 + … + βpxip
Then = logistic
function of ηi
Graph of pi versus ηi has a sigmoid shape
GLMs - Inference The negative inverse Hessian matrix of the log likelihood function characterizes Var( ) (adjunct)
SE( ) obtained as square root of the jth diagonal entry
Typically, substituting for β
“Wald” inference applies the paradigm from Lecture 2
Z = is asympotically ~ N(0,1) under H0: βj= β0j
Z provides a test statistic for H0: βj= β0j versus HA: βj≠ β0j
± z(1-α/2) SE{ } =(L,U) is a (1-α)x100% CI for βj
{exp(L),exp(U)} is a (1-α)x100% CI for exp(βj)
( )
j j
jSE
0
GLMs: “Global” Inference Analog: F-testing in linear regression
The only difference: log likelihoods replace SS
Hypothesis to be tested is H0: βj1=...=βjk = 0
Fit model excluding xj1,...,xjpj: Save -2 log likelihood = Ls
Fit “full” (or larger) model adding xj1,...,xjpj to smaller model. Save -2 log likelihood = LL
Test statistic S = Ls - LL
Distribution under null hypothesis: χ2pj Define rejection region based on this distribution
Compute S
Reject or not as S is in rejection region or not
GLMs: “Global” Inference Many programs refer to “deviance” rather than -2 log likelihood
This quantity equals the difference in -2 log likelihoods between ones fitted model and a “saturated model”
Deviance measures “fit”
Differences in deviances can be substituted for differences in -2 log likelihood in the method given on the previous page
Likelihood ratio tests have appealing optimality properties
Outline: A few more topics
Model checking: Residuals, influence points ML can be written as an iteratively reweighted
least squares algorithm
Predictive accuracy
Framework generalizes easily
Main Points Generalized linear modeling provides a flexible
regression framework for a variety of response types Continuous, categorical measurement scales
Probability distributions tailored to the outcome
Systematic model to accommodate
Measurement range, interpretation
Logistic regression Binary responses (yes, no)
Bernoulli / binomial distribution
Regression coefficients as log odds ratios for association between predictors and outcomes
Main Points Generalized linear modeling
accommodates description, inference, adjustment with the same flexibility as linear modeling
Inference “Wald”- statistical tests and confidence
intervals via parameter estimator standardization
“Likelihood ratio” / “global” – via comparison of log likelihoods from nested models
Top Related