Lecture 18: Review Lecture -...

87
Lecture 18: Review Lecture Ani Manichaikul [email protected] 15 May 2007

Transcript of Lecture 18: Review Lecture -...

Page 1: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Lecture 18: Review Lecture

Ani [email protected]

15 May 2007

Page 2: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Types of Biostatistics

n 1) Descriptive Statisticsn Exploratory Data Analysis

n often not in literature

n Summariesn "Table 1" in a paper

n Goal: visualize relationships, generatehypotheses

Page 3: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Types of Biostatistics

n 2) Inferential Statisticsn Confirmatory Data Analysis

n Methods Section of paper

n Goal: quantify relationships, testhypotheses

Page 4: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Approach to Modeling

A general approach for most statisticalmodeling is to:

n Define the Population of Interestn State the Scientific Questions & Underlying

Theoriesn Describe and Explore the Observed Datan Define the Model

n Probability part (models the randomness / noise)n Systematic part (models the expectation / signal)

Page 5: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Approach to Modeling

n Estimate the Parameters in the Modeln Fit the Model to the Observed Data

n Make Inferences about Covariatesn Check the Validity of the Model

n Verify the Model Assumptionsn Re-define, Re-fit, and Re-check the Model if

necessaryn Interpret the results of the Analysis in terms

of the Scientific Questions of Interest

Page 6: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Stem-and-Leaf Plots

n Age in years (10 observations)

25, 26, 29, 32, 35, 36, 38, 44, 49, 51

5 6 920-29

150-594 940-492 5 6 830-39

ObservationsAge Interval

Page 7: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Grouping:Frequency Distribution Tables

n Shows the number of observations for eachrange of data

n Intervals can be chosen in ways similar tostem-and-leaf displays

320-29

150-59240-49430-39

FrequencyAge Interval

Page 8: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Histograms

n Pictures of the frequency or relativefrequency distribution

12

34

Fre

quen

cy

1 2 3 4Age Ca tegory

Histogram of Age

Page 9: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Box-and-Whisker Plots

2530

3540

4550

Age

in Y

ears

Box Plot of Age

n IQR = 44 –29 = 15n Upper Fence = 44 + 15*1.5 = 66.5n Lower Fence = 29 –15*1.5 = 6.5

Page 10: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

2 Continuous Variables

n Scatterplot

n Scatterplots visually display the relationship betweentwo continuous variables

150

160

170

180

190

Hei

ght i

n C

entim

eter

s

25 30 35 40 45 50Age in Years

Age by Height in cm

Page 11: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way
Page 12: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Why is the power of a testimportant?

n Power indicates the chance of finding a“significant”difference when therereally is onen Low power: like to obtain non-significant

results even when significant differencesexist

n High power is desirable!n Low power is usually cause by small

sample size

Page 13: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

We’re not always right

Page 14: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Errors in Hypothesis Testing α

n Aim: to keep Type I error small byspecifying a small rejection region

n α is set before performing a test,usually at 0.05

Page 15: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Errors in Hypothesis Testing β

n Aim: To keep Type II error small andthus power high

Page 16: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

β: Probability of Type II Error

n The value of β is usually unknown since itdepends on a specified alternative value.

n β depends on sample size and α.n Before data collection, scientists decide

n the test they will performn αn the desired β

n They will use this information to choose thesample size

Page 17: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

P-Values

n Definition: The p-value for a hypothesistest is the probability of obtaining bychance, alone, when H0 is true, avalue of the test statistic as extreme ormore extreme (in the appropriatedirection) than the one actuallyobserved.

Page 18: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Steps of Hypothesis Testing

n Define the null hypothesis, H0.n Define the alternative hypothesis, Ha, where

Ha is usually of the form “not H0”.n Define the type 1 error, α, usually 0.05.n Calculate the test statisticn Calculate the P-valuen If the P-value is less than α, reject H0.

Otherwise fail to reject H0.

Page 19: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Why use linear regression?

n Linear regression is very powerful. Itcan be used for many things:n Binary Xn Continuous Xn Categorical Xn Adjustment for confoundingn Interactionn Curved relationships between X and Y

Page 20: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

20

SLR: Y= 0+ 1X1

n Linear regression is used for continuousoutcome variablesn 0: mean outcome when X=0 (Center!)n Binary X = “dummy variable”for group

n 1: mean difference in outcome betweengroups

n Continuous Xn 1: mean difference in outcome corresponding

to a 1-unit increase in Xn Center X to give meaning to 0

n Test 1=0 in the population

Page 21: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Assumptions of LinearRegression

n L Linear relationshipn I Independent observationsn N Normally distributed around linen E Equal variance across X’s

Page 22: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

In Simple Linear Regression

n In simple linear regression (SLR):n One Predictor / Covariate / Explanatory Variable:

X

n In multiple linear regression (MLR):n Same Assumptions as SLR, (i.e. L.I.N.E.), but:n More than one Covariate: X1, X2, X3, …, Xp

Model:n Y ~ N(µ, σ2)n µ = E(Y | X) = β0 + β1X1 + β2X2 + β3X3 +... βpXp

Page 23: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Regression Methods

Page 24: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Regression Methods

Page 25: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Nested models

n One model is nested within another ifthe parent model contains one set ofvariables and the extended modelcontains all of the original variables plusone or more additional variables.

Page 26: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Difference in assessing variables:“nested models”

n other predictor(s)n assess with t test if single variable defines

predictorn assess with F test (today) if two or more

variables are needed to define thepredictor

n potential confounder(s)n compare CI of primary predictor to see

whether new parameter is significantlydifferent

Page 27: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

The F test

( )( )

nested

nested

nestedparent

obs

dfresidualRSS

added variablesnewof#RSSRSS

F

=

( )4.4

228.49

28.496.69

Fobs =−

=

What is Fcr?

H0: all new ’s=0 in population

HA: at least one new is not 0 in population

Page 28: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

The F test: notes

n The F test can be used to compare any twonested models

n If only one variable is added, it’s easier tocompare the models using the t test for thatvariablen t2=F if one variable is added

n For any regression, the estimated variance ofthe residuals is RSS/(residual df)

Page 29: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Nested Models

n Comparing nested modelsn 1 new variable: use t test for that variablen 2+ new variables: use F test

n Categorical predictorn set one group as referencen create dummy variable for other groupsn include/exclude all dummy variablesn evaluate categorical predictor with F test

Page 30: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Effect Modification

n In linear regression, effect modificationis a way of allowing the associationbetween the primary predictor and theoutcome to change with the level ofanother predictor.n If the 3rd predictor is binary, that results in

a graph in which the two lines (for the twogroups) are no longer parallel.

Page 31: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

31

Splines and Quadratic Terms

n Splines are used to allow the regression lineto bendn the breakpoint is arbitrary and decided graphically

or by hypothesisn the actual slope above and below the breakpoint

is usually of more interest than the coefficient forthe spline (ie the change in slope)

n Quadratic term allows for curvature in themodel

Page 32: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Logistic regression

n For binary outcomesn Model log odds probability, which we

also call the logitn Baseline term interpreted as log oddsn Other coefficients are log odds ratios

Page 33: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Logistic regression model

[ ]

=

Tx)|reliefP(noTx)|P(relieflogTx)|fodds(Relielog

= β0 + β1Tx

0 if Placebowhere: Tx =

1 if Drug

Page 34: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Then…

n log( odds(Relief|Drug) ) = β0 + β1

n log( odds(Relief|Placebo) ) = β0

n log( odds(R|D)) –log( odds(R|P)) = β1

Page 35: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

And…

n Thus: log = β1

n And: OR = exp(β1) = eβ1 !!

n So: exp(β1) = odds ratio of relief forpatients taking the Drug-vs-patientstaking the Placebo.

P)|odds(RD)|odds(R

Page 36: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Logistic Regression

Logit estimates Number of obs = 70LR chi2(1) = 2.83Prob > chi2 = 0.0926

Log likelihood = -46.99169 Pseudo R2 = 0.0292

------------------------------------------------------------------------------y | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------drug | .8137752 .4889211 1.66 0.096 -.1444926 1.772043_cons | -.2876821 .341565 -0.84 0.400 -.9571372 .3817731

------------------------------------------------------------------------------

Estimates:

log( odds(relief) ) =

= -0.288 + 0.814(Drug)

Therefore: OR = exp(0.814) = 2.26 !

Drug10ˆˆ ββ +

Page 37: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Adding other variables

n What ifPr(relief) = function of Drug or Placebo AND Age

n We could easily include age in a modelsuch as:

log( odds(relief) ) = β0 + β1Drug + β2Age

Page 38: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Logistic Regression

n As in MLR, we can include manyadditional covariates.

n For a Logistic Regression model with ppredictors:

log ( odds(Y=1)) = β0 + β1X1 + ... + βpXp

where: odds(Y=1) = =)1Pr(1

)1Pr(=−

=Y

Y)0Pr()1Pr(

=

=

YY

Page 39: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Types of interpretation

n 0+ 1 = ln(odds) (for X=1)n 1 = difference in log odds

n = odds (for X=1)n = odds ratio

n But we started with P(Y=1).Can we find that?

10e +

1e

Page 40: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

More useful math

n

n

n( )

10

10

e1e1Xforrobabilitypso +

+

+==

odds1oddsrobabilityp+

=

robabilityp1robabilitypodds

−=

Page 41: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Nested models

n Adding a single new variable to the model

n null model:

n full model:

( )30Agep1

pln 10 −+=

( ) ( )minMultivita30Agep1

pln 210 +−+=

Page 42: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Comparing nested models thatdiffer by one variable

n Compare models with p-value or CIn What method is this?

n The Wald test, a test that applies the CLT, liken Z test comparing proportions in 2x2 tablen analogous to the t test for linear regression

n H0: the new variable is not neededn or H0: new=0 in the population

Page 43: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Conclusion from the Wald test

n The p-value for multivitamin is 0.007 (<0.05)and the CI for coefficient multivitamin doesnot include 0 (CI for OR doesn’t include 1)

n Reject H0

n Conclude that the larger model is better:after adjusting for age, multivitamin use isstill an important predictor of physician visitsin the population

Page 44: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Interpretation - log oddsn 0: the log odds of not visiting a physician

for a 30-year-old person who reports notregularly taking multivitamins

n 1: the log odds ratio of not visiting aphysician for a one year increase in agecontrolling for multivitamin use

n 2: the log odds ratio of not visiting aphysician for those who take multivitaminscompared with those who do not, adjustingfor age

Page 45: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Interpretation –odds andodds ratio

n exp{ 0}: the odds of not visiting aphysician for a 30-year-old person whoreports not regularly takingmultivitamins

Page 46: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Interpretation –odds andodds ratio

n exp{ 1}: after adjusting formultivitamin use, the odds ratio of notvisiting a physician changes by a factorof exp{ 1}=1.001 for each additionalyear of agen additional age is associated with lower

frequency of physician visits in these students,but the association is not statistically significant(p>0.05)

Page 47: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Interpretation –odds andodds ratio

n exp{ 2}: the odds ratio of not visiting aphysician for those who takemultivitamins compared with those whodo not is exp{ 2}=0.46, adjusting foragen taking multivitamins is associated with regular

physician visits (p=0.007)

Page 48: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Interpretation In General

n Also: log = β1

n And: OR = exp(β1) !!n exp(β1) is the Multiplicative change in

odds for a 1 unit increase in X1 providedX2 is held constant.

n The result is similar for X2

=

+=

)2X,1X|1odds(Y

)2X1,1X|1odds(Y

Page 49: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

CHD by smoking and coffeen Yi = 1 if CHD case, 0 if control

n COFi = 1 if Coffee Drinker, 0 if not

n SMKi = 1 if Smoker, 0 if not

n pi = Pr (Yi = 1)

n ni = Number observed at patterni of Xs

Page 50: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Logistic Regression Model

n Yi are from a Binomial (ni, pi)distribution

n Yi are independentn log odds (Yi=1) (or, logit( Yi=1) ) is a

function ofn Coffeen Smokingn and coffee x smoking interaction

Page 51: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Logistic Regression Model

n Which implies that Pr(Yi=1) is thelogistic function

21322110

21322110

e1e

iXiXiXiX

iiii XXXX

ip ββ

ββ

+++

+=

+++

iiiii

i SMKCOFSMKCOFp

p32101

log ββββ +++=

Page 52: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Interpretations

n exp{ 1}: odds ratio of being a CHD casefor coffee drinkers -vs- non-drinkersamong non-smokers

n exp{ 1 3}: odds ratio of being a CHDcase for coffee drinkers -vs- non-drinkers among smokers

Page 53: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Interpretations

n exp{ 2}: odds ratio of being a CHD casefor smokers -vs- non-smokers amongnon-coffee drinkers

n exp{ 2 3}: odds ratio of being casefor smokers -vs- non-smokers amongcoffee drinkers

Page 54: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Interpretations

n fraction of cases among non-smoking non-coffee drinking individualsin the sample (determined by samplingplan)

n exp{ 3}: ratio of odds ratios

0

0

1 β

β

ee

+

Page 55: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

exp{ 3} Interpretations

n exp{ 3}: factor by which odds ratio of beinga CHD case for coffee drinkers -vs-nondrinkers is multiplied for smokers ascompared to non-smokers

orn exp{ 3}: factor by which odds ratio of being a

CHD case for smokers -vs- non-smokers ismultiplied for coffee drinkers as compared tonon-coffee drinkers

Page 56: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Some Special Cases

n Given

n If 1 = 2 = 3 = 0

n Neither smoking nor coffee drinking isassociated with increased risk of CHD

SMKCOFSMKCOFYY *

)0Pr()1Pr(log 3210 ββββ +++=

==

Page 57: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Some Special Cases

n Given

n If 1 = 3 = 0

n Smoking, but not coffee drinking, isassociated with increased risk of CHD

SMKCOFSMKCOFYY *

)0Pr()1Pr(log 3210 ββββ +++=

==

Page 58: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Some Special Cases

n If 3 = 0n Smoking and coffee drinking are both

associated with risk of CHD but the odds ratioof CHD-smoking is the same at levels ofcoffee

n Smoking and coffee drinking are bothassociated with risk of CHD but the odds ratioof CHD-coffee is the same at levels ofsmoking.

Page 59: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Confounding

n In epidemiological terms, Z is a “confounder”of the relationship of Y with X if Z is relatedto both X and Y and Z is not in the causalpathway between X and Y

n In statistical terms, Z is a “confounder”of therelationship of Y with X if the X coefficientchanges when Z is added to a regression of Yon X

Page 60: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Confounding

n For example, consider the two modelsY = 0 + 1X + 1

Y = 0 + 1X + 2Z + 2

n then Z is a confounder of the X, Yrelationship if 1 1

Page 61: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Look at Confidence Intervals

n Without SmokingOR = e0.79 = 2.2

n 95% CI for log(OR): 0.79 ± 1.96(0.33)= (0.13, 1.44)

n 95% CI for OR: (e0.13, e1.44)= (1.14, 4.24)

Page 62: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Look at Confidence Intervals

n With Smoking (adjusting for smoking)OR = e0.53 = 1.7

n 95% CI for log(OR): 0.53 ± 1.96(0.35)= (-0.17, 1.22)

n 95% CI for OR: (e-0.17, e1.22)= (0.85, 3.39)

Page 63: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Conclusion

n So, ignoring smoking, the CHD andcoffee OR is 2.2 (95%CI: 1.14 - 4.26)

n Adjusting for smoking, gives moremodest evidence for a coffee effect

n In this case-control study, smoking is aweak-to-moderate confounder of thecoffee-CHD association

Page 64: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Interaction Model

Model 3

2.4.551.3Smoking-.59.73-.43Coffee*

Smoking

1.5.45.69Coffee-3.4.30-1.0Intercept

zseEstVariable

Page 65: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Testing Interaction Term

n Z= -0.59, p-value = 0.554

n 95% Confidence interval for 1 3n (0.42, 3.99)

n Both of the above suggest that there islittle evidence that smoking is an effectmodifier!

Page 66: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Likelihood Ratio Test

n The Likelihood Ratio Test will help decidewhether or not additional term(s)“significantly”improve the model fit

n Likelihood Ratio Test (LRT) statistic forcomparing nested models isn -2 times the difference between the log likelihoods

(LLs) for the Null -vs- Extended modelsn the obtained is identical to from an

analysis of variance test for linear regressionmodels

Page 67: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Likelihood Ratio Test

Deviance is a term used for the difference in-2*log likelihood relative to the best possible value froma perfectly predicting model.

Change in deviance is the same as change in -2LL.

Page 68: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

LRT Example

Page 69: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Model comparisons usinglikelihood ratio test

Page 70: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Summary: Unadjusted ORs

n The odds of CHD was estimated to be3.4 times higher among smokerscompared to non-smokersn 95% CI: (1.7, 7.9)

n The odds of CHD was estimated to be2.2 times higher among coffee drinkerscompared to non-coffee drinkersn 95% CI: (1.1, 4.3)

Page 71: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Summary: Adjusted ORs

n Controlling for the potentialconfounding of smoking, the coffeeodds ratio was estimated to be 1.7 with95% CI: (.85, 3.4).

n Hence, the evidence in these data areinsufficient to conclude coffee has anindependent effect on CHD beyond thatof smoking.

Page 72: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Comparing the models

n Models C and F are both nested inModel A

n Models C and F cannot be directlycompared to one another, but we cansee which has a smaller p-value whencompared to Model An C vs. A: X2 = 26.5 with 2 dfn F vs. A: X2 = 21.7 with 3 df

Page 73: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

What next?

n Model C improves prediction beyond genderalone (Model A) more than Model F.

n Model C should be the next parent model,and we should test the new variables inModel F to see if they continue to improveprediction within the context of Model C.

n When a tentative final model is identified, theassumptions of logistic regression should bechecked.

Page 74: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

74

Flexibility in linear models

n A spline allows the “slope”for acontinuous predictor to change at agiven point; the coefficient is for thedifference in log odds ratio

n An interaction term allows the oddsratio for one variable to differ by thevalue of a second variable; thecoefficient is for the difference in logodds ratio

Page 75: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Poisson regression model

n Log-linear model for mean rate

where p is the number of predictors inthe model

n Random component:

n Here:

Page 76: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Exponentiating Poissonregression models

Page 77: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Interpreting Poissonregression parameters

Page 78: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Modelling rates

n Of key interest in Poisson regressionmodels is to make inference about ratesof events

n We are often interested in whether therate of cancer, or some other disease,varies by population subgroups such asgender, race, or age

Page 79: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Person-years

n In defining rates, it is crucial to statewhat denominator we have in mind

n For disease, we are usually interested indisease rate per person, per year

n If the HIV incidence rate is 5 per 1million person years, that means weexpect to see 5 new cases of HIV per 1million persons per year

Page 80: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Modelling Danish Cancer caseswith an offset

n We observed Danish cancer cases in 6age groups over a period of 4 years

n The model:

predicts log rates per 10,000 personyears

Page 81: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Interpretation of coefficients

Page 82: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

More about offsets

n The purpose of an offset is to specifythe denominator of the predicted rates

n We should always try to use an offset ifwe suspect the underlying populationsizes vary for the observed counts

n Typically, we’ll use log(N) as the offset,where N is the sample size or numberof person years generating each count

Page 83: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Poisson regression for cohortstudies

n Log-linear regression can be used to estimaterelative risks for cohort studies (but not casecontrol)

n Relative risks is like relative rates, but we arecomparing risks (probability of disease)instead of rates (expected cases per person-year) across groups

n Could also estimate relative risk bytransforming results from logistic regression

Page 84: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Grand summary

n Exploratory analysis includes graphsand tables –good to get a feel for thedata

n Confirmatory analysis is useful formaking definitive conclusions

n Linear models provide us with aframework in which to performconfirmatory analysis in many settings

Page 85: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Grand summary: linear models

n Linear regression: for continuous(normal) outcomes

n Logistic regression: for binary outcomesn Poisson regression: for counts

Page 86: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Grand summary: modelling

n In all generalized linear models, we canuse the following tools to make modelsmore flexible:n Adjust for confounders using additive

covariatesn Effect modification allows by interaction

termsn Curved and bent lines through polynomials

and splines

Page 87: Lecture 18: Review Lecture - people.Virginia.EDUpeople.virginia.edu/~am3xa/BiostatII/slides/lecture18.pdf · Effect Modification nIn linear regression, effect modification is a way

Grand summary: testing

n We can test significance of a singlepredictor using z-test (or t-test forlinear regression)

n Test significance of several covariatesusing a pair of nested models by alikelihood ratio test

n Know how to interpret p-values andconfidence intervals!