Ch8 Regression Revby Rao

27

description

 

Transcript of Ch8 Regression Revby Rao

Page 1: Ch8 Regression Revby Rao
Page 2: Ch8 Regression Revby Rao

Medical Statistics Medical Statistics (full English class)(full English class)

Shaoqi Rao, PhD

School of Public Health

Sun Yat-Sen University

Slides adapted from Dr. Ji-Qian Fang’s

Page 3: Ch8 Regression Revby Rao

Chapter 8Chapter 8Linear RegressionLinear Regression

Page 4: Ch8 Regression Revby Rao

How does the value of one variable How does the value of one variable depend on that of another one?depend on that of another one?How does the son’s height depend on the father’s

height?How does the death rate of animal depend on the

drug dosage?How does the infant weight depend on the month’

s age?How does the body surface area depend on the hei

ght?

---- To explore linear dependence quantitatively between two continuous variables.

Page 5: Ch8 Regression Revby Rao

8.1.1 Linear regression equation Initial meaning of “regression”: Galdon noted that if the father is tall, his son will be relatively tall; if the father is short, his son will be relatively short. But, if the father is very tall, his son will not talle

r than his father usually; if the father is very short, his son will not shorter than his father usually.

Otherwise, ……?!Galdon called this phenomenon “regression to th

e mean”

8.1 Statistical Description of Linear Regression

Page 6: Ch8 Regression Revby Rao

Independent variable (explanatory variable), X

randomly changing

or fixed by the researcher

Dependent variable (response variable), Y

randomly following a linear equation

Page 7: Ch8 Regression Revby Rao

What is regression in statistics?What is regression in statistics?

To find out the track of the means

100

120

140

160

180

200

220

100 120 140 160 180 200 220

Father’s height ( cm)

Son’s height (cm)

Page 8: Ch8 Regression Revby Rao

Given the value of X, Y varies around a center (y|x)

All the centers locate on a line -- regression line.

The relationship between the center y|x and X is described by a linear equation

|y x X

Page 9: Ch8 Regression Revby Rao

Linear regression

Try to estimate and , getting

Where

a -- estimate of , intercept

b -- estimate of , slope

-- estimate of y|x

bXaY ˆ

Y

|y x X

Page 10: Ch8 Regression Revby Rao

8.1.2 Regression coefficient and its calculation

To find a straight line to best fit the points.

Residual:

Fitness of the regression line:

Principle of least squares: To find a straight line that minimizes the sum of squared residuals.

Under such a principle, it is easy to get the formulas for and by calculus:  

(8.3)

(8.4)

Such a line must go through the point of , and cross the vertical axis at ---- Why?

yy ˆ

2)ˆ( yy

2)(

))((

xx

yyxx

l

lb

i

ii

xx

xy

xbya ),( yx

a

Page 11: Ch8 Regression Revby Rao

Example 8.1 Calculate the regression equation Example 8.1 Calculate the regression equation of the height of son of the height of son YY on the height of father on the height of father XX . .

No. 1 2 3 4 5 6 7 8 9 10

Father’s height, X 150 153 155 158 161 164 165 167 168 169 Son’s height, Y 159 157 163 166 169 170 169 167 169 170

No. 11 12 13 14 15 16 17 18 19 20

Father’s height, X 170 171 172 174 175 177 178 181 183 185 Son’s height, Y 173 170 170 176 178 174 173 178 176 180

8.168x 35.170y 2.1859xxl 4.1059xyl

5698.0

2.1859

4.1059 xx

xy

l

lb 17.74)8.168)(5698.0(35.170 a

XY 5698.017.74ˆ

Page 12: Ch8 Regression Revby Rao
Page 13: Ch8 Regression Revby Rao

8.2.1.1 The t-test for regression coefficient

b is the sample regression coefficient, changing from sample to sample

There is a population regression coefficient, denoted by

Question : Whether =0 or not?

H0: =0, H1: ≠0α=0.05

8.2 Statistical Inference on Regression 8.2.1 Hypothesis tests

Page 14: Ch8 Regression Revby Rao

2

)ˆ( 2

n

YYs

20

ns

bt

b

Statistic

Standard deviation of regression coefficient

Standard deviation of residual

2)( XX

ssb

Page 15: Ch8 Regression Revby Rao

For Example 8.1For Example 8.1

05326.0

2.1859

2964.2

xx

bl

ss

68.1005326.0

5698.00

bb s

bt 18220

2964.218

92.94

2

)ˆ( 2

n

yys ii

p <0.001.

0H

Reject ---- the regression of the son’s height on the father’s height is statistically significant.

: =0, : ≠0

0H

0H

1H 0H

Page 16: Ch8 Regression Revby Rao

8.2.1.2 Analysis of variance : The contribution of the linear regression is 0

: The contribution of the linear regression is not 0

(1) Before regression, we can only use to estimate

(2) After regression, we can use to estimate

(3) The regression makes the sum of squared deviations decline

(4) To test The contribution of regression is 0, F-statistic is used

0H

1H

y xy| Y xy|

sidualTotalgression SSSSSS ReRe 1ReRe sidualTotalgression

Page 17: Ch8 Regression Revby Rao

For Example 8.1For Example 8.1Source SS DF MS F P

Regression 603.63 1 603.63 114.54 < 0.01

Residual 94.92 18 5.27

Total 698.55 19

Conclusion: the regression of the son’s height on the father’s height is statistically significant.

The slight difference between these two approaches :• t test could be used for both of one-side and two-side problems;• ANOVA for two-side only. However, the idea of ANOVA can easily be extended to the cases of nonlinear regression and multiple regression.

Page 18: Ch8 Regression Revby Rao

8.2.2 Determination coefficient

For Example 8.1For Example 8.1

63.603Re gressionSS 55.698TotalSS

8641.0

55.698

63.603Re Total

gression

SS

SS 8641.09296.0 22 r

Determination coefficient: Contribution of regression by %

Total

gression

SS

SSR Re2 10 2 R

•It reflects that the percentage of the total sum of squared deviations can be explained by the regression.• If both of X and Y are random variables,

tcoefficien ncorrelatio of square2 R

Page 19: Ch8 Regression Revby Rao

In practice, it is suggested to report the value of In practice, it is suggested to report the value of determination coefficient after an analysis of determination coefficient after an analysis of regression to describe how good the regression regression to describe how good the regression is. is. Here is a story:

: An index of liver function: A score for psychological status

Regression is statistically significant, Claimed: “the index for liver function can be improved

by psychological consultation” It is wrong?

Why?

X Y

2.0r

01.0b

Page 20: Ch8 Regression Revby Rao

8.3 The Application of Linear Regression

8.3.1 Two interval estimations

8.3.1.1 Confidence interval for

8.3.1.2 Prediction interval for Y

xy|

2

20

,0 )(

)(1ˆxx

xx

nstY

i

2

20

,0 )(

)(11ˆ

xx

xx

nstY

i

Page 21: Ch8 Regression Revby Rao

8.3.3 On the basic assumptions 8.3.3 On the basic assumptions ---- ---- LINE LINE

(1) Linear : There exists a linear tendency between the dependent variable and the independent variable

(2) Independent : The individual observations are independent each other

(3) Normal : Given the value of, the corresponding follows a normal distribution

(4) Equal variances : The variances of for different values of are all equal, denoted with .

Page 22: Ch8 Regression Revby Rao

In practice, one may use scatter diagram to observe whether the basic assumptions are met.

The assumption of linearity is essential that using a linear model to describe a curvilinear relationship is obviously inappropriate;

The assumption of independency is also essential; The violation to the assumptions of normal

distribution and equal variance might not seriously affect the least square estimates though all the introduced formulas for statistical inference might not valid.

Once the assumptions (1), (3) and (4) are violated, some transformations are worthwhile to try.

Page 23: Ch8 Regression Revby Rao

SummarySummary Regression and Correlation Regression and Correlation

1. Distinguish and connection Distinguish: Correlation: Both X and Y are random Regression: Y must be random X could be random or not random

Page 24: Ch8 Regression Revby Rao

Connection: When both X and Y are random

1) Same sign for correlation coefficient

and regression coefficient

2) t tests are equivalent

tr = tb

3) Determination Coefficient

Total

Regressiont Coefficien ionDeterminatSS

SS

2tCoefficien ionDeterminat r

Page 25: Ch8 Regression Revby Rao

2. Caution --

for regression and correlation

1) Don’t put any two variables together for correlation and regression – They must have some relation in subject matter;

2) Correlation and regression do not necessary mean causality

---- sometimes may be indirect relation or even no any real relation;

Page 26: Ch8 Regression Revby Rao

3) A big value of r does not necessary mean a big regression coefficient b;

4) To reject does not necessary mean that the correlation is strong, only but ;

5) A regression equation is statistically significant does not necessary mean that one can well predict Y by X, only but ; well predict or not depends on coefficient of determination;

6) Scatter diagram is useful before working with linear correlation and linear regression;

7) The regression equation is not allowed to be applied beyond the range of the data set.

0:0 H0

0

Page 27: Ch8 Regression Revby Rao