Lecture 2 Regression relationships

23
RDP Statistical Methods in Scientific Research - Lectu re 2 1 Lecture 2 Regression relationships 2.1 The influence of actual widths of the anorexics 2.2 Testing the importance of each influence 2.3 Comments on the anorexia study

description

Lecture 2 Regression relationships. 2.1 The influence of actual widths of the anorexics 2.2 Testing the importance of each influence 2.3 Comments on the anorexia study. 2.1 The influence of actual widths of the anorexics. Scatter plot. Observations  BPI decreases with actual width - PowerPoint PPT Presentation

Transcript of Lecture 2 Regression relationships

Page 1: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 1

Lecture 2

Regression relationships

2.1 The influence of actual widths of the anorexics

2.2 Testing the importance of each influence

2.3 Comments on the anorexia study

Page 2: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 2

2.1 The influence of actual widths of the anorexics

Anorexics Controls

BPI Actual width BPI Actual width

130 22.5 202 18.2

194 19.2 140 24.2

160 19.3 168 16.0

120 23.3 160 21.3

152 21.3 147 21.3

144 22.8 133 24.9

120 28.2 229 17.2

141 21.9 172 19.9

130 22.0

206 19.2

153 22.1

Page 3: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 3

Scatter plot

Page 4: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 4

Observations

BPI decreases with actual width

The controls have smaller waists than the anorexics!

Actual width appears to be a stronger determinant of BPI than anorexic status

Page 5: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 5

Five models for the data

1 INTERCEPT Neither anorexia nor

actual width affect BPI

2 INTERCEPT + GROUP Anorexia affects BPI,

but actual width does not

Page 6: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 6

3 INTERCEPT + AW Actual width affects BPI,

but anorexia does not

4 INTERCEPT + GROUP + AW

Anorexia and actual width affect BPI additively

Page 7: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 7

5 INTERCEPT + GROUP + AW + INTERACTION

Anorexia and actual width affect BPI non-additively

Which model fits the data best?

How can we judge?

How should we play off goodness-of-fit against complexity?

Page 8: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 8

Residuals

The residuals are the vertical distances between the observedpoints and the fitted models:

residual = BPIobserved – BPIfitted

For example, for Model 4 we have:

Page 9: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 9

Page 10: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 10

Showing only the residuals, we have:

Page 11: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 11

Moving them all down to 0 gives:

The goodness-of-fit of a models is assessed in terms of theresidual sum of squares, RSS, (the smaller, the better):

2RSS residual

Page 12: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 12

Model fits

degrees-of-freedom (df) = n # parameters

Goodness-of-fit improves as terms are added into the model,although model complexity (number of parameters) increases(which is a bad thing)

Anorexics Controls

Model intercept slope intercept slope RSS df

1 157.9 0 157.9 0 16952.95 18

2 145.1 0 167.3 0 14681.06 17

3 338.5 8.47 338.5 8.47 6368.05 17

4 324.2 8.03 332.4 8.03 6087.24 16

5 296.2 6.77 346.0 8.93 5936.12 15

Page 13: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 13

Interaction

We start with the most complex model (Model 5), and see whether it can be simplified

That is, we test H0: there is no aw group interaction (Model 4 is valid)

If the observations are normally distributed, then if Model 4 is true, then Fint follows an F-distribution with (1, 15) degrees-of-freedom: that is Fint ~ F1,15, where

4 5 4 5int

5 5

(RSS RSS ) /(df df )F

RSS / df

2.2 Testing the importance of each influence

Page 14: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 14

Interaction

Large values of Fint indicate that H0 is false

Here, we have

This value is too small to suggest that interaction is important

The p-value is p = P(F 0.38) where F ~ F1,15, and

p = 0.5459

int

(6087.24 5936.12) /(16 15)F 0.38

5936.12 /15

Page 15: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 15

Actual width

Take the model in which BPI depends on actual width only (Model 3), and see whether the effect of actual width is necessary

That is, we test H0: actual width does not effect BPI, which means that Model 1 is valid

If the observations are normally distributed, then if Model 1 is true, then Faw ~ F1,17, where

1 3 1 3group

3 3

(RSS RSS ) /(df df )F

RSS / df

Page 16: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 16

Actual width

We have

This value is too large to come from the F1,17 distribution

The p-value is p = P(F 28.26) where F ~ F1,17, and

p < 0.0001

H0: actual width does not effect BPI is rejected

group

(16952.95 6368.05) /(18 17)F 28.26

6368.05/17

Page 17: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 17

Group

Accepting that actual width is needed in the model, now take Model 4, and see whether it can be simplified by removing the effect of anorexia

That is, we test H0: anorexia does not effect BPI (once aw is allowed for), which means that Model 3 is valid

If the observations are normally distributed, then if Model 3 is true, then Fgroupaw ~ F1,16, where

3 4 3 4group aw

4 4

(RSS RSS ) /(df df )F

RSS / df

Page 18: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 18

Group

We have

This value is too small to suggest that group is important

The p-value is p = P(F 0.74) where F ~ F1,16, and

p = 0.4030

group aw

(6368.05 6087.24) /(17 16)F 0.74

6087.24 /16

Page 19: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 19

Final model

This is Model 3, which states that BPI has

mean = 8.47 + 338.5 aw

standard deviation = 395.74 = 19.89

and that being anorexic has no significant effect on bodyperception index

Page 20: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 20

Order of fitting is important

Test interaction first: if this is significant, then the two main effects should not be tested: Model 5 is needed to describe the data

Then determine whether actual width is needed in the model

As actual width is needed, test the effect of group (the factor that is of interest), by comparing Model 3 with Model 4

If actual width were not needed, test the effect of group by

comparing Model 1 with Model 2

Page 21: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 21

Order of fitting is important

To compare Model 1 with Model 2, find

which is

The p-value is p = P(F 2.63) where F ~ F1,16, and

p = 0.1232

1 2 1 2group

2 2

(RSS RSS ) /(df df )F

RSS / df

group

(16952.95 14681.06) /(18 17)F 2.631

14681.06 /17

Page 22: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 22

Order of fitting is important

The t-statistic for testing the effect of anorexia shown on Slide 1.11 was equal to 1.622

The square of 1.622 is 2.631, which is equal to Fgroup

This is no coincidence: these two tests are in fact identical

BUT, in this case, due to the important influence of actualwidth, any analysis that fails to account for aw is invalid

Page 23: Lecture 2 Regression relationships

RDP Statistical Methods in Scientific Research - Lecture 2 23

Choice of subjects

The anorexics were consecutive unmarried female patients at St George’s Hospital, London

The controls were volunteer fifth form pupils from Putney Girls’ High School, with normal dietary habits

Ages: Anorexics mean = 19.7, sd = 3.6

Controls mean = 15.4, sd = 0.5

This was not a suitable control group for this study

2.3 Comments on the anorexia study