2DS00 Statistics 1 for Chemical Engineering Lecture 3.

2DS00

Statistics 1 for Chemical

Engineering

Lecture 3

Week schedule

Week 1: Measurement and statistics

Week 2: Error propagation

Week 3: Simple linear regression analysis

Week 4: Multiple linear regression analysis

Week 5: Nonlinear regression analysis

Detailed contents of week 3

• Least Squares Method

• simple linear regression

– parameter estimates

– residuals

– confidence intervals

– significance test

– influential points

– lack-of-fit

Least Squares

• measurements of time and distance

• estimate speed (assuming constant

speed)

Tijd(sec)

Gemetenafstand

Berekendeafstand

Gemeten –Berekendeafstand

Kwadraat

1 36.754 21 15.754 248.19

2 71.845 32 39.845 1587.62

3 60.479 43 17.479 305.52

4 101.149 54 47.149 2223.03

5 103.150 65 38.15 1455.42

6 111.148 76 35.148 1235.38

7 142.170 87 55.17 3043.73

8 157.334 98 59.334 3520.52

9 161.843 109 52.843 2792.38

10 206.030 120 86.03 7401.16

Kwadratensom 23812.96

Table of measurements and squares

Visualisation of sums of squares

8. 56

13. 56

18. 56

23. 56

Par amet er v

- 10

10

30

50

Par amet er S0

Kwadr at ensom

1117

5733

10349

14965

Types of regression analysis

Linear means linear in coefficients, not linear functions!

•Simple linear regression

•Multiple linear regression

• Non-linear regression

0 1Y x

0 1 1 2 2 ...Y x x

21Y C

Surface tension nitrobenzene

• measurements of temperature and surface tension

• temperature ranges from 40 to 200 oC

• scatter plot indicates linear relation

Regression analysis of nitrobenzene example

Confidence intervals

• parameter estimates: estimate +/- t14-2;0,025 standard error

• predicted values (extrapolation is dangerous, most accurate

predictions at mean of independent variable)

----------------------------------------------------------------------------- Standard TParameter Estimate Error Statistic P-Value-----------------------------------------------------------------------------Intercept 45,9629 0,164407 279,568 0,0000Slope -0,113016 0,0014057 -80,3984 0,0000-----------------------------------------------------------------------------

Extrapolation

Plot of Fitted Model

TempC

OppS

pan

100 150 200 250 30012

16

20

24

28

32

36

Significance testing----------------------------------------------------------------------------- Standard TParameter Estimate Error Statistic P-Value-----------------------------------------------------------------------------Intercept 45,9629 0,164407 279,568 0,0000Slope -0,113016 0,0014057 -80,3984 0,0000-----------------------------------------------------------------------------

Analysis of Variance-----------------------------------------------------------------------------Source Sum of Squares Df Mean Square F-Ratio P-Value-----------------------------------------------------------------------------Model 374,924 1 374,924 6463,91 0,0000Residual 0,696032 12 0,0580027-----------------------------------------------------------------------------Total (Corr.) 375,62 13

Model: Yi = 0 + 1x1 + i

ssumptions:

• the model is linear (+ enough terms)

• the i's are normally distributed with =0 and constant

variance 2

• the i's are independent.

Simple Linear regression: model assumptions

Normality checking + independence

• check normality by considering residuals

• apply both graphical checks and Shapiro-Wilks

• check independence by using the Durbin – Watson test

• also check residuals by plotting them against time

Residuals

• use studentized residuals in order to obtain universal scale

e versus homogeneity of

variance

e versus linearity

e versus time independence of

errors

e versus xi homogeneity of

variance

Y

Y

Lack-of-fit test

• if multiple measurements are available, then we may test whether

model may be improved significantly

• test is based on two different ways of computing standard deviation

• note difference with testing of model is significant

Influential points

regression lines tend to go to remote points: see

http://www.stat.sc.edu/~west/javahtml/Regression.html

Y

X

****

*

*

*

*Invloedrijk punt

Check-list

1. apply regression analysis

2. check whether regression is signficant. If applicable, apply lack-of-

fit test

3. study residual plots for constant variance

4. check for outliers

5. check normality of residuals (graphical checks, Shapiro-Wilks)

6. check independence of residuals (residual plots, Durbin – Watson)

7. check for influential points

Causality and regression

Significant regression results do not imply causal relation !

Statistical results must be explained (afterwards) by chemical theory.

2DS00 Statistics 1 for Chemical Engineering Lecture 3.

Documents

Transcript of 2DS00 Statistics 1 for Chemical Engineering Lecture 3.