CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount)....

51
CIVL 7012/8012 Simple Linear Regression Lecture 2

Transcript of CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount)....

Page 1: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

CIVL 7012/8012

Simple Linear Regression

Lecture 2

Page 2: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Correlation

β€’ Correlation is the degree to which two continuous variables are

linearly associated.

β€’ This is most often represented by a scatterplot and the Pearson

correlation coefficient, denote by (π‘Ÿ).

β€’ The scatterplot provides a visual as to how the two continuous

variable are correlated.

β€’ The coefficient is a measure of the linear association between the

two variables.

Page 3: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Correlation

β€’ If there is no correlation between the two variables, the points will

form a horizontal or vertical line or complete randomness (no obvious

patterns).

β€’ Note that it does not matter which variable is on x-axis and which is

on the y-axis.

β€’ The pattern the two variables form determines the strength and

direction of their correlation.

Page 4: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Correlation

β€’ The stronger the correlation, the more

linearly distinct the pattern will be.

β€’ The coefficient is between -1 and 1.

+1 indicates a perfect positive correlation

-1 indicates a perfect negative correlation

0 indicates no correlation

β€’ No strict rules for interpretation, however,

as a guideline, it is suggested:

0 < |π‘Ÿ| < 0.3: weak correlation

0.3 < |π‘Ÿ| < 0.7: moderate correlation

|π‘Ÿ| > 0.7: strong correlation

Page 5: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Correlation

Snapshot from Multivariate Lecture 6

πœŒπ‘‹π‘Œ is the correlation notation for the entire population.

Pearson correlation coefficient (π‘Ÿ) is for our sample representing

the population.

π‘Ÿ = π‘₯𝑖 βˆ’ π‘₯ 𝑦𝑖 βˆ’ 𝑦

π‘₯𝑖 βˆ’ π‘₯ 2 𝑦𝑖 βˆ’ 𝑦 2

Page 6: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Correlation calculation

Meal

Bill ($)

Tip ($)

Bill deviations

Tip deviations

Deviations products

Bill deviations squared

Tip deviations squared

π‘₯ 𝑦 π‘₯𝑖 βˆ’ π‘₯ 𝑦𝑖 βˆ’ 𝑦 (π‘₯𝑖 βˆ’ π‘₯ )(𝑦𝑖 βˆ’ 𝑦 ) π‘₯𝑖 βˆ’ π‘₯ 2 𝑦𝑖 βˆ’ 𝑦 2

1 35 6 -37.5 -4 150 1406.25 16

2 110 18 37.5 8 300 1406.25 64

3 66 11 -6.5 1 -6.5 42.25 1

4 75 7 2.5 -3 -7.5 6.25 9

5 100 14 27.5 4 110 756.25 16

6 49 4 -23.5 -6 141 552.25 36

687 4169.5 142

π‘Ÿ = π‘₯𝑖 βˆ’ π‘₯ 𝑦𝑖 βˆ’ 𝑦

π‘₯𝑖 βˆ’ π‘₯ 2 𝑦𝑖 βˆ’ 𝑦 2=

687

(4169.5)(142) = 0.892

Page 7: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Correlation significance test (t-test)

β€’ Is it statistically significant?

β€’ Conduct a t-test

β€’ 𝐻0: 𝜌 = 0 𝑣𝑠. 𝐻1: 𝜌 β‰  0 π‘Žπ‘‘ 𝛼 = 0.05

β€’ 𝑑 = π‘Ÿπ‘›βˆ’2

1βˆ’π‘Ÿ2, df=n-2

β€’ 𝑑 = 0.8926βˆ’2

1βˆ’0.8922= 3.947

π‘Ÿ = 0.892

Page 8: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Correlation significance test (t-test)

β€’ 𝐻0: 𝜌 = 0 𝑣𝑠. 𝐻1: 𝜌 β‰  0 π‘Žπ‘‘ 𝛼 = 0.05

β€’ 𝑑 = π‘Ÿπ‘›βˆ’2

1βˆ’π‘Ÿ2, df=n-2

β€’ 𝑑 = 0.8926βˆ’2

1βˆ’0.8922= 3.947

β€’ π‘‘π‘π‘Žπ‘™π‘ > π‘‘π‘π‘Ÿπ‘–π‘‘. βˆ’βˆ’β†’ π‘Ÿπ‘’π‘—π‘’π‘π‘‘ 𝑛𝑒𝑙𝑙

Page 9: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

SLR Lecture 1 Recap

Page 10: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap - Quick Review

β€’ SLR is a comparison of 2 models:

β€’ One is where the independent variable does not exist

β€’ And the other uses the best-fit regression line

β€’ If there is only one variable, the best prediction for other

values is the mean of the dependent variable.

β€’ The distance between the best-fit line and the observed

value is called residual (or error).

β€’ The residuals are squared and added together to

generate sum of squares residuals/error (SSE).

β€’ SLR is designed to find the best fitting line through the

data that minimizes the SSE.

Page 11: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap - Example

0

2

4

6

8

10

12

14

16

18

20

0 1 2 3 4 5 6 7

Tip

($

)

Meal #

Tips for service ($)

𝑦 =10

Best-fit line

Meal # Tip ($)

1 6

2 18

3 11

4 7

5 14

6 4

Page 12: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

0

2

4

6

8

10

12

14

16

18

20

0 1 2 3 4 5 6 7

Tips for service ($)

16 1

16

64

9 36

Recap - Residuals (Errors)

+8

+1

βˆ’3

+4

βˆ’6 Squared Residuals (Errors)

# Residual Residual2

1 βˆ’4 16

2 +8 64

3 +1 1

4 βˆ’3 9

5 +4 16

6 βˆ’6 36

Sum of squared errors (SSE)

= 142

π‘Ήπ’†π’”π’Šπ’…π’–π’‚π’π’”πŸ = πŸπŸ’πŸ

βˆ’4

Page 13: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap – Population vs. Sample Eq.

β€’ If we knew our β€œpopulation” parameters, 𝛽0, 𝛽1, then we could use the SLR eq. as is.

β€’ In reality, we almost never have the population parameters. Therefore we have to estimate them using sample data. With sample data, SLR eq. changes a bit.

β€’ Where 𝑦 β€œy-hat” is the point estimator of 𝐸 𝑦 .

β€’ Or, 𝑦 is the mean value of 𝑦 for a given π‘₯.

𝐸 𝑦 = 𝛽0 + 𝛽1π‘₯

𝑦 = 𝑏0 + 𝑏1π‘₯

Page 14: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap – OLS criterion

𝑦𝑖 = observed value of dependent variable (tip amount).

𝑦 𝑖 =estimated (predicted) value of the dependent variable

(predicted tip amount based on regression model).

min 𝑦𝑖 βˆ’ 𝑦 𝑖2

0

5

10

15

20

0 50 100 150

observed

predicted

Page 15: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap - SLR parameter equations

𝑦 𝑖 = 𝑏0 + 𝑏1π‘₯

𝑏1 = π‘₯𝑖 βˆ’ π‘₯ 𝑦𝑖 βˆ’ 𝑦

π‘₯𝑖 βˆ’ π‘₯ 2

slope

π‘₯ = mean of the independent variable ($

bill)

𝑦 = mean of the dependent variable ($ tip)

π‘₯𝑖 = value of the independent variable

𝑦𝑖 = value of the dependent variable

𝑏0 = 𝑦 βˆ’ 𝑏1π‘₯

intercept

Page 16: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap - OLS Calculations

Meal Bill ($) Tip ($) Bill deviations

(𝑆π‘₯) Tip deviations Deviations products

Bill deviations squared 𝑆π‘₯

2

π‘₯ 𝑦 π‘₯𝑖 βˆ’ π‘₯ 𝑦𝑖 βˆ’ 𝑦 (π‘₯𝑖 βˆ’ π‘₯ )(𝑦𝑖 βˆ’ 𝑦 ) π‘₯𝑖 βˆ’ π‘₯ 2

1 35 6 -37.5 -4 150 1406.25

2 110 18 37.5 8 300 1406.25

3 66 11 -6.5 1 -6.5 42.25

4 75 7 2.5 -3 -7.5 6.25

5 100 14 27.5 4 110 756.25

6 49 4 -23.5 -6 141 552.25

π‘₯ = 72.5 𝑦 = 10 687 4169.5

Page 17: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap - OLS Calculations

Deviations products Bill deviations squared

(π’™π’Š βˆ’ 𝒙 )(π’šπ’Š βˆ’ π’š ) π’™π’Š βˆ’ 𝒙 𝟐

150 1406.25

300 1406.25

-6.5 42.25

-7.5 6.25

110 756.25

141 552.25

πŸ”πŸ–πŸ• πŸ’πŸπŸ”πŸ—. πŸ“

π’ƒπŸ = π’™π’Š βˆ’ 𝒙 π’šπ’Š βˆ’ π’š

π’™π’Š βˆ’ 𝒙 𝟐

π’ƒπŸ =πŸ”πŸ–πŸ•

πŸ’πŸπŸ”πŸ—. πŸ“

π’ƒπŸ = 𝟎. πŸπŸ”πŸ’πŸ–

Page 18: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap - OLS Calculations

π’ƒπŸŽ = 𝟏𝟎 βˆ’ 𝟎. πŸπŸ”πŸ’πŸ–(πŸ•πŸ. πŸ“)

π’ƒπŸ = 𝟎. πŸπŸ”πŸ’πŸ–

π’ƒπŸŽ = π’š + π’ƒπŸπ’™

Bill ($) Tip ($)

𝒙 π’š

35 6

110 18

66 11

75 7

100 14

49 4

π‘₯ = 72.5 𝑦 = 10

π’ƒπŸŽ = 𝟏𝟎 βˆ’ 𝟏𝟏. πŸ—πŸ’πŸ“πŸ•

π’ƒπŸŽ = βˆ’πŸ. πŸ—πŸ’πŸ“πŸ•

Page 19: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap – New Best-Fit Line & Parameters

𝑦 𝑖 = 𝑏0 + 𝑏1π‘₯

𝑦 𝑖 = βˆ’1.9457 +0.1648π‘₯

𝑏0 = βˆ’1.9457

intercept

𝑏1 = 0.1648

slope

𝑦 𝑖 = 0.1648π‘₯ βˆ’ 1.9457

OR

Page 20: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap - Final SLR line

0

2

4

6

8

10

12

14

16

18

20

0 20 40 60 80 100 120

Tip

($

)

Bill ($)

Bill vs. Tip Amount ($)

π’š Μ‚_π’Š =βˆ’πŸ.πŸ—πŸ’πŸ“πŸ• +𝟎.πŸπŸ”πŸ’πŸ–π’™

π’ƒπŸŽ=βˆ’πŸ.πŸ—πŸ’πŸ“πŸ•

𝒔𝒍𝒐𝒑𝒆 π’ƒπŸ = 𝟎. πŸπŸ”πŸ’πŸ–

Page 21: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Recap - SLR Model Interpretation

𝑦 𝑖 = βˆ’1.9457 +0.1648π‘₯

For every $1 the bill amount (π‘₯) increases, we would expect the tip

amount to also increase by $0.1648 or

about 16 cents (positive coefficient).

If the bill amount (π‘₯) is zero, then the

expected/predicted tip amount is $-

1.9457 or negative $1.95!

Does this make any sense? NO In real

world problems, the intercept may or

may not make sense.

Page 22: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

SLR – Lecture 2

Page 23: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

0

2

4

6

8

10

12

14

16

18

20

0 50 100 150

Bills vs Tips ($)

0

5

10

15

20

0 1 2 3 4 5 6 7

Tips ($)

Model fit and Coefficient of Determination

𝑺𝑺𝑬 = πŸπŸ’πŸ

𝑺𝑺𝑬 = 𝑺𝑺𝑻

With only the DV, the only sum

of squares is due to error.

Therefore, it is also the total,

and MAX sum of squares for

this data sample. 𝑺𝑺𝑻 = πŸπŸ’πŸ

With both the IV and DV, SST

remains the same. But the SSE

is reduced significantly. The

difference between the SSE

and SST is due to regression

(SSR).

𝑺𝑺𝑻 = πŸπŸ’πŸ

𝑺𝑺𝑬 = ?

𝑺𝑺𝑻 βˆ’ 𝑺𝑺𝑬 = 𝑺𝑺𝑹

Page 24: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Estimate regression values

Meal Bill ($) Tip ($) π’š π’Š = βˆ’πŸ. πŸ—πŸ’πŸ“πŸ• +𝟎. πŸπŸ”πŸ’πŸ–π’™ π’š π’Š (predicted tip $)

π‘₯𝑖 𝑦𝑖

1 35 6 𝑦 𝑖 = βˆ’1.9457 +0.1648(35) 3.8212

2 110 18 𝑦 𝑖 = βˆ’1.9457 +0.1648(110) 16.1788

3 66 11 𝑦 𝑖 = βˆ’1.9457 +0.1648(66) 8.9290

4 75 7 𝑦 𝑖 = βˆ’1.9457 +0.1648(75) 10.4119

5 100 14 𝑦 𝑖 = βˆ’1.9457 +0.1648(100) 14.5311

6 49 4 𝑦 𝑖 = βˆ’1.9457 +0.1648(49) 6.1280

π‘₯ = 72.5 𝑦 = 10

min 𝑦𝑖 βˆ’ 𝑦 𝑖2

Page 25: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Regression errors (residuals)

Meal Bill ($) Tip ($) π’š π’Š (predicted tip $) Error (π’š βˆ’ π’š π’Š)

π‘₯ 𝑦 (observed-predicted)

1 35 6 3.8212 6 βˆ’ 3.8212 = 2.1788

2 110 18 16.1788 18 βˆ’ 16.1788 = 1.8212

3 66 11 8.9290 11 βˆ’ 8.9290 = 2.0710

4 75 7 10.4119 7 βˆ’ 10.4119 = -3.4119

5 100 14 14.5311 14 βˆ’ 14.5311 = -0.5311

6 49 4 6.1280 4 βˆ’ 6.1280 = -2.1280

π‘₯ = 72.5 𝑦 = 10

Page 26: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Meal Bill ($) Tip ($) π’š π’Š (predicted tip $) Error (π’š βˆ’ π’š π’Š) (π’š βˆ’ π’š π’Š)𝟐

π‘₯ 𝑦

1 35 6 3.8212 2.1788 4.7472

2 110 18 16.1788 1.8212 3.3168

3 66 11 8.9290 2.0710 4.2890

4 75 7 10.4119 -3.4119 11.6412

5 100 14 14.5311 -0.5311 0.2821

6 49 4 6.1280 -2.1280 4.5282

Regression errors (residuals) - SSE

π‘₯ = 72.5 𝑦 = 10 𝑆𝑆𝐸 = 28.8044

Page 27: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

SSE comparison

Sum of squared error (SSE) Comparison

D.V. (tip $) ONLY

+ + + + + = SSE = 28.8044

16 1 16 64 9 36 + + + + + = SSE = 142

D.V. & I.V (tip $ as a function of bill $)

Page 28: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Comparison of two lines

β€’ When we conducted the regression, the SSE decreased

from 142 to 28.8044.

β€’ 28.8044 was explained by (allocated to) ERROR.

β€’ What happen to the difference (113.1956)?

β€’ 113.1956 is the sum of squares due to REGRESSION

(SSR).

β€’ 𝑆𝑆𝑇 = 𝑆𝑆𝑅 + 𝑆𝑆𝐸

β€’ In this case:

142 = 113.1956 + 28.8044

Page 29: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

0

2

4

6

8

10

12

14

16

18

20

0 50 100 150

Bills vs Tips ($)

0

5

10

15

20

0 1 2 3 4 5 6 7

Tips ($)

Comparison of two lines

𝑺𝑺𝑬 = πŸπŸ’πŸ

𝑺𝑺𝑬 = 𝑺𝑺𝑻

𝑺𝑺𝑻 = πŸπŸ’πŸ

𝑺𝑺𝑻 = πŸπŸ’πŸ

𝑺𝑺𝑬 = πŸπŸ–. πŸ–πŸŽπŸ’πŸ’

𝑺𝑺𝑻 βˆ’ 𝑺𝑺𝑬 = 𝑺𝑺𝑹 = πŸπŸπŸ‘. πŸπŸ—πŸ“πŸ”

Page 30: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Coefficient of Determination (π‘Ÿ2)

β€’ How well does the estimated regression equation fit our

data?

β€’ This is where regression starts to look a lot like ANOVA,

where the SST is partitioned into SSE & SSR.

β€’ The larger the SSR the smaller the SSE.

β€’ The Coefficient of Determination quantifies this ratio as a

percentage (%).

SSE

SST

SSR

πΆπ‘œπ‘’π‘“π‘“π‘–π‘π‘–π‘’π‘›π‘‘ π‘œπ‘“ π·π‘’π‘‘π‘’π‘Ÿπ‘šπ‘–π‘›π‘Žπ‘‘π‘–π‘œπ‘› = π‘Ÿ2 =𝑆𝑆𝑅

𝑆𝑆𝑇

Page 31: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Coefficient of Determination (π‘Ÿ2)

β€’ How well does the estimated regression equation fit our

data?

β€’ This is where regression starts to look a lot like ANOVA,

where the SST is partitioned into SSE & SSR.

β€’ The larger the SSR the smaller the SSE.

β€’ The Coefficient of Determination quantifies this ratio as a

percentage (%).

SSE

SST

SSR

ANOVA

df SS MS F Significance F

Regression 1 113.1956 113.1956 15.7192 0.016611541

Residual 4 28.80441 7.201103

Total 5 142

Page 32: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

π‘Ÿ2 Interpretation

β€’ πΆπ‘œπ‘’π‘“π‘“π‘–π‘π‘–π‘’π‘›π‘‘ π‘œπ‘“ π·π‘’π‘‘π‘’π‘Ÿπ‘šπ‘–π‘›π‘Žπ‘‘π‘–π‘œπ‘› = π‘Ÿ2 =𝑆𝑆𝑅

𝑆𝑆𝑇

β€’ πΆπ‘œπ‘’π‘“π‘“π‘–π‘π‘–π‘’π‘›π‘‘ π‘œπ‘“ π·π‘’π‘‘π‘’π‘Ÿπ‘šπ‘–π‘›π‘Žπ‘‘π‘–π‘œπ‘› = π‘Ÿ2 =113.1956

142

β€’ πΆπ‘œπ‘’π‘“π‘“π‘–π‘π‘–π‘’π‘›π‘‘ π‘œπ‘“ π·π‘’π‘‘π‘’π‘Ÿπ‘šπ‘–π‘›π‘Žπ‘‘π‘–π‘œπ‘› = π‘Ÿ2 = 0.7972 π‘œπ‘Ÿ 79.72%

β€’ We can conclude that 79.72% of the total sum of squares

can be explained using the estimates from the regression

equation to predict the tip amount. And that the remainder

(20.28%) is error.

β€’ This is a β€œGood fit”!

Page 33: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

0

2

4

6

8

10

12

14

16

18

20

30 40 50 60 70 80 90 100 110

Tip

($

)

Bill ($)

3 squared differences

π’š π’Š = βˆ’πŸ. πŸ—πŸ’πŸ“πŸ• +𝟎. πŸπŸ”πŸ’πŸ–π’™

Bills vs. Tips ($)

π’š = 𝟏𝟎

SSE= (𝑦𝑖 βˆ’ 𝑦 𝑖)2

SST= (𝑦𝑖 βˆ’ 𝑦 )2

SSR= (𝑦 𝑖 βˆ’ 𝑦 )2

Page 34: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Model fit

𝑦 𝑖 = βˆ’1.9457 +0.1648π‘₯

Questions:

β€’ Once a regression line is calculated, how much better is it than only

using the mean of the dependent variable line alone? (coefficient of

determination (π‘Ÿ2)

β€’ How confident are we in the significance of the relationship between x

and y? (t-test of slope)

Page 35: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Regression with Excel

β€’ Produce SLR model in Excel.

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.892834

R Square 0.797152

Adjusted R Square 0.74644

Standard Error 2.683487

Observations 6

ANOVA

df SS MS F Significance F

Regression 1 113.1956 113.1956 15.7192 0.016611541

Residual 4 28.80441 7.201103

Total 5 142

Coefficien

ts Standard

Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%

Intercept -1.94568 3.205964 -0.60689 0.576683 -10.84685887 6.955504991 -10.84685887 6.955504991

X Variable 1 0.164768 0.041558 3.964745 0.016612 0.049383684 0.280152232 0.049383684 0.280152232

Page 36: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Testing slope -1

β€’ Is the relationship between 𝑦 and π‘₯ significant?

β€’ Test the slope 𝛽1. (two-tailed t-test)

β€’ Remember 𝑏1is for our sample and 𝛽1 is for the population

β€’ We will use our sample slope 𝑏1 to test if the true slope of

the population 𝛽1 is significantly different than 0.

𝑦 𝑖 = βˆ’1.9457 +0.1648π‘₯

Page 37: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Testing slope -2

Steps to conduct a t-test on slope 𝛽1:

β€’ Step 1: Specify hypothesis:

β€’ 𝐻0: 𝛽1 = 0 𝑣𝑠. 𝐻1: 𝛽1 β‰  0 π‘Žπ‘‘ 𝛼 = 0.05

β€’ Step 2: Determine the test statistic:

𝑑 =𝑏1βˆ’π›½1

𝑆𝐸𝑏1

β€’ where 𝛽1 is true coefficient for all population

β€’ where 𝑆𝐸𝑏1 =π‘†π‘†πΈπ‘›βˆ’2

(π‘₯βˆ’π‘₯ )2

= standard error of the slope 𝑏1

Page 38: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Testing slope -3

β€’ Step 2 calculation:

β€’ 𝑆𝐸𝑏1 =π‘†π‘†πΈπ‘›βˆ’2

(π‘₯βˆ’π‘₯ )2

=28.8044(6βˆ’2)

4169.5

= 0.0416

β€’ 𝑑 =𝑏1βˆ’π›½1

𝑆𝐸𝑏1=

0.1648βˆ’0

0.0416= 3.9615

β€’ Step 3: Quantify the evidence of the test

β€’ Method 1: Critical value method

β€’ Compare calculated t to critical t

β€’ ±𝑑1βˆ’π›Ό

2,π‘›βˆ’2 = ±𝑑0.975,4

𝑦 𝑖 = βˆ’1.9457 +0.1648π‘₯

Page 39: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Testing slope -4

β€’ Step 3: Quantify the evidence of the test

β€’ Method 1: Critical value method

β€’ Compare calculated 𝑑 to critical 𝑑 (remember 𝛼 = 0.05)

β€’ ±𝑑1βˆ’π›Ό

2,π‘›βˆ’2 = ±𝑑0.975,4 = 2.776

Page 40: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Testing slope -5

β€’ Step 3: Method 1: Critical value method

β€’ Compare calculated 𝑑 to critical 𝑑 (remember 𝛼 = 0.05)

β€’ π‘‘π‘π‘Žπ‘™π‘π‘’π‘™π‘Žπ‘‘π‘’π‘‘ = 3.9615 > π‘‘π‘π‘Ÿπ‘–π‘‘π‘–π‘π‘Žπ‘™ = 2.776

β€’ T calc is in the critical region so Reject null hypothesis 𝐻0: 𝛽1 = 0

meaning that our 𝛽1 β‰  0 and we do have a statistically significant

relationship between π‘₯ and 𝑦. .

0.95

0.025 0.025

Page 41: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Testing slope -6

β€’ Step 3: Method 2: p-value method

β€’ Compare calculated/estimated 𝑝 value to desired significance

level. (remember 𝛼 = 0.05)

β€’ π‘π‘π‘Žπ‘™π‘π‘’π‘™π‘Žπ‘‘π‘’π‘‘/π‘’π‘ π‘‘π‘–π‘šπ‘Žπ‘‘π‘’π‘‘ = 2𝑝 𝑑 > π‘π‘œπ‘šπ‘π‘’π‘‘π‘’π‘‘ 𝑑 = 2𝑝(𝑑 > 3.9615) β‰ˆ

0.03

β€’ 𝑝 π‘£π‘Žπ‘™π‘’π‘’ π‘œπ‘“ 0.03 < 𝛼 = 0.05, therefore reject null hypothesis

𝐻0: 𝛽1 = 0 meaning that our 𝛽1 β‰  0 and we do have a statistically

significant relationship between π‘₯ and 𝑦. .

Page 42: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

SLR Example with R

β€’ Start R session

β€’ Import dataset β€œairquality” included in R base

β€’ Explore and plot data

β€’ Run a simple linear regression model with

β€œOzone” as a DV (𝑦)

β€œTemp” as an IV (π‘₯)

β€’ Follow in R session and model results are as follows:

Page 43: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

SLR Example with R

β€’ Dataset = airquality ----> 153 obs. of 6 variables

β€’ Start R session and follow instructions in code

β€’ Use simple linear regression to predict ozone levels β€œOzone” based on the

temperature β€œTemp”.

ID Ozone Solar.R Wind Temp Month Day

1 41 190 7.4 67 5 1

2 36 118 8 72 5 2

3 12 149 12.6 74 5 3

4 18 313 11.5 62 5 4

5 NA NA 14.3 56 5 5

6 28 NA 14.9 66 5 6

7 23 299 8.6 65 5 7

8 19 99 13.8 59 5 8

9 8 19 20.1 61 5 9

10 NA 194 8.6 69 5 10

Page 44: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Step 1: scatter plot

Ozone Temp

41 67

36 72

12 74

18 62

NA 56

28 66

23 65

19 59

8 61

NA 69

Page 45: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

STEP 3: CORRELATION (Ozone vs Temp)

β€’ What is the correlation coefficient (r) for Ozone vs. Temp? (see R session)

In this case, π‘Ÿ = .698

β€’ Is the relationship strong?

MODERATE! --------> RUN MODEL see R session

Page 46: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Model results (model m1)

β€’ 𝑦 = 𝛽0 + 𝛽1π‘₯

β€’ 𝛽0 = βˆ’146.996 (Intercept) 𝛽1 = +2.429 (Slope)

β€’ Regression line for this model ---> 𝑦 = βˆ’146.996 +2.429(π‘₯)

Page 47: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Results interpretation (model m1) -1

Residuals:

β€’ Residuals are the differences between the actual observed response values

(distance to Ozone levels in our case) and the response values that the

model predicted.

β€’ The β€œResiduals” section of the model output breaks it down into 5 summary

points to assess how well the model fit the data.

β€’ A good fit model will show symmetry from the min to max around the mean

value (0).

β€’ We do not have a very good symmetry here.

β€’ So, the model is predicting certain points that fall far away from the actual

observed points.

Page 48: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Results interpretation (model m1) -2

Model Coefficients:

β€’ 𝛽0 = βˆ’146.996 (𝑦 βˆ’ πΌπ‘›π‘‘π‘’π‘Ÿπ‘π‘’π‘π‘‘)

No interpretational meaning; but it is the Ozone level value when Temp = 0

β€’ 𝛽1 = +2.429 (π‘†π‘™π‘œπ‘π‘’)

For every 1 degree ℉ the temperature increases (π‘₯), it is expected that the

Ozone level to also increase by 2.429 units.

β€’ 𝑠𝑑𝑑. π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ = 0.2331

We can say that Ozone level/units can vary by 0.2331.

β€’ t-value for β€œTemp” = π‘π‘œπ‘’π‘“π‘“π‘–π‘π‘–π‘’π‘›π‘‘

𝑠𝑑𝑑. π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ =

2.429

0.233 = 10.418

t-value is significant Pr (> |𝑑|) = 2π‘’βˆ’16 ; which is significant at any level of

significance (you could say at 99.99% level of confidence or 0.001).

Page 49: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Results interpretation (model m1) -3

β€’ Residual Standard Error = 23.71 on 114 degrees of freedom

β€’ The Residual Standard Error is the average amount that the response

β€œOzone” will deviate from the true regression line.

β€’ In our example, the actual Ozone level can deviate from the true regression

line by approximately 23.71 units, on average.

β€’ Degrees of freedom are the actual number of data points (observations)

minus 2 (taking into account the parameters for the β€œintercept” and the

β€œOzone” variables).

So, we started the model with 153 data point in the β€œairquality” dataset

We removed 37 data points that were N/A’s

We are left with 116 data points

116 data points will lead to (116-2 parameters) = 114 DF

Page 50: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

Results interpretation (model m1) -4

β€’ 𝑅-squared = 0.4877 (𝑅2 = coefficient of determination)

𝑅2 varies from 0 π‘‘π‘œ 1; in this case, 48.77% of (𝑦) is explained by (π‘₯)

β€’ Adjusted 𝑅2 = 0.4832

Adjusted 𝑅2 accounts for how many independent variables entered the

model. Typically lower than 𝑅2 based on how much contribution

additional independent variables (π‘₯’𝑠)added to explaining (𝑦)

A sharp drop in the adjusted 𝑅2 versus 𝑅2 indicates a bad model.

𝑭-Test (F-value is used for measuring the overall model significance).

β€’ At the desired level of significance (say 95%), the statistical significance of

the 𝐹-test will show how good of a model this is.

β€’ In this model, the 𝐹-statistic = 108.5 on 1 variable with 114

β€’ The 𝐹-statistic level of significance is Pr (> 𝐹) = 2.2π‘’βˆ’16; that is the 𝐹-statistic

is significant at any reasonable level of significance (or you could say @

99.99%).

Page 51: CIVL 7012/8012 - Memphis Linear...𝑖= observed value of dependent variable (tip amount). 𝑖=estimated (predicted) value of the dependent variable (predicted tip amount based on

SLR – R code