Least Squares Regression Fitting a Line to Bivariate Data.

54
Least Squares Regression Fitting a Line to Bivariate Data

Transcript of Least Squares Regression Fitting a Line to Bivariate Data.

Page 1: Least Squares Regression Fitting a Line to Bivariate Data.

Least Squares Regression

Fitting a Line to Bivariate Data

Page 2: Least Squares Regression Fitting a Line to Bivariate Data.

Linear Relationships

Avg. occupants per car

1980: 6/car 1990: 3/car 2000: 1.5/car By the year 2010

every fourth car will have nobody in it!

Food for Thought Kind of

mathematical relationship between year and avg. no. of occupants per car?

Why might relation-

ship break down by 2010?

Page 3: Least Squares Regression Fitting a Line to Bivariate Data.

Basic Terminology Scatterplots, correlation: interested in

association between 2 variables (assign x and y arbitrarily)

Least squares regression: does one quantitative variable explain or cause changes in another variable?

Page 4: Least Squares Regression Fitting a Line to Bivariate Data.

Basic Terminology (cont.) Explanatory variable: explains or

causes changes in the other variable; the x variable. (independent variable)

Response variable: the y -variable; it responds to changes in the x - variable. (dependent variable)

Page 5: Least Squares Regression Fitting a Line to Bivariate Data.

Examples Fertilizer (x ) corn yield (y ) Advertising $ (x ) store income (y ) Drug dose (x ) blood pressure (y ) Daily temperature (x )

natural gas demand (y ) change in min wage(x)

unemployment rate (y)

Page 6: Least Squares Regression Fitting a Line to Bivariate Data.

Simplest Relationship Simplest equation that describes the

dependence of variable y on variable x

y = b0 + b1x linear equation graph is line with slope b1 and y-

intercept b0

Page 7: Least Squares Regression Fitting a Line to Bivariate Data.

Graph

y

x0

b0

y=b0 +b1x

run

riseSlope b=rise/run

Page 8: Least Squares Regression Fitting a Line to Bivariate Data.

Notation (x1, y1), (x2, y2), . . . , (xn, yn)

draw the line y= b0 + b1x through the scatterplot , the point on the line corresponding to xi is

0 1

0 1 i

i

ˆ ˆ; is the value of y predicted by the line

y when ;

is the observed value of when .

i i i

i

y b b x y

b b x x x

y y x x

Page 9: Least Squares Regression Fitting a Line to Bivariate Data.

Observed y, Predicted y

predicted y when x=2.7yhat = a + bx = a + b*2.7

2.7

Page 10: Least Squares Regression Fitting a Line to Bivariate Data.

Scatterplot: Fuel Consumption vs Car Weight Fuel Consumption vs Car Weight

2

3

4

5

6

7

1 2 3 4 5

Car Weight (1000 lbs)

Fu

el

con

sum

pti

on

(g

al/

100

mil

es)

Fuel consumption

“Best” line?

Page 11: Least Squares Regression Fitting a Line to Bivariate Data.

Scatterplot with least squares prediction line

FUEL CONSUMPTION vs CAR WEIGHT

y = 1.639x - 0.3631r2 = 0.9538

234567

1.5 2.5 3.5 4.5

WEIGHT (1000 lbs)

FU

EL

CO

NS

UM

P.

(gal

/100

mile

s)

Page 12: Least Squares Regression Fitting a Line to Bivariate Data.

How do we draw the line? Residuals

0 1

ˆ

( )

th

th

th

i i

i i

i

i

i y y

y y

y b b x

the residual is the vertical deviation of the

data point from the line :

residual = observed predicted

Page 13: Least Squares Regression Fitting a Line to Bivariate Data.

Residuals: graphically

Graphical Display of Residuals

XXi

Yi ei=Yi - Yi

Yi

positive residual

negative residual

Page 14: Least Squares Regression Fitting a Line to Bivariate Data.

Criterion for choosing what line to draw: method of least

squares The method of least squares chooses

the line that makes the sum of squares of the residuals as small as possible

This line has slope b1 and intercept b0 that minimizes

20 1

1

[ ( )]

( , )

n

i ii

i i

y b b x

x y

for the given observations

Page 15: Least Squares Regression Fitting a Line to Bivariate Data.

Least Squares Line y = b0 + b1x: Slope b1 and Intercept b0

1

0

2

11 2

2

11 2

1

( )is the standard deviation of , ,...,

1

( )is the standard deviation of , ,...,

1

( )( )

(

y

x

n

ii

x n

n

ii

y n

n

i ii

s

s

b

x xs x x x

n

y ys y y y

n

x x y yr

b r

y bx

1 1 2 2 n n(x ,y ),(x ,y ), ,(x ,y )

slope

y intercept

where

20 1

1 1 1

is the correlation between and1) x y

n n n

i i i ii i i

x yn s s

SSE y b y b x y

Page 16: Least Squares Regression Fitting a Line to Bivariate Data.

Example: Income vs Consumption Expenditure

Income (x)ConsumptionExpenditure (y)

1 75 69 9

13 817 10

Page 17: Least Squares Regression Fitting a Line to Bivariate Data.

Questions

Construct scatterplot; determine if linear model is appropriate. If so …

… find the least squares prediction line Estimate consumption expenditure in a

household with an income of (i) $6,000 (ii) $25,000. Comfortable with estimates?

Compute the residuals

Page 18: Least Squares Regression Fitting a Line to Bivariate Data.

Scatterplot

Consumption Expenditure

5

6

7

8

9

10

11

0 5 10 15 20

Household Income ($1,000's)

Exp

end

itu

re (

$1,0

00's

)

Page 19: Least Squares Regression Fitting a Line to Bivariate Data.

SolutionInc. x Exp. y xi-xbar (xi-xbar)2 yi-ybar (yi-ybar)2 (xi-xbar)

(yi-ybar) 1 7 -8 64 -1 1 8

5 6 -4 16 -2 4 8

9 9 0 0 1 1 0

13 8 4 16 0 0 0

17 10 8 64 2 4 16

x=45 y=40 (xi-xbar) =0

(xi-xbar)2

=160 (yi-ybar)

=0(yi-ybar)2

=10 32

1604

104

45 409; 8; 40 6.325

5 532

2.5 1.581; .84(6.325)(1.581)

x

y

x y s

s r

Page 20: Least Squares Regression Fitting a Line to Bivariate Data.

Calculations

1

0 1

1.581.8 .2;

6.325

8 .2(9) 8 1.8 6.2

least squares prediction line:

ˆ 6.2 .2

y

x

sb r

s

b y b x

y x

Page 21: Least Squares Regression Fitting a Line to Bivariate Data.

least squares prediction line

0 1ˆ 6.2 .2

$6,000, 6

ˆ 6.2 .2(6) 7.4 ($7,400)

$25,000, 25

ˆ 6.2 .2(25) 11.2 ($11,200)

y b b x x

income x

y

income x

y

Page 22: Least Squares Regression Fitting a Line to Bivariate Data.

Least Squares Prediction Line

Consumption Expenditure

y = 6.2 + 0.2x

5

6

7

8

9

10

11

0 5 10 15 20

Household Income ($1,000's)

Exp

end

itu

re (

$1,0

00's

)

Page 23: Least Squares Regression Fitting a Line to Bivariate Data.

Consumption Expenditure Prediction When x=$6,000

Consumption Expenditure

y = 6.2 + 0.2x

5

6

7

8

9

10

11

0 5 10 15 20

Household Income ($1,000's)

Exp

end

itu

re (

$1,0

00's

)

6

7.4

Page 24: Least Squares Regression Fitting a Line to Bivariate Data.

Consumption Expenditure Prediction When x=$25,000

Consumption Expenditure

y = 6.2 + 0.2x

5

6

7

8

9

10

11

12

0 5 10 15 20 25

Household Income ($1,000's)

Exp

endi

ture

($1,

000'

s)

25

11.2

Page 25: Least Squares Regression Fitting a Line to Bivariate Data.

The least squares line always goes through the point with coordinates (x, y)

( x, y ) = ( 9, 8 )

Page 26: Least Squares Regression Fitting a Line to Bivariate Data.

C. Compute the Residuals

Inc. x ConE y y=6.2+.2x y - y (y-y)^2

1 7 6.4 .6 .36

5 6 7.2 -1.2 1.44

9 9 8 1 1

13 8 8.8 -.8 .64

17 10 9.6 .4 .16

residuals=0 (residuals)2

=3.6

Page 27: Least Squares Regression Fitting a Line to Bivariate Data.

Residuals

Consumption Expenditure

y = 6.2 + 0.2x

5

6

7

8

9

10

11

0 5 10 15 20

Household Income ($1,000's)

Exp

end

itu

re (

$1,0

00's

)

Page 28: Least Squares Regression Fitting a Line to Bivariate Data.

Income Residual Plot

Income Residual Plot

-2-1012

0 5 10 15 20

Incom e

Resi

dual

s

Page 29: Least Squares Regression Fitting a Line to Bivariate Data.

residuals, residuals)2

Note that* residuals = 0 residuals)2 = 3.6* From formula in box on p. 7:

SSE=yi2 – b0*yi – b1*xiyi

330 – 6.2*40 - .2*392= 330 – 248 – 78.4 = 3.6

Any other line drawn through the scatterplot will have

residuals)2 > 3.6

Page 30: Least Squares Regression Fitting a Line to Bivariate Data.

Car Weight, Fuel Consumption Example, cont.

FUEL CONSUMPTION vs CAR WEIGHT

2

3

4

5

6

7

1.5 2.5 3.5 4.5

WEIGHT (1000 lbs)

FU

EL

CO

NS

UM

P.

(gal

/100

mile

s)(xi, yi): (3.4, 5.5) (3.8, 5.9) (4.1, 6.5) (2.2, 3.3)(2.6, 3.6) (2.9, 4.6) (2, 2.9) (2.7, 3.6) (1.9, 3.1) (3.4, 4.9)

Page 31: Least Squares Regression Fitting a Line to Bivariate Data.

Wt

(x)

Fuel

(y)

3.4 5.5 .5 .25 1.11 1.231 .555

3.8 5.9 .9 .81 1.51 2.2801 1.359

4.1 6.5 1.2 1.44 2.11 4.4521 2.532

2.2 3.3 -.7 .49 -1.09 1.1881 .763

2.6 3.6 -.3 .09 -.79 .6241 .237

2.9 4.6 0 0 .21 .0441 0

2.0 2.9 -.9 .81 -1.49 2.2201 1.341

2.7 3.6 -.2 .04 -.79 .6241 .158

1.9 3.1 -1.0 1 -1.29 1.6641 1.29

3.4 4.9 .5 .25 .51 .2601 .255

29 43.9 0 5.18 0 14.589 8.49

ix - x 2i(x - x) iy - y 2

i(y - y) i i(x - x)(y - y)

col. sum

Page 32: Least Squares Regression Fitting a Line to Bivariate Data.

Calculations

5.189

14.5899

1

0 1

0 1

slope 1.639

intercept 4.39 1.639(2.9) .3631

ˆleast squares prediction line .3631 1.

2.9; 4.39; .7587;

8.491.2732; .9766

9(.77587)(1.2732)

1.2732.9766

.7587

x

y

y

x

b r

b y b x

y b b x

x y s

s r

s

s

639x

Page 33: Least Squares Regression Fitting a Line to Bivariate Data.

Scatterplot with least squares prediction line

FUEL CONSUMPTION vs CAR WEIGHT

y = 1.639x - 0.3631r2 = 0.9538

234567

1.5 2.5 3.5 4.5

WEIGHT (1000 lbs)

FU

EL

CO

NS

UM

P.

(gal

/100

mile

s)

Page 34: Least Squares Regression Fitting a Line to Bivariate Data.

The Least Squares Line Always goes Through ( x, y )

(x, y ) = (2.9, 4.39)

Page 35: Least Squares Regression Fitting a Line to Bivariate Data.

Using the least squares line for prediction. Fuel consumption of 3,000 lb car? (x=3)

ˆ .3631 1.639(3) 4.5539y Fuel Consumption vs Car Weight: Scatterplot and Least Squares Line

y = - 0.3631 + 1.639x

2

3

4

5

6

7

1.5 2 2.5 3 3.5 4 4.5CAR WEIGHT

FU

EL

CO

NS

UM

PT

ION

(3.0, 4.5539)

Page 36: Least Squares Regression Fitting a Line to Bivariate Data.

Be Careful!

ˆ .3631 1.639(.5) .4564

(219 mpg)

y

Fuel consumption of 500 lb car? (x = .5)

FUEL CONSUMPTION vs CAR WEIGHT

y = 1.639x - 0.3631r2 = 0.9538

234567

1.5 2.5 3.5 4.5

WEIGHT (1000 lbs)

FU

EL

CO

NS

UM

P.

(gal/100 m

iles)

x = .5 is outside the range of the x-data that we used to determine the least squares line

Page 37: Least Squares Regression Fitting a Line to Bivariate Data.

Avoid GIGO! Evaluating the least squares line

1. Create scatterplot. Approximately linear?

2. Calculate r2, the square of the correlation coefficient

3. Examine residual plot

Page 38: Least Squares Regression Fitting a Line to Bivariate Data.

r2 : The Variation Accounted For

The square of the correlation coefficient r gives important information about the usefulness of the least squares line

Page 39: Least Squares Regression Fitting a Line to Bivariate Data.

r2: important information for evaluating the usefulness of the least squares line

The square of the correlation coefficient, r2, is the fraction of the variation in y that is explained by the least squares regression of y on x.

-1 ≤ r ≤ 1 implies 0 ≤ r2 ≤ 1

The square of the correlation coefficient, r2, is the fraction of the variation in y that is explained by the variation in x.

Page 40: Least Squares Regression Fitting a Line to Bivariate Data.

Example: car weight, fuel consumption

x=car weight, y=fuel consumption

r2 = (.9766)2 .95

About 95% of the variation in fuel consumption (y) is explained by the linear relationship between car weight (x) and fuel consumption (y).

What else affects fuel consumption?

– Driver, size of engine, tires, road, etc.

Page 41: Least Squares Regression Fitting a Line to Bivariate Data.

Example: SAT scoresSAT Mean per State vs % Seniors Taking Test

y = -2.2375x + 1023.4

R2 = 0.7542

820

870

920

970

1020

1070

1120

0 10 20 30 40 50 60 70 80

% of Seniors Taking Test

Mea

n S

AT

Sco

re

Page 42: Least Squares Regression Fitting a Line to Bivariate Data.

SAT scores: calculations

1 0 1

1

0

33.882 24.103 947.549 62.1 .868

,

62.1slope .868 2.23635

24.103intercept 947.549 ( 2.236)33.882 1023.309

ˆleast squares prediction line 1023.309 2.236

x y

y

x

x s y s r

sb r b y b x

s

b

b

y x

Page 43: Least Squares Regression Fitting a Line to Bivariate Data.

SAT scores: result

SAT Mean per State vs % Seniors Taking Test

y = -2.2375x + 1023.4

R2 = 0.7542

820

870

920

970

1020

1070

1120

0 10 20 30 40 50 60 70 80

% of Seniors Taking Test

Mea

n S

AT

Sco

re

r2 = (-.868)2 = .7534

If 57% of NC seniors take the SAT, the predicted mean score is

ˆ 1023.309 2.23635(57) 895.84y

Page 44: Least Squares Regression Fitting a Line to Bivariate Data.

Avoid GIGO! Evaluating the least squares line

1. Create scatterplot. Approximately linear?

2. Calculate r2, the square of the correlation coefficient

3. Examine residual plot

Page 45: Least Squares Regression Fitting a Line to Bivariate Data.

Residuals residual =observed y - predicted y

= y - y Properties of residuals

1. The residuals always sum to 0 (therefore the mean of the residuals is 0)

2. The least squares line always goes through the point (x, y)

Page 46: Least Squares Regression Fitting a Line to Bivariate Data.

Graphicallyresidual = y - y

y

yi

yi ei=yi - yi

Xxi

Page 47: Least Squares Regression Fitting a Line to Bivariate Data.

Residual Plot

Residuals help us determine if fitting a least squares line to the data makes sense

When a least squares line is appropriate, it should model the underlying relationship; nothing interesting should be left behind

We make a scatterplot of the residuals in the hope of finding…

NOTHING!

Page 48: Least Squares Regression Fitting a Line to Bivariate Data.

Car Wt/ Fuel Consump: Residuals

CAR WT. FUEL CONSUMP. Pred FUEL CONSUMP. Residuals

3.4 5.5 5.2094980690 .290501931 3.8 5.9 5.865096525 0.034903475 4.1 6.5 6.356795367 0.143204633 2.2 3.3 3.242702703 0.057297297 2.6 3.6 3.898301158 -0.29830115 2.9 4.6 4.39 0.21 2 2.9 2.914903475 -0.01490347 2.7 3.6 4.062200772 -0.46220077 1.9 3.1 2.751003861 0.348996139 3.4 4.9 5.209498069 -0.309498069

Page 49: Least Squares Regression Fitting a Line to Bivariate Data.

Example: Car wt/fuel consump. residual plot page 13

RESIDUALS vs WT(X)

-0.6

-0.4

-0.2

0

0.2

0.4

1.5 2 2.5 3 3.5 4 4.5

WT(X)

RE

SID

UA

LS

RESIDUAL

Page 50: Least Squares Regression Fitting a Line to Bivariate Data.

SAT Residuals

%TAKE Residual Plot

-100-50

0

50100

0 20 40 60 80

%TAKE

Resi

dual

s

Page 51: Least Squares Regression Fitting a Line to Bivariate Data.

Linear Relationship?

Linear(?)

0

10

20

30

40

50

60

-4 -2 0 2 4 6 8X

Y

Page 52: Least Squares Regression Fitting a Line to Bivariate Data.

Garbage In Garbage Out

GIGO

y = 4x + 11

0

10

20

30

40

50

60

-4 -2 0 2 4 6 8X

Y

Page 53: Least Squares Regression Fitting a Line to Bivariate Data.

Residual Plot – Clue to GIGO

Residual Plot

-20

-10

0

10

20

-4 -2 0 2 4 6 8

X Variable

Resi

dual

s

Page 54: Least Squares Regression Fitting a Line to Bivariate Data.

GIGO

y = 4x + 11

0

10

20

30

40

50

60

-4 -2 0 2 4 6 8X

Y

Residual Plot

-20

-10

0

10

20

-4 -2 0 2 4 6 8

X Variable

Re

sid

ua

ls