Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate...

61
Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination Correlation Coefficient Anova and the F statistic Multiple Regression

description

Chapter 8 – 3 Overview Interval Nominal Dependent Variable Independent Variables Nominal Interval Regression Correlation You already know how to deal with two nominal variables Lambda Anova and F-Test TODAY! This cell is not covered in this course Logistic Regression TODAY!

Transcript of Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate...

Page 1: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 1

Regression & Correlation:Extended Treatment

• Overview• The Scatter Diagram• Bivariate Linear Regression• Prediction Error• Coefficient of Determination • Correlation Coefficient• Anova and the F statistic• Multiple Regression

Page 2: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 2

Overview

Int

erva

l

N

omin

al

Dep

ende

ntV

aria

ble

Independent Variables

Nominal Interval

Considers the distribution of one variable across the categories of another variable

Considers the difference between the mean of one group on a variable with another group

Considers how a change in a variable affects a discrete outcome

Considers the degree to which a change in one or two variables results in a change in another

Page 3: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 3

Overview

Int

erva

l

N

omin

al

Dep

ende

ntV

aria

ble

Independent Variables

Nominal Interval

RegressionCorrelation

You already know how to deal with two

nominal variables

Lambda

Anova and F-Test

TODAY!

This cell is not covered in this course

Logistic Regression

TODAY!

Page 4: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 4

General Examples

Does a change in one variable significantly affect another variable?

Do two scores co-vary positively (high on one score high on the other, low on one, low on the other)?

Do two scores co-vary negatively (high on one score low on the other; low on one, hi on the other)?

Does a change in two or more variables significantly affect another variable?

Inte

rval

Nom

inal

Dep

ende

ntV

aria

ble

Independent Variables

Nominal Interval

Inte

rval

Nom

inal

Dep

ende

ntV

aria

ble

Independent Variables

Nominal Interval

Inte

rval

Nom

inal

Dep

ende

ntV

aria

ble

Independent Variables

Nominal Interval

Page 5: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 5

Specific Examples

Does getting older significantly influence a person’s political views?

Does marital satisfaction increase with length of marriage?

How does an additional year of education affect one’s earnings?

How do education and seniority affect one’s earnings?

Inte

rval

Nom

inal

Dep

ende

ntV

aria

ble

Independent Variables

Nominal Interval

Inte

rval

Nom

inal

Dep

ende

ntV

aria

ble

Independent Variables

Nominal Interval

Inte

rval

Nom

inal

Dep

ende

ntV

aria

ble

Independent Variables

Nominal Interval

Page 6: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 6

Scatter Diagrams

• Scatter Diagram (scatterplot)—a visual method used to display a relationship between two interval-ratio variables.

• Typically, the independent variable is placed on the X-axis (horizontal axis), while the dependent variable is placed on the Y-axis (vertical axis.)

Page 7: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 7

Scatter Diagram Example

• The data…

Page 8: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 8

Scatter Diagram Example

Page 9: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 9

A Scatter Diagram Example of a Negative Relationship

Page 10: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 10

Linear Relationships• Linear relationship – A relationship between two

interval-ratio variables in which the observations displayed in a scatter diagram can be approximated with a straight line.

• Deterministic (perfect) linear relationship – A relationship between two interval-ratio variables in which all the observations (the dots) fall along a straight line. The line provides a predicted value of Y (the vertical axis) for any value of X (the horizontal axis.

Page 11: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 11

Graph the data below and examine the relationship:

Page 12: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 12

The Seniority-Salary Relationship

Page 13: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 13

Example: Education & Prestige

Does education predict occupational prestige? If so, then the higher the respondent’s level of education, as measured by number of years of schooling, the greater the prestige of the respondent’s occupation.Take a careful look at the scatter diagram on the next slide and see if you think that there exists a relationship between these two variables…

Page 14: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 14

Scatterplot of Prestige by Education

Page 15: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 15

Example: Education & Prestige

• The scatter diagram data can be represented by a straight line, therefore there does exist a relationship between these two variables.

• In addition, since occupational prestige becomes higher, as years of education increases, we can say also that the relationship is a positive one.

Page 16: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 16

Take your best guess?

The mean age for U.S. residents.

Now if I tell you that this person owns a skateboard, would you change your guess? (Of course!)With quantitative analyses we are generally trying to predict or take our best guess at value of the dependent variable. One way to assess the relationship between two variables is to consider the degree to which the extra information of the second variable makes your guess better. If someone owns a skateboard, that is likely to indicate to us that s/he is younger and we may be able to guess closer to the actual value.

If you know nothing else about a person, except that he or she lives in United States and I asked you to his or her age, what would you guess?

Page 17: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 17

Take your best guess?

• Similar to the example of age and the skateboard, we can take a much better guess at someone’s occupational prestige, if we have information about her/his years or level of education.

Page 18: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 18

Equation for a Straight LineY= a + bX

where a = interceptb = slopeY = dependent variableX = independent variable

X

Y

a

riserunrise run = b

Page 19: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 19

Bivariate Linear Regression EquationY = a + bX

• Y-intercept (a)—The point where the regression line crosses the Y-axis, or the value of Y when X=0.

• Slope (b)—The change in variable Y (the dependent variable) with a unit change in X (the independent variable.)

The estimates of a and b will have the property that the sum of the squared differences between the observed and predicted (Y-Y)2 is minimized using ordinary least squares (OLS). Thus the regression line represents the Best Linear and Unbiased Estimators (BLUE) of the intercept and slope.

ˆ

^

Page 20: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 20

Coefficientsa

6.120 1.531 3.997 .000

2.762 .111 .559 24.886 .000

(Constant)HIGHEST YEAR OFSCHOOL COMPLETED

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: RS OCCUPATIONAL PRESTIGE SCORE (1980)a.

Now let’s interpret the SPSS output...

SPSS Regression Output: 1996 GSSEducation & Prestige

Model Summary

.559a .313 .312 11.90Model1

R R SquareAdjusted R

Square

Std. Errorof the

Estimate

Predictors: (Constant), HIGHEST YEAR OF SCHOOLCOMPLETED

a.

Page 21: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 21

The Regression EquationCoefficientsa

6.120 1.531 3.997 .000

2.762 .111 .559 24.886 .000

(Constant)HIGHEST YEAR OFSCHOOL COMPLETED

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: RS OCCUPATIONAL PRESTIGE SCORE (1980)a.

Prediction Equation:

Y = 6.120 + 2.762(X)This line represents the predicted values for Y when X is zero.

ˆ

Page 22: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 22

Prediction Equation:

Y = 6.120 + 2.762(X)This line represents the predicted values for Y for each additional year of education

ˆ

Coefficientsa

6.120 1.531 3.997 .000

2.762 .111 .559 24.886 .000

(Constant)HIGHEST YEAR OFSCHOOL COMPLETED

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: RS OCCUPATIONAL PRESTIGE SCORE (1980)a.

The Regression Equation

Page 23: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 23

• If a respondent had zero years of schooling, this model predicts that his occupational prestige score would be 6.120 points.

• For each additional year of education, our model predicts a 2.762 point increase in occupational prestige.

Y = 6.120 + 2.762(X)ˆ

Interpreting the regression equation

Page 24: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 24

Ordinary Least Squares

• Least-squares line (best fitting line) – A line where the errors sum of squares, or e2, is at a minimum.

• Least-squares method – The technique that produces the least squares line.

Page 25: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 25

Estimating the slope: b• The bivariate regression coefficient or the

slope of the regression line can be obtained from the observed X and Y scores.

)(

)()(

1

)(1

)()(

222 XX

YYXX

N

XXN

YYXX

SSb

X

YX

Page 26: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 26

Covariance =

Variance of X =

Covariance of X and Y—a measure of how X and Y vary together. Covariance will be close to zero when X and Y are unrelated. It will be greater than zero when the relationship is positive and less than zero when the relationship is negative.Variance of X—we have talked a lot about variance in the dependent variable. This is simply the variance for the independent variable

Covariance and Variance

1

)()(

N

YYXX

1)(

1))(( 2

N

XXN

XXXX

Page 27: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 27

Estimating the Intercept

XbYa

The regression line always goes through the point corresponding to the mean of both X and Y, by definition.

So we utilize this information to solve for a:

Page 28: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 28

Back to the original scatterplot:

Page 29: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 29

A Representative Line

Page 30: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 30

Other Representative Lines

Page 31: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 31

Calculating the

Regression Equation

Page 32: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 32

Calculating the

Regression Equation

Page 33: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 33

The Least Squares Line!

Page 34: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 34

Summary: Properties of the Regression Line

• Represents the predicted values for Y for any and all values of X.

• Always goes through the point corresponding to the mean of both X and Y.

• It is the best fitting line in that it minimizes the sum of the squared deviations.

• Has a slope that can be positive or negative;

Page 35: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 35

Prediction Errors Back to our original data…

Consider the prediction of Y for one country: Norway

Norway’s predicted Y=73

Page 36: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 36

Take your best guess?

If you didn’t know the percentage of citizens in Norway who agreed to pay higher prices for

environmental protection (Y) what would you guess?

The mean for Y or = 56.45(The horizontal line in Figure 8)

With this prediction the error for Norway is:

55.1645.5673 YY

Page 37: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 37

IMPROVING THE PREDICTION

• Let’s see if we can reduce the error of prediction for Norway by using the linear regression equation:

• The new error of prediction is:

• Have we improved the prediction? • Yes!By…5.72 (16.55-10.83=5.72)

83.1017.6273ˆ YY

17.62)22(59.019.49ˆ Y

Page 38: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 38

SUM OF SQUARED DEVIATION

• We have looked only at Norway..To calculate deviations from the mean for all

the cases we square the deviations and sum them;we call it the total sum of squares or

SST:• The sum of squared deviations from the

regression line is called the error sum of squares or SSE

SSTYY 2)(

SSEYY 2)ˆ(

Page 39: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 39

MEASURING THE IMPROVEMENT IN PREDICTION

The improvement in the prediction error resulting from our use of the linear prediction equation is called the regression sum of squares or SSR. It is calculated by subtracting SSE from SST or:

SSR=SST-SSE

Page 40: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 40

EXAMPLE:GNP AND WILLINGNESS TO PAY MORE

Calculating the error sum of squares(SSE)

Page 41: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 41

Example:GNP and Willingness to Pay More

We already have the total sum of squares from Table 4:(SST)

The regression sum of squares or SSR is thus:

SSR=SST-SSE=3,032.7-2,625.92=406.78

7.032,3)( 2 YYSST

Page 42: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 42

Coefficient of Determination• Coefficient of Determination (r2) – A PRE measure

reflecting the proportional reduction of error that results from using the linear regression model.

• The total sum of squares(SST) measures the prediction error when the independent variable is ignored(E1): E1= SST

• The error sum of squares(SSE) measures the prediction errors when using the independent variable and the linear regression equation(E2): E2=SSE

1

212

EEE

rPRE

Page 43: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 43

Coefficient of Determination

Thus...

13.07.032,3

92.625,27.032,3)(

)ˆ()(2

222

YY

YYYYSST

SSESSTrPRE

1

212

EEE

rPRE

r2= 0.13 means: by using GNP and the linear prediction rule to predict Y-the percentage willing to pay higher prices-the error of prediction is reduced by 13

percent(0.13x100). r2 also reflects the proportion of the total variation in the dependent variable, Y, explained by the independent variable, X.

Page 44: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 44

Coefficient of Determination

22

222

)]()][var([var)],([cov

yx

yx

SSS

YianceXianceYXariancer

r2 can also be calculated using this equation……..

Page 45: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 45

• Pearson’s Correlation Coefficient (r) — The square root of r2. It is a measure of association between two interval-ratio variables.

• Symmetrical measure—No specification of independent or dependent variables.

• Ranges from –1.0 to +1.0. The sign () indicates direction. The closer the number is to 1.0 the stronger the association between X and Y.

)Y.(d.s*)X.(d.s)Y,X(covr

The Correlation Coefficient

Page 46: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 46

r = 0 means that there is no association between the two variables.

The Correlation Coefficient

Y

X

r = 0

Page 47: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 47

The Correlation Coefficient

Y

X

r = +1

r = 0 means that there is no association between the two variables. r = +1 means a perfect positive correlation.

Page 48: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 48

The Correlation Coefficient

Y

X

r = –1

r = 0 means that there is no association between the two variables. r = +1 means a perfect positive correlation. r = –1 means a perfect negative correlation.

Page 49: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 49

Testing the Significance of r2 using Anova

• r2 is an estimate based on sample data.• We test it for statistical significance to assess

the probability that the linear relationship it expresses is zero in the population.

• This technique, analysis of variance (Anova) is based on the regression sum of squares(SSR) and the error sum of squares(SSE).

Page 50: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 50

Determining df• There are df associated with both the

regression sum of squares(SSR) and errors sum of squares (SSE).

• For SSR df=k. K is equal to the number of independent variables in the regression equation. In the bivariate case df=1

• For SSE df=N-(K+1). In the bivariate case df=N-2[N-(1+1)]

Page 51: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 51

Calculating Mean Squares

Mean squares are averages computed by dividing each sum of squares by its corresponding degrees of freedom.

To calculate MSR and MSE for our example….

78.4061

78.4061

SSRMSR

77.29121192.625,2

2

NSSEMSE

Page 52: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 52

The F StatisticThe mean squares regression (MSR) and mean squares error (MSE) compose the obtained F-statistic.

MSEMSRF

39.177.29178.406

MSEMSRF

The larger MSR is relative to MSE, the larger the F ratio and the more likely r2 is larger than zero in the population. The null hypothesis states that r2 is zero in the population.

Thus….

Page 53: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 53

Making a Decision• Use Appendix F to determine the probability of F=1.39.• Listed are F values for the numerator df-associated with

MSR(df=1); and the denominator df associated with MSE(df=9).

• Choose the table marked “p=.05.” For df(1,9) the critical F= 5.12

• The obtained F is smaller than the critical F(1.3<5.12). We cannot reject the null hypothesis.

• Conclusion: the linear relationship as expressed in r2, is probably zero in the population.

Page 54: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 54

The Anova Table

The results of the ANOVA test are often summarized in a table such as Table 9.

Page 55: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 55

Multiple Regression• An extension of bivariate regression. • We examine the effect of two or more independent

variables on the dependent variable.• The calculations are easily accomplished using

SPSS or other statistical software. • The General Form of the Multiple Regression

Equation(2 independent variables):

2211ˆ XbXbaY

Page 56: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 56

Multiple Regression

• =Predicted Y• X1 =The score on independent variable X1

• X2 =The score on independent variable X2

• a=the Y-intercept or the value of Y when both X1 and X2 are equal to zero

• b1=the change in Y with a unit change in X1, when X2 is controlled

• b2= the change in Y with a unit change in X2, when X1 is controlled

Y2211

ˆ XbXbaY

Page 57: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 57

Multiple Regression

• The hypothesis: the higher the state’s expenditure, the lower the teen pregnancy rate. The higher the state’s unemployment rate the higher the teen pregnancy rate.

• The multiple linear equation is:

• =Teen pregnancy;X1=Unemployment rate,;X2=Expenditure per pupil;

21 007.736.9813.49ˆ XXY

Y

Page 58: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 58

• A state’s teen pregnancy rate( ) goes up by 9.736 for each 1% increase in the unemployment rate (X1), holding expenditure per pupil (X2) constant.

• A state’s pregnancy rate goes down by .007 with each $1 increase in the state’s expenditure per pupil (X2), holding the unemployment rate(X1 ) constant

• The value of a-49.813 reflects the state’s teen pregnancy rate when both the unemployment rate and the state’s expenditure per pupil are equal to zero.

21 007.736.9813.49ˆ XXY

Y

Interpretation

Page 59: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 59

Multiple Regression and Coefficient of Determination -R2

• The coefficient of determination for multiple regression is R2.

• It measures the proportional reduction of error that results from using the linear regression model.

• We obtained an R2 of .267. This means that by using states’ unemployment rates and expenditures per pupil to predict pregnancy rates the error of prediction is reduced by 26.7%(.267x100).

Page 60: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 60

The Multiple Correlation Coefficient- R

• The square root of R2(R), is the multiple correlation coefficient.

• It measures the linear relationship between the dependent variable and the combined effect of two or more independent variables.

• For our example, R=. 517. It indicates that there is a moderate relationship between teen pregnancy rate and both unemployment rate and expenditure per pupil.

Page 61: Chapter 8 – 1 Regression & Correlation:Extended Treatment Overview The Scatter Diagram Bivariate Linear Regression Prediction Error Coefficient of Determination.

Chapter 8 – 61

ANOVA and R2

• The statistical significance of R2 is assessed by performing an Anova test, calculating an F ratio and determining its level of significance

• With 2 and 43 df, we would need an F of 5.18 to reject the null hypothesis that R2=0, at the .01 level.

• The obtained F exceeds the critical F(7.85>5.18). We can reject the null hypothesis with p<. 01.

The obtainedF Ratio