Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate...

49
Multiple Regression

Transcript of Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate...

Page 1: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Multiple Regression

Page 2: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Objectives

• Maximize the predictive power of the independent variables as represented in the variate.

• Compare two or more sets of independent variables to ascertain the predictive power of each variate

Page 3: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Explanation

• The most direct interpretation of the regression variate is a determination of the relative importance of each independent variable in the prediction of the dependent measure.

• Assess the nature of the relationships between the independent variables and the dependent variable.

• Provide insight into the relationships among independent variables.

Y’ X1

X2

X3

Page 4: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Sample Problem (Leslie Salt Property):Finding Fair Price of a LandVariable Name Description

PRICE Sale price in $000 per acre

COUNTY San Mateo=0, Santa Clara=1

SIZE Size of the property in acres

ELEVATION Average elevation in feet above sea level

SEWER Distance (in feet) to nearest sewer connection

DATE Date of sale counting backward from current time (in months)

FLOOD Subject to flooding by tidal action =1; otherwise=0

DISTANCE Distance in miles from Leslie property

Page 5: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

PRICE COUNTY SIZE ELEVATION SEWER DATE FLOOD DISTANCE4.5 1 138.4 10 3000 -103 0 0.3

10.6 1 52 4 0 -103 0 2.51.7 0 16.1 0 2640 -98 1 10.3

5 0 1695.2 1 3500 -93 0 145 0 845 1 1000 -92 1 14

3.3 1 6.9 2 10000 -86 0 05.7 1 105.9 4 0 -68 0 06.2 1 56.6 4 0 -64 0 0

19.4 1 51.4 20 1300 -63 0 1.23.2 1 22.1 0 6000 -62 0 04.7 1 22.1 0 6000 -61 0 06.9 1 27.7 3 4500 -60 0 08.1 1 18.6 5 5000 -59 0 0.5

11.6 1 69.9 8 0 -59 0 4.419.3 1 145.7 10 0 -59 0 4.211.7 1 77.2 9 0 -59 0 4.513.3 1 26.2 8 0 -59 0 4.715.1 1 102.3 6 0 -59 0 4.912.4 1 49.5 11 0 -59 0 4.615.3 1 12.2 8 0 -59 0 512.2 0 320.6 0 4000 -54 0 16.518.1 1 9.9 5 0 -54 0 5.216.8 1 15.3 2 0 -53 0 5.5

5.9 0 55.2 0 1320 -49 1 11.94 0 116.2 2 900 -45 1 5.5

37.2 0 15 5 0 -39 0 7.218.2 0 23.4 5 4420 -39 0 5.515.1 0 132.8 2 2640 -35 0 10.222.9 0 12 5 3400 -16 0 5.515.2 0 67 2 900 -5 1 5.521.9 0 30.8 2 900 -4 0 5.5

Histogram of leslie_salt[, 1]

leslie_salt[, 1]

Fre

qu

en

cy

0 10 20 30 40

02

46

8

Histogram of Y

Y

De

nsi

ty

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.00

.00

.20

.40

.60

.8

Page 6: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

SEWERFLOOD

SIZECOUNTY

DISTANCEELEVATION

DATEPRICE

SE

WE

RF

LOO

DS

IZE

CO

UN

TY

DIS

TAN

CE

ELE

VA

TIO

ND

AT

EP

RIC

EPRICE COUNTY SIZE ELEVATION SEWER DATE FLOOD DISTANCE

PRICE 100.00% -18.22% -23.97% 35.18% -39.12% 59.47% -32.31% 9.33%COUNTY -18.22% 100.00% -33.94% 47.52% -5.00% -36.98% -55.18% -74.22%SIZE -23.97% -33.94% 100.00% -20.95% 5.34% -34.95% 10.89% 55.69%ELEVATION 35.18% 47.52% -20.95% 100.00% -35.94% -5.65% -37.31% -36.25%SEWER -39.12% -5.00% 5.34% -35.94% 100.00% -15.15% -11.31% -15.87%DATE 59.47% -36.98% -34.95% -5.65% -15.15% 100.00% 1.54% 4.44%FLOOD -32.31% -55.18% 10.89% -37.31% -11.31% 1.54% 100.00% 42.33%DISTANCE 9.33% -74.22% 55.69% -36.25% -15.87% 4.44% 42.33% 100.00%

Page 7: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

PRICE

0.0 0.4 0.8 0 5 15 -100 -60 -20 0 5 10 15

0.5

0.0

1.0

COUNTY

SIZE

0

015 ELEVATION

SEWER

0

-100

DATE

FLOOD

0.0

1.0

0.5 2.0 3.5

015

0 1000 0 4000 10000 0.0 0.4 0.8

DISTANCE

Page 8: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

PRICE COUNTY SIZE ELEVATION SEWER DATE FLOOD DISTANCEPRICE 100.00% -18.22% -23.97% 35.18% -39.12% 59.47% -32.31% 9.33%COUNTY -18.22% 100.00% -33.94% 47.52% -5.00% -36.98% -55.18% -74.22%SIZE -23.97% -33.94% 100.00% -20.95% 5.34% -34.95% 10.89% 55.69%ELEVATION 35.18% 47.52% -20.95% 100.00% -35.94% -5.65% -37.31% -36.25%SEWER -39.12% -5.00% 5.34% -35.94% 100.00% -15.15% -11.31% -15.87%DATE 59.47% -36.98% -34.95% -5.65% -15.15% 100.00% 1.54% 4.44%FLOOD -32.31% -55.18% 10.89% -37.31% -11.31% 1.54% 100.00% 42.33%DISTANCE 9.33% -74.22% 55.69% -36.25% -15.87% 4.44% 42.33% 100.00%

log (𝑃𝑅𝐼𝐶𝐸 )=𝑏0+𝑏1𝐸𝐿𝐸𝑉𝐴𝑇𝐼𝑂𝑁 +𝑏2𝑆𝐸𝑊𝐸𝑅+𝑏3𝐷𝐴𝑇𝐸+𝑏4𝐹𝐿𝑂𝑂𝐷+𝜀

Page 9: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

summary(model)

Call:lm(formula = leslie_salt[, 1] ~ leslie_salt[, 4] + leslie_salt[, 5] + leslie_salt[, 6])

Residuals: Min 1Q Median 3Q Max -9.6076 -3.2506 -0.0281 2.8770 20.2776

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 21.2787636 2.9203157 7.286 7.75e-08 ***leslie_salt[, 4] 0.5614588 0.2515472 2.232 0.034107 * leslie_salt[, 5] -0.0005871 0.0004460 -1.316 0.199129 leslie_salt[, 6] 0.1836824 0.0421712 4.356 0.000172 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.559 on 27 degrees of freedomMultiple R-squared: 0.5327, Adjusted R-squared: 0.4807 F-statistic: 10.26 on 3 and 27 DF, p-value: 0.000111

Page 10: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Assumptions

• Linearity of the dependent variable in terms of independent variables.

𝑋 1

𝑦

Page 11: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Linearity (cts.)

A higher order term of the dependent variable should be included.In that case define a new variable by taking the square (for this case) of that independent variable and use squared values in the regression.

Use: Visual inspection

Page 12: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

More troublesome is MODERATOR effect• If an independent-dependent variable relationship is effected by

another independent variable this situation is termed a moderator effect.

• The most common moderator effect in multiple regression is the bilinear moderator in which the slope of the relationship of one independent variable (X1) changes across values of the moderator variable (X2).

Page 13: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Example

Family income (X2) can be a positive moderator of the relationship between family size (X1) and credit card usage (Y). Then expected change in credit card usage based on family size () might be lower for families with low incomes and high in high incomes.

Without the moderator effect we are assuming that family size have a constant effect on credit card usage.

Page 14: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Adding Moderator Effect

The idea comes from observing a self moderator effect. If a variable has a moderator effect onto itself then we would assume a nonlinear (second degree) relationship with the dependent variable.

Thus if there is a moderator effect add X1X2 as an independent variable to regression equation. But we will return back to this!!!

Page 15: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Assumption:HomoscedasticityConstant variance of the error terms.

Page 16: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

• Heteroscedasticity (cts.)

in residuals within variables

Page 17: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Heteroscedasticity (cts.)• Use: Levene Test.Levene Test: Tests the equality of variance. Levene's test works by testing the null hypothesis that the variances of the group are the same. The output probability is the probability that at least one of the samples in the test has a significantly different variance. If this is greater than a selected percentage (usually 5%) then it is considered too great to be able to usefully apply parametric tests.Variances

In SPSS it is reported.In R: In «lawstat» library use levene.test() function.

Page 18: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Use F test for more than 2 groups…

Page 19: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

0 5 10 15 20

-10

05

15

Fitted values

Re

sid

ua

ls

lm(leslie_salt[, 1] ~ leslie_salt[, 4] + leslie_salt[, 5] + leslie_salt[, 6 ...

Residuals vs Fitted

26

25

2

Page 20: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Assumptions

• Independence of the error terms.

Check the coordinates!!!

Page 21: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Independence of Error Terms• Use: Durbin-WatsonThe value of the Durbin-Watson statistic ranges from 0 to 4. As a general rule of thumb, the residuals are uncorrelated is the Durbin-Watson statistic is approximately 2. A value close to 0 indicates strong positive correlation, while a value of 4 indicates strong negative correlation.

Page 22: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.
Page 23: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

• In SPSS Durbin Watson is reported.• In R under «lmtest» library use dwtest()dwtest(formula, order.by = NULL, alternative = c("greater", "two.sided", "less"), iterations = 15, exact = NULL, tol = 1e-10, data = list())

For our regression model.

> dwtest(model) Durbin-Watson test data: model DW = 2.3762, p-value = 0.7783 alternative hypothesis: true autocorrelation is greater than 0

Page 24: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Assumptions• Normality of the error term distribution.

-2 -1 0 1 2

-10

12

3

Theoretical Quantiles

Sta

nd

ard

ize

d r

esi

du

als

lm(leslie_salt[, 1] ~ leslie_salt[, 4] + leslie_salt[, 6] + leslie_salt[, 7 ...

Normal Q-Q

2

26

10

Page 25: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

qqPlot(model)

-2 -1 0 1 2

-10

12

3

t Quantiles

Stu

de

ntiz

ed

Re

sid

ua

ls(m

od

el)

Page 26: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

DiagonisticsCall:lm(formula = leslie_salt[, 1] ~ leslie_salt[, 4] + leslie_salt[, 5] + leslie_salt[, 6])

Residuals: Min 1Q Median 3Q Max -9.6076 -3.2506 -0.0281 2.8770 20.2776

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 21.2787636 2.9203157 7.286 7.75e-08 ***leslie_salt[, 4] 0.5614588 0.2515472 2.232 0.034107 * leslie_salt[, 5] -0.0005871 0.0004460 -1.316 0.199129 leslie_salt[, 6] 0.1836824 0.0421712 4.356 0.000172 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.559 on 27 degrees of freedomMultiple R-squared: 0.5327, Adjusted R-squared: 0.4807 F-statistic: 10.26 on 3 and 27 DF, p-value: 0.000111

Page 27: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Identifying Influential Observations

• observations that lie outside the general patterns of the data set• observations that strongly influence regression results

Types of Influential Observations1. Outliers – observations that have large residuals (based on

dependent variables)2. Leverage points – observations that are distinct from the remaining

observations based on their independent variable values.3. Influential observations – including all observations that have a

disproportionate effect on the regression results.

Page 28: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Outliers

• Typical boxplot test.• In «car» library

outlierTest(model) rstudent unadjusted p-value Bonferonni p

2 3.704906 0.0010527 0.032634

Page 29: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Leverage

An observation with an extreme value on a predictor variable is called a point with high leverage. Leverage is a measure of how far an IV deviates from its mean. These leverage points can have an unusually large effect on the estimate of regression coefficients. We hope to see very few (if any) points in the plot representing high values of leverage. High leverage can also point toward outliers, which are defined as observations with large residuals in regression. You should say something about the number of cases that appear to represent high leverage.Leverage: Cut off point : p: # of independent variablesn: # of observations

Page 30: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

-0.5 0.0 0.5 1.0

-1.0

0.0

leslie_salt[, 4] | others

lesl

ie_s

alt[

, 1]

| o

ther

s

-0.5 0.0 0.5 1.0

-1.0

0.0

1.0

leslie_salt[, 6] | others

lesl

ie_s

alt[

, 1]

| o

ther

s

-0.6 -0.2 0.2 0.4

-0.5

0.5

leslie_salt[, 7] | others

lesl

ie_s

alt[

, 1]

| o

ther

s

-0.2 0.0 0.2 0.4 0.6

-0.5

0.5

leslie_salt[, 8] | others

lesl

ie_s

alt[

, 1]

| o

ther

s

Leverage Plots

Page 31: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

0 5 10 15 20 25 30

0.00

0.15

0.30

Index

cook

s.di

stan

ce(m

odel

)

Cook’s Distance:Cut off point: p: # of independent variablesn: # of observations

Page 32: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

R-Code# Influential Observations# added variable plots av.Plots(model)# Cook's D plot# identify D values > 4/(n-k-1) cutoff <- 4/((nrow(leslie_salt)-length(model$coefficients)-2)) plot(fit, which=4, cook.levels=cutoff)# Influence Plot influencePlot(model, id.method="identify", main="Influence Plot", sub="Circle size is proportial to Cook's Distance" )

0.1 0.2 0.3 0.4

-10

12

3

Influence Plot

Circle size is proportial to Cook's DistanceHat-Values

Stu

de

ntiz

ed

Re

sid

ua

ls

2

4

5

910

21

26

30

Page 33: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

0.0 0.1 0.2 0.3 0.4

-20

12

3

Leverage

Sta

nd

ard

ize

d r

esi

du

als

lm(leslie_salt[, 1] ~ leslie_salt[, 4] + leslie_salt[, 6] + leslie_salt[, 7 ...

Cook's distance 0.5

0.5

1

Residuals vs Leverage

2

94

Cook’s Distance:Cut off point: p: # of independent variablesn: # of observations

Page 34: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Assessing Multicollinearity*****

A key issue in interpreting the regression variate is the correlation among the independent variables.

Our task in a regression analysis includes the following:1. Assess the degree of multicollinearity2. Determine its impact on results3. Apply the necessary remedies if needed

Page 35: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Assess the degree of multicollinearity

• The simplest and most obvious way: Identifying collinearity in correlation matrix. Check for correlation >90%.

• A direct measure of multicollinearity is tolerance (1/VIF). • The amount of variability of the selected independent variable not explained

by the other independent variables. Computation:• Take each independent variable. Assume it as the dependent variable. Compute adjusted

R2. • Tolerance is then 1-R2.

• For example if other variables explain 25% of an independent variable then tolerence of this variable is 75%. Tolerence should be more than 10%

> 1/vif(model)leslie_salt[, 4] leslie_salt[, 6] leslie_salt[, 7] leslie_salt[, 8] 0.8081325 0.9959058 0.7650806 0.7715437

Page 36: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Further…

• see page http://www.statmethods.net/stats/rdiagnostics.html for diagonistic tests with R

Page 37: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Partial Correlation

• A partial correlation coefficient is a way of expressing the unique relationship between the criterion and a predictor. Partial correlation represents the correlation between the criterion and a predictor after common variance with other predictors has been removed from both the criterion and the predictor of interest.

t.values <- model$coeff / sqrt(diag(vcov(model)))partcorr <- sqrt((t.values^2) / ((t.values^2) + model$df.residual))partcorr*****************************************************leslie_salt[, 4] leslie_salt[, 6] leslie_salt[, 7] leslie_salt[, 8] 0.6562662 0.8043296 0.6043579 0.5740840

Page 38: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.
Page 39: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Part (Semi-partial) Correlation

• A semipartial correlation coefficient represents the correlation between the criterion and a predictor that has been residualized with respect to all other predictors in the equation. Note that the criterion remains unaltered in the semipartial. Only the predictor is residualized. After removing variance that the predictor has in common with other predictors, the semipartial expresses the correlation between the residualized predictor and the unaltered criterion. An important advantage of the semipartial is that the denominator of the coefficient (the total variance of the criterion, Y) remains the same no matter which predictor is being examined. This makes the semipartial very interpretable. The square of the semipartial can be interpreted as the proportion of the criterion variance associated uniquely with the predictor. It is also possible to use the semipartial to fully deconstruct the variance components in a regression analysis.

Page 40: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Project (Step1):

Go to web page:http://luna.cas.usf.edu/~mbrannic/files/regression/Partial.html

Replicate the results there using a dataset of your own. Be creative in problem formulation. Data may be imaginary. Use at least 5 independent variables.

Page 41: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Comparing Regression Models

• In multiple regression the hardest problem is deciding on which variables to enter into equation even after checking assumption such as multicollinearity.

• Adjusted is not a proper way of model comparison.• Next we learn a better way.

Page 42: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Stepwise Regression

• Start with the most basic model. Pick your favourite independent variable and construct the model. Test it.

Remember correlation matrix (price in logs)

PRICE COUNTY SIZE ELEVATION SEWER DATE FLOOD DISTANCEPRICE 100.00% -18.22% -23.97% 35.18% -39.12% 59.47% -32.31% 9.33%COUNTY -18.22% 100.00% -33.94% 47.52% -5.00% -36.98% -55.18% -74.22%SIZE -23.97% -33.94% 100.00% -20.95% 5.34% -34.95% 10.89% 55.69%ELEVATION 35.18% 47.52% -20.95% 100.00% -35.94% -5.65% -37.31% -36.25%SEWER -39.12% -5.00% 5.34% -35.94% 100.00% -15.15% -11.31% -15.87%DATE 59.47% -36.98% -34.95% -5.65% -15.15% 100.00% 1.54% 4.44%FLOOD -32.31% -55.18% 10.89% -37.31% -11.31% 1.54% 100.00% 42.33%DISTANCE 9.33% -74.22% 55.69% -36.25% -15.87% 4.44% 42.33% 100.00%

𝑝𝑟𝑖𝑐𝑒=𝛽0+𝛽1𝑑𝑎𝑡𝑒

Page 43: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

𝑝𝑟𝑖𝑐𝑒=𝛽0+𝛽1𝑑𝑎𝑡𝑒

Call:lm(formula = leslie_salt[, 1] ~ leslie_salt[, 6])

Residuals: Min 1Q Median 3Q Max -1.12046 -0.34364 0.04853 0.39719 1.00081

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.322336 0.269975 12.306 4.9e-13 ***leslie_salt[, 6] 0.018124 0.004257 4.257 0.000198 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5719 on 29 degrees of freedomMultiple R-squared: 0.3846, Adjusted R-squared: 0.3634 F-statistic: 18.12 on 1 and 29 DF, p-value: 0.0001982

Our focus is the improvement in RSS. So we need residual sum of squares. But it is not given in the report directly (given in SPSS).

> anova(m1)Analysis of Variance Table

Response: leslie_salt[, 1] Df Sum Sq Mean Sq F value Pr(>F) leslie_salt[, 6] 1 5.9282 5.9282 18.124 0.0001982 ***Residuals 29 9.4858 0.3271 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Page 44: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

• Now lets add another variable say SEWER and assume we have done all testing

Call:lm(formula = leslie_salt[, 1] ~ leslie_salt[, 6] + leslie_salt[, 5])

Residuals: Min 1Q Median 3Q Max -1.21681 -0.21980 0.08597 0.29875 0.81520

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.442e+00 2.442e-01 14.093 3.07e-14 ***leslie_salt[, 6] 1.643e-02 3.841e-03 4.278 0.000199 ***leslie_salt[, 5] -1.105e-04 3.797e-05 -2.910 0.007013 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.51 on 28 degrees of freedomMultiple R-squared: 0.5275, Adjusted R-squared: 0.4937 F-statistic: 15.63 on 2 and 28 DF, p-value: 2.766e-05

Analysis of Variance Table

Response: leslie_salt[, 1] Df Sum Sq Mean Sq F value Pr(>F) leslie_salt[, 6] 1 5.9282 5.9282 22.7903 5.146e-05 ***leslie_salt[, 5] 1 2.2024 2.2024 8.4671 0.007013 ** Residuals 28 7.2833 0.2601 ---

Page 45: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

How much improvement do we have?Our aim is to check whether the imrovement in RSS is statistically significant or not.Define

Numerator measures average improvement as we add a new variable (we may add a bunch of new variables) and scales the improvement with respect to original model.The degrees of freedom of the statistic is (1,degrees of freedom of old model)

Page 46: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

In our case

So, new model is superior, the improvement is statistically significant.

Page 47: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Back to moderator effect.

To test the moderator effect we use,

as the simple model and

as the extended model and then decide accordingly.

Page 48: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Mediation

Page 49: Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.

Project (Step2,3 and 4):

• Find the best regression equation for your Project.• Test moderator effects• Test mediation effects.