Multiple Linear Regression

10

Click here to load reader

Transcript of Multiple Linear Regression

Page 1: Multiple Linear Regression

REGRESSION

/DESCRIPTIVES MEAN STDDEV CORR SIG N

/MISSING LISTWISE

/STATISTICS COEFF OUTS CI(95) R ANOVA COLLIN TOL ZPP

/CRITERIA=PIN(.05) POUT(.10)

/NOORIGIN

/DEPENDENT Satisfaction

/METHOD=ENTER ProductQuality EcommerceActivity ComplaintResolution Sales

forceImage ProductLine FirmSize

/SCATTERPLOT=(*ZRESID ,*ZPRED) (*SRESID ,*ZPRED)

/RESIDUALS HIST(ZRESID) NORM(ZRESID).

Regression

Output Created

Comments

Active Dataset

Filter

Weight

Split File

N of Rows in Working Data File

Definition of Missing

Cases Used

Syntax

Processor Time

Elapsed Time

Memory Required

Additional Memory Required for Residual Plots

Input

Missing Value Handling

Resources

1128 bytes

3276 bytes

0:00:02.276

0:00:02.200

REGRESSION /DESCRIPTIVES MEAN STDDEV CORR SIG N /MISSING LISTWISE /STATISTICS COEFF OUTS CI(95) R ANOVA COLLIN TOL ZPP /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT Satisfaction /METHOD=ENTER ProductQuality EcommerceActivity ComplaintResolution SalesforceImage ProductLine FirmSize /SCATTERPLOT=(*ZRESID ,*ZPRED) (*SRESID ,*ZPRED) /RESIDUALS HIST(ZRESID) NORM(ZRESID).

Statistics are based on cases with no missing values for any variable used.

User-defined missing values are treated as missing.

100

<none>

<none>

<none>

DataSet1

12-Sep-2012 21:27:57

Notes

[DataSet1]

Page 1

Page 2: Multiple Linear Regression

NStd. DeviationMeanSatisfaction

Product Quality

E-commerce Activity

Complaint Resolution

Salesforce Image

Product Line

Firm Size 100.502.51

1001.31535.805

1001.07235.123

1001.20845.442

100.70053.672

1001.39637.810

1001.19186.918

Descriptive Statistics

E-commerce Activity

Product QualitySatisfaction

Satisfaction

Product Quality

E-commerce Activity

Complaint Resolution

Salesforce Image

Product Line

Firm Size

Satisfaction

Product Quality

E-commerce Activity

Complaint Resolution

Salesforce Image

Product Line

Firm Size

Satisfaction

Product Quality

E-commerce Activity

Complaint Resolution

Salesforce Image

Product Line

Firm Size

Pearson Correlation

Sig. (1-tailed)

N

100100100

100100100

100100100

100100100

100100100

100100100

100100100

.273.159.011

.301.000.000

.000.066.000

.082.146.000

..087.002

.087..000

.002.000.

.061-.101.229

-.053.477.551

.792-.152.500

.140.106.603

1.000-.137.283

-.1371.000.486

.283.4861.000

Correlations

Page 2

saurabh
Typewriter
The descriptive statistics indicate that apart from firm size, all the others have standard deviations which are not very dispersed indicating that the data is not volatile and probably transformation is not required. The means are very much acceptable for product line, sales force image and complaint resolution.
saurabh
Typewriter
The correlation matrix indicates that none of the significant correlations are above the cut off of .8 indicating that collinearit y should not be a problem. the highest correlation is between satisfaction and complaint resolution at.603 and is also significant.
saurabh
Typewriter
saurabh
Typewriter
saurabh
Typewriter
saurabh
Typewriter
Page 3: Multiple Linear Regression

Salesforce Image

Complaint Resolution

Satisfaction

Product Quality

E-commerce Activity

Complaint Resolution

Salesforce Image

Product Line

Firm Size

Satisfaction

Product Quality

E-commerce Activity

Complaint Resolution

Salesforce Image

Product Line

Firm Size

Satisfaction

Product Quality

E-commerce Activity

Complaint Resolution

Salesforce Image

Product Line

Firm Size

Pearson Correlation

Sig. (1-tailed)

N

100100

100100

100100

100100

100100

100100

100100

.002.189

.272.000

..011

.011.

.000.082

.066.146

.000.000

.289.089

-.061.561

1.000.230

.2301.000

.792.140

-.152.106

.500.603

Correlations

Page 3

saurabh
Highlight
saurabh
Highlight
Page 4: Multiple Linear Regression

Firm SizeProduct LineSatisfaction

Product Quality

E-commerce Activity

Complaint Resolution

Salesforce Image

Product Line

Firm Size

Satisfaction

Product Quality

E-commerce Activity

Complaint Resolution

Salesforce Image

Product Line

Firm Size

Satisfaction

Product Quality

E-commerce Activity

Complaint Resolution

Salesforce Image

Product Line

Firm Size

Pearson Correlation

Sig. (1-tailed)

N

100100

100100

100100

100100

100100

100100

100100

..039

.039.

.002.272

.189.000

.273.301

.159.000

.011.000

1.000-.177

-.1771.000

.289-.061

.089.561

.061-.053

-.101.477

.229.551

Correlations

MethodVariables Removed

Variables Entered

1 Enter.Firm Size, E-commerce Activity, Complaint Resolution, Product Quality, Product Line, Salesforce Image

a

ModelModel

Variables Entered/Removedb

a. All requested variables entered.

b. Dependent Variable: Satisfaction

Std. Error of the Estimate

Adjusted R SquareR SquareR

1 .5483.788.801.895a

ModelModel

Model Summaryb

a. Predictors: (Constant), Firm Size, E-commerce Activity, Complaint Resolution, Product Quality, Product Line, Salesforce Image

b. Dependent Variable: Satisfaction

Page 4

saurabh
Typewriter
saurabh
Typewriter
The model summary is indicating to the fact that 1)Multiple R is the correlation coefficient between the predictor and the outcome. Square of this R is the R square. 2) R square is the correlation coefficient squared (.895^2=.801). this is also referred to as the coefficient of determination. this value indicates the 80 % of the total variation of satisfaction is explained by the regression model consisting of Product Quality, E-commerce Activity, Complaint Resolution, Salesforce Image, Product Line, Firm Size
saurabh
Typewriter
saurabh
Typewriter
Adjusted R square takes into account the number of independent variables included in the regression equation and the sample size. (R^2 - adjusted R^2) = SHRINKAGE. The adjusted value tells us that 78.8 % of the variance in satisfaction would be accounted for if the model is derived from the population from which the sample is taken.
saurabh
Typewriter
saurabh
Typewriter
Standard error of estimate is the expected standard deviation of the prediction error given as: (square root of Sum of squared errors / n-2)= square root of 27.958/ 98 = .548. this value represents an estimate of the standard deviation of the actual dependent values around the regression line.
saurabh
Typewriter
saurabh
Typewriter
saurabh
Typewriter
Page 5: Multiple Linear Regression

Sig.FMean SquaredfSum of

SquaresRegression

Residual

Total

1

99140.628

.3019327.958

.000a

62.46418.7786112.669ModelModel

ANOVAb

a. Predictors: (Constant), Firm Size, E-commerce Activity, Complaint Resolution, Product Quality, Product Line, Salesforce Image

b. Dependent Variable: Satisfaction

Std. ErrorB Beta Sig.t

Standardized CoefficientsUnstandardized Coefficients

(Constant)

Product Quality

E-commerce Activity

Complaint Resolution

Salesforce Image

Product Line

Firm Size

1

.0302.207.114.123.271

.0013.323.224.061.203

.0007.507.631.093.701

.0004.994.304.060.300

.015-2.473-.196.135-.333

.0007.881.427.046.365

.013-2.542.492-1.250ModelModel

Coefficientsa

a. Dependent Variable: Satisfaction

Upper BoundLower Bound PartPartialZero-order

Correlations95.0% Confidence Interval for B

(Constant)

Product Quality

E-commerce Activity

Complaint Resolution

Salesforce Image

Product Line

Firm Size

1

.102.223.229.515.027

.154.326.551.325.082

.347.614.500.887.516

.231.460.603.419.181

-.114-.248.283-.066-.601

.364.633.486.457.273

-.274-2.227ModelModel

Coefficientsa

a. Dependent Variable: Satisfaction

VIFTolerance

Collinearity Statistics

(Constant)

Product Quality

E-commerce Activity

Complaint Resolution

Salesforce Image

Product Line

Firm Size

1

1.253.798

2.130.469

3.304.303

1.736.576

2.935.341

1.375.727

ModelModel

Coefficientsa

a. Dependent Variable: Satisfaction

Page 5

saurabh
Typewriter
The H0 : B1 =B2=B3=B4=B5 =B6= 0; H1: Bi =/ 0. If the F test is significant then the H1 is accepted. The ANOVA table provides a statistical test for the overall model fit in terms of the F ratio. the Total Sum of square (140.628) is the squared error if only the mean of satisfaction us used to predict the dependent variable. Using the 6 independent variables reduces the error by (112.669/140.628) *100 = 80.12%. The reduction is deemed statistically significant with an F ration of 64.24 and a significance level of .000.
saurabh
Typewriter
(n-1)
saurabh
Typewriter
(n-p-1)
saurabh
Typewriter
saurabh
Typewriter
saurabh
Typewriter
saurabh
Typewriter
p
saurabh
Typewriter
saurabh
Typewriter
saurabh
Typewriter
saurabh
Typewriter
The zero order correlation is the pearson correlation between each independent variable and the dependent variable. the pearson correlation between product line and satisfaction is .551. The partial correlation between product line and satisfaction controlling for all the other independent variables is .326. The part correlation between product line and satisfaction is .154. it represents the unique relationship between the predictor and the dependent variable.
saurabh
Typewriter
VIF and tolerance are used to diagnose multicollinearity. None of the independent variables have a VIF above 10 or tolerance below .10 indicating that mulitcollinearity is not an issue in the above regression model. usually VIF = (1/ TOL)
saurabh
Typewriter
saurabh
Typewriter
use the standardized estimates to write the regression model.
saurabh
Typewriter
the standard error of the regression coefficient is an estimate of how much the regression coefficient will vary between samples of the same size taken from the same population. that is s.e of prodct line is the standard deviation of the estimate across multiple samples. A smaller s.e implIes reliable prediction and smaller C.I
saurabh
Typewriter
saurabh
Typewriter
saurabh
Typewriter
saurabh
Typewriter
For a unit change in e-commerce activity on an average satisfaction changes/ decreases by .196 units
saurabh
Typewriter
for an unit change in large firm as compared to a small firm on an average satisfaction changes by .114 units
saurabh
Typewriter
saurabh
Typewriter
Page 6: Multiple Linear Regression

Condition IndexEigenvalue

E-commerce Activity

Product Quality(Constant)

Variance Proportions

1

2

3

4

5

6

7

1

.91.04.1531.388.007

.02.62.6825.994.009

.02.05.1521.129.014

.00.25.0213.285.036

.05.03.009.641.069

.00.00.003.759.454

.00.00.001.0006.411Model DimensionModel Dimension

Collinearity Diagnosticsa

a. Dependent Variable: Satisfaction

Firm SizeProduct LineSalesforce

ImageComplaint Resolution

Variance Proportions

1

2

3

4

5

6

7

1

.10.01.76.06

.02.10.16.12

.04.79.03.44

.01.01.00.36

.06.08.05.02

.77.00.00.00

.01.00.00.00Model DimensionModel Dimension

Collinearity Diagnosticsa

a. Dependent Variable: Satisfaction

NStd. DeviationMeanMaximumMinimumPredicted Value

Std. Predicted Value

Standard Error of Predicted Value

Adjusted Predicted Value

Residual

Std. Residual

Stud. Residual

Deleted Residual

Stud. Deleted Residual

Mahal. Distance

Cook's Distance

Centered Leverage Value 100.031.060.177.019

100.017.011.074.000

1003.0565.94017.4831.912

1001.017-.0042.020-2.595

100.5720-.00111.1317-1.4215

1001.005-.0011.987-2.518

100.969.0001.914-2.446

100.5314.00001.0493-1.3412

1001.07056.9199.4694.491

100.030.142.237.094

1001.000.0002.467-2.223

1001.06686.9189.5504.546

Residuals Statisticsa

a. Dependent Variable: Satisfaction

Charts

Page 6

saurabh
Typewriter
from this table it is evident that under the variance proportion columns most of the variables have different loadings in different dimensions. for example product quality has 65% variance on dimension 6, product line has 79% variance on dimension 5, firm size has 77% variation in dimension 1. thus indicating that their is no multicollinearity in the data. All these dimensions have relatively small eigen value and large variance proportions.
saurabh
Typewriter
Page 7: Multiple Linear Regression

Regression Standardized Residual

210-1-2-3

Fre

qu

ency

15

10

5

0

Histogram

Dependent Variable: Satisfaction

Mean =-4.90E-15Std. Dev. =0.969

N =100

Page 7

saurabh
Typewriter
the histogram looks like a normal distribution. for the HBAT data the distribution of errors is rouughly normal though their is a deficiency of residuals exactly at 0.
Page 8: Multiple Linear Regression

Observed Cum Prob

1.00.80.60.40.20.0

Exp

ecte

d C

um

Pro

b

1.0

0.8

0.6

0.4

0.2

0.0

Normal P-P Plot of Regression Standardized Residual

Dependent Variable: Satisfaction

Page 8

saurabh
Typewriter
The Normality P-P plot also reflects deviations from normality. the straight line in this diagram reflects the normality while the dots reflect the observed residuals. in a perfectly normally distrbuted data set all the residuals will lie on the line. this is pretty much what we see of the HBAT data.
saurabh
Typewriter
Page 9: Multiple Linear Regression

Regression Standardized Predicted Value

3210-1-2-3

Reg

ress

ion

Sta

nd

ard

ized

Res

idu

al

2

1

0

-1

-2

-3

Scatterplot

Dependent Variable: Satisfaction

Page 9

saurabh
Typewriter
The graph below of ZRESID (Yaxis) and ZPRED (Xaxis) should look like a random array of dots evenly dispersed around zero. this graph is used to check for homoscedasticity (i.e. equal variances i.e. Var (u) = sigma square). In our figure below we see that most of the residual values are distributed evenly across zero. this implies we have homoscedasticity, which is one of the major assumptions of regression. heteroskedasticity will generally lead to a funnel like figure.
Page 10: Multiple Linear Regression

Regression Standardized Predicted Value

3210-1-2-3

Reg

ress

ion

Stu

den

tize

d R

esid

ual

2

1

0

-1

-2

-3

Scatterplot

Dependent Variable: Satisfaction

Page 10