QTIA Report

22
Quantitative Techniques In Analysis Multiple Regression Model 1 IBAD UR REHMAN - 5873 SAAD AHMED - 5810 SHOAIB SHAIKH - 5785 MBA (M) MULTIPLE REGRESSION MODEL – PREDICTION OF MATH ACHIEVEMENTS SUBMITTED TO: USMAN ALI WARRAICH

Transcript of QTIA Report

Page 1: QTIA Report

Quantitative Techniques In AnalysisMultiple Regression Model

1

IBAD UR REHMAN - 5873

SAAD AHMED - 5810

SHOAIB SHAIKH - 5785

MBA (M)

MULTIPLE REGRESSION MODEL – PREDICTION OF MATH ACHIEVEMENTS

SUBMITTED TO: USMAN ALI WARRAICH

Page 2: QTIA Report

Quantitative Techniques In AnalysisMultiple Regression Model

2

Objectives of Multiple Regression Model

The Multiple regression models are appropriate in either of the two types of research problems:

1. Prediction2. Explanation

In our regression model we will predict a particular dependent variable with the help of some predictors (independent variables). After that we will explain the output by examining the regression coefficients, their magnitude signs and statistical significances. By doing this we can determine the relative importance of each individual variable in the prediction of the dependent variable and the nature of the relationships between the independent variables and the dependent variable.

Research Model

Our Multiple regression problem selected for this assignment is:

Prediction of math achievement by a combination of variables: motivation, competence, pleasure, grades in high school, parent's education, and gender

That is we will evaluate and conclude how well these variables predict and explain the dependent variable math achievement.

Dependent and Independent Variables

Multiple Linear Regression is a dependence technique therefore we must specify which variable is dependent and which are independent. The Predictors initially included in the research problem are six in number (five scale and one nominal dichotomous). They are:

1. Motivation: A scale variable measuring motivation level of students. The values lie in between 1 and 4.

2. Pleasure: Another scale variable measuring personal pleasure level of students. It defines and differentiates the pleasure in life a student got. The values lie in between 1 and 4.

3. Grades in High School: It is also included in the prediction model which showing the grades obtained by the cases scaled from 8 to 1, indicating grade A to D.

4. Competence: It is another subjective predictor indicating the ability, aptitude or IQ of the students which is again marked from 1 to 4 scores.

5. Parent’s Education: A scale variable, measures the parent’s education of the students from scale 2 to 10 where 2 indicates lowest education level i.e. less than high school grade and 10 shows the highest i.e. PhD.

6. Gender: A nominal variable which is dichotomous (having 2 categories) can be included as a predictor in the multiple regression model. Here gender includes two categories, male and female numbered ‘0’ and ‘1’ respectively.

Page 3: QTIA Report

Quantitative Techniques In AnalysisMultiple Regression Model

3

The dependent variable, which is to be predicted here, is Math Achievement Test. We will see if math achievement can be predicted better from a combination of these predictors. The table in the test run showing the variables entered is given below.

Variables Entered/Removedb

Model Variables

Entered

Variables

Removed Method

dim

ensi

on0

1 competence

scale, gender,

parents'

education,

pleasure scale,

grades in h.s.,

motivation scalea

. Enter

a. All requested variables entered.

b. Dependent Variable: math achievement test

Sample Size

It is the most influential element in our control in the research problem and it plays an important role in assessing the statistical power of an analysis. Our research problem includes 75 cases for each variable.

Assumptions in Multiple Regression Analysis

Multiple regression analysis is based on certain assumptions. Assumptions tested for this model are:

1. Normality2. Multicollinearity

1) Normality

We have to check normal distribution of residuals because calculation of confidence interval and various significance test for coefficient are all based on the assumption of normally distributed errors therefore, it should be followed. For Checking if the residuals are following normal distribution we have to consider the test of normality.

Page 4: QTIA Report

Quantitative Techniques In AnalysisMultiple Regression Model

4

Tests of Normality

Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

math achievement test .071 75 .200* .966 75 .040

a. Lilliefors Significance Correction

*. This is a lower bound of the true significance.

The hypothesis of this table is “if significant value is greater than 0.05 then the residuals are following normal distribution”.According to the results of the table Kolmogorov-Smirnov the significant value is greater than 0.05 therefore, we can conclude that the errors are normally distributed.

2) Multicollinearity

Multicollinearity (or collinearity) occurs when there are high inter-correlations among some set of the predictor variables. In other words, multicollinearity happens when two or more predictors contain much of the same information. For checking multicollinearity we have to consider the collinearity statistics of the following table.

ModelCollinearity Statistics

Tolerance VIF(Constant)    

gender .811 1.233

grades in h.s. .747 1.339

competence scale .549 1.821

motivation scale .641 1.561

pleasure scale .776 1.289

parents' education .796 1.256

Tolerance and VIF give the same information. (Tolerance = 1 /VIF) They tell us if there is multicollinearity. If the tolerance value is less than 0.5 or VIF is greater than 2, then there is probably a problem with multicollinearity. Here, the tolerance values for all predictors are greater than 0.5 therefore there is low colllinearity among the predictors. Competence scale is quite near to the threshold, possibly correlated with the motivation scale.

After checking these assumptions on which the model is presumed to be based, we can proceed to the further tests and results to assess our model.

Page 5: QTIA Report

Quantitative Techniques In AnalysisMultiple Regression Model

5

TEST RUN 1

Statistical Significance of the Model

The hypothesis of the ANOVA table is, “There is no relationship between the predictors and the outcome”

The significance value in the ANOVA table given below indicates that the combination of the independent variables significantly predicts the dependent variable. If the value is less than 0.05, i.e. the p-value, than it is considered to be statistically significant and the null hypothesis is rejected.

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 1523.461 6 253.910 9.284 .000a

Residual 1750.333 64 27.349

Total 3273.794 70

a. Predictors: (Constant), parents' education, pleasure scale, gender, motivation scale, grades in

h.s., competence scale

b. Dependent Variable: math achievement test

Here the Sig. value is less the 0.05 and it shows that the above predictors are contributing in the model and predicting the math achievement results significantly.The ANOVA table shows that F— 9.284 and is significant. This also indicates that the combination of thepredictors significantly predict math achievement.

Model Summary Table

Model Summary

Model R R Square Adjusted R Square

Std. Error of the Estimate

  1 .682 .465 .415 5.22962

a. Predictors: (Constant), parents' education, pleasure scale, gender, motivation scale, grades in h.s., competence scale

The Model Summary table shows that the multiple correlation coefficient (R), using all the predictors, is .682, R square is 0.465 and the adjusted R square is .415, meaning that 46% of the variance in math achievement can be predicted from gender, competence and other variables combined. Note that the adjusted R square is lower than the unadjusted. This is because it takes into account the sample size and the number of independent variables in the regression model.

Page 6: QTIA Report

Quantitative Techniques In AnalysisMultiple Regression Model

6

Coefficients Table

It is a very important table is the multiple regression model analysis. Here the individual predictors are assessed. The results in this table show which variable is significantly contributing in the model and which is not. It also tells the relative importance of the predictors and the type of relationship with the dependent variable.

The hypothesis of this table is, “B (beta) = 0”. We do not reject the null hypothesis if the sig. value is greater than 0.05.

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant) -7.859 4.546 -1.729 .089

Gender -3.729 1.382 -.274 -2.698 .009

grades in h.s. 1.966 .455 .457 4.325 .000

competence scale .360 1.253 .035 .287 .775

motivation scale 1.701 1.207 .161 1.410 .164

pleasure scale .852 1.180 .075 .722 .473

parents' education .592 .314 .193 1.883 .064

a. Dependent Variable: math achievement test

In this table there are different columns to explain. The t value and the Sig opposite each independent variable indicates whether that variable is significantly contributing to the equation for predicting math achievement from the whole set of predictors. Thus, high school grades and gender, in this table, are the only variables that are significantly adding to the prediction as their sig. values are less than 0.05, hence rejecting the null hypothesis.

The Standardized beta coefficients in the table are showing the relative importance of each predictor in the model. Grades in high school with the highest B value of 0.475 is contributing the most in the research model while competence scale with the value of 0.035 is contributing the least.

The signs of the Beta values are showing the type of relationship a predictor possesses with the predicted variable. Except gender, all other predictors are showing positive relationship with the dependent variable (math achievement). Individual interpretation of the predictors will be explained later.

The magnitude of the Beta coefficients explains how much change in math achievement scores would take place with the per unit change in each predictor. Here an additional one mark in high school grades

Page 7: QTIA Report

Quantitative Techniques In AnalysisMultiple Regression Model

7

would increase the math achievement score by 1.966. Other variables will be interpreted later on in the report.

According to the coefficients table, the sig values of competence scale, pleasure scale, motivation and parent’s education are not statistically significant. It means they are not contributing significantly in our research model, therefore to increase the overall prediction capability of the model i.e. to increase the F value we have to eliminate the insignificant variables.

TEST RUN 2

Now we re-run the test by removing the predictor with the highest sig. value that is competence scale with the sig. value of 0.775. Rests of the predictors are included.

Dependent and Independent variables

Variables Entered/Removedb

Model Variables

Entered

Variables

Removed Method

dim

ensi

on0

1 parents'

education,

pleasure scale,

gender, grades

in h.s.,

motivation scalea

Competence

scale.

Enter

a. All requested variables entered.

b. Dependent Variable: math achievement test

Following are the results obtained:

Statistical Significance of the Model

Page 8: QTIA Report

Quantitative Techniques In AnalysisMultiple Regression Model

8

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 1530.302 5 306.060 11.673 .000a

Residual 1756.774 67 26.221

Total 3287.076 72

a. Predictors: (Constant), parents' education, pleasure scale, gender, grades in h.s., motivation

scale

b. Dependent Variable: math achievement test

Here in the table the F value is increased in the new model proving that some insignificant variable has been removed.

Model SummaryModel Summary

Model R R Square Adjusted R Square

Std. Error of the Estimate

  1 .682 .466 .426 5.12060

a. Predictors: (Constant), parents' education, pleasure scale, gender, grades in h.s., motivation scale

In the table, interestingly the R square value increase as compare to the previous model summary’s result despite using less number of predictors. It indicates that the new model is predicting more of the variance in math achievement than the previous model. This was due to the competence scale variable with the little predictive power and might because of its correlation with the other predictors, which may decrease their predictive power as well. Now almost 47% variation in math achievement test can be explained by the model.

Coefficients Table

Page 9: QTIA Report

Quantitative Techniques In AnalysisMultiple Regression Model

9

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant) -7.707 4.423 -1.742 .086

Gender -3.827 1.305 -.284 -2.933 .005

grades in h.s. 2.051 .407 .482 5.042 .000

motivation scale 1.872 1.022 .177 1.832 .071

pleasure scale .946 1.067 .084 .886 .379

parents' education .548 .283 .187 1.934 .057

a. Dependent Variable: math achievement test

In these results, still gender and grades in high school are significantly predicting math achievement, while other predictors are not. Pleasure scale as being the weakest predictor should be removed in the next cycle.

TEST RUN 3In this step we are removing the pleasure scale, which produced the p value equal to 0.379 and analyzing the results.

Dependent and Independent variables

Variables Entered/Removedb

Model Variables

Entered

Variables

Removed Method

dim

ensi

on0

1 parents'

education,

motivation scale,

grades in h.s.,

gendera

Pleasure scale. Enter

a. All requested variables entered.

b. Dependent Variable: math achievement test

Page 10: QTIA Report

Quantitative Techniques In AnalysisMultiple Regression Model

10

Variables Entered/Removedb

Model Variables

Entered

Variables

Removed Method

dim

ensi

on0

1 parents'

education,

motivation scale,

grades in h.s.,

gendera

Pleasure scale. Enter

a. All requested variables entered.

Statistical Significance of the Model

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 1509.723 4 377.431 14.440 .000a

Residual 1777.353 68 26.138

Total 3287.076 72

a. Predictors: (Constant), parents' education, motivation scale, grades in h.s., gender

b. Dependent Variable: math achievement test

F value is now further increased up to 14.44 in the new model as compare to the previous model’s 11.673, proving that an insignificant variable has been removed.

Model SummaryModel Summary

Model R R Square Adjusted R Square

Std. Error of the Estimate

  1 .678 .459 .427 5.11249

a. Predictors: (Constant), parents' education, motivation scale, grades in h.s., gender

In this table, R square value has decreased, but very minutely, as compare to the previous model summary’s result because now we are using less number of predictors. The gradual decrease is due to the removal of the predictor (pleasure scale) which was not affecting the model significantly. It means

Page 11: QTIA Report

Quantitative Techniques In AnalysisMultiple Regression Model

11

pleasure scale was not contributing much in explaining variation in math achievement. Now 46% variation in math achievement test is explained by the model.

Coefficients Table

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant) -5.444 3.605 -1.510 .136

gender -3.631 1.284 -.269 -2.828 .006

grades in h.s. 1.991 .400 .468 4.972 .000

motivation scale 2.148 .972 .203 2.211 .030

parents' education .580 .280 .198 2.070 .042

a. Dependent Variable: math achievement test

Finally all the predictors are now contributing significantly in the model in predicting math achievement. Each predictor’s p value is less than 0.05. We can evaluate the same results in one step through backward method.

Backward Method

Variables

Page 12: QTIA Report

Quantitative Techniques In AnalysisMultiple Regression Model

12

Variables Entered/Removedb

Model Variables

Entered

Variables

Removed Method

dime

nsio

1 parents'

education,

pleasure scale,

gender,

motivation scale,

grades in h.s.,

competence

scalea

. Enter

2 . competence

scale

Backward

(criterion:

Probability of

F-to-remove >=

.100).

3 . pleasure scale Backward

(criterion:

Probability of

F-to-remove >=

.100).

a. All requested variables entered.

b. Dependent Variable: math achievement test

Model Summary

Page 13: QTIA Report

Quantitative Techniques In AnalysisMultiple Regression Model

13

Model Summary

Model

R R Square

Adjusted R

Square

Std. Error of the

Estimate

dime

nsio

1 .682a .465 .415 5.22962

2 .682b .465 .423 5.19258

3 .677c .458 .425 5.18352

a. Predictors: (Constant), parents' education, pleasure scale, gender,

motivation scale, grades in h.s., competence scale

b. Predictors: (Constant), parents' education, pleasure scale, gender,

motivation scale, grades in h.s.

c. Predictors: (Constant), parents' education, gender, motivation scale,

grades in h.s.

ANOVAd

Model Sum of Squares df Mean Square F Sig.

1 Regression 1523.461 6 253.910 9.284 .000a

Residual 1750.333 64 27.349

Total 3273.794 70

2 Regression 1521.208 5 304.242 11.284 .000b

Residual 1752.586 65 26.963

Total 3273.794 70

3 Regression 1500.446 4 375.111 13.961 .000c

Residual 1773.348 66 26.869

Total 3273.794 70

a. Predictors: (Constant), parents' education, pleasure scale, gender, motivation scale, grades in

h.s., competence scale

b. Predictors: (Constant), parents' education, pleasure scale, gender, motivation scale, grades in

h.s.

c. Predictors: (Constant), parents' education, gender, motivation scale, grades in h.s.

d. Dependent Variable: math achievement test

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant) -7.859 4.546 -1.729 .089

gender -3.729 1.382 -.274 -2.698 .009

grades in h.s. 1.966 .455 .457 4.325 .000

Page 14: QTIA Report

Quantitative Techniques In AnalysisMultiple Regression Model

14

competence scale .360 1.253 .035 .287 .775

motivation scale 1.701 1.207 .161 1.410 .164

pleasure scale .852 1.180 .075 .722 .473

parents' education .592 .314 .193 1.883 .064

2 (Constant) -7.750 4.498 -1.723 .090

gender -3.750 1.370 -.275 -2.737 .008

grades in h.s. 2.007 .429 .467 4.681 .000

motivation scale 1.873 1.039 .177 1.803 .076

pleasure scale .967 1.102 .085 .878 .383

parents' education .591 .312 .193 1.893 .063

3 (Constant) -5.462 3.659 -1.493 .140

gender -3.518 1.342 -.258 -2.621 .011

grades in h.s. 1.947 .423 .453 4.608 .000

motivation scale 2.158 .985 .204 2.190 .032

parents' education .627 .309 .204 2.031 .046

a. Dependent Variable: math achievement test

In the above tables, approximately same results are deduced as were calculated when individually

predictors were removed in 3 steps.

Excluding the Constant

The sig. value of constant term is greater than 0.05 therefore it is insignificant to the model. We can

exclude the constant term. The results are as under:

Model Summary

Model R R Squareb

Adjusted R Square

Std. Error of the Estimate

 

1 .935 .875 .863 5.30902

2 .935 .875 .865 5.26945

3 .935 .875 .867 5.23083

4 .933 .870 .864 5.29067

a. Predictors: parents' education, gender, motivation scale, grades in

h.s., pleasure scale, competence scale

b. For regression through the origin (the no-intercept model), R

Square measures the proportion of the variability in the dependent

variable about the origin explained by regression. This CANNOT be

compared to R Square for models which include an intercept.

c. Predictors: parents' education, gender, motivation scale, grades in

h.s., pleasure scale

Page 15: QTIA Report

Quantitative Techniques In AnalysisMultiple Regression Model

15

d. Predictors: parents' education, gender, motivation scale, grades in

h.s.

e. Predictors: parents' education, gender, grades in h.s.

By excluding the constant term the R square become 87%. It means 87% variation in math achievement is explained by the independent variables.

ANOVAf,g

Model Sum of Squares df Mean Square F Sig.

1 Regression 12799.551 6 2133.258 75.686 .000a

Residual 1832.071 65 28.186

Total 14631.622b 71

2 Regression 12798.992 5 2559.798 92.188 .000c

Residual 1832.630 66 27.767

Total 14631.622b 71

3 Regression 12798.397 4 3199.599 116.938 .000d

Residual 1833.225 67 27.362

Total 14631.622b 71

4 Regression 12728.224 3 4242.741 151.574 .000e

Residual 1903.398 68 27.991

Total 14631.622b 71

a. Predictors: parents' education, gender, motivation scale, grades in h.s., pleasure scale,

competence scale

b. This total sum of squares is not corrected for the constant because the constant is zero for

regression through the origin.

c. Predictors: parents' education, gender, motivation scale, grades in h.s., pleasure scale

d. Predictors: parents' education, gender, motivation scale, grades in h.s.

e. Predictors: parents' education, gender, grades in h.s.

f. Dependent Variable: math achievement test

g. Linear Regression through the OriginFrom the above ANOVA table we can see the increased F value (151.574) by the exclusion of insignificant constant term.

Coefficientsa,b

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

Page 16: QTIA Report

Quantitative Techniques In AnalysisMultiple Regression Model

16

1 gender -4.077 1.388 -.208 -2.937 .005

grades in h.s. 1.679 .430 .691 3.908 .000

competence scale .179 1.268 .042 .141 .888

motivation scale 1.073 1.168 .220 .918 .362

pleasure scale -.198 1.027 -.044 -.193 .848

parents' education .549 .318 .186 1.727 .089

2 gender -4.085 1.376 -.208 -2.968 .004

grades in h.s. 1.701 .396 .700 4.295 .000

motivation scale 1.163 .968 .239 1.202 .234

pleasure scale -.133 .911 -.030 -.146 .884

parents' education .549 .316 .186 1.740 .087

3 gender -4.154 1.284 -.212 -3.234 .002

grades in h.s. 1.695 .391 .697 4.336 .000

motivation scale 1.061 .662 .218 1.601 .114

parents' education .539 .306 .182 1.763 .083

4 gender -4.068 1.298 -.207 -3.134 .003

grades in h.s. 2.108 .297 .867 7.092 .000

parents' education .647 .302 .219 2.146 .035

a. Dependent Variable: math achievement test

b. Linear Regression through the Origin

In the coefficient table, when the constant is excluded only three variables are left in the model which are explaining the variation in math achievement test.

In the further discussions we are including the insignificant constant term in the multiple regression equation predicting math achievement score because this exclusion has changed the results greatly.

Conclusion

After this discussion we came to a conclusion that in our research model, i.e prediction of math

achievement test scores with the help of six predictors, only four predictors are contributing

significantly. These four predictors are;

1) Gender

Page 17: QTIA Report

Quantitative Techniques In AnalysisMultiple Regression Model

17

2) Grades in High School

3) Motivation Scale and,

4) Parents Education

Multiple Regression Equation

Y` = β₁X₁ + β₂X₂ + β₃X₃ + β₄X₄ + C

Where Y`=Predicted Math Achievement

X₁=Gender

X₂=Grades in High School

X₃=Motivation scale

X₄=Parent’s Education

C=Constant term

Interpretation of the equation

With the help of the coefficient table on page # 10, we can interpret the beta values of the predictors in the equation.

1) GenderThe beta value of gender is -3.631. We have nominated 0 to male and 1 to female, therefore the beta means that a female would score 3.631 less than a male in math achievement test.

2) Grades in high schoolThe magnitude of beta is 1.991 which explains that per unit increase in grades in high school of students would predict an increase of 1.991 in math achievement.

3) Motivation ScaleHere the beta of 2.148 interprets that one unit increase in motivation would increase math achievement by 2.148 units.

4) Parent’s EducationThe beta here is small. Increase in one level of education of parents would increase the math achievement score by only 0.58.

5) Constant

Page 18: QTIA Report

Quantitative Techniques In AnalysisMultiple Regression Model

18

The constant term in this equation is -5.444. That is a student will score in negative if all the predictor’s beta are equal to zero.

Therefore we can conclude that the model (gender, parent’s education, motivation and grades in high school) can predict the math achievement test.

Future Use of the Equation

1. Our multiple regression model can be used for the purpose of forecasting. Math achievement scores can be predicted or forecasted in future as well.2. With the combination of these variables we can also predict other type of tests similar to the math achievement test.3. This model can be used in other quantitative techniques like logistic regression.

Limitations:

1. The variables initially used in the model were subjective like Pleasure, Motivation and Competence which are difficult to be used as a scalar quantity. (Subject to which was obtained by providing various group and individual tests)2. Pleasure to students provided should possess a limit as increase in the level of pleasure will affect the math achievement subsequently.3. The constant term is included in the equation even when it’s p value was insignificant to show realistic prediction of math achievement with the help of given predictors.