QTIA Report
-
Upload
ibad-rehman -
Category
Documents
-
view
104 -
download
1
Transcript of QTIA Report
Quantitative Techniques In AnalysisMultiple Regression Model
1
IBAD UR REHMAN - 5873
SAAD AHMED - 5810
SHOAIB SHAIKH - 5785
MBA (M)
MULTIPLE REGRESSION MODEL – PREDICTION OF MATH ACHIEVEMENTS
SUBMITTED TO: USMAN ALI WARRAICH
Quantitative Techniques In AnalysisMultiple Regression Model
2
Objectives of Multiple Regression Model
The Multiple regression models are appropriate in either of the two types of research problems:
1. Prediction2. Explanation
In our regression model we will predict a particular dependent variable with the help of some predictors (independent variables). After that we will explain the output by examining the regression coefficients, their magnitude signs and statistical significances. By doing this we can determine the relative importance of each individual variable in the prediction of the dependent variable and the nature of the relationships between the independent variables and the dependent variable.
Research Model
Our Multiple regression problem selected for this assignment is:
Prediction of math achievement by a combination of variables: motivation, competence, pleasure, grades in high school, parent's education, and gender
That is we will evaluate and conclude how well these variables predict and explain the dependent variable math achievement.
Dependent and Independent Variables
Multiple Linear Regression is a dependence technique therefore we must specify which variable is dependent and which are independent. The Predictors initially included in the research problem are six in number (five scale and one nominal dichotomous). They are:
1. Motivation: A scale variable measuring motivation level of students. The values lie in between 1 and 4.
2. Pleasure: Another scale variable measuring personal pleasure level of students. It defines and differentiates the pleasure in life a student got. The values lie in between 1 and 4.
3. Grades in High School: It is also included in the prediction model which showing the grades obtained by the cases scaled from 8 to 1, indicating grade A to D.
4. Competence: It is another subjective predictor indicating the ability, aptitude or IQ of the students which is again marked from 1 to 4 scores.
5. Parent’s Education: A scale variable, measures the parent’s education of the students from scale 2 to 10 where 2 indicates lowest education level i.e. less than high school grade and 10 shows the highest i.e. PhD.
6. Gender: A nominal variable which is dichotomous (having 2 categories) can be included as a predictor in the multiple regression model. Here gender includes two categories, male and female numbered ‘0’ and ‘1’ respectively.
Quantitative Techniques In AnalysisMultiple Regression Model
3
The dependent variable, which is to be predicted here, is Math Achievement Test. We will see if math achievement can be predicted better from a combination of these predictors. The table in the test run showing the variables entered is given below.
Variables Entered/Removedb
Model Variables
Entered
Variables
Removed Method
dim
ensi
on0
1 competence
scale, gender,
parents'
education,
pleasure scale,
grades in h.s.,
motivation scalea
. Enter
a. All requested variables entered.
b. Dependent Variable: math achievement test
Sample Size
It is the most influential element in our control in the research problem and it plays an important role in assessing the statistical power of an analysis. Our research problem includes 75 cases for each variable.
Assumptions in Multiple Regression Analysis
Multiple regression analysis is based on certain assumptions. Assumptions tested for this model are:
1. Normality2. Multicollinearity
1) Normality
We have to check normal distribution of residuals because calculation of confidence interval and various significance test for coefficient are all based on the assumption of normally distributed errors therefore, it should be followed. For Checking if the residuals are following normal distribution we have to consider the test of normality.
Quantitative Techniques In AnalysisMultiple Regression Model
4
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
math achievement test .071 75 .200* .966 75 .040
a. Lilliefors Significance Correction
*. This is a lower bound of the true significance.
The hypothesis of this table is “if significant value is greater than 0.05 then the residuals are following normal distribution”.According to the results of the table Kolmogorov-Smirnov the significant value is greater than 0.05 therefore, we can conclude that the errors are normally distributed.
2) Multicollinearity
Multicollinearity (or collinearity) occurs when there are high inter-correlations among some set of the predictor variables. In other words, multicollinearity happens when two or more predictors contain much of the same information. For checking multicollinearity we have to consider the collinearity statistics of the following table.
ModelCollinearity Statistics
Tolerance VIF(Constant)
gender .811 1.233
grades in h.s. .747 1.339
competence scale .549 1.821
motivation scale .641 1.561
pleasure scale .776 1.289
parents' education .796 1.256
Tolerance and VIF give the same information. (Tolerance = 1 /VIF) They tell us if there is multicollinearity. If the tolerance value is less than 0.5 or VIF is greater than 2, then there is probably a problem with multicollinearity. Here, the tolerance values for all predictors are greater than 0.5 therefore there is low colllinearity among the predictors. Competence scale is quite near to the threshold, possibly correlated with the motivation scale.
After checking these assumptions on which the model is presumed to be based, we can proceed to the further tests and results to assess our model.
Quantitative Techniques In AnalysisMultiple Regression Model
5
TEST RUN 1
Statistical Significance of the Model
The hypothesis of the ANOVA table is, “There is no relationship between the predictors and the outcome”
The significance value in the ANOVA table given below indicates that the combination of the independent variables significantly predicts the dependent variable. If the value is less than 0.05, i.e. the p-value, than it is considered to be statistically significant and the null hypothesis is rejected.
ANOVAb
Model Sum of Squares df Mean Square F Sig.
1 Regression 1523.461 6 253.910 9.284 .000a
Residual 1750.333 64 27.349
Total 3273.794 70
a. Predictors: (Constant), parents' education, pleasure scale, gender, motivation scale, grades in
h.s., competence scale
b. Dependent Variable: math achievement test
Here the Sig. value is less the 0.05 and it shows that the above predictors are contributing in the model and predicting the math achievement results significantly.The ANOVA table shows that F— 9.284 and is significant. This also indicates that the combination of thepredictors significantly predict math achievement.
Model Summary Table
Model Summary
Model R R Square Adjusted R Square
Std. Error of the Estimate
1 .682 .465 .415 5.22962
a. Predictors: (Constant), parents' education, pleasure scale, gender, motivation scale, grades in h.s., competence scale
The Model Summary table shows that the multiple correlation coefficient (R), using all the predictors, is .682, R square is 0.465 and the adjusted R square is .415, meaning that 46% of the variance in math achievement can be predicted from gender, competence and other variables combined. Note that the adjusted R square is lower than the unadjusted. This is because it takes into account the sample size and the number of independent variables in the regression model.
Quantitative Techniques In AnalysisMultiple Regression Model
6
Coefficients Table
It is a very important table is the multiple regression model analysis. Here the individual predictors are assessed. The results in this table show which variable is significantly contributing in the model and which is not. It also tells the relative importance of the predictors and the type of relationship with the dependent variable.
The hypothesis of this table is, “B (beta) = 0”. We do not reject the null hypothesis if the sig. value is greater than 0.05.
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig.B Std. Error Beta
1 (Constant) -7.859 4.546 -1.729 .089
Gender -3.729 1.382 -.274 -2.698 .009
grades in h.s. 1.966 .455 .457 4.325 .000
competence scale .360 1.253 .035 .287 .775
motivation scale 1.701 1.207 .161 1.410 .164
pleasure scale .852 1.180 .075 .722 .473
parents' education .592 .314 .193 1.883 .064
a. Dependent Variable: math achievement test
In this table there are different columns to explain. The t value and the Sig opposite each independent variable indicates whether that variable is significantly contributing to the equation for predicting math achievement from the whole set of predictors. Thus, high school grades and gender, in this table, are the only variables that are significantly adding to the prediction as their sig. values are less than 0.05, hence rejecting the null hypothesis.
The Standardized beta coefficients in the table are showing the relative importance of each predictor in the model. Grades in high school with the highest B value of 0.475 is contributing the most in the research model while competence scale with the value of 0.035 is contributing the least.
The signs of the Beta values are showing the type of relationship a predictor possesses with the predicted variable. Except gender, all other predictors are showing positive relationship with the dependent variable (math achievement). Individual interpretation of the predictors will be explained later.
The magnitude of the Beta coefficients explains how much change in math achievement scores would take place with the per unit change in each predictor. Here an additional one mark in high school grades
Quantitative Techniques In AnalysisMultiple Regression Model
7
would increase the math achievement score by 1.966. Other variables will be interpreted later on in the report.
According to the coefficients table, the sig values of competence scale, pleasure scale, motivation and parent’s education are not statistically significant. It means they are not contributing significantly in our research model, therefore to increase the overall prediction capability of the model i.e. to increase the F value we have to eliminate the insignificant variables.
TEST RUN 2
Now we re-run the test by removing the predictor with the highest sig. value that is competence scale with the sig. value of 0.775. Rests of the predictors are included.
Dependent and Independent variables
Variables Entered/Removedb
Model Variables
Entered
Variables
Removed Method
dim
ensi
on0
1 parents'
education,
pleasure scale,
gender, grades
in h.s.,
motivation scalea
Competence
scale.
Enter
a. All requested variables entered.
b. Dependent Variable: math achievement test
Following are the results obtained:
Statistical Significance of the Model
Quantitative Techniques In AnalysisMultiple Regression Model
8
ANOVAb
Model Sum of Squares df Mean Square F Sig.
1 Regression 1530.302 5 306.060 11.673 .000a
Residual 1756.774 67 26.221
Total 3287.076 72
a. Predictors: (Constant), parents' education, pleasure scale, gender, grades in h.s., motivation
scale
b. Dependent Variable: math achievement test
Here in the table the F value is increased in the new model proving that some insignificant variable has been removed.
Model SummaryModel Summary
Model R R Square Adjusted R Square
Std. Error of the Estimate
1 .682 .466 .426 5.12060
a. Predictors: (Constant), parents' education, pleasure scale, gender, grades in h.s., motivation scale
In the table, interestingly the R square value increase as compare to the previous model summary’s result despite using less number of predictors. It indicates that the new model is predicting more of the variance in math achievement than the previous model. This was due to the competence scale variable with the little predictive power and might because of its correlation with the other predictors, which may decrease their predictive power as well. Now almost 47% variation in math achievement test can be explained by the model.
Coefficients Table
Quantitative Techniques In AnalysisMultiple Regression Model
9
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig.B Std. Error Beta
1 (Constant) -7.707 4.423 -1.742 .086
Gender -3.827 1.305 -.284 -2.933 .005
grades in h.s. 2.051 .407 .482 5.042 .000
motivation scale 1.872 1.022 .177 1.832 .071
pleasure scale .946 1.067 .084 .886 .379
parents' education .548 .283 .187 1.934 .057
a. Dependent Variable: math achievement test
In these results, still gender and grades in high school are significantly predicting math achievement, while other predictors are not. Pleasure scale as being the weakest predictor should be removed in the next cycle.
TEST RUN 3In this step we are removing the pleasure scale, which produced the p value equal to 0.379 and analyzing the results.
Dependent and Independent variables
Variables Entered/Removedb
Model Variables
Entered
Variables
Removed Method
dim
ensi
on0
1 parents'
education,
motivation scale,
grades in h.s.,
gendera
Pleasure scale. Enter
a. All requested variables entered.
b. Dependent Variable: math achievement test
Quantitative Techniques In AnalysisMultiple Regression Model
10
Variables Entered/Removedb
Model Variables
Entered
Variables
Removed Method
dim
ensi
on0
1 parents'
education,
motivation scale,
grades in h.s.,
gendera
Pleasure scale. Enter
a. All requested variables entered.
Statistical Significance of the Model
ANOVAb
Model Sum of Squares df Mean Square F Sig.
1 Regression 1509.723 4 377.431 14.440 .000a
Residual 1777.353 68 26.138
Total 3287.076 72
a. Predictors: (Constant), parents' education, motivation scale, grades in h.s., gender
b. Dependent Variable: math achievement test
F value is now further increased up to 14.44 in the new model as compare to the previous model’s 11.673, proving that an insignificant variable has been removed.
Model SummaryModel Summary
Model R R Square Adjusted R Square
Std. Error of the Estimate
1 .678 .459 .427 5.11249
a. Predictors: (Constant), parents' education, motivation scale, grades in h.s., gender
In this table, R square value has decreased, but very minutely, as compare to the previous model summary’s result because now we are using less number of predictors. The gradual decrease is due to the removal of the predictor (pleasure scale) which was not affecting the model significantly. It means
Quantitative Techniques In AnalysisMultiple Regression Model
11
pleasure scale was not contributing much in explaining variation in math achievement. Now 46% variation in math achievement test is explained by the model.
Coefficients Table
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig.B Std. Error Beta
1 (Constant) -5.444 3.605 -1.510 .136
gender -3.631 1.284 -.269 -2.828 .006
grades in h.s. 1.991 .400 .468 4.972 .000
motivation scale 2.148 .972 .203 2.211 .030
parents' education .580 .280 .198 2.070 .042
a. Dependent Variable: math achievement test
Finally all the predictors are now contributing significantly in the model in predicting math achievement. Each predictor’s p value is less than 0.05. We can evaluate the same results in one step through backward method.
Backward Method
Variables
Quantitative Techniques In AnalysisMultiple Regression Model
12
Variables Entered/Removedb
Model Variables
Entered
Variables
Removed Method
dime
nsio
1 parents'
education,
pleasure scale,
gender,
motivation scale,
grades in h.s.,
competence
scalea
. Enter
2 . competence
scale
Backward
(criterion:
Probability of
F-to-remove >=
.100).
3 . pleasure scale Backward
(criterion:
Probability of
F-to-remove >=
.100).
a. All requested variables entered.
b. Dependent Variable: math achievement test
Model Summary
Quantitative Techniques In AnalysisMultiple Regression Model
13
Model Summary
Model
R R Square
Adjusted R
Square
Std. Error of the
Estimate
dime
nsio
1 .682a .465 .415 5.22962
2 .682b .465 .423 5.19258
3 .677c .458 .425 5.18352
a. Predictors: (Constant), parents' education, pleasure scale, gender,
motivation scale, grades in h.s., competence scale
b. Predictors: (Constant), parents' education, pleasure scale, gender,
motivation scale, grades in h.s.
c. Predictors: (Constant), parents' education, gender, motivation scale,
grades in h.s.
ANOVAd
Model Sum of Squares df Mean Square F Sig.
1 Regression 1523.461 6 253.910 9.284 .000a
Residual 1750.333 64 27.349
Total 3273.794 70
2 Regression 1521.208 5 304.242 11.284 .000b
Residual 1752.586 65 26.963
Total 3273.794 70
3 Regression 1500.446 4 375.111 13.961 .000c
Residual 1773.348 66 26.869
Total 3273.794 70
a. Predictors: (Constant), parents' education, pleasure scale, gender, motivation scale, grades in
h.s., competence scale
b. Predictors: (Constant), parents' education, pleasure scale, gender, motivation scale, grades in
h.s.
c. Predictors: (Constant), parents' education, gender, motivation scale, grades in h.s.
d. Dependent Variable: math achievement test
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig.B Std. Error Beta
1 (Constant) -7.859 4.546 -1.729 .089
gender -3.729 1.382 -.274 -2.698 .009
grades in h.s. 1.966 .455 .457 4.325 .000
Quantitative Techniques In AnalysisMultiple Regression Model
14
competence scale .360 1.253 .035 .287 .775
motivation scale 1.701 1.207 .161 1.410 .164
pleasure scale .852 1.180 .075 .722 .473
parents' education .592 .314 .193 1.883 .064
2 (Constant) -7.750 4.498 -1.723 .090
gender -3.750 1.370 -.275 -2.737 .008
grades in h.s. 2.007 .429 .467 4.681 .000
motivation scale 1.873 1.039 .177 1.803 .076
pleasure scale .967 1.102 .085 .878 .383
parents' education .591 .312 .193 1.893 .063
3 (Constant) -5.462 3.659 -1.493 .140
gender -3.518 1.342 -.258 -2.621 .011
grades in h.s. 1.947 .423 .453 4.608 .000
motivation scale 2.158 .985 .204 2.190 .032
parents' education .627 .309 .204 2.031 .046
a. Dependent Variable: math achievement test
In the above tables, approximately same results are deduced as were calculated when individually
predictors were removed in 3 steps.
Excluding the Constant
The sig. value of constant term is greater than 0.05 therefore it is insignificant to the model. We can
exclude the constant term. The results are as under:
Model Summary
Model R R Squareb
Adjusted R Square
Std. Error of the Estimate
1 .935 .875 .863 5.30902
2 .935 .875 .865 5.26945
3 .935 .875 .867 5.23083
4 .933 .870 .864 5.29067
a. Predictors: parents' education, gender, motivation scale, grades in
h.s., pleasure scale, competence scale
b. For regression through the origin (the no-intercept model), R
Square measures the proportion of the variability in the dependent
variable about the origin explained by regression. This CANNOT be
compared to R Square for models which include an intercept.
c. Predictors: parents' education, gender, motivation scale, grades in
h.s., pleasure scale
Quantitative Techniques In AnalysisMultiple Regression Model
15
d. Predictors: parents' education, gender, motivation scale, grades in
h.s.
e. Predictors: parents' education, gender, grades in h.s.
By excluding the constant term the R square become 87%. It means 87% variation in math achievement is explained by the independent variables.
ANOVAf,g
Model Sum of Squares df Mean Square F Sig.
1 Regression 12799.551 6 2133.258 75.686 .000a
Residual 1832.071 65 28.186
Total 14631.622b 71
2 Regression 12798.992 5 2559.798 92.188 .000c
Residual 1832.630 66 27.767
Total 14631.622b 71
3 Regression 12798.397 4 3199.599 116.938 .000d
Residual 1833.225 67 27.362
Total 14631.622b 71
4 Regression 12728.224 3 4242.741 151.574 .000e
Residual 1903.398 68 27.991
Total 14631.622b 71
a. Predictors: parents' education, gender, motivation scale, grades in h.s., pleasure scale,
competence scale
b. This total sum of squares is not corrected for the constant because the constant is zero for
regression through the origin.
c. Predictors: parents' education, gender, motivation scale, grades in h.s., pleasure scale
d. Predictors: parents' education, gender, motivation scale, grades in h.s.
e. Predictors: parents' education, gender, grades in h.s.
f. Dependent Variable: math achievement test
g. Linear Regression through the OriginFrom the above ANOVA table we can see the increased F value (151.574) by the exclusion of insignificant constant term.
Coefficientsa,b
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig.B Std. Error Beta
Quantitative Techniques In AnalysisMultiple Regression Model
16
1 gender -4.077 1.388 -.208 -2.937 .005
grades in h.s. 1.679 .430 .691 3.908 .000
competence scale .179 1.268 .042 .141 .888
motivation scale 1.073 1.168 .220 .918 .362
pleasure scale -.198 1.027 -.044 -.193 .848
parents' education .549 .318 .186 1.727 .089
2 gender -4.085 1.376 -.208 -2.968 .004
grades in h.s. 1.701 .396 .700 4.295 .000
motivation scale 1.163 .968 .239 1.202 .234
pleasure scale -.133 .911 -.030 -.146 .884
parents' education .549 .316 .186 1.740 .087
3 gender -4.154 1.284 -.212 -3.234 .002
grades in h.s. 1.695 .391 .697 4.336 .000
motivation scale 1.061 .662 .218 1.601 .114
parents' education .539 .306 .182 1.763 .083
4 gender -4.068 1.298 -.207 -3.134 .003
grades in h.s. 2.108 .297 .867 7.092 .000
parents' education .647 .302 .219 2.146 .035
a. Dependent Variable: math achievement test
b. Linear Regression through the Origin
In the coefficient table, when the constant is excluded only three variables are left in the model which are explaining the variation in math achievement test.
In the further discussions we are including the insignificant constant term in the multiple regression equation predicting math achievement score because this exclusion has changed the results greatly.
Conclusion
After this discussion we came to a conclusion that in our research model, i.e prediction of math
achievement test scores with the help of six predictors, only four predictors are contributing
significantly. These four predictors are;
1) Gender
Quantitative Techniques In AnalysisMultiple Regression Model
17
2) Grades in High School
3) Motivation Scale and,
4) Parents Education
Multiple Regression Equation
Y` = β₁X₁ + β₂X₂ + β₃X₃ + β₄X₄ + C
Where Y`=Predicted Math Achievement
X₁=Gender
X₂=Grades in High School
X₃=Motivation scale
X₄=Parent’s Education
C=Constant term
Interpretation of the equation
With the help of the coefficient table on page # 10, we can interpret the beta values of the predictors in the equation.
1) GenderThe beta value of gender is -3.631. We have nominated 0 to male and 1 to female, therefore the beta means that a female would score 3.631 less than a male in math achievement test.
2) Grades in high schoolThe magnitude of beta is 1.991 which explains that per unit increase in grades in high school of students would predict an increase of 1.991 in math achievement.
3) Motivation ScaleHere the beta of 2.148 interprets that one unit increase in motivation would increase math achievement by 2.148 units.
4) Parent’s EducationThe beta here is small. Increase in one level of education of parents would increase the math achievement score by only 0.58.
5) Constant
Quantitative Techniques In AnalysisMultiple Regression Model
18
The constant term in this equation is -5.444. That is a student will score in negative if all the predictor’s beta are equal to zero.
Therefore we can conclude that the model (gender, parent’s education, motivation and grades in high school) can predict the math achievement test.
Future Use of the Equation
1. Our multiple regression model can be used for the purpose of forecasting. Math achievement scores can be predicted or forecasted in future as well.2. With the combination of these variables we can also predict other type of tests similar to the math achievement test.3. This model can be used in other quantitative techniques like logistic regression.
Limitations:
1. The variables initially used in the model were subjective like Pleasure, Motivation and Competence which are difficult to be used as a scalar quantity. (Subject to which was obtained by providing various group and individual tests)2. Pleasure to students provided should possess a limit as increase in the level of pleasure will affect the math achievement subsequently.3. The constant term is included in the equation even when it’s p value was insignificant to show realistic prediction of math achievement with the help of given predictors.