MULTIPLE REGRESSION AND CLASSICAL ASSUMPTION TESTING · Multiple Regression Teaching Materials Agus...
Transcript of MULTIPLE REGRESSION AND CLASSICAL ASSUMPTION TESTING · Multiple Regression Teaching Materials Agus...
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
MULTIPLE REGRESSION AND CLASSICAL ASSUMPTION TESTING
In statistics, linear regression is a linear approach to modeling the relationship between
scalar responses with one or more explanatory variables. The case of one explanatory
variable is called simple linear regression. For more than one explanatory variable it is
called multiple linear regression.
Multiple linear regression is the most common form of linear regression analysis. As a
predictive analysis, multiple linear regression is used to explain the relationship between
one continuous dependent variable and two or more independent variables. The
independent variable can be continuous or category (dummy given the appropriate code).
Assumption:
1. Residual regression must be normally distributed.
2. A linear relationship is assumed between the dependent variable and the independent
variable.
3. The residual is homoscedastic and is roughly rectangular in shape.
4. The absence of multicollinearity is assumed in the model, which means that the
independent variables are not highly correlated.
Benefits of Regression
There are 3 main uses for multiple linear regression analysis.
First, it can be used to identify the strength of the effect that an independent variable has
on the dependent variable.
Second, it can be used to estimate the effect or impact of change. That is, multiple linear
regression analysis helps us to understand how much the dependent variable will change
when we change the independent variable.
Third, multiple linear regression analysis predicts future trends and values. Multiple
linear regression analysis can be used to obtain point estimates.
When choosing a model for multiple linear regression analysis, another important
consideration is the appropriate model. Adding independent variables to the multiple
linear regression model will always increase the amount of variance explained in the
dependent variable (usually expressed as R²). Therefore, adding too many independent
variables without theoretical justification can produce a model that is too appropriate.
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
Example:
Here is the multiple regression equation:
GDP = β0 + β1 GFCF + β2 TR + β3 TRADE + β4 AK+ β5 IVA + ε
atau
LOG(GDP) = β + β1LOG(GFCF) + β2LOG(TR) + β3LOG(TRADE) +
β4LOG(AK) + β5LOG(IVA) + ε
Where
Gross : Domestic Product GDP
GFCF : Gross fixed capital formation
TR : Tax Receipts
TRADE : Trade Value
AK : Work Force
IVA : Industrial Value Added
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
Yes
No
Literature Study
(Previous Theories and Studies)
Identification of Research Variables
and Modeling
Making Hypotheses
Data processing
Model Specification Test and Classical
Assumption Test
Data Collection Process
Model Estimation and
Hypothesis test
Conclusions and Recommendations
Figure 1
Research Steps With Regression
Revition
Fullfil
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
ABCDEFG Country Macro Data
Tahun GDP Kurs GFCF TR Trade IVA
1986 2,047,292,604,338,300 1,283 525,767,839,086,000 14,993,000,000,000 819,472,884,152,016 798,544,781,014,800
1987 2,155,799,176,654,500 1,644 554,680,733,051,900 18,827,000,000,000 998,818,533,420,652 848,962,545,204,100
1988 2,292,814,846,746,000 1,686 618,517,644,265,400 21,435,000,000,000 1,083,459,633,537,290 907,301,991,769,700
1989 2,501,110,525,830,700 1,770 710,782,070,626,400 26,678,000,000,000 1,227,592,114,499,590 1,053,730,190,492,900
1990 2,726,249,821,167,800 1,843 825,058,186,077,300 37,431,600,000,000 1,441,964,271,955,940 1,161,956,026,945,300
1991 2,969,643,847,614,300 1,950 931,494,177,183,200 39,098,000,000,000 1,628,539,758,976,440 1,277,017,158,693,800
1992 3,184,067,029,251,900 2,030 964,890,833,384,600 44,499,900,000,000 1,828,527,952,182,690 1,503,686,756,495,200
1993 3,415,041,652,725,900 2,087 1,028,569,876,323,000 47,344,000,000,000 1,725,393,071,594,540 1,482,119,880,662,300
1994 3,672,538,070,470,900 2,161 1,170,056,947,025,800 60,958,000,000,000 1,905,206,289,918,400 1,647,643,312,108,300
1995 3,980,897,516,197,100 2,249 1,333,804,863,859,200 68,017,000,000,000 2,148,036,149,401,290 1,819,329,402,675,000
1996 4,285,149,005,555,500 2,342 1,527,398,594,230,900 75,810,000,000,000 2,239,622,144,209,320 2,013,806,086,902,600
1997 4,486,545,547,065,900 2,909 1,658,266,039,607,700 100,505,700,000,000 2,512,191,662,450,940 2,117,949,036,253,500
1998 3,897,609,098,897,500 10,014 1,110,903,117,229,100 143,626,700,000,000 3,748,961,858,536,470 1,822,466,026,611,800
1999 3,928,444,247,354,300 7,855 908,769,274,450,800 179,430,400,000,000 2,472,716,527,537,000 1,858,334,024,795,000
2000 4,121,726,241,993,900 8,422 1,060,872,288,362,200 99,643,790,476,190 2,944,432,464,112,750 1,967,791,836,600,200
2001 4,271,899,954,667,000 10,261 1,129,749,087,480,800 190,614,200,000,000 2,981,495,922,803,340 2,021,590,172,211,700
2002 4,464,113,041,849,300 9,311 1,182,784,395,845,000 215,467,500,000,000 2,637,374,038,628,250 2,107,764,749,213,200
2003 4,677,514,123,258,600 8,577 1,189,884,726,120,600 249,404,313,253,852 2,507,919,080,077,860 2,186,913,010,032,200
2004 4,912,833,962,560,100 8,939 1,364,599,072,554,300 283,093,000,000,000 2,935,973,057,506,130 2,273,100,844,234,100
2005 5,192,500,538,917,800 9,705 1,513,164,999,669,000 312,488,056,626,926 3,322,573,914,833,110 2,380,026,639,458,400
2006 5,478,137,490,010,100 9,159 1,552,460,084,009,500 343,625,377,952,311 3,103,755,304,808,870 2,486,855,317,982,500
2007 5,825,726,531,358,900 9,141 1,697,209,598,268,500 374,762,699,277,696 3,194,202,162,925,520 2,604,234,878,138,900
2008 6,176,068,457,506,700 9,699 1,898,942,099,974,500 658,701,000,000,000 3,616,792,130,902,920 2,701,585,275,030,700
2009 6,461,950,710,551,600 10,390 1,961,482,072,067,500 619,922,000,000,000 2,940,970,850,172,160 2,798,525,931,939,900
2010 6,864,133,100,000,000 9,090 2,127,840,682,000,000 723,307,000,000,000 3,205,637,598,235,500 2,936,192,400,000,000
2011 7,287,635,300,000,000 8,770 2,316,359,104,097,800 873,874,000,000,000 3,656,936,354,349,880 3,122,633,400,000,000
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
Tahun GDP Kurs GFCF TR Trade IVA
2012 7,727,083,400,000,000 9,387 2,527,728,790,484,400 980,518,000,000,000 3,831,311,903,721,210 3,288,298,000,000,000
2013 8,156,497,800,000,000 10,461 2,654,375,036,700,900 1,077,310,000,000,000 3,967,106,232,237,770 3,431,080,900,000,000
2014 8,566,271,200,000,000 11,865 2,775,733,579,064,600 1,145,282,928,434,310 4,116,716,178,297,760 3,577,694,800,000,000
2015 8,976,931,500,000,000 13,389 2,916,601,630,562,700 1,164,554,577,386,610 3,764,719,516,265,140 3,672,595,500,000,000
2016 8,164,934,610,302,790 12,935 2,542,077,573,709,210 935,748,526,661,157 4,201,921,841,339,830 3,547,268,352,813,750
2017 8,380,512,700,553,410 13,343 2,612,020,756,005,150 974,202,791,298,637 4,304,178,262,152,610 3,638,770,115,628,690
2018 8,596,090,790,804,030 13,751 2,681,963,938,301,090 1,012,657,055,936,110 4,406,434,682,965,400 3,730,271,878,443,630
Source: Hypothesis Data
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
Type the data in Excell and name the file with the ABCDFEG hypothesis data name, then
close the file.
Open Eviews
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
Appears on the screen
Fill in the Date specification with
Start date : 1986
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
End date : 2018
Then the data that we have created in Excel, we move it to eviews
Click file import Import from file …
it will appear on the screen
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
Locate the file to be imported and click open
Click Next
Click Next
Click Finish
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
Then the screen will appear
How to Regress
Click Quick Estimate Equation …
Then appears
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
Click OK
Then all of our data is log (Logarithmic), so the equation
LS LOG(GDP) C LOG(GFCF) LOG(TR) LOG(TRADE) LOG(AK) LOG(IVA)
Click Estimate
Click OK
The coefficients are
very large and some
are small
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
Classic assumption test
The classic assumption test used in linear regression with the Ordinary Least Squared
(OLS) approach includes Linearity test, Normality test. Autocorrelation test,
Heteroscedasticity test, and Multicollinearity test However, not all classic assumption
tests must be performed on every linear regression model with the OLS approach.
1. Linearity tests are hardly carried out on every linear regression model. Because it is
assumed that the model is linear. Even if it has to be done solely to see the extent of
the linearity.
2. Normality test is basically not a BLUE (Best Linear Unbias Estimator) requirement
and some opinions do not require this condition as something that must be fulfilled.
3. Autocorrelation only occurs in time series data. Testing autocorrelation on data that is
not time series (cross section or panel) will be useless or meaningless.
4. Multicollinearity needs to be done when linear regression uses more than one
independent variable. If only one independent variable, multicollinearity is not
possible.
5. Heteroscedasticity usually occurs in cross section data, where panel data is closer to
the cross section data characteristics than time series.
Normality test
Click View Residual Diagnostics Histogram – Normality Test
The coefficient
shows the elasticity
number
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
The test results are as follows
Probability value> 0.05, then the model has normal distribution data
Linearity Test
Click View Stability Diagnostics Ramsey RESET Test …
Appears on the screen
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
Click OK
Because the probability value of F-statistic <0.05 means that the regression equation is
not linear.
Autocorrelation Test
Click View Residual Diagnostocs Serial Correlation LM Test …
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
will appear
Click OK
Because the probability value Obs * Rsquared <0.05 means the regression equation
contains autocorrelation models
Heteroscedasticity Test
Click View Residual Diagnostocs Heteroskedastisitas Tests …
Then click White
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
Because the probability value Obs * Rsquared> 0.05 means the regression equation does
not contain heteroscedasticity (homoscedasticity)
Multicollinearity Test
The multicollinearity test is used to assess whether there is a correlation or
intercorrelation between independent variables in the regression model or is also
commonly used to determine whether or not there is a deviation from the classic
assumption of multicollinearity, namely the existence of a linear relationship between the
independent variables in the regression model. In testing the presence or absence of
multicollinearity symptoms is done by looking at the value of VIF (Variance Inflation
Factor) and Tolerance.
Hypothesis:
H0: There is a multicollinearity problem
H1: There is no multicollinearity problem
Probability <10, H0 rejected, H1 accepted
Probability> 10, H1 rejected, H0 accepted
The following steps are VIF and TOL
Click View Coefficient Diagnostics Variance Inflation Factors
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
And the result
Values of Centered VIF are above 10, meaning that the model contains multicollinearity.
Another way of testing multicollinearity
Block all variables as below
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
Then right click copy
Then click Quick Group Statistics Correlations
Click OK
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
Almost all variables have correlation values above 0.85, meaning that the model contains
multicollinearity.
Or by comparing R2
a. LS LOG(GDP) C LOG(GFCF) LOG(TR) LOG(TRADE) LOG(AK) LOG(IVA) hitung R2
1
b. LOG(GFCF) C LOG(TR) LOG(TRADE) LOG(AK) LOG(IVA) hitung R22
c. LOG(TR) C LOG(GFCF) LOG(TRADE) LOG(AK) LOG(IVA) hitung R23
d. LOG(TRADE) C LOG(GFCF) LOG(TR) LOG(AK) LOG(IVA) hitung R24
e. LOG(AK) C LOG(GFCF) LOG(TR) LOG(TRADE) LOG(IVA) hitung R25
f. LOG(IVA) C LOG(GFCF) LOG(TR) LOG(TRADE) LOG(AK) hitung R26
The model does not contain multicollinearity if:
R21 > R2
2 , R23, R
24, R
25, R
26,
Value R21 = 0,9987
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
Value R22 = 0,9691
Value R22 = 0,9781
Value R2
2 = 0,9457
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
Value R22 = 0,9870
Value R2
2 = 0,9928
Because of value R21 > R2
2 , R23, R
24, R
25, dan R
26
0,9987 > 0,9691, 0,9781, 0,9457, 0,9870 dan 0,9928
Then the model does not contain multicollinearity
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
Results of the regression equation above
Estimation Command: ========================= LS LOG(GDP) C LOG(GFCF) LOG(TR) LOG(TRADE) LOG(AK) LOG(IVA) Estimation Equation: ========================= LOG(GDP) = C(1) + C(2)*LOG(GFCF) + C(3)*LOG(TR) + C(4)*LOG(TRADE) + C(5)*LOG(AK) + C(6)*LOG(IVA) Substituted Coefficients: ========================= LOG(GDP) = 4.6489 + 0.3739 LOG(GFCF) + 0.0498 LOG(TR) - 0.0688 LOG(TRADE) + 0.7019 LOG(AK) + 0.1781 LOG(IVA)
Multiple Regression Teaching Materials Agus Tri Basuki, M.Sc.
DAFTAR PUSTAKA
Basuki, A. T., & Prawoto, N. (2016). ANALISIS REGRESI DALAM PENELITIAN
EKONOMI & BISIS (DILENGKAPI APLIKASI SPSS & EVIEWS).
Gujarati, D. N. (2009). Basic econometrics. Tata McGraw-Hill Education.