Post on 16-Jan-2016
description
4-4-11
MGMG 522 : Session #4MGMG 522 : Session #4Choosing the Independent VariablesChoosing the Independent Variables
and a Functional Formand a Functional Form
(Ch. 6 & 7)(Ch. 6 & 7)
4-4-22
Major Specification ProblemsMajor Specification Problems
1.1. Problem with the selection of the Problem with the selection of the independent variables.independent variables.
2.2. Problem with the functional form.Problem with the functional form.
3.3. Problem with the form of the error Problem with the form of the error term.term.
4-4-33
Problems with the Selection of the Problems with the Selection of the Independent VariablesIndependent Variables
The choice of independent variables is up to The choice of independent variables is up to the researcher to decide.the researcher to decide.
This freedom does not come without a cost.This freedom does not come without a cost. ProblemsProblems
1. Omitted variables1. Omitted variables 2. Irrelevant variables2. Irrelevant variables
Your underlying theory should give you Your underlying theory should give you some hints about what independent some hints about what independent variables should be included in your variables should be included in your regression model.regression model.
The statistical fit is less important than the The statistical fit is less important than the underlying theory.underlying theory.
4-4-44
Case 1: Omitted VariablesCase 1: Omitted Variables
It occurs when you don’t include important It occurs when you don’t include important independent variables in your regression model independent variables in your regression model when you should, either because you don’t think when you should, either because you don’t think of them or you think of them but you can’t get of them or you think of them but you can’t get the data.the data.
True model: Y = True model: Y = 00++11XX11++22XX22++εε
Your model: Y = Your model: Y = 00++11XX11++εε**
where, where, εε* * = = 22XX22++εε
If XIf X11 and X and X22 are not completely independent, are not completely independent, εε** will not be independent of Xwill not be independent of X11, a violation of the , a violation of the classical assumption #3 (all X’s are uncorrelated classical assumption #3 (all X’s are uncorrelated with with εε).).
4-4-55
Problems:Problems:
1.1. OLS is no longer BLUE.OLS is no longer BLUE.
2.2. Coefficient estimates are Coefficient estimates are biasedbiased, . , .
3.3. , variances of the coefficient , variances of the coefficient estimates decrease. See p. 4-8.estimates decrease. See p. 4-8.
For a 2-independent variable model, it For a 2-independent variable model, it can be shown that,can be shown that,
E(E(11) = ) = 11++2211
where where 11 is from: X is from: X22 = = 00++11XX11+u+u
and u is a classical error termand u is a classical error term Coefficient estimates could be unbiased if Coefficient estimates could be unbiased if
22 = 0 or = 0 or 11 = 0. But, that is unlikely. = 0. But, that is unlikely.
kkE ˆ
kVAR ̂
4-4-66
E(E(11) = ) = 11++2211
The amount of bias = The amount of bias = 2211.. Or, the amount of bias = Or, the amount of bias = 22f(rf(rx1,x2x1,x2).). The direction of the bias can be determined by The direction of the bias can be determined by
the signs of the signs of 22 and and 11. For example, (-)(-)=(+) . For example, (-)(-)=(+) or (-)(+)=(-).or (-)(+)=(-).
To correct for the omitted variables problem,To correct for the omitted variables problem,1.1. Think again about your theory. What other Think again about your theory. What other
important variables could be missing?important variables could be missing?2.2. If the signs of the included coefficient estimates If the signs of the included coefficient estimates
are unexpected, you could probably tell the are unexpected, you could probably tell the direction of the bias and have some clues about direction of the bias and have some clues about the missing variables.the missing variables.
4-4-77
Case 2: Irrelevant VariablesCase 2: Irrelevant Variables
It occurs when you have some It occurs when you have some unnecessary independent variables in your unnecessary independent variables in your regression model.regression model.
True model: Y = True model: Y = 00++11XX11++εε
Your model: Y = Your model: Y = 00++11XX11++22XX22++εε****
where, where, εε** ** = = εε--22XX22
The coefficient estimates will still be The coefficient estimates will still be unbiased, but increase, lowing the unbiased, but increase, lowing the reported t-values.reported t-values.
kVAR ̂
4-4-88
For the model, Y = For the model, Y = 00++11XX11++22XX22..
If rIf r1212 ≠ 0, the variance will increase. ≠ 0, the variance will increase. If rIf r1212 = 0 or the irrelevant variable is not in = 0 or the irrelevant variable is not in
the regression model, the variance will the regression model, the variance will stay the same.stay the same.
Now, it seems like having extra Now, it seems like having extra unnecessary variables is not as serious a unnecessary variables is not as serious a problem compared to the omitted problem compared to the omitted variables problem.variables problem.
In fact, we want neither one of these In fact, we want neither one of these problems.problems.
211212
2
11
ˆ
XXrVAR
4-4-99
Four Criteria to Help You Choose Four Criteria to Help You Choose the Independent Variablesthe Independent Variables
1.1. TheoryTheory2.2. t-Testt-Test3.3. Adj-RAdj-R22
4.4. BiasBias If all four conditions are met, that If all four conditions are met, that
variable should be in your model.variable should be in your model. If not, that variable doesn’t belong in If not, that variable doesn’t belong in
your model.your model. If some conditions are met while some If some conditions are met while some
are not, use your judgment.are not, use your judgment.
4-4-1010
When you add an omitted variable, usuallyWhen you add an omitted variable, usually– Adj-RAdj-R22 will rise will rise– Coefficient estimates will changeCoefficient estimates will change
When you add an irrelevant variable, When you add an irrelevant variable, usuallyusually– Adj-RAdj-R22 will fall will fall– Coefficient estimates will not changeCoefficient estimates will not change– t-values become less significantt-values become less significant
Don’t rely on the Adj-RDon’t rely on the Adj-R22 criterion alone, criterion alone, because it can be shown that Adj-Rbecause it can be shown that Adj-R22 will will rise if you include a variable with t-value > rise if you include a variable with t-value > 1 but not significant.1 but not significant.
Adj-RAdj-R22 will also rise if you delete a variable will also rise if you delete a variable with t-value < 1 from your regression with t-value < 1 from your regression model. See an example on pp. 173-176.model. See an example on pp. 173-176.
4-4-1111
Three Methods to Avoid When Three Methods to Avoid When Choosing Independent VariablesChoosing Independent Variables
1.1. Data MiningData Mining2.2. Stepwise RegressionStepwise Regression3.3. Sequential SearchSequential Search You should specify as few models as You should specify as few models as
possible.possible. The more you look, the higher the The more you look, the higher the
chance you will find a model that has a chance you will find a model that has a good statistical fit with not much good statistical fit with not much theoretical support.theoretical support.
Do not select a variable based on its t-Do not select a variable based on its t-value, because that technique creates a value, because that technique creates a systematic bias. systematic bias. How?How?
4-4-1212
Lagged VariableLagged Variable
Sometimes, the change in Y is not Sometimes, the change in Y is not caused by the change in X from the caused by the change in X from the current period, but from the other current period, but from the other period.period.
The coefficient estimate of a lagged The coefficient estimate of a lagged variable measures the change in Y variable measures the change in Y this period as a result of a change in this period as a result of a change in X in the other period.X in the other period.
4-4-1313
Other Specification CriteriaOther Specification Criteria
Besides the four criteria outlined on Besides the four criteria outlined on p. 4-9, there are other specification p. 4-9, there are other specification criteria.criteria.
1.1. Ramsey’s Regression Specification Ramsey’s Regression Specification Error Test (RESET)Error Test (RESET)
2.2. Akaike’s Information Criterion (AIC)Akaike’s Information Criterion (AIC)
3.3. Schwarz Criterion (SC)Schwarz Criterion (SC)
4-4-1414
Ramsey’s Regression Specification Ramsey’s Regression Specification Error Test (RESET)Error Test (RESET)
HH00: There is no specification error.: There is no specification error. HH11: There is a specification error.: There is a specification error. If F-value from the RESET is higher than the If F-value from the RESET is higher than the
critical F-value, we can reject Hcritical F-value, we can reject H00, meaning that , meaning that there is a specification error. However, RESET there is a specification error. However, RESET doesn’t tell how to correct it.doesn’t tell how to correct it.
If F-value from the RESET is lower than the critical If F-value from the RESET is lower than the critical F-value, we cannot reject HF-value, we cannot reject H00, meaning that we , meaning that we probably have a correct specification.probably have a correct specification.
RESET is more useful in confirming our model RESET is more useful in confirming our model than telling us what’s wrong and how to correct than telling us what’s wrong and how to correct our model.our model.
4-4-1515
AIC and SCAIC and SC
AIC and SC are used to compare two AIC and SC are used to compare two regression models.regression models.
Both AIC and SC penalize the Both AIC and SC penalize the addition of another independent addition of another independent variable if it doesn’t improve the variable if it doesn’t improve the overall fit significantly.overall fit significantly.
Between two regression models, the Between two regression models, the one with lower AIC and SC values is one with lower AIC and SC values is preferred.preferred.
4-4-1616
Choosing a Functional FormChoosing a Functional Form
Now, you have a set of independent Now, you have a set of independent variables, you still need to specify a variables, you still need to specify a functional form.functional form.
That is, how Y is related to each X.That is, how Y is related to each X. About the intercept term:About the intercept term:
1.1. Do not suppress the intercept term even Do not suppress the intercept term even if the theory suggests.if the theory suggests.
2.2. Do not rely on the estimate of the Do not rely on the estimate of the intercept term for analysis or inference.intercept term for analysis or inference.
4-4-1717
Different Functional FormsDifferent Functional Forms
1.1. Linear Form: Y = Linear Form: Y = 00++11XX11++εε
2.2. Double-Log Form: lnY = Double-Log Form: lnY = 00++11lnXlnX11++εε
3.3. Semilog Form: lnY = Semilog Form: lnY = 00++11XX11++εε , or , or
Y = Y = 00++11lnXlnX11++εε
4.4. Polynomial Form:Polynomial Form:
Y = Y = 00++11XX11++22(X(X11))22++33(X(X11))33++εε
5.5. Inverse Form: Y = Inverse Form: Y = 00++11(1/X(1/X11)+)+εε• Theory usually suggests only the signs of the Theory usually suggests only the signs of the
coefficients, not the functional form.coefficients, not the functional form.
4-4-1818
Linear FormLinear Form
Y = Y = 00++11XX11++22XX22+…++…+εε The slope is constant.The slope is constant.
But, the elasticity of Y with respect to X But, the elasticity of Y with respect to X is is notnot constant. constant.
Unless the theory suggests otherwise, Unless the theory suggests otherwise, the linear form should be used.the linear form should be used.
11
X
Y
Y
X
XX
YY 11
11 /
/
4-4-1919
Double-Log FormDouble-Log Form
lnY = lnY = 00++11lnXlnX11++22lnXlnX22+…++…+εε This is another popular form besides the This is another popular form besides the
linear form.linear form. It is also known as the “Log-linear” form.It is also known as the “Log-linear” form. The slope is The slope is notnot constant. constant. But, the elasticity of Y with respect to X is But, the elasticity of Y with respect to X is
constant.constant.
See p. 212 for more information.See p. 212 for more information.
1
111 ln
ln
/
/
X
Y
XX
YY
4-4-2020
Semilog FormSemilog Form
lnY = lnY = 00++11XX11++22lnXlnX22++εε , or , or
Y = Y = 00++11lnXlnX11++22XX22++εε Similar to the Double-Log form, Similar to the Double-Log form,
except that some variables, but not except that some variables, but not all, are in log form.all, are in log form.
See p. 214 for more information.See p. 214 for more information.
4-4-2121
Polynomial FormPolynomial Form
Y = Y = 00++11XX11++22(X(X11))22++33(X(X11))33++εε Appropriate for a model where Appropriate for a model where
changes in X cause Y to changes in X cause Y to increase/decrease over some range increase/decrease over some range and decrease/increase over other and decrease/increase over other range.range.
See p. 217 for more information.See p. 217 for more information.
4-4-2222
Inverse FormInverse Form
Y = Y = 00++11(1/X(1/X11)+)+εε Appropriate for a model where the Appropriate for a model where the
impact of an independent variable impact of an independent variable approaches zero as its value gets approaches zero as its value gets large.large.
See p. 219 for more information.See p. 219 for more information.
4-4-2323
Selecting a Functional FormSelecting a Functional Form Rely on your theory. What does your theory Rely on your theory. What does your theory
tell you about the relationships?tell you about the relationships? Do not compare Adj-RDo not compare Adj-R22 from a linear in from a linear in
variable model with a nonlinear in variable variable model with a nonlinear in variable model. Becausemodel. Because– Adj-RAdj-R22 are not comparable when Y is are not comparable when Y is
transformed. Use for transformed. Use for
comparison instead.comparison instead.– Adj-RAdj-R22 may look good inside the range of the may look good inside the range of the
sample, but could look bad outside the range of sample, but could look bad outside the range of the sample.the sample.
2
2
2 )ˆln(antilog1Quasi
YY
YYR
i
ii
4-4-2424
Dummy VariablesDummy Variables
1.1. Dummy InterceptDummy Intercept takes the form: takes the form:
Y = Y = 00++11XX11++22D+D+εε2.2. Dummy SlopeDummy Slope takes the form: takes the form:
Y = Y = 00++11XX11++22XX11D+D+εε3.3. Both Both Dummy InterceptDummy Intercept and and Dummy Dummy
SlopeSlope take the form: take the form:
Y = Y = 00++11XX11++22XX11D+D+33D+D+εε** We will discuss the concept and use of ** We will discuss the concept and use of
dummy variables again in “Panel Data dummy variables again in “Panel Data Regression” if we have time.Regression” if we have time.
4-4-2525
Appendix: General F-TestAppendix: General F-Test
Any F-test we’ve seen so far can be Any F-test we’ve seen so far can be thought of as a special case of the general thought of as a special case of the general F-test.F-test.
The general F-test tests more than one The general F-test tests more than one coefficient at a time.coefficient at a time.
The null hypothesis for the general F-test The null hypothesis for the general F-test is what we think is correct.is what we think is correct.
We usually want to “accept” HWe usually want to “accept” H00.. This contrasts to the traditional way of This contrasts to the traditional way of
hypothesis testing we’ve learned.hypothesis testing we’ve learned.
4-4-2626
Steps in General F-TestSteps in General F-Test
1.1. Specify the null and alternative Specify the null and alternative hypotheses.hypotheses.
2.2. The null hypothesis will be used as a The null hypothesis will be used as a constraint to be put on the equation.constraint to be put on the equation.
3.3. Calculate RSSs from the constraint and Calculate RSSs from the constraint and the unconstraint equations.the unconstraint equations.
4.4. If the fits of the two equations are not If the fits of the two equations are not significantly different, we will “accept” significantly different, we will “accept” HH00. If the fits are significantly different, . If the fits are significantly different, reject Hreject H00..
4-4-2727
F-statistic:F-statistic:
RSSRSSCC = = Residual Sum of Squares Residual Sum of Squares
from the constraint equationfrom the constraint equation RSSRSSUU = = Residual Sum of Squares from the Residual Sum of Squares from the
unconstraint equationunconstraint equation M = M = # of constraints# of constraints K = K = # of independent variables in the # of independent variables in the
unconstraint equationunconstraint equation n = n = # of observations# of observations
1Kn/RSS
/MRSSRSSF
U
UC
4-4-2828
Example of General F-TestExample of General F-Test Y = Y = 00++11XX11++22XX22++33XX33++44XX44++εε ----- (1) ----- (1) Suppose you think Suppose you think 11==33==44=0=0 In other words, Y = In other words, Y = 00++22XX22++εε. ----- (2). ----- (2) Therefore, your HTherefore, your H0 0 is is 11==33==44=0.=0. And your HAnd your H11: The original model fits the data OK.: The original model fits the data OK. You’ll run OLS of (1) and obtain RSSYou’ll run OLS of (1) and obtain RSSUU.. You’ll run OLS of (2) and obtain RSSYou’ll run OLS of (2) and obtain RSSCC.. Substitute RSSSubstitute RSSUU and RSS and RSSCC from (1) and (2) into the from (1) and (2) into the
F-statistic formula.F-statistic formula. Then, compare your F-value with the critical F-value Then, compare your F-value with the critical F-value
and make the decision whether or not to reject Hand make the decision whether or not to reject H00.. Note for this example, K = 4, M = 3.Note for this example, K = 4, M = 3.
4-4-2929
Chow TestChow Test Chow test is a test whether two data sets Chow test is a test whether two data sets
can be combined into one data set can be combined into one data set because the slopes are not statistically because the slopes are not statistically different.different.
Put differently, there is no structural Put differently, there is no structural change in the model between the two data change in the model between the two data sets (e.g., before and after a war.)sets (e.g., before and after a war.)
HH00: Slopes in the two data sets are not : Slopes in the two data sets are not different (no structural change).different (no structural change).
HH11: Slopes in the two data sets are : Slopes in the two data sets are different (there is structural change).different (there is structural change).
4-4-3030
Steps in Chow TestSteps in Chow Test
1.1. Run two separate OLS regressions with the Run two separate OLS regressions with the same specification for each data set and record same specification for each data set and record RSS from each data set. Call these, RSSRSS from each data set. Call these, RSS11 and and RSSRSS22..
2.2. Combined the two data sets into one and run Combined the two data sets into one and run OLS with the same specification again, and OLS with the same specification again, and record RSS. Call it, RSSrecord RSS. Call it, RSSTT..
K = # of independent variablesK = # of independent variables
NN11 = # of observations in sample 1 = # of observations in sample 1
NN22 = # of observations in sample 2 = # of observations in sample 2
3.3. Reject HReject H00 if F-value > critical F-value. if F-value > critical F-value.
22KNN/RSSRSS
1K/RSSRSSRSSF
2121
21T