UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

28
1 School of Economics Introductory Econometrics ECON2206/ECON3290 Tutorial Program Sample Answers Session 1, 2011

Transcript of UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

Page 1: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

1

School of Economics

Introductory Econometrics

ECON2206ECON3290

Tutorial Program

Sample Answers

Session 1 2011

2

Table of Contents

Week 2 Tutorial Exercises 3

Problem Set 3

STATA Hints 4

Week 3 Tutorial Exercises 5

Review Questions (these may or may not be discussed in tutorial classes) 5

Problem Set 5

Week 4 Tutorial Exercises 7

Review Questions (these may or may not be discussed in tutorial classes) 7

Problem Set (these will be discussed in tutorial classes) 8

Week 5 Tutorial Exercises 9

Review Questions (these may or may not be discussed in tutorial classes) 9

Problem Set (these will be discussed in tutorial classes) 10

Week 6 Tutorial Exercises 11

Review Questions (these may or may not be discussed in tutorial classes) 11

Problem Set (these will be discussed in tutorial classes) 11

Week 7 Tutorial Exercises 15

Review Questions (these may or may not be discussed in tutorial classes) 15

Problem Set (these will be discussed in tutorial classes) 15

Week 8 Tutorial Exercises 17

Review Questions (these may or may not be discussed in tutorial classes) 17

Problem Set (these will be discussed in tutorial classes) 17

Week 9 Tutorial Exercises 20

Review Questions (these may or may not be discussed in tutorial classes) 20

Problem Set (these will be discussed in tutorial classes) 20

Week 10 Tutorial Exercises 23

Readings 23

Review Questions (these may or may not be discussed in tutorial classes) 23

Problem Set (these will be discussed in tutorial classes) 24

Week 11 Tutorial Exercises 26

Review Questions (these may or may not be discussed in tutorial classes) 26

Problem Set 27

Week 12 Tutorial Exercises 29

Review Questions (these may or may not be discussed in tutorial classes) 29

Problem Set (these will be discussed in tutorial classes) 30

Week 13 Tutorial Exercises 32

Review Questions (these may or may not be discussed in tutorial classes) 32

Problem Set (these will be discussed in tutorial classes) 33

3

Week 2 Tutorial Exercises

Problem Set Q1 Wooldridge 11 bull (i) Ideally we could randomly assign students to classes of different sizes That is each

student is assigned a different class size without regard to any student characteristics such as ability and family background For reasons we will see in Chapter 2 we would like substantial variation in class sizes (subject of course to ethical considerations and resource constraints)

bull (ii) A negative correlation means that larger class size is associated with lower performance We might find a negative correlation because larger class size actually hurts performance However with observational data there are other reasons we might find a negative relationship For example children from more affluent families might be more likely to attend schools with smaller class sizes and affluent children generally score better on standardized tests Another possibility is that within a school a principal might assign the better students to smaller classes Or some parents might insist their children are in the smaller classes and these same parents tend to be more involved in their childrenrsquos education

bull (iii) Given the potential for confounding factors ndash some of which are listed in (ii) ndash finding a negative correlation would not be strong evidence that smaller class sizes actually lead to better performance Some way of controlling for the confounding factors is needed and this is the subject of multiple regression analysis

Q2 Wooldridge 12 bull (i) Here is one way to pose the question If two firms say A and B are identical in all

respects except that firm A supplies job training one hour per worker more than firm B by how much would firm Arsquos output differ from firm Brsquos

bull (ii) Firms are likely to choose job training depending on the characteristics of workers Some observed characteristics are years of schooling years in the workforce and experience in a particular job Firms might even discriminate based on age gender or race Perhaps firms choose to offer training to more or less able workers where ldquoabilityrdquo might be difficult to quantify but where a manager has some idea about the relative abilities of different employees Moreover different kinds of workers might be attracted to firms that offer more job training on average and this might not be evident to employers

bull (iii) The amount of capital and technology available to workers would also affect output So two firms with exactly the same kinds of employees would generally have different outputs if they use different amounts of capital or technology The quality of managers would also have an effect

bull (iv) No unless the amount of training is randomly assigned The many factors listed in parts (ii) and (iii) can contribute to finding a positive correlation between output and training even if job training does not improve worker productivity

Q3 Wooldridge C13 (meap01_ch01do) bull (i) The largest is 100 the smallest is 0 bull (ii) 38 out of 1823 or about 21 percent of the sample bull (iii) 17 bull (iv) The average of math4 is about 719 and the average of read4 is about 601 So at least in

2001 the reading test was harder to pass bull (v) The sample correlation between math4 and read4 is about 0843 which is a very high

degree of (linear) association Not surprisingly schools that have high pass rates on one test have a strong tendency to have high pass rates on the other test

4

bull (vi) The average of exppp is about $519487 The standard deviation is $109189 which shows rather wide variation in spending per pupil [The minimum is $120688 and the maximum is $1195764]

bull (vii) 87 Note that by Taylor expansion log(x + h) ndash log(x) = log(1 + hx) asymp hx for positive x and small hx (in absolute value)

Q4 Wooldridge C14 (jtrain2_ch01do) bull (i) 185445 asymp 416 is the fraction of men receiving job training or about 416 bull (ii) For men receiving job training the average of re78 is about 635 or $6350 For men not

receiving job training the average of re78 is about 455 or $4550 The difference is $1800 which is very large On average the men receiving the job training had earnings about 40 higher than those not receiving training

bull (iii) About 243 of the men who received training were unemployed in 1978 the figure is 354 for men not receiving training This too is a big difference

bull (iv) The differences in earnings and unemployment rates suggest the training program had strong positive effects Our conclusions about economic significance would be stronger if we could also establish statistical significance (which is done in Computer Exercise C910 in Chapter 9)

STATA Hints bull Some of STATA do-files for solving the Problem Set are posted in the course website It is

important to understand the commands in the do-files so that you will be able write your own do-files for your assignments and the course project

5

Week 3 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull The minimum requirement for OLS to be carried out for the data set (xi yi) i=1hellipn with

the sample size n gt 2 is that the sample variance of x is positive In what circumstances is the sample variance of x zero When all observations on x have the same value there is no variation in x

bull The OLS estimation of the simple regression model has the following properties a) the sum of the residuals is zero b) the sample covariance of the residuals and x is zero

Why How would you relate them to the ldquoleast squaresrdquo principle These are derived from the first-order conditions for minimising the sum of squared residuals (SSR)

bull Convince yourself that the point )( yx the sample means of x and y is on the sample regression function (SRF) which is a straight line This can be derived from the first of the first-order conditions for minimising SSR

bull How do you know that SST = SSE + SSR is true The dependent variable values can be expressed as the fitted value plus the residual After taking the average away from both side of the equation we find (119910119894 minus 119910) = (119910119894 minus 119910) + 119906119894 The required ldquoSST = SSE + SSRrdquo is obtained by squaring and summing both sides of the above equation noting (119910119894 minus 119910) = 1(119909119894 minus ) and using the second of the first-order conditions of minimising SSR

bull Which of the following models is (are) nonlinear model(s) a) sales = β0 [1 + exp(-β1 ad_expenditure)] + u b) sales = β0 + β1 log(ad_expenditure) + u c) sales = β0 + β1 exp(ad_expenditure) + u d) sales = exp(β0 + β1 ad_expenditure + u)

The model in a) is nonlinear bull Can you follow the proofs of Theorems 21-23

You should be able to follow the proofs If not at this point try again

Problem Set Q1 Wooldridge 24 bull (i) When cigs = 0 predicted birth weight is 11977 ounces When cigs = 20 992256bwght =

10949 This is about an 86 drop bull (ii) Not necessarily There are many other factors that can affect birth weight

particularly overall health of the mother and quality of prenatal care These could be correlated with cigarette smoking during birth Also something such as caffeine consumption can affect birth weight and might also be correlated with cigarette smoking

bull (iii) If we want a predicted bwght of 125 then cigs = (125 ndash 11977)( ndash524) ndash1018 or about ndash10 cigarettes This is nonsense of course and it shows what happens when we are trying to predict something as complicated as birth weight with only a single explanatory variable The largest predicted birth weight is necessarily 11977 Yet almost 700 of the births in the sample had a birth weight higher than 11977

bull (iv) 1176 out of 1388 women did not smoke while pregnant or about 847 Because we are using only cigs to explain birth weight we have only one predicted birth weight

6

at cigs = 0 The predicted birth weight is necessarily roughly in the middle of the observed birth weights at cigs = 0 and so we will under predict high birth rates

Q2 Wooldridge 27 bull (i) When we condition on inc in computing an expectation inc becomes a constant So

E(u|inc) = E(radic119894119899119888sdote|inc) = radic119894119899119888sdotE(e|inc) = radic119894119899119888sdot0 because E(e|inc) = E(e) = 0 bull (ii) Again when we condition on inc in computing a variance inc becomes a constant So

Var(u|inc) = Var(radic119894119899119888sdote|inc) = (radic119894119899119888)2Var(e|inc) = 1205901205762inc because Var(e|inc) = 1205901205762 bull (iii) Families with low incomes do not have much discretion about spending typically a

low-income family must spend on food clothing housing and other necessities Higher income people have more discretion and some might choose more consumption while others more saving This discretion suggests wider variability in saving among higher income families

Q3 Wooldridge C26 (meap93_ch02do) bull (i) It seems plausible that another dollar of spending has a larger effect for low-spending

schools than for high-spending schools At low-spending schools more money can go toward purchasing more books computers and for hiring better qualified teachers At high levels of spending we would expect little if any effect because the high-spending schools already have high-quality teachers nice facilities plenty of books and so on

bull (ii) If we take changes as usual we obtain ∆119898119886119905ℎ10 = 1205731∆ log(119890119909119901119890119899119889) asymp 1205731

100 (∆119890119909119901119890119899119889)

just as in the second row of Table 23 So if Δexpend = 10 Δmath10 = β110 bull (iii) The regression results are

119898119886119905ℎ10 = minus6934 + 1116 log(119890119909119901119890119899119889) 119899 = 408 1198772 = 0297 bull (iv) If expend increases by 10 percent 119898119886119905ℎ10 increases by about 11 percentage points

This is not a huge effect but it is not trivial for low-spending schools where a 10 percent increase in spending might be a fairly small dollar amount

bull (v) In this data set the largest value of math10 is 667 which is not especially close to 100 In fact the largest fitted values is only about 302

Q4 Wooldridge C27 (charity_ch02do) bull (i) The average gift is about 744 Dutch guilders Out of 4268 respondents 2561 did not

give a gift or about 60 percent bull (ii) The average mailings per year are about 205 The minimum value is 25 (which

maximum value is 35 bull (iii) The estimated equation is

119892120484119891119905 = 201 + 265 mailsyear n = 4268 1198772 = 0138 bull (iv) The slope coefficient from part (iii) means that each mailing per year is associated with

ndash perhaps even ldquocausesrdquo ndash an estimated 265 additional guilders on average Therefore if each mailing costs one guilder the expected profit from each mailing is estimated to be 165 guilders This is only the average however Some mailings generate no contributions or a contribution less than the mailing cost other mailings generated much more than the mailing cost

bull (v) Because the smallest mailsyear in the sample is 25 the smallest predicted value of gifts is 201 + 265(25) asymp 267 Even if we look at the overall population where some people have received no mailings the smallest predicted value is about two So with this estimated equation we never predict zero charitable gifts

7

Week 4 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What do we mean when we say ldquoregress wage on educ and exprrdquo

Use OLS to estimate the multiple linear regression model wage = β0 + β1educ + β2expr + u bull Why and under what circumstance do we need to ldquocontrol forrdquo expr in the regression model

in order to quantify the effect of educ on wage We need to control for expr in the model when the partial or ceteris paribus effect of educ on wage is the parameter of interest and expr and educ are correlated

bull What is the bias of an estimator bias =E( 119895) ndash βj

bull What is the ldquoomitted variable biasrdquo It is the bias resulted from omitting relevant variables that are correlated with the explanatory variables in a regression model

bull What is the consequence of adding an irrelevant variable to a regression model Including an irrelevant variable will generally increase the variance of the OLS estimators

bull What is the requirement of the ZCM assumption in your own words You must do this by yourself

bull Why assuming E(u) = 0 is not restrictive when an intercept is included in the regression model When the mean value of the disturbance is not zero you can always define the new disturbance as the original disturbance minus the mean value and define the new intercept as the original intercept plus the mean value

bull In terms of notation why do we need two subscripts for independent variables In our notation the first subscript is for indexing observations (1st obs 2nd obs etc) and the second subscript is for indexing variables (1st variable 2nd variable etc)

bull Using OLS to estimate a multiple regression model with k independent variables and an intercept how many first-order conditions are there k + 1

bull How do you know that the OLS estimators are linear combinations of the observations on the dependent variable By the first-order conditions we see that the OLS estimators are the solution to k+1 linear equations with k+1 unknowns The ldquoright-hand-sidesrdquo of the equations are linear functions of observations on the dependent variable (say y) Hence the solution is linear combinations of the y-observations

bull What is Gauss-Markov theorem Read the textbook bull Try to explain why R2 never decreases (it is likely to increase) when additional explanatory

variables are added to the regression model First the smaller is the SSR the greater is the R-squared The OLS chooses parameter estimates to minimise the SSR With extra explanatory variables the R-squared does not decrease because the SSR does not increase which is true because the OLS has a larger parameter space to choose estimates (more possibilities to make SSR small)

bull What is an endogenous explanatory variable Exogenous explanatory variable See the top of page 88

bull What is multicollinearity and its likely effect on the OLS estimators Read pages 95-99

8

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 32 bull (i) Yes Because of budget constraints it makes sense that the more siblings there are in a

family the less education any one child in the family has To find the increase in the number of siblings that reduces predicted education by one year we solve 1 = 094(Δsibs) so Δsibs = 1094 asymp 106

bull (ii) Holding sibs and feduc fixed one more year of motherrsquos education implies 131 years more of predicted education So if a mother has four more years of education her son is predicted to have about a half a year (524) more years of education

bull (iii) Since the number of siblings is the same but meduc and feduc are both different the coefficients on meduc and feduc both need to be accounted for The predicted difference in education between B and A is 131(4) + 210(4) = 1364

Q2 Wooldridge 310 bull (i) Because x1 is highly correlated with x2 and x3 and these latter variables have large partial

effects on y the simple and multiple regression coefficients on x1 can differ by large amounts We have not done this case explicitly but given equation (346) and the discussion with a single omitted variable the intuition is pretty straightforward

bull (ii) Here we would expect 1205731 and 1 to be similar (subject of course to what we mean by ldquoalmost uncorrelatedrdquo) The amount of correlation between x2 and x3 does not directly effect the multiple regression estimate on x1 if is essentially uncorrelated with x2 and x3

bull (iii) In this case we are (unnecessarily) introducing multicollinearity into the regression x2 and x3 have small partial effects on y and yet x2 and x3 are highly correlated withx1 Adding x2 and x3 likely increases the standard error of the coefficient on x1 substantially so se(1) is likely to be much larger than se(1205731)

bull (iv) In this case adding x2 and x3 will decrease the residual variance without causing much collinearity (because x1 is almost uncorrelated with x2 and x3 ) so we should see se(1) smaller than se(1205731) The amount of correlation between x and does not directly affect se(1)

Q3 Wooldridge C32 (hprice1_ch03do) bull (i) The estimated equation is

119901119903120484119888119890 = minus1932 + 128 119904119902119903119891119905 + 1520 119887119889119903119898119904 119899 = 88 1198772 = 632 bull (ii) Holding square footage constant (the price increase) = 1520 (increase in bedroom)

Hence the price increases by 1520 which means $15200 bull (iii) Now (the price increase) = 128 (increase in square feet) + 1520 (increase in bedroom)

= 128 (140) + 1520 (1) = 3312 That is $33120 Because the house size is increasing the effect is larger than that in (ii)

bull (iv) About 632 bull (v) The predicted price is -1932+128(2438)+1520(4) = 353544 or $353544 bull (vi) The residual = 300000 ndash 353544 = - 53544 The buyer under-paid judged by the

predicted price But of course there are many other features of a house (some that we cannot even measure) that affect price We have not controlled for those

Q4 Wooldridge C36 (wage2_ch03_c3_6do) bull (i) The slope coefficient from regressing IQ on educ is 353383 bull (ii) The slope coefficient from log(wage) on educ is 05984 bull (iii) The slope coefficient from log(wage) on educ and IQ are 03912 and 00586 respectively bull (iv) The computed 211 βδβ ˆ~ˆ + = 03912 + 353383(00586) asymp 05984

9

Week 5 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the CLM assumptions

These are MLR1 MLR2 MLR3 and MLR6 MLR6 is a very strong assumption that implies both MLR4 and MLR5

bull What is the sampling distribution of the OLS estimators under the CLM assumptions The OLS estimators under the CLM assumptions follow the normal distribution

bull What are the standard errors of the OLS estimators The standard error is the square root of the estimated variance of the estimator

bull What is the null hypothesis about a parameter In this context the null hypothesis is a statement that assigns a known value (or hypothesised value) to the parameter of interest Usually the null is a maintained proposition (from economic theories or experience) that will only be rejected on very strong evidence

bull What is a one-tailed (two-tailed) alternative hypothesis This should be clear once you finish your reading

bull In testing hypotheses what is a Type 1 (Type 2) error What is the level of significance Type 1 error is the rejection of a true null Type 2 error is the non-rejection of a false null

bull The decision rule we use can be stated as ldquoreject the null if the t-statistic exceeds the critical valuerdquo How is the critical value determined The critical value is determined by the level of significance (or level of confidence in the case of confidence intervals) and the distribution of the test statistic For example for t-stat we use the t-distribution when the sample size is small and the normal distribution when the sample size is large (df gt 120)

bull Justify the statement ldquoGiven the observed test statistic the p-value is the smallest significant level at which the null hypothesis would be rejectedrdquo If you used the observed t-stat as your critical value you would just reject the null (because the t-stat is equal to this value) This ldquocritical valuerdquo (= the t-stat value) implies a level of significance say p which is exactly the p-value For any significance level smaller than p the corresponding critical value is more extreme than the t-stat value and does not lead to the rejection of the null Hence p is the smallest significance level at which the null would be rejected

bull What is the 90 confidence interval for a parameter It is an interval [L U] defined by two sample statistics the lower bound L and the upper bound U The probability that [L U] covers the parameter is 09

bull In constructing a confidence interval for a parameter what is the level of confidence It is the probability that the CI covers the parameter

bull When the level of confidence increases how would the width of the confidence interval change (holding other things fixed) The width will increase

bull Try to convince yourself that the event ldquothe 90 confidence interval covers a hypothesised value of the parameterrdquo is the same as the event ldquothe null of the parameter being the hypothesised value cannot be rejected in favour of the two-tailed alternative at the 10 level of significancerdquo It becomes apparent if you start with P(|t-stat| gt c) = 09 and disentangle the expression for the t-stat

10

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 41 bull (i) and (iii) generally cause the t statistics not to have a t distribution under H0

Homoskedasticity is one of the CLM assumptions An important omitted variable violates Assumption MLR4 (hence MLR6) in general The CLM assumptions contain no mention of the sample correlations among independent variables except to rule out the case where the correlation is one (MLR3)

Q2 Wooldridge 42 bull (i) H0β3 = 0 H1β3 gt 0 bull (ii) The proportionate effect on is 00024(50) = 012 To obtain the percentage effect we

multiply this by 100 12 Therefore a 50 point ceteris paribus increase in ros is predicted to increase salary by only 12 Practically speaking this is a very small effect for such a large change in ros

bull (iii) The 10 critical value for a one-tailed test using df = infin is obtained from Table G2 as 1282 The t statistic on ros is 0002400054 asymp 44 which is well below the critical value Therefore we fail to reject H0 at the 10 significance level

bull (iv) Based on this sample the estimated ros coefficient appears to be different from zero only because of sampling variation On the other hand including ros may not be causing any harm it depends on how correlated it is with the other independent variables (although these are very significant even with ros in the equation)

Q3 Wooldridge 45 bull (i) 412 plusmn 196(094) or about [228 596] bull (ii) No because the value 4 is well inside the 95 CI bull (iii) Yes because 1 is well outside the 95 CI

Q4 Wooldridge C48 (401ksubs_ch04do) bull (i) There are 2017 single people in the sample of 9275 bull (ii) The estimated equation is

119899119890119905119905119891119886 = minus4304 + 799 inc + 843 age ( 408) (060) (092) n = 2017 R2 = 119 The coefficient on inc indicates that one more dollar in income (holding age fixed) is reflected in about 80 more cents in predicted nettfa no surprise there The coefficient on age means that holding income fixed if a person gets another year older hisher nettfa is predicted to increase by about $843 (Remember nettfa is in thousands of dollars) Again this is not surprising

bull (iii) The intercept is not very interesting as it gives the predicted nettfa for inc = 0 and age = 0 Clearly there is no one with even close to these values in the relevant population

bull (iv) The t statistic is (843 minus 1)092 asymp minus171 Against the one-sided alternative H1 β2 lt 1 the p-value is about 044 Therefore we can reject H0 β2 = 1 at the 5 significance level (against the one-sided alternative)

bull (v) The slope coefficient on inc in the simple regression is about 821 which is not very different from the 799 obtained in part (ii) As it turns out the correlation between inc and age in the sample of single people is only about 039 which helps explain why the simple and multiple regression estimates are not very different see the text also

11

Week 6 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull How would you test hypotheses about a single linear combination of parameters

We may re-parameterise the regression model to isolate the ldquosingle linear combinationrdquo Then the OLS on the re-parameterised model will provide the estimate of the ldquosingle linear combinationrdquo and the related standard error See the example in the textbook and lecture slides

bull What are exclusion restrictions for a regression model Exclusion restrictions are the null hypothesis that a group of x-variables have zero coefficients in the regression model

bull What are restricted and unrestricted models When the null hypothesis is a set of restrictions on the parameters the regression under the null is called the restricted model while the regression under the alternative (which simply states that the null is false) is called the unrestricted model

bull How do you compute the F-statistic given that you have SSRs Clearly the general F-stat is based on the relative difference between the SSRs under the restricted model and unrestricted model F = [(SSRr ndash SSRur)q][SSRur(n-k-1)] where you should be able to explain the meanings of various symbols

bull What are general linear restrictions on parameters These are linear equations on the parameters For example β1 ndash 2β2 + 3β3 = 0

bull What is the test for the overall significance of a regression This is the F-test for the null hypothesis that all the coefficients of x-variables are zero

bull How would you report your regression results See the guidelines in Section 46 of the textbook

bull Why would you care about the asymptotic properties of the OLS estimators When MLR6 does not hold the finite-sample distribution of the OLS estimators is not available For reasonably large samples (large n) we use the asymptotic distribution of the OLS estimators which is known from studying the asymptotic properties to approximate the finite-sample distribution of the OLS estimators which is unknown

bull Comparing the inference procedures in Chapter 5 with those in Chapter 4 can you list the similarities and differences It is beneficial to make a list by yourself

bull Under MLR1-MLR5 the OLS estimators are consistent asymptotically normal and asymptotically efficient Try to explain these properties in your own words This is really a test on your understanding of these notions in econometrics

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 46 bull (i) With df = n ndash 2 = 86 we obtain the 5 critical value from Table G2 with df = 90 Because

each test is two-tailed the critical value is 1987 The t statistic for H0β0 = 0 is about -89 which is much less than 1987 in absolute value Therefore we fail to reject H0β0 = 0 The t statistic for H0β1 = 1 is (976 ndash 1)049 asymp -49 which is even less significant (Remember we reject H0 in favor of H1 in this case only if |t| gt 1987)

bull (ii) We use the SSR form of the F statistic We are testing q = 2 restrictions and the df in the unrestricted model is 86 We are given SSRr = 20944899 and SSRur = 16564451 Therefore F = [(20944899 minus 16564451)2][1656445186] asymp 1137

12

which is a strong rejection of H0 from Table G3c the 1 critical value with 2 and 90 df is 485

bull (iii) We use the R-squared form of the F statistic We are testing q = 3 restrictions and there are 88 ndash 5 = 83 df in the unrestricted model The F statistic is F = [(829 ndash 820)3][(1 ndash 829)83] asymp 146 The 10 critical value (again using 90 denominator df in Table G3a) is 215 so we fail to reject H0 at even the 10 level In fact the p-value is about 23

bull (iv) If heteroskedasticity were present Assumption MLR5 would be violated and the F statistic would not have an F distribution under the null hypothesis Therefore comparing the F statistic against the usual critical values or obtaining the p-value from the F distribution would not be especially meaningful

Q2 Wooldridge 48 bull (i) We use Property VAR3 from Appendix B

Var(1minus 32) = Var (1) + 9 Var (2) ndash 6 Cov (1 2) bull (ii) t = (1 minus 32 minus 1)se(1minus 32) so we need the standard error of 1minus 32 bull (iii) Because θ = β1 ndash 3β2 we can write β1 = θ + 3β2 Plugging this into the population model

gives y = β0 + (θ + 3β2)x1 + β2 x2 + β3 x3 + u = β0 + θ x1 + β2 (3 x1 + x2) + β3 x3 + u This last equation is what we would estimate by regressing y on x1 3x1 + x2 and x3 The coefficient and standard error on x1 are what we want

Q3 Wooldridge 410 bull (i) We need to compute the F statistic for the overall significance of the regression with n =

142 and k = 4 F = [0395(1 ndash 0395)](1374) asymp 141 The 5 critical value with 4 and 120 df is 245 which is well above the value of F Therefore we fail to reject H0 β1 = β2 = β3 = β4 = 0 at the 10 level No explanatory variable is individually significant at the 5 level The largest absolute t statistic is on dkr tdkr asymp 160 which is not significant at the 5 level against a two-sided alternative

bull (ii) The F statistic (with the same df) is now [0330(1 ndash 0330)](1374) asymp 117 which is even lower than in part (i) None of the t statistics is significant at a reasonable level

bull (iii) We probably should not use the logs as the logarithm is not defined for firms that have zero for dkr or eps Therefore we would lose some firms in the regression

bull (iv) It seems very weak There are no significant t statistics at the 5 level (against a two-sided alternative) and the F statistics are insignificant in both cases Plus less than 4 of the variation in return is explained by the independent variables

Q4 Wooldridge C49 (discrim_c4_9do) bull (i) The results from the OLS regression with standard errors in parentheses are

log (119901119904119900119889119886) = minus146 + 073 prpblck + 137 log(income) + 380 prppov (029) (031) (027) (133) n = 401 R2 = 087

bull The p-value for testing H0 β1= 0 against the two-sided alternative is about 018 so that we reject H0 at the 5 level but not at the 1 level

bull (ii) The correlation is about minus84 indicating a strong degree of multicollinearity Yet each coefficient is very statistically significant the t statistic for log (119894119899119888119900119898) is about 51 and that for prppov is about 286 (two-sided p-value = 004)

bull (iii) The OLS regression results when log(hseval) is added are log (119901119904119900119889119886) = minus84 + 098 prpblck minus 053 log(income) + 052 prppov + 121 log(hseval) (29) (029) (038) (134) (018)

13

n = 401 R2 = 184 The coefficient on log(hseval) is an elasticity a one percent increase in housing value holding the other variables fixed increases the predicted price by about 12 percent The two-sided p-value is zero to three decimal places

bull (iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the 15 significance level against a two-sided alternative for log(income) and prppov is does not have a t statistic even close to one in absolute value) Nevertheless they are jointly significant at the 5 level because the outcome of the F2396 statistic is about 352 with p-value = 030 All of the control variables ndash log(income) prppov and log(hseval) ndash are highly correlated so it is not surprising that some are individually insignificant

bull (v) Because the regression in (iii) contains the most controls log(hseval) is individually significant and log(income) and prppov are jointly significant (iii) seems the most reliable It holds fixed three measure of income and affluence Therefore a reasonable estimate is that if prpblck increases by 10 psoda is estimated to increase by 1 other factors held fixed

Q5 Wooldridge 52 bull This is about the inconsistency of the simple regression of pctstck on funds A higher

tolerance of risk means more willingness to invest in the stock market so β2 gt 0 By assumption funds and risktol are positively correlated Now we use equation (55) where δ1 gt 0 plim(1205731) = β1 + β2δ1 gt β1 so 1205731 has a positive inconsistency (asymptotic bias) This makes sense if we omit risktol from the regression and it is positively correlated with funds some of the estimated effect of funds is actually due to the effect of risktol

Q6 Wooldridge C51 (wage1_c5_1do) bull (i) The estimated equation is

119908119886119892119890 = minus287 + 599 educ + 022 exper + 169 tenure (073) (051) (012) (022) n = 526 R2 = 306 120590 = 3085

bull Below is a histogram of the 526 residual 119906119894 i = 1 2 526 The histogram uses 27 bins which is suggested by the formula in the Stata manual for 526 observations For comparison the normal distribution that provides the best fit to the histogram is also plotted

bull

14

bull (ii) With log(wage) as the dependent variable the estimated equation is log (119908119886119892119890) = 284 + 092 educ + 0041 exper + 022 tenure (104) (007) (0017) (003) n = 526 R2 = 316 120590 = 441 The histogram for the residuals from this equation with the best-fitting normal distribution overlaid is given below

bull (iii) The residuals from the log(wage) regression appear to be more normally distributed

Certainly the histogram in part (ii) fits under its comparable normal density better than in part (i) and the histogram for the wage residuals is notably skewed to the left In the wage regression there are some very large residuals (roughly equal to 15) that lie almost five estimated standard deviations (120590 = 3085) from the mean of the residuals which is identically zero of course Residuals far from zero does not appear to be nearly as much of a problem in the log(wage) regression

bull The skewness and kurtosis of the residuals from the OLS regression may be used to test the null that u is normally distributed (skewness = 0 and kurtosis = 3 under the null) The null distribution of the Jarque-Bera test statistic (which takes into account both skewness and kurtosis) is approximately the chi-squared with 2 degrees of freedom Hence at the 5 level we reject the normality of u whenever the JB statistic is greater than 599 In STATA this test is carried out by the command ldquosktestrdquo With the data in WAGE1RAW JB = sdot (extremely large) for the wage model and JB = 1059 for the log(wage) model While the normality is rejected for both models we do see that the JB statistic is much smaller for the log(wage) model [also compare its p-values of skewness and excess kurtosis tests against those from the wage model] Hence the normality appears to be a reasonable approximation to the distribution of the error term in the log(wage) model

15

Week 7 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the advantages of using the log of a variable in regression Find the ldquorules of

thumbrdquo for taking logs See page 191 of Wooldridge (p198-199 for the 3rd edition)

bull Be careful when you interpret the coefficients of explanatory variables in a model where some variables are in logarithm Do you remember Table 23 Consult Table 23

bull How do you compute the change in y caused by Δx when the model is built for log(y) A precise computation is given by Equation (68) of Wooldridge

bull Why do we need the ldquointeractionrdquo terms in regression models An interaction term is needed if the partial effect of an explanatory variable is linearly related to another explanatory variable See (617) for example

bull What is the adjusted R-squared What is the difference between it and the R-squared The primary attractiveness of the adjusted R-squared is that it imposes penalty for adding additional explanatory variables to a model The R-squared can never fall when a new explanatory variable is added However the adjusted R-squared will fall if the t-ratio on the new variable is less than one in absolute value

bull How do you construct interval prediction for given x-values This is nicely summarised in Equations (627)-(631) and the surrounding text

bull How do you predict y for given x-values when the model is built for log(y) Check the list on page 212 (p220 for the 3rd edition)

bull What is involved in ldquoresidual analysisrdquo See pages 209-210 (p217-218 for the 3rd edition)

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 64 bull (i) Holding all other factors fixed we have

Δlog(wage) = β1 Δeduc + β2 Δeducsdot pareduc = (β1 + β2 pareduc) Δeduc Dividing both sides by Δeduc gives the result The sign of β2 is not obvious although β2 gt 0 if we think a child gets more out of another year of education the more highly educated are the childrsquos parents

bull (ii) We use the values pareduc = 32 and pareduc = 24 to interpret the coefficient on educsdotpareduc The difference in the estimated return to education is 00078(32 ndash 24) = 0062 or about 62 percentage points (Percentage points are changes in percentages)

bull (iii) When we add pareduc by itself the coefficient on the interaction term is negative The t statistic on educsdotpareduc is about ndash133 which is not significant at the 10 level against a two-sided alternative Note that the coefficient on pareduc is significant at the 5 level against a two-sided alternative This provides a good example of how omitting a level effect (pareduc in this case) can lead to biased estimation of the interaction effect

Q2 Wooldridge 65 bull This would make little sense Performances on math and science exams are measures of

outputs of the educational process and we would like to know how various educational inputs and school characteristics affect math and science scores For example if the staff-to-pupil ratio has an effect on both exam scores why would we want to hold

16

performance on the science test fixed while studying the effects of staff on the math pass rate This would be an example of controlling for too many factors in a regression equation The variable scill could be a dependent variable in an identical regression equation

Q3 Wooldridge C63 (C62 for the 3rd Edition) (wage2_c6_3do) bull (i) Holding exper (and the elements in u) fixed we have

Δ log(wage) = β1Δeduc + β2Δeducsdotexper = (β1 + β2exper)Δeduc or Δ log(wage) Δeduc = (β1 + β2exper) This is the approximate proportionate change in wage given one more year of education

bull (ii) H0 β3 = 0 If we think that education and experience interact positively ndash so that people with more experience are more productive when given another year of education then β3 gt 0 is the appropriate alternative

bull (iii) The estimated equation is log (119908119886119892119890) = 595 + 0440 educ ndash 0215 exper + 00320 educsdotexper (024) (0174) (0200) (00153) n = 935 R2 = 135 adjusted-R2 = 132 The t statistic on the interaction term is about 213 which gives a p-value below 02 against H1 β3 gt 0 Therefore we reject H0 β3 = 0 in favor of H1 β3 gt 0 at the 2 level

bull (iv) We rewrite the equation as log(wage) = β0 + θ1educ + β2exper + β3 educ (exper minus 10) + u and run the regression log(wage) on educ exper and educ(exper ndash 10) We want the coefficient on educ We obtain θ1 asymp 0761 and se(θ1) asymp 0066 The 95 CI for θ1 is about 063 to 089

Q4 Wooldridge C68 (hprice1_c6_8do) bull (i) The estimated equation (where price is in dollars) is

119901119903120484119888119890 = minus217703 + 2068 lotsize + 12278 sqrft + 138525 bdrms (294750) (0642) (1324) (90101) n = 88 R2 = 672 adjusted-R2 = 661 120590 = 59833 The predicted price at lotsize = 10000 sqrft = 2300 and bdrms = 4 is about $336714

bull (ii) The regression is pricei on (lotsizei ndash 10000) (sqrfti ndash 2300) and (bdrmsi ndash 4) We want the intercept estimate and the associated 95 CI from this regression The CI is approximately 3367067 plusmn 14665 or about $322042 to $351372 when rounded to the nearest dollar

bull (iii) We must use equation (636) to obtain the standard error of and then use equation (637) (assuming that price is normally distributed) But from the regression in part (ii) se(1199100) asymp 73745 and 120590 asymp 59833 Therefore se(0) asymp [(73745)2 + (59833) 2]12 asymp 602858 Using 199 as the approximate 975th percentile in the t84 distribution gives the 95 CI for price0 at the given values of the explanatory variables as 3367067 plusmn 199(602858) or rounded to the nearest dollar $216738 to $456675 This is a fairly wide prediction interval But we have not used many factors to explain housing price If we had more we could presumably reduce the error standard deviation and therefore 120590 to obtain a tighter prediction interval

17

Week 8 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is a qualitative factor Give some examples

This should be easy bull How do you use dummy variables to represent qualitative factors

A factor with two levels (or values) is easily represented one dummy variable For a factor with more than two levels (or values) we need more than one dummies For example a factors with 5 values requires 4 dummy variables

bull How do you use dummy variables to represent an ordinal variable An ordinal variable may be represented by dummies in the same way as a multi-levelled factor

bull How do you test for differences in regression functions across different groups We generally use the F test on properly-defined restricted model and unrestricted model Chow test is a special case of the F test where the null hypothesis is no difference across groups

bull What can you achieve by interacting group dummy variables with other regressors The interaction allows the slope coefficients to be different across groups

bull What is ldquoprogram evaluationrdquo It assesses the effectiveness of a program by checking the differences in regression coefficients for the controlled (or base) group and the treatment group

bull What are the interpretations of the PRF and SRF when the dependent variable is binary The PRF is the conditional probability of ldquosuccessrdquo for given explanatory variables The SRF is the predicted conditional probability of ldquosuccessrdquo for given explanatory variables

bull What are the shortcomings of the LPM The major drawback of the LPM is that the predicted probability of success may be outside the interval [01]

bull Does MLR5 hold for the LPM No as indicated in the textbook and lecture the conditional variance of the disturbance is not a constant and depends on explanatory variables var(u|x) = E(y|x)[1minus E(y|x)]

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 74 bull (i) The approximate difference is just the coefficient on utility times 100 or ndash283 The t

statistic is minus283099 asymp minus286 which is statistically significant bull (ii) 100sdot[exp(minus283) ndash 1) asymp minus247 and so the estimate is somewhat smaller in magnitude bull (iii) The proportionate difference is 181 minus 158 = 023 or about 23 One equation that can

be estimated to obtain the standard error of this difference is log(salary) = β0 + β1log(sales) + β2roe + δ1consprod + δ2utility +δ3trans + u where trans is a dummy variable for the transportation industry Now the base group is finance and so the coefficient δ1 directly measures the difference between the consumer products and finance industries and we can use the t statistic on consprod

Q2 Wooldridge 76 bull In Section 33 ndash in particular in the discussion surrounding Table 32 ndash we discussed how to

determine the direction of bias in the OLS estimators when an important variable (ability in this case) has been omitted from the regression As we discussed there Table 32 only

18

strictly holds with a single explanatory variable included in the regression but we often ignore the presence of other independent variables and use this table as a rough guide (Or we can use the results of Problem 310 for a more precise analysis) If less able workers are more likely to receive training then train and u are negatively correlated If we ignore the presence of educ and exper or at least assume that train and u are negatively correlated after netting out educ and exper then we can use Table 32 the OLS estimator of β1 (with ability in the error term) has a downward bias Because we think β1 ge 0 we are less likely to conclude that the training program was effective Intuitively this makes sense if those chosen for training had not received training they would have lowers wages on average than the control group

Q3 Wooldridge 79 bull (i) Plugging in u = 0 and d = 1 gives f1(z) = (β0+δ0) + (β1+δ1)z bull (ii) Setting f0(z) = f1(z) gives β0 + β1z = (β0+δ0) + (β1+δ1)z Therefore provided δ1 ne 0

we have z = minusδ0δ1 Clearly z is positive if and only if δ0δ1 is negative which means δ0 and δ1 must have opposite signs

bull (iii) Using part (ii) we have totcoll = 357030 = 119 years (totcoll= total college years) bull (iv) The estimated years of college where women catch up to men is much too high to be

practically relevant While the estimated coefficient on femalesdottotcoll shows that the gap is reduced at higher levels of college it is never closed ndash not even close In fact at four years of college the difference in predicted log wage is still minus357+030(4) = 237 or about 211 less for women

Q4 Wooldridge C713 (apple_c7_13do) bull (i) 412660 asymp 624 bull (ii) The OLS estimates of the LPM are

119890119888119900119899119887119906119910 = 424 minus 803 ecoprc + 719 regprc + 00055 faminc + 024 hhsize (165) (109) (132) (00053) (013) + 025 educ minus 00050 age (008) (00125) n = 660 R2 = 110 If ecoprc increases by say 10 cents (10) then the probability of buying eco-labeled apples falls by about 080 If regprc increases by 10 cents the probability of buying eco-labeled apples increases by about 072 (Of course we are assuming that the probabilities are not close to the boundaries of zero and one respectively)

bull (iii) The F test with 4 and 653 df is 443 with p-value = 0015 Thus based on the usual F test the four non-price variables are jointly very significant Of the four variables educ appears to have the most important effect For example a difference of four years of education implies an increase of 025(4) = 10 in the estimated probability of buying eco-labeled apples This suggests that more highly educated people are more open to buying produce that is environmentally friendly which is perhaps expected Household size (hhsize) also has an effect Comparing a couple with two children to one that has no children ndash other factors equal ndash the couple with two children has a 048 higher probability of buying eco-labeled apples

bull (iv) The model with log(faminc) fits the data slightly better the R-squared increases to about 112 (We would not expect a large increase in R-squared from a simple change in the functional form) The coefficient on log(faminc) is about 045 (t = 155) If log(faminc) increases by 10 which means roughly a 10 increase in faminc then P(ecobuy = 1) is estimated to increase by about 0045 a pretty small effect

19

bull (v) The fitted probabilities range from about 185 to 1051 so none are negative There are two fitted probabilities above 1 which is not a source of concern with 660 observations

bull (vi) Using the standard prediction rule ndash predict one when 119890119888119900119899119887119906119910 ge 05 and zero otherwise ndash gives the fraction correctly predicted for ecobuy = 0 as 102248 asymp 411 so about 411 For ecobuy = 1 the fraction correctly predicted is 340412 asymp 825 or 825 With the usual prediction rule the model does a much better job predicting the decision to buy eco-labeled apples (The overall percent correctly predicted is about 67)

20

Week 9 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is heteroskedasticity in a regression model

When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i) we say that heteroskedasticity is present

bull In the presence of heteroskedasticity are the t-stat and F-stat from the usual OLS still valid Why Are there any other problems with the OLS under heteroskedasticity The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators

bull What are the heteroskedasticity-robust standard errors How do you use them in STATA These are the corrected standard errors that take into account the possible presence of heteroskedasticity The t-stat and F-stat computed using the heteroskedasticity-robust standard errors are valid test statistics In STATA these are easily obtained by using the option ldquorobustrdquo with the ldquoregressrdquo command eg regress lwage educ robust

bull How do you detect if there is heteroskedasticity The Breusch-Pagan test or the White test can be used to detect the presence of heteroskedasticity See Section 83 for details

bull If heteroskedasticity is present in a known form how would you estimate the model In this case the WLS estimators should be used which is more efficient than the OLS estimators See Section 84 for details

bull If heteroskedasticity is present in an unknown form how would you estimate the model If there is strong evidence for heteroskedasticity the FGLS estimators should be used which is based on an exponential functional form See Section 84 for details

bull What are the steps in the FGLS estimation You should summarise from Section 84

bull How would you handle the heteroskedasticity of the LPM The heteroskedasticity functional form is known for the LPM Hence the WLS can be used in principle However because the LPM can produce predicted probabilities that are outside the interval (01) the known functional form p(x)[1-p(x)] may not be useful for the WLS In that case it may be necessary to use FGLS for the LPM

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 81 bull Parts (ii) and (iii) The homoskedasticity assumption played no role in Chapter 5 in showing

that OLS is consistent But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid even in large samples As heteroskedasticity is a violation of the Gauss-Markov assumptions OLS is no longer BLUE

Q2 Wooldridge 82 bull With Var(u|incpriceeducfemale) = σ2inc2 h(x) = inc2 where h(x) is the heteroskedasticity

function defined in equation (821) Therefore radich(x) = inc and so the transformed equation is obtained by dividing the original equation by inc beerinc = β0(1inc) + β1 + β2priceinc + β3educinc + β4femaleinc + uinc

21

Notice that β1 which is the slope on inc in the original model is now a constant in the transformed equation This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation

Q3 Wooldridge 83 bull False The unbiasedness of WLS and OLS hinges crucially on Assumption MLR4 and as we

know from Chapter 4 this assumption is often violated when an important variable is omitted When MLR4 does not hold both WLS and OLS are biased Without specific information on how the omitted variable is correlated with the included explanatory variables it is not possible to determine which estimator has a small bias It is possible that WLS would have more bias than OLS or less bias Because we cannot know we should not claim to use WLS in order to solve ldquobiasesrdquo associated with OLS

Q4 Wooldridge 85 bull (i) No For each coefficient the usual standard errors and the heteroskedasticity-robust

ones are practically very similar bull (ii) The effect is minus029(4) = minus116 so the probability of smoking falls by about 116 bull (iii) As usual we compute the turning point in the quadratic 020[2(00026)] asymp 3846 so

about 38 and one-half years bull (iv) Holding other factors in the equation fixed a person in a state with restaurant smoking

restrictions has a 101 lower chance of smoking This is similar to the effect of having four more years of education

bull (v) We just plug the values of the independent variables into the OLS regression line 119904119898119900119896119890119904 = 656 minus 069log(6744)+012log(6500) minus029(16)+020(77) minus00026(77) asymp 0052 Thus the estimated probability of smoking for this person is close to zero (In fact this person is not a smoker so the equation predicts well for this particular observation)

Q5 Wooldridge C810 (401ksubs_c8_10do) bull (i) In the following equation estimated by OLS the usual standard errors are in (sdot) and the

heteroskedasticity-robust standard errors are in [sdot] 119890401119896 = minus506 + 0124 inc minus 000062 inc2 + 0265 age minus 00031 age2 minus 0035 male (081) (0006) (000005) (0039) (00005) (0121) [079] [0006] [000005] [0038] [00004] [0121] n = 9275 R2 = 094 There are no important differences if anything the robust standard errors are smaller

bull (ii) This is a general claim Since Var(y|x) = p(x)[1-p(x)] we can write E(u2|x) = p(x)-p(x)2 Written in error form u2 = p(x) - p(x)2 + v In other words we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v with the restrictions δ0 = 0 δ1 = 1 and δ2 = -1 Remember that for the LPM the fitted values 119910 are estimates of p(x) So when we run the regression 1199062 on 119910 and 1199102 (including an intercept) the intercept estimates should be close to zero the coefficient on 119910 should be close to one and the coefficient on 1199102 should be close to ndash1

bull (iii) The White LM statistic and F statistic about 5819 and 31032 respectively both of which are very significant The coefficient on 119890401119896 is about 1010 the coefficient on 119890401119896 2 about minus970 and the intercept is about -009 These estimates are quite close to what we expect to find from the theory in part (ii)

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 2: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

2

Table of Contents

Week 2 Tutorial Exercises 3

Problem Set 3

STATA Hints 4

Week 3 Tutorial Exercises 5

Review Questions (these may or may not be discussed in tutorial classes) 5

Problem Set 5

Week 4 Tutorial Exercises 7

Review Questions (these may or may not be discussed in tutorial classes) 7

Problem Set (these will be discussed in tutorial classes) 8

Week 5 Tutorial Exercises 9

Review Questions (these may or may not be discussed in tutorial classes) 9

Problem Set (these will be discussed in tutorial classes) 10

Week 6 Tutorial Exercises 11

Review Questions (these may or may not be discussed in tutorial classes) 11

Problem Set (these will be discussed in tutorial classes) 11

Week 7 Tutorial Exercises 15

Review Questions (these may or may not be discussed in tutorial classes) 15

Problem Set (these will be discussed in tutorial classes) 15

Week 8 Tutorial Exercises 17

Review Questions (these may or may not be discussed in tutorial classes) 17

Problem Set (these will be discussed in tutorial classes) 17

Week 9 Tutorial Exercises 20

Review Questions (these may or may not be discussed in tutorial classes) 20

Problem Set (these will be discussed in tutorial classes) 20

Week 10 Tutorial Exercises 23

Readings 23

Review Questions (these may or may not be discussed in tutorial classes) 23

Problem Set (these will be discussed in tutorial classes) 24

Week 11 Tutorial Exercises 26

Review Questions (these may or may not be discussed in tutorial classes) 26

Problem Set 27

Week 12 Tutorial Exercises 29

Review Questions (these may or may not be discussed in tutorial classes) 29

Problem Set (these will be discussed in tutorial classes) 30

Week 13 Tutorial Exercises 32

Review Questions (these may or may not be discussed in tutorial classes) 32

Problem Set (these will be discussed in tutorial classes) 33

3

Week 2 Tutorial Exercises

Problem Set Q1 Wooldridge 11 bull (i) Ideally we could randomly assign students to classes of different sizes That is each

student is assigned a different class size without regard to any student characteristics such as ability and family background For reasons we will see in Chapter 2 we would like substantial variation in class sizes (subject of course to ethical considerations and resource constraints)

bull (ii) A negative correlation means that larger class size is associated with lower performance We might find a negative correlation because larger class size actually hurts performance However with observational data there are other reasons we might find a negative relationship For example children from more affluent families might be more likely to attend schools with smaller class sizes and affluent children generally score better on standardized tests Another possibility is that within a school a principal might assign the better students to smaller classes Or some parents might insist their children are in the smaller classes and these same parents tend to be more involved in their childrenrsquos education

bull (iii) Given the potential for confounding factors ndash some of which are listed in (ii) ndash finding a negative correlation would not be strong evidence that smaller class sizes actually lead to better performance Some way of controlling for the confounding factors is needed and this is the subject of multiple regression analysis

Q2 Wooldridge 12 bull (i) Here is one way to pose the question If two firms say A and B are identical in all

respects except that firm A supplies job training one hour per worker more than firm B by how much would firm Arsquos output differ from firm Brsquos

bull (ii) Firms are likely to choose job training depending on the characteristics of workers Some observed characteristics are years of schooling years in the workforce and experience in a particular job Firms might even discriminate based on age gender or race Perhaps firms choose to offer training to more or less able workers where ldquoabilityrdquo might be difficult to quantify but where a manager has some idea about the relative abilities of different employees Moreover different kinds of workers might be attracted to firms that offer more job training on average and this might not be evident to employers

bull (iii) The amount of capital and technology available to workers would also affect output So two firms with exactly the same kinds of employees would generally have different outputs if they use different amounts of capital or technology The quality of managers would also have an effect

bull (iv) No unless the amount of training is randomly assigned The many factors listed in parts (ii) and (iii) can contribute to finding a positive correlation between output and training even if job training does not improve worker productivity

Q3 Wooldridge C13 (meap01_ch01do) bull (i) The largest is 100 the smallest is 0 bull (ii) 38 out of 1823 or about 21 percent of the sample bull (iii) 17 bull (iv) The average of math4 is about 719 and the average of read4 is about 601 So at least in

2001 the reading test was harder to pass bull (v) The sample correlation between math4 and read4 is about 0843 which is a very high

degree of (linear) association Not surprisingly schools that have high pass rates on one test have a strong tendency to have high pass rates on the other test

4

bull (vi) The average of exppp is about $519487 The standard deviation is $109189 which shows rather wide variation in spending per pupil [The minimum is $120688 and the maximum is $1195764]

bull (vii) 87 Note that by Taylor expansion log(x + h) ndash log(x) = log(1 + hx) asymp hx for positive x and small hx (in absolute value)

Q4 Wooldridge C14 (jtrain2_ch01do) bull (i) 185445 asymp 416 is the fraction of men receiving job training or about 416 bull (ii) For men receiving job training the average of re78 is about 635 or $6350 For men not

receiving job training the average of re78 is about 455 or $4550 The difference is $1800 which is very large On average the men receiving the job training had earnings about 40 higher than those not receiving training

bull (iii) About 243 of the men who received training were unemployed in 1978 the figure is 354 for men not receiving training This too is a big difference

bull (iv) The differences in earnings and unemployment rates suggest the training program had strong positive effects Our conclusions about economic significance would be stronger if we could also establish statistical significance (which is done in Computer Exercise C910 in Chapter 9)

STATA Hints bull Some of STATA do-files for solving the Problem Set are posted in the course website It is

important to understand the commands in the do-files so that you will be able write your own do-files for your assignments and the course project

5

Week 3 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull The minimum requirement for OLS to be carried out for the data set (xi yi) i=1hellipn with

the sample size n gt 2 is that the sample variance of x is positive In what circumstances is the sample variance of x zero When all observations on x have the same value there is no variation in x

bull The OLS estimation of the simple regression model has the following properties a) the sum of the residuals is zero b) the sample covariance of the residuals and x is zero

Why How would you relate them to the ldquoleast squaresrdquo principle These are derived from the first-order conditions for minimising the sum of squared residuals (SSR)

bull Convince yourself that the point )( yx the sample means of x and y is on the sample regression function (SRF) which is a straight line This can be derived from the first of the first-order conditions for minimising SSR

bull How do you know that SST = SSE + SSR is true The dependent variable values can be expressed as the fitted value plus the residual After taking the average away from both side of the equation we find (119910119894 minus 119910) = (119910119894 minus 119910) + 119906119894 The required ldquoSST = SSE + SSRrdquo is obtained by squaring and summing both sides of the above equation noting (119910119894 minus 119910) = 1(119909119894 minus ) and using the second of the first-order conditions of minimising SSR

bull Which of the following models is (are) nonlinear model(s) a) sales = β0 [1 + exp(-β1 ad_expenditure)] + u b) sales = β0 + β1 log(ad_expenditure) + u c) sales = β0 + β1 exp(ad_expenditure) + u d) sales = exp(β0 + β1 ad_expenditure + u)

The model in a) is nonlinear bull Can you follow the proofs of Theorems 21-23

You should be able to follow the proofs If not at this point try again

Problem Set Q1 Wooldridge 24 bull (i) When cigs = 0 predicted birth weight is 11977 ounces When cigs = 20 992256bwght =

10949 This is about an 86 drop bull (ii) Not necessarily There are many other factors that can affect birth weight

particularly overall health of the mother and quality of prenatal care These could be correlated with cigarette smoking during birth Also something such as caffeine consumption can affect birth weight and might also be correlated with cigarette smoking

bull (iii) If we want a predicted bwght of 125 then cigs = (125 ndash 11977)( ndash524) ndash1018 or about ndash10 cigarettes This is nonsense of course and it shows what happens when we are trying to predict something as complicated as birth weight with only a single explanatory variable The largest predicted birth weight is necessarily 11977 Yet almost 700 of the births in the sample had a birth weight higher than 11977

bull (iv) 1176 out of 1388 women did not smoke while pregnant or about 847 Because we are using only cigs to explain birth weight we have only one predicted birth weight

6

at cigs = 0 The predicted birth weight is necessarily roughly in the middle of the observed birth weights at cigs = 0 and so we will under predict high birth rates

Q2 Wooldridge 27 bull (i) When we condition on inc in computing an expectation inc becomes a constant So

E(u|inc) = E(radic119894119899119888sdote|inc) = radic119894119899119888sdotE(e|inc) = radic119894119899119888sdot0 because E(e|inc) = E(e) = 0 bull (ii) Again when we condition on inc in computing a variance inc becomes a constant So

Var(u|inc) = Var(radic119894119899119888sdote|inc) = (radic119894119899119888)2Var(e|inc) = 1205901205762inc because Var(e|inc) = 1205901205762 bull (iii) Families with low incomes do not have much discretion about spending typically a

low-income family must spend on food clothing housing and other necessities Higher income people have more discretion and some might choose more consumption while others more saving This discretion suggests wider variability in saving among higher income families

Q3 Wooldridge C26 (meap93_ch02do) bull (i) It seems plausible that another dollar of spending has a larger effect for low-spending

schools than for high-spending schools At low-spending schools more money can go toward purchasing more books computers and for hiring better qualified teachers At high levels of spending we would expect little if any effect because the high-spending schools already have high-quality teachers nice facilities plenty of books and so on

bull (ii) If we take changes as usual we obtain ∆119898119886119905ℎ10 = 1205731∆ log(119890119909119901119890119899119889) asymp 1205731

100 (∆119890119909119901119890119899119889)

just as in the second row of Table 23 So if Δexpend = 10 Δmath10 = β110 bull (iii) The regression results are

119898119886119905ℎ10 = minus6934 + 1116 log(119890119909119901119890119899119889) 119899 = 408 1198772 = 0297 bull (iv) If expend increases by 10 percent 119898119886119905ℎ10 increases by about 11 percentage points

This is not a huge effect but it is not trivial for low-spending schools where a 10 percent increase in spending might be a fairly small dollar amount

bull (v) In this data set the largest value of math10 is 667 which is not especially close to 100 In fact the largest fitted values is only about 302

Q4 Wooldridge C27 (charity_ch02do) bull (i) The average gift is about 744 Dutch guilders Out of 4268 respondents 2561 did not

give a gift or about 60 percent bull (ii) The average mailings per year are about 205 The minimum value is 25 (which

maximum value is 35 bull (iii) The estimated equation is

119892120484119891119905 = 201 + 265 mailsyear n = 4268 1198772 = 0138 bull (iv) The slope coefficient from part (iii) means that each mailing per year is associated with

ndash perhaps even ldquocausesrdquo ndash an estimated 265 additional guilders on average Therefore if each mailing costs one guilder the expected profit from each mailing is estimated to be 165 guilders This is only the average however Some mailings generate no contributions or a contribution less than the mailing cost other mailings generated much more than the mailing cost

bull (v) Because the smallest mailsyear in the sample is 25 the smallest predicted value of gifts is 201 + 265(25) asymp 267 Even if we look at the overall population where some people have received no mailings the smallest predicted value is about two So with this estimated equation we never predict zero charitable gifts

7

Week 4 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What do we mean when we say ldquoregress wage on educ and exprrdquo

Use OLS to estimate the multiple linear regression model wage = β0 + β1educ + β2expr + u bull Why and under what circumstance do we need to ldquocontrol forrdquo expr in the regression model

in order to quantify the effect of educ on wage We need to control for expr in the model when the partial or ceteris paribus effect of educ on wage is the parameter of interest and expr and educ are correlated

bull What is the bias of an estimator bias =E( 119895) ndash βj

bull What is the ldquoomitted variable biasrdquo It is the bias resulted from omitting relevant variables that are correlated with the explanatory variables in a regression model

bull What is the consequence of adding an irrelevant variable to a regression model Including an irrelevant variable will generally increase the variance of the OLS estimators

bull What is the requirement of the ZCM assumption in your own words You must do this by yourself

bull Why assuming E(u) = 0 is not restrictive when an intercept is included in the regression model When the mean value of the disturbance is not zero you can always define the new disturbance as the original disturbance minus the mean value and define the new intercept as the original intercept plus the mean value

bull In terms of notation why do we need two subscripts for independent variables In our notation the first subscript is for indexing observations (1st obs 2nd obs etc) and the second subscript is for indexing variables (1st variable 2nd variable etc)

bull Using OLS to estimate a multiple regression model with k independent variables and an intercept how many first-order conditions are there k + 1

bull How do you know that the OLS estimators are linear combinations of the observations on the dependent variable By the first-order conditions we see that the OLS estimators are the solution to k+1 linear equations with k+1 unknowns The ldquoright-hand-sidesrdquo of the equations are linear functions of observations on the dependent variable (say y) Hence the solution is linear combinations of the y-observations

bull What is Gauss-Markov theorem Read the textbook bull Try to explain why R2 never decreases (it is likely to increase) when additional explanatory

variables are added to the regression model First the smaller is the SSR the greater is the R-squared The OLS chooses parameter estimates to minimise the SSR With extra explanatory variables the R-squared does not decrease because the SSR does not increase which is true because the OLS has a larger parameter space to choose estimates (more possibilities to make SSR small)

bull What is an endogenous explanatory variable Exogenous explanatory variable See the top of page 88

bull What is multicollinearity and its likely effect on the OLS estimators Read pages 95-99

8

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 32 bull (i) Yes Because of budget constraints it makes sense that the more siblings there are in a

family the less education any one child in the family has To find the increase in the number of siblings that reduces predicted education by one year we solve 1 = 094(Δsibs) so Δsibs = 1094 asymp 106

bull (ii) Holding sibs and feduc fixed one more year of motherrsquos education implies 131 years more of predicted education So if a mother has four more years of education her son is predicted to have about a half a year (524) more years of education

bull (iii) Since the number of siblings is the same but meduc and feduc are both different the coefficients on meduc and feduc both need to be accounted for The predicted difference in education between B and A is 131(4) + 210(4) = 1364

Q2 Wooldridge 310 bull (i) Because x1 is highly correlated with x2 and x3 and these latter variables have large partial

effects on y the simple and multiple regression coefficients on x1 can differ by large amounts We have not done this case explicitly but given equation (346) and the discussion with a single omitted variable the intuition is pretty straightforward

bull (ii) Here we would expect 1205731 and 1 to be similar (subject of course to what we mean by ldquoalmost uncorrelatedrdquo) The amount of correlation between x2 and x3 does not directly effect the multiple regression estimate on x1 if is essentially uncorrelated with x2 and x3

bull (iii) In this case we are (unnecessarily) introducing multicollinearity into the regression x2 and x3 have small partial effects on y and yet x2 and x3 are highly correlated withx1 Adding x2 and x3 likely increases the standard error of the coefficient on x1 substantially so se(1) is likely to be much larger than se(1205731)

bull (iv) In this case adding x2 and x3 will decrease the residual variance without causing much collinearity (because x1 is almost uncorrelated with x2 and x3 ) so we should see se(1) smaller than se(1205731) The amount of correlation between x and does not directly affect se(1)

Q3 Wooldridge C32 (hprice1_ch03do) bull (i) The estimated equation is

119901119903120484119888119890 = minus1932 + 128 119904119902119903119891119905 + 1520 119887119889119903119898119904 119899 = 88 1198772 = 632 bull (ii) Holding square footage constant (the price increase) = 1520 (increase in bedroom)

Hence the price increases by 1520 which means $15200 bull (iii) Now (the price increase) = 128 (increase in square feet) + 1520 (increase in bedroom)

= 128 (140) + 1520 (1) = 3312 That is $33120 Because the house size is increasing the effect is larger than that in (ii)

bull (iv) About 632 bull (v) The predicted price is -1932+128(2438)+1520(4) = 353544 or $353544 bull (vi) The residual = 300000 ndash 353544 = - 53544 The buyer under-paid judged by the

predicted price But of course there are many other features of a house (some that we cannot even measure) that affect price We have not controlled for those

Q4 Wooldridge C36 (wage2_ch03_c3_6do) bull (i) The slope coefficient from regressing IQ on educ is 353383 bull (ii) The slope coefficient from log(wage) on educ is 05984 bull (iii) The slope coefficient from log(wage) on educ and IQ are 03912 and 00586 respectively bull (iv) The computed 211 βδβ ˆ~ˆ + = 03912 + 353383(00586) asymp 05984

9

Week 5 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the CLM assumptions

These are MLR1 MLR2 MLR3 and MLR6 MLR6 is a very strong assumption that implies both MLR4 and MLR5

bull What is the sampling distribution of the OLS estimators under the CLM assumptions The OLS estimators under the CLM assumptions follow the normal distribution

bull What are the standard errors of the OLS estimators The standard error is the square root of the estimated variance of the estimator

bull What is the null hypothesis about a parameter In this context the null hypothesis is a statement that assigns a known value (or hypothesised value) to the parameter of interest Usually the null is a maintained proposition (from economic theories or experience) that will only be rejected on very strong evidence

bull What is a one-tailed (two-tailed) alternative hypothesis This should be clear once you finish your reading

bull In testing hypotheses what is a Type 1 (Type 2) error What is the level of significance Type 1 error is the rejection of a true null Type 2 error is the non-rejection of a false null

bull The decision rule we use can be stated as ldquoreject the null if the t-statistic exceeds the critical valuerdquo How is the critical value determined The critical value is determined by the level of significance (or level of confidence in the case of confidence intervals) and the distribution of the test statistic For example for t-stat we use the t-distribution when the sample size is small and the normal distribution when the sample size is large (df gt 120)

bull Justify the statement ldquoGiven the observed test statistic the p-value is the smallest significant level at which the null hypothesis would be rejectedrdquo If you used the observed t-stat as your critical value you would just reject the null (because the t-stat is equal to this value) This ldquocritical valuerdquo (= the t-stat value) implies a level of significance say p which is exactly the p-value For any significance level smaller than p the corresponding critical value is more extreme than the t-stat value and does not lead to the rejection of the null Hence p is the smallest significance level at which the null would be rejected

bull What is the 90 confidence interval for a parameter It is an interval [L U] defined by two sample statistics the lower bound L and the upper bound U The probability that [L U] covers the parameter is 09

bull In constructing a confidence interval for a parameter what is the level of confidence It is the probability that the CI covers the parameter

bull When the level of confidence increases how would the width of the confidence interval change (holding other things fixed) The width will increase

bull Try to convince yourself that the event ldquothe 90 confidence interval covers a hypothesised value of the parameterrdquo is the same as the event ldquothe null of the parameter being the hypothesised value cannot be rejected in favour of the two-tailed alternative at the 10 level of significancerdquo It becomes apparent if you start with P(|t-stat| gt c) = 09 and disentangle the expression for the t-stat

10

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 41 bull (i) and (iii) generally cause the t statistics not to have a t distribution under H0

Homoskedasticity is one of the CLM assumptions An important omitted variable violates Assumption MLR4 (hence MLR6) in general The CLM assumptions contain no mention of the sample correlations among independent variables except to rule out the case where the correlation is one (MLR3)

Q2 Wooldridge 42 bull (i) H0β3 = 0 H1β3 gt 0 bull (ii) The proportionate effect on is 00024(50) = 012 To obtain the percentage effect we

multiply this by 100 12 Therefore a 50 point ceteris paribus increase in ros is predicted to increase salary by only 12 Practically speaking this is a very small effect for such a large change in ros

bull (iii) The 10 critical value for a one-tailed test using df = infin is obtained from Table G2 as 1282 The t statistic on ros is 0002400054 asymp 44 which is well below the critical value Therefore we fail to reject H0 at the 10 significance level

bull (iv) Based on this sample the estimated ros coefficient appears to be different from zero only because of sampling variation On the other hand including ros may not be causing any harm it depends on how correlated it is with the other independent variables (although these are very significant even with ros in the equation)

Q3 Wooldridge 45 bull (i) 412 plusmn 196(094) or about [228 596] bull (ii) No because the value 4 is well inside the 95 CI bull (iii) Yes because 1 is well outside the 95 CI

Q4 Wooldridge C48 (401ksubs_ch04do) bull (i) There are 2017 single people in the sample of 9275 bull (ii) The estimated equation is

119899119890119905119905119891119886 = minus4304 + 799 inc + 843 age ( 408) (060) (092) n = 2017 R2 = 119 The coefficient on inc indicates that one more dollar in income (holding age fixed) is reflected in about 80 more cents in predicted nettfa no surprise there The coefficient on age means that holding income fixed if a person gets another year older hisher nettfa is predicted to increase by about $843 (Remember nettfa is in thousands of dollars) Again this is not surprising

bull (iii) The intercept is not very interesting as it gives the predicted nettfa for inc = 0 and age = 0 Clearly there is no one with even close to these values in the relevant population

bull (iv) The t statistic is (843 minus 1)092 asymp minus171 Against the one-sided alternative H1 β2 lt 1 the p-value is about 044 Therefore we can reject H0 β2 = 1 at the 5 significance level (against the one-sided alternative)

bull (v) The slope coefficient on inc in the simple regression is about 821 which is not very different from the 799 obtained in part (ii) As it turns out the correlation between inc and age in the sample of single people is only about 039 which helps explain why the simple and multiple regression estimates are not very different see the text also

11

Week 6 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull How would you test hypotheses about a single linear combination of parameters

We may re-parameterise the regression model to isolate the ldquosingle linear combinationrdquo Then the OLS on the re-parameterised model will provide the estimate of the ldquosingle linear combinationrdquo and the related standard error See the example in the textbook and lecture slides

bull What are exclusion restrictions for a regression model Exclusion restrictions are the null hypothesis that a group of x-variables have zero coefficients in the regression model

bull What are restricted and unrestricted models When the null hypothesis is a set of restrictions on the parameters the regression under the null is called the restricted model while the regression under the alternative (which simply states that the null is false) is called the unrestricted model

bull How do you compute the F-statistic given that you have SSRs Clearly the general F-stat is based on the relative difference between the SSRs under the restricted model and unrestricted model F = [(SSRr ndash SSRur)q][SSRur(n-k-1)] where you should be able to explain the meanings of various symbols

bull What are general linear restrictions on parameters These are linear equations on the parameters For example β1 ndash 2β2 + 3β3 = 0

bull What is the test for the overall significance of a regression This is the F-test for the null hypothesis that all the coefficients of x-variables are zero

bull How would you report your regression results See the guidelines in Section 46 of the textbook

bull Why would you care about the asymptotic properties of the OLS estimators When MLR6 does not hold the finite-sample distribution of the OLS estimators is not available For reasonably large samples (large n) we use the asymptotic distribution of the OLS estimators which is known from studying the asymptotic properties to approximate the finite-sample distribution of the OLS estimators which is unknown

bull Comparing the inference procedures in Chapter 5 with those in Chapter 4 can you list the similarities and differences It is beneficial to make a list by yourself

bull Under MLR1-MLR5 the OLS estimators are consistent asymptotically normal and asymptotically efficient Try to explain these properties in your own words This is really a test on your understanding of these notions in econometrics

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 46 bull (i) With df = n ndash 2 = 86 we obtain the 5 critical value from Table G2 with df = 90 Because

each test is two-tailed the critical value is 1987 The t statistic for H0β0 = 0 is about -89 which is much less than 1987 in absolute value Therefore we fail to reject H0β0 = 0 The t statistic for H0β1 = 1 is (976 ndash 1)049 asymp -49 which is even less significant (Remember we reject H0 in favor of H1 in this case only if |t| gt 1987)

bull (ii) We use the SSR form of the F statistic We are testing q = 2 restrictions and the df in the unrestricted model is 86 We are given SSRr = 20944899 and SSRur = 16564451 Therefore F = [(20944899 minus 16564451)2][1656445186] asymp 1137

12

which is a strong rejection of H0 from Table G3c the 1 critical value with 2 and 90 df is 485

bull (iii) We use the R-squared form of the F statistic We are testing q = 3 restrictions and there are 88 ndash 5 = 83 df in the unrestricted model The F statistic is F = [(829 ndash 820)3][(1 ndash 829)83] asymp 146 The 10 critical value (again using 90 denominator df in Table G3a) is 215 so we fail to reject H0 at even the 10 level In fact the p-value is about 23

bull (iv) If heteroskedasticity were present Assumption MLR5 would be violated and the F statistic would not have an F distribution under the null hypothesis Therefore comparing the F statistic against the usual critical values or obtaining the p-value from the F distribution would not be especially meaningful

Q2 Wooldridge 48 bull (i) We use Property VAR3 from Appendix B

Var(1minus 32) = Var (1) + 9 Var (2) ndash 6 Cov (1 2) bull (ii) t = (1 minus 32 minus 1)se(1minus 32) so we need the standard error of 1minus 32 bull (iii) Because θ = β1 ndash 3β2 we can write β1 = θ + 3β2 Plugging this into the population model

gives y = β0 + (θ + 3β2)x1 + β2 x2 + β3 x3 + u = β0 + θ x1 + β2 (3 x1 + x2) + β3 x3 + u This last equation is what we would estimate by regressing y on x1 3x1 + x2 and x3 The coefficient and standard error on x1 are what we want

Q3 Wooldridge 410 bull (i) We need to compute the F statistic for the overall significance of the regression with n =

142 and k = 4 F = [0395(1 ndash 0395)](1374) asymp 141 The 5 critical value with 4 and 120 df is 245 which is well above the value of F Therefore we fail to reject H0 β1 = β2 = β3 = β4 = 0 at the 10 level No explanatory variable is individually significant at the 5 level The largest absolute t statistic is on dkr tdkr asymp 160 which is not significant at the 5 level against a two-sided alternative

bull (ii) The F statistic (with the same df) is now [0330(1 ndash 0330)](1374) asymp 117 which is even lower than in part (i) None of the t statistics is significant at a reasonable level

bull (iii) We probably should not use the logs as the logarithm is not defined for firms that have zero for dkr or eps Therefore we would lose some firms in the regression

bull (iv) It seems very weak There are no significant t statistics at the 5 level (against a two-sided alternative) and the F statistics are insignificant in both cases Plus less than 4 of the variation in return is explained by the independent variables

Q4 Wooldridge C49 (discrim_c4_9do) bull (i) The results from the OLS regression with standard errors in parentheses are

log (119901119904119900119889119886) = minus146 + 073 prpblck + 137 log(income) + 380 prppov (029) (031) (027) (133) n = 401 R2 = 087

bull The p-value for testing H0 β1= 0 against the two-sided alternative is about 018 so that we reject H0 at the 5 level but not at the 1 level

bull (ii) The correlation is about minus84 indicating a strong degree of multicollinearity Yet each coefficient is very statistically significant the t statistic for log (119894119899119888119900119898) is about 51 and that for prppov is about 286 (two-sided p-value = 004)

bull (iii) The OLS regression results when log(hseval) is added are log (119901119904119900119889119886) = minus84 + 098 prpblck minus 053 log(income) + 052 prppov + 121 log(hseval) (29) (029) (038) (134) (018)

13

n = 401 R2 = 184 The coefficient on log(hseval) is an elasticity a one percent increase in housing value holding the other variables fixed increases the predicted price by about 12 percent The two-sided p-value is zero to three decimal places

bull (iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the 15 significance level against a two-sided alternative for log(income) and prppov is does not have a t statistic even close to one in absolute value) Nevertheless they are jointly significant at the 5 level because the outcome of the F2396 statistic is about 352 with p-value = 030 All of the control variables ndash log(income) prppov and log(hseval) ndash are highly correlated so it is not surprising that some are individually insignificant

bull (v) Because the regression in (iii) contains the most controls log(hseval) is individually significant and log(income) and prppov are jointly significant (iii) seems the most reliable It holds fixed three measure of income and affluence Therefore a reasonable estimate is that if prpblck increases by 10 psoda is estimated to increase by 1 other factors held fixed

Q5 Wooldridge 52 bull This is about the inconsistency of the simple regression of pctstck on funds A higher

tolerance of risk means more willingness to invest in the stock market so β2 gt 0 By assumption funds and risktol are positively correlated Now we use equation (55) where δ1 gt 0 plim(1205731) = β1 + β2δ1 gt β1 so 1205731 has a positive inconsistency (asymptotic bias) This makes sense if we omit risktol from the regression and it is positively correlated with funds some of the estimated effect of funds is actually due to the effect of risktol

Q6 Wooldridge C51 (wage1_c5_1do) bull (i) The estimated equation is

119908119886119892119890 = minus287 + 599 educ + 022 exper + 169 tenure (073) (051) (012) (022) n = 526 R2 = 306 120590 = 3085

bull Below is a histogram of the 526 residual 119906119894 i = 1 2 526 The histogram uses 27 bins which is suggested by the formula in the Stata manual for 526 observations For comparison the normal distribution that provides the best fit to the histogram is also plotted

bull

14

bull (ii) With log(wage) as the dependent variable the estimated equation is log (119908119886119892119890) = 284 + 092 educ + 0041 exper + 022 tenure (104) (007) (0017) (003) n = 526 R2 = 316 120590 = 441 The histogram for the residuals from this equation with the best-fitting normal distribution overlaid is given below

bull (iii) The residuals from the log(wage) regression appear to be more normally distributed

Certainly the histogram in part (ii) fits under its comparable normal density better than in part (i) and the histogram for the wage residuals is notably skewed to the left In the wage regression there are some very large residuals (roughly equal to 15) that lie almost five estimated standard deviations (120590 = 3085) from the mean of the residuals which is identically zero of course Residuals far from zero does not appear to be nearly as much of a problem in the log(wage) regression

bull The skewness and kurtosis of the residuals from the OLS regression may be used to test the null that u is normally distributed (skewness = 0 and kurtosis = 3 under the null) The null distribution of the Jarque-Bera test statistic (which takes into account both skewness and kurtosis) is approximately the chi-squared with 2 degrees of freedom Hence at the 5 level we reject the normality of u whenever the JB statistic is greater than 599 In STATA this test is carried out by the command ldquosktestrdquo With the data in WAGE1RAW JB = sdot (extremely large) for the wage model and JB = 1059 for the log(wage) model While the normality is rejected for both models we do see that the JB statistic is much smaller for the log(wage) model [also compare its p-values of skewness and excess kurtosis tests against those from the wage model] Hence the normality appears to be a reasonable approximation to the distribution of the error term in the log(wage) model

15

Week 7 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the advantages of using the log of a variable in regression Find the ldquorules of

thumbrdquo for taking logs See page 191 of Wooldridge (p198-199 for the 3rd edition)

bull Be careful when you interpret the coefficients of explanatory variables in a model where some variables are in logarithm Do you remember Table 23 Consult Table 23

bull How do you compute the change in y caused by Δx when the model is built for log(y) A precise computation is given by Equation (68) of Wooldridge

bull Why do we need the ldquointeractionrdquo terms in regression models An interaction term is needed if the partial effect of an explanatory variable is linearly related to another explanatory variable See (617) for example

bull What is the adjusted R-squared What is the difference between it and the R-squared The primary attractiveness of the adjusted R-squared is that it imposes penalty for adding additional explanatory variables to a model The R-squared can never fall when a new explanatory variable is added However the adjusted R-squared will fall if the t-ratio on the new variable is less than one in absolute value

bull How do you construct interval prediction for given x-values This is nicely summarised in Equations (627)-(631) and the surrounding text

bull How do you predict y for given x-values when the model is built for log(y) Check the list on page 212 (p220 for the 3rd edition)

bull What is involved in ldquoresidual analysisrdquo See pages 209-210 (p217-218 for the 3rd edition)

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 64 bull (i) Holding all other factors fixed we have

Δlog(wage) = β1 Δeduc + β2 Δeducsdot pareduc = (β1 + β2 pareduc) Δeduc Dividing both sides by Δeduc gives the result The sign of β2 is not obvious although β2 gt 0 if we think a child gets more out of another year of education the more highly educated are the childrsquos parents

bull (ii) We use the values pareduc = 32 and pareduc = 24 to interpret the coefficient on educsdotpareduc The difference in the estimated return to education is 00078(32 ndash 24) = 0062 or about 62 percentage points (Percentage points are changes in percentages)

bull (iii) When we add pareduc by itself the coefficient on the interaction term is negative The t statistic on educsdotpareduc is about ndash133 which is not significant at the 10 level against a two-sided alternative Note that the coefficient on pareduc is significant at the 5 level against a two-sided alternative This provides a good example of how omitting a level effect (pareduc in this case) can lead to biased estimation of the interaction effect

Q2 Wooldridge 65 bull This would make little sense Performances on math and science exams are measures of

outputs of the educational process and we would like to know how various educational inputs and school characteristics affect math and science scores For example if the staff-to-pupil ratio has an effect on both exam scores why would we want to hold

16

performance on the science test fixed while studying the effects of staff on the math pass rate This would be an example of controlling for too many factors in a regression equation The variable scill could be a dependent variable in an identical regression equation

Q3 Wooldridge C63 (C62 for the 3rd Edition) (wage2_c6_3do) bull (i) Holding exper (and the elements in u) fixed we have

Δ log(wage) = β1Δeduc + β2Δeducsdotexper = (β1 + β2exper)Δeduc or Δ log(wage) Δeduc = (β1 + β2exper) This is the approximate proportionate change in wage given one more year of education

bull (ii) H0 β3 = 0 If we think that education and experience interact positively ndash so that people with more experience are more productive when given another year of education then β3 gt 0 is the appropriate alternative

bull (iii) The estimated equation is log (119908119886119892119890) = 595 + 0440 educ ndash 0215 exper + 00320 educsdotexper (024) (0174) (0200) (00153) n = 935 R2 = 135 adjusted-R2 = 132 The t statistic on the interaction term is about 213 which gives a p-value below 02 against H1 β3 gt 0 Therefore we reject H0 β3 = 0 in favor of H1 β3 gt 0 at the 2 level

bull (iv) We rewrite the equation as log(wage) = β0 + θ1educ + β2exper + β3 educ (exper minus 10) + u and run the regression log(wage) on educ exper and educ(exper ndash 10) We want the coefficient on educ We obtain θ1 asymp 0761 and se(θ1) asymp 0066 The 95 CI for θ1 is about 063 to 089

Q4 Wooldridge C68 (hprice1_c6_8do) bull (i) The estimated equation (where price is in dollars) is

119901119903120484119888119890 = minus217703 + 2068 lotsize + 12278 sqrft + 138525 bdrms (294750) (0642) (1324) (90101) n = 88 R2 = 672 adjusted-R2 = 661 120590 = 59833 The predicted price at lotsize = 10000 sqrft = 2300 and bdrms = 4 is about $336714

bull (ii) The regression is pricei on (lotsizei ndash 10000) (sqrfti ndash 2300) and (bdrmsi ndash 4) We want the intercept estimate and the associated 95 CI from this regression The CI is approximately 3367067 plusmn 14665 or about $322042 to $351372 when rounded to the nearest dollar

bull (iii) We must use equation (636) to obtain the standard error of and then use equation (637) (assuming that price is normally distributed) But from the regression in part (ii) se(1199100) asymp 73745 and 120590 asymp 59833 Therefore se(0) asymp [(73745)2 + (59833) 2]12 asymp 602858 Using 199 as the approximate 975th percentile in the t84 distribution gives the 95 CI for price0 at the given values of the explanatory variables as 3367067 plusmn 199(602858) or rounded to the nearest dollar $216738 to $456675 This is a fairly wide prediction interval But we have not used many factors to explain housing price If we had more we could presumably reduce the error standard deviation and therefore 120590 to obtain a tighter prediction interval

17

Week 8 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is a qualitative factor Give some examples

This should be easy bull How do you use dummy variables to represent qualitative factors

A factor with two levels (or values) is easily represented one dummy variable For a factor with more than two levels (or values) we need more than one dummies For example a factors with 5 values requires 4 dummy variables

bull How do you use dummy variables to represent an ordinal variable An ordinal variable may be represented by dummies in the same way as a multi-levelled factor

bull How do you test for differences in regression functions across different groups We generally use the F test on properly-defined restricted model and unrestricted model Chow test is a special case of the F test where the null hypothesis is no difference across groups

bull What can you achieve by interacting group dummy variables with other regressors The interaction allows the slope coefficients to be different across groups

bull What is ldquoprogram evaluationrdquo It assesses the effectiveness of a program by checking the differences in regression coefficients for the controlled (or base) group and the treatment group

bull What are the interpretations of the PRF and SRF when the dependent variable is binary The PRF is the conditional probability of ldquosuccessrdquo for given explanatory variables The SRF is the predicted conditional probability of ldquosuccessrdquo for given explanatory variables

bull What are the shortcomings of the LPM The major drawback of the LPM is that the predicted probability of success may be outside the interval [01]

bull Does MLR5 hold for the LPM No as indicated in the textbook and lecture the conditional variance of the disturbance is not a constant and depends on explanatory variables var(u|x) = E(y|x)[1minus E(y|x)]

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 74 bull (i) The approximate difference is just the coefficient on utility times 100 or ndash283 The t

statistic is minus283099 asymp minus286 which is statistically significant bull (ii) 100sdot[exp(minus283) ndash 1) asymp minus247 and so the estimate is somewhat smaller in magnitude bull (iii) The proportionate difference is 181 minus 158 = 023 or about 23 One equation that can

be estimated to obtain the standard error of this difference is log(salary) = β0 + β1log(sales) + β2roe + δ1consprod + δ2utility +δ3trans + u where trans is a dummy variable for the transportation industry Now the base group is finance and so the coefficient δ1 directly measures the difference between the consumer products and finance industries and we can use the t statistic on consprod

Q2 Wooldridge 76 bull In Section 33 ndash in particular in the discussion surrounding Table 32 ndash we discussed how to

determine the direction of bias in the OLS estimators when an important variable (ability in this case) has been omitted from the regression As we discussed there Table 32 only

18

strictly holds with a single explanatory variable included in the regression but we often ignore the presence of other independent variables and use this table as a rough guide (Or we can use the results of Problem 310 for a more precise analysis) If less able workers are more likely to receive training then train and u are negatively correlated If we ignore the presence of educ and exper or at least assume that train and u are negatively correlated after netting out educ and exper then we can use Table 32 the OLS estimator of β1 (with ability in the error term) has a downward bias Because we think β1 ge 0 we are less likely to conclude that the training program was effective Intuitively this makes sense if those chosen for training had not received training they would have lowers wages on average than the control group

Q3 Wooldridge 79 bull (i) Plugging in u = 0 and d = 1 gives f1(z) = (β0+δ0) + (β1+δ1)z bull (ii) Setting f0(z) = f1(z) gives β0 + β1z = (β0+δ0) + (β1+δ1)z Therefore provided δ1 ne 0

we have z = minusδ0δ1 Clearly z is positive if and only if δ0δ1 is negative which means δ0 and δ1 must have opposite signs

bull (iii) Using part (ii) we have totcoll = 357030 = 119 years (totcoll= total college years) bull (iv) The estimated years of college where women catch up to men is much too high to be

practically relevant While the estimated coefficient on femalesdottotcoll shows that the gap is reduced at higher levels of college it is never closed ndash not even close In fact at four years of college the difference in predicted log wage is still minus357+030(4) = 237 or about 211 less for women

Q4 Wooldridge C713 (apple_c7_13do) bull (i) 412660 asymp 624 bull (ii) The OLS estimates of the LPM are

119890119888119900119899119887119906119910 = 424 minus 803 ecoprc + 719 regprc + 00055 faminc + 024 hhsize (165) (109) (132) (00053) (013) + 025 educ minus 00050 age (008) (00125) n = 660 R2 = 110 If ecoprc increases by say 10 cents (10) then the probability of buying eco-labeled apples falls by about 080 If regprc increases by 10 cents the probability of buying eco-labeled apples increases by about 072 (Of course we are assuming that the probabilities are not close to the boundaries of zero and one respectively)

bull (iii) The F test with 4 and 653 df is 443 with p-value = 0015 Thus based on the usual F test the four non-price variables are jointly very significant Of the four variables educ appears to have the most important effect For example a difference of four years of education implies an increase of 025(4) = 10 in the estimated probability of buying eco-labeled apples This suggests that more highly educated people are more open to buying produce that is environmentally friendly which is perhaps expected Household size (hhsize) also has an effect Comparing a couple with two children to one that has no children ndash other factors equal ndash the couple with two children has a 048 higher probability of buying eco-labeled apples

bull (iv) The model with log(faminc) fits the data slightly better the R-squared increases to about 112 (We would not expect a large increase in R-squared from a simple change in the functional form) The coefficient on log(faminc) is about 045 (t = 155) If log(faminc) increases by 10 which means roughly a 10 increase in faminc then P(ecobuy = 1) is estimated to increase by about 0045 a pretty small effect

19

bull (v) The fitted probabilities range from about 185 to 1051 so none are negative There are two fitted probabilities above 1 which is not a source of concern with 660 observations

bull (vi) Using the standard prediction rule ndash predict one when 119890119888119900119899119887119906119910 ge 05 and zero otherwise ndash gives the fraction correctly predicted for ecobuy = 0 as 102248 asymp 411 so about 411 For ecobuy = 1 the fraction correctly predicted is 340412 asymp 825 or 825 With the usual prediction rule the model does a much better job predicting the decision to buy eco-labeled apples (The overall percent correctly predicted is about 67)

20

Week 9 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is heteroskedasticity in a regression model

When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i) we say that heteroskedasticity is present

bull In the presence of heteroskedasticity are the t-stat and F-stat from the usual OLS still valid Why Are there any other problems with the OLS under heteroskedasticity The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators

bull What are the heteroskedasticity-robust standard errors How do you use them in STATA These are the corrected standard errors that take into account the possible presence of heteroskedasticity The t-stat and F-stat computed using the heteroskedasticity-robust standard errors are valid test statistics In STATA these are easily obtained by using the option ldquorobustrdquo with the ldquoregressrdquo command eg regress lwage educ robust

bull How do you detect if there is heteroskedasticity The Breusch-Pagan test or the White test can be used to detect the presence of heteroskedasticity See Section 83 for details

bull If heteroskedasticity is present in a known form how would you estimate the model In this case the WLS estimators should be used which is more efficient than the OLS estimators See Section 84 for details

bull If heteroskedasticity is present in an unknown form how would you estimate the model If there is strong evidence for heteroskedasticity the FGLS estimators should be used which is based on an exponential functional form See Section 84 for details

bull What are the steps in the FGLS estimation You should summarise from Section 84

bull How would you handle the heteroskedasticity of the LPM The heteroskedasticity functional form is known for the LPM Hence the WLS can be used in principle However because the LPM can produce predicted probabilities that are outside the interval (01) the known functional form p(x)[1-p(x)] may not be useful for the WLS In that case it may be necessary to use FGLS for the LPM

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 81 bull Parts (ii) and (iii) The homoskedasticity assumption played no role in Chapter 5 in showing

that OLS is consistent But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid even in large samples As heteroskedasticity is a violation of the Gauss-Markov assumptions OLS is no longer BLUE

Q2 Wooldridge 82 bull With Var(u|incpriceeducfemale) = σ2inc2 h(x) = inc2 where h(x) is the heteroskedasticity

function defined in equation (821) Therefore radich(x) = inc and so the transformed equation is obtained by dividing the original equation by inc beerinc = β0(1inc) + β1 + β2priceinc + β3educinc + β4femaleinc + uinc

21

Notice that β1 which is the slope on inc in the original model is now a constant in the transformed equation This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation

Q3 Wooldridge 83 bull False The unbiasedness of WLS and OLS hinges crucially on Assumption MLR4 and as we

know from Chapter 4 this assumption is often violated when an important variable is omitted When MLR4 does not hold both WLS and OLS are biased Without specific information on how the omitted variable is correlated with the included explanatory variables it is not possible to determine which estimator has a small bias It is possible that WLS would have more bias than OLS or less bias Because we cannot know we should not claim to use WLS in order to solve ldquobiasesrdquo associated with OLS

Q4 Wooldridge 85 bull (i) No For each coefficient the usual standard errors and the heteroskedasticity-robust

ones are practically very similar bull (ii) The effect is minus029(4) = minus116 so the probability of smoking falls by about 116 bull (iii) As usual we compute the turning point in the quadratic 020[2(00026)] asymp 3846 so

about 38 and one-half years bull (iv) Holding other factors in the equation fixed a person in a state with restaurant smoking

restrictions has a 101 lower chance of smoking This is similar to the effect of having four more years of education

bull (v) We just plug the values of the independent variables into the OLS regression line 119904119898119900119896119890119904 = 656 minus 069log(6744)+012log(6500) minus029(16)+020(77) minus00026(77) asymp 0052 Thus the estimated probability of smoking for this person is close to zero (In fact this person is not a smoker so the equation predicts well for this particular observation)

Q5 Wooldridge C810 (401ksubs_c8_10do) bull (i) In the following equation estimated by OLS the usual standard errors are in (sdot) and the

heteroskedasticity-robust standard errors are in [sdot] 119890401119896 = minus506 + 0124 inc minus 000062 inc2 + 0265 age minus 00031 age2 minus 0035 male (081) (0006) (000005) (0039) (00005) (0121) [079] [0006] [000005] [0038] [00004] [0121] n = 9275 R2 = 094 There are no important differences if anything the robust standard errors are smaller

bull (ii) This is a general claim Since Var(y|x) = p(x)[1-p(x)] we can write E(u2|x) = p(x)-p(x)2 Written in error form u2 = p(x) - p(x)2 + v In other words we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v with the restrictions δ0 = 0 δ1 = 1 and δ2 = -1 Remember that for the LPM the fitted values 119910 are estimates of p(x) So when we run the regression 1199062 on 119910 and 1199102 (including an intercept) the intercept estimates should be close to zero the coefficient on 119910 should be close to one and the coefficient on 1199102 should be close to ndash1

bull (iii) The White LM statistic and F statistic about 5819 and 31032 respectively both of which are very significant The coefficient on 119890401119896 is about 1010 the coefficient on 119890401119896 2 about minus970 and the intercept is about -009 These estimates are quite close to what we expect to find from the theory in part (ii)

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 3: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

3

Week 2 Tutorial Exercises

Problem Set Q1 Wooldridge 11 bull (i) Ideally we could randomly assign students to classes of different sizes That is each

student is assigned a different class size without regard to any student characteristics such as ability and family background For reasons we will see in Chapter 2 we would like substantial variation in class sizes (subject of course to ethical considerations and resource constraints)

bull (ii) A negative correlation means that larger class size is associated with lower performance We might find a negative correlation because larger class size actually hurts performance However with observational data there are other reasons we might find a negative relationship For example children from more affluent families might be more likely to attend schools with smaller class sizes and affluent children generally score better on standardized tests Another possibility is that within a school a principal might assign the better students to smaller classes Or some parents might insist their children are in the smaller classes and these same parents tend to be more involved in their childrenrsquos education

bull (iii) Given the potential for confounding factors ndash some of which are listed in (ii) ndash finding a negative correlation would not be strong evidence that smaller class sizes actually lead to better performance Some way of controlling for the confounding factors is needed and this is the subject of multiple regression analysis

Q2 Wooldridge 12 bull (i) Here is one way to pose the question If two firms say A and B are identical in all

respects except that firm A supplies job training one hour per worker more than firm B by how much would firm Arsquos output differ from firm Brsquos

bull (ii) Firms are likely to choose job training depending on the characteristics of workers Some observed characteristics are years of schooling years in the workforce and experience in a particular job Firms might even discriminate based on age gender or race Perhaps firms choose to offer training to more or less able workers where ldquoabilityrdquo might be difficult to quantify but where a manager has some idea about the relative abilities of different employees Moreover different kinds of workers might be attracted to firms that offer more job training on average and this might not be evident to employers

bull (iii) The amount of capital and technology available to workers would also affect output So two firms with exactly the same kinds of employees would generally have different outputs if they use different amounts of capital or technology The quality of managers would also have an effect

bull (iv) No unless the amount of training is randomly assigned The many factors listed in parts (ii) and (iii) can contribute to finding a positive correlation between output and training even if job training does not improve worker productivity

Q3 Wooldridge C13 (meap01_ch01do) bull (i) The largest is 100 the smallest is 0 bull (ii) 38 out of 1823 or about 21 percent of the sample bull (iii) 17 bull (iv) The average of math4 is about 719 and the average of read4 is about 601 So at least in

2001 the reading test was harder to pass bull (v) The sample correlation between math4 and read4 is about 0843 which is a very high

degree of (linear) association Not surprisingly schools that have high pass rates on one test have a strong tendency to have high pass rates on the other test

4

bull (vi) The average of exppp is about $519487 The standard deviation is $109189 which shows rather wide variation in spending per pupil [The minimum is $120688 and the maximum is $1195764]

bull (vii) 87 Note that by Taylor expansion log(x + h) ndash log(x) = log(1 + hx) asymp hx for positive x and small hx (in absolute value)

Q4 Wooldridge C14 (jtrain2_ch01do) bull (i) 185445 asymp 416 is the fraction of men receiving job training or about 416 bull (ii) For men receiving job training the average of re78 is about 635 or $6350 For men not

receiving job training the average of re78 is about 455 or $4550 The difference is $1800 which is very large On average the men receiving the job training had earnings about 40 higher than those not receiving training

bull (iii) About 243 of the men who received training were unemployed in 1978 the figure is 354 for men not receiving training This too is a big difference

bull (iv) The differences in earnings and unemployment rates suggest the training program had strong positive effects Our conclusions about economic significance would be stronger if we could also establish statistical significance (which is done in Computer Exercise C910 in Chapter 9)

STATA Hints bull Some of STATA do-files for solving the Problem Set are posted in the course website It is

important to understand the commands in the do-files so that you will be able write your own do-files for your assignments and the course project

5

Week 3 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull The minimum requirement for OLS to be carried out for the data set (xi yi) i=1hellipn with

the sample size n gt 2 is that the sample variance of x is positive In what circumstances is the sample variance of x zero When all observations on x have the same value there is no variation in x

bull The OLS estimation of the simple regression model has the following properties a) the sum of the residuals is zero b) the sample covariance of the residuals and x is zero

Why How would you relate them to the ldquoleast squaresrdquo principle These are derived from the first-order conditions for minimising the sum of squared residuals (SSR)

bull Convince yourself that the point )( yx the sample means of x and y is on the sample regression function (SRF) which is a straight line This can be derived from the first of the first-order conditions for minimising SSR

bull How do you know that SST = SSE + SSR is true The dependent variable values can be expressed as the fitted value plus the residual After taking the average away from both side of the equation we find (119910119894 minus 119910) = (119910119894 minus 119910) + 119906119894 The required ldquoSST = SSE + SSRrdquo is obtained by squaring and summing both sides of the above equation noting (119910119894 minus 119910) = 1(119909119894 minus ) and using the second of the first-order conditions of minimising SSR

bull Which of the following models is (are) nonlinear model(s) a) sales = β0 [1 + exp(-β1 ad_expenditure)] + u b) sales = β0 + β1 log(ad_expenditure) + u c) sales = β0 + β1 exp(ad_expenditure) + u d) sales = exp(β0 + β1 ad_expenditure + u)

The model in a) is nonlinear bull Can you follow the proofs of Theorems 21-23

You should be able to follow the proofs If not at this point try again

Problem Set Q1 Wooldridge 24 bull (i) When cigs = 0 predicted birth weight is 11977 ounces When cigs = 20 992256bwght =

10949 This is about an 86 drop bull (ii) Not necessarily There are many other factors that can affect birth weight

particularly overall health of the mother and quality of prenatal care These could be correlated with cigarette smoking during birth Also something such as caffeine consumption can affect birth weight and might also be correlated with cigarette smoking

bull (iii) If we want a predicted bwght of 125 then cigs = (125 ndash 11977)( ndash524) ndash1018 or about ndash10 cigarettes This is nonsense of course and it shows what happens when we are trying to predict something as complicated as birth weight with only a single explanatory variable The largest predicted birth weight is necessarily 11977 Yet almost 700 of the births in the sample had a birth weight higher than 11977

bull (iv) 1176 out of 1388 women did not smoke while pregnant or about 847 Because we are using only cigs to explain birth weight we have only one predicted birth weight

6

at cigs = 0 The predicted birth weight is necessarily roughly in the middle of the observed birth weights at cigs = 0 and so we will under predict high birth rates

Q2 Wooldridge 27 bull (i) When we condition on inc in computing an expectation inc becomes a constant So

E(u|inc) = E(radic119894119899119888sdote|inc) = radic119894119899119888sdotE(e|inc) = radic119894119899119888sdot0 because E(e|inc) = E(e) = 0 bull (ii) Again when we condition on inc in computing a variance inc becomes a constant So

Var(u|inc) = Var(radic119894119899119888sdote|inc) = (radic119894119899119888)2Var(e|inc) = 1205901205762inc because Var(e|inc) = 1205901205762 bull (iii) Families with low incomes do not have much discretion about spending typically a

low-income family must spend on food clothing housing and other necessities Higher income people have more discretion and some might choose more consumption while others more saving This discretion suggests wider variability in saving among higher income families

Q3 Wooldridge C26 (meap93_ch02do) bull (i) It seems plausible that another dollar of spending has a larger effect for low-spending

schools than for high-spending schools At low-spending schools more money can go toward purchasing more books computers and for hiring better qualified teachers At high levels of spending we would expect little if any effect because the high-spending schools already have high-quality teachers nice facilities plenty of books and so on

bull (ii) If we take changes as usual we obtain ∆119898119886119905ℎ10 = 1205731∆ log(119890119909119901119890119899119889) asymp 1205731

100 (∆119890119909119901119890119899119889)

just as in the second row of Table 23 So if Δexpend = 10 Δmath10 = β110 bull (iii) The regression results are

119898119886119905ℎ10 = minus6934 + 1116 log(119890119909119901119890119899119889) 119899 = 408 1198772 = 0297 bull (iv) If expend increases by 10 percent 119898119886119905ℎ10 increases by about 11 percentage points

This is not a huge effect but it is not trivial for low-spending schools where a 10 percent increase in spending might be a fairly small dollar amount

bull (v) In this data set the largest value of math10 is 667 which is not especially close to 100 In fact the largest fitted values is only about 302

Q4 Wooldridge C27 (charity_ch02do) bull (i) The average gift is about 744 Dutch guilders Out of 4268 respondents 2561 did not

give a gift or about 60 percent bull (ii) The average mailings per year are about 205 The minimum value is 25 (which

maximum value is 35 bull (iii) The estimated equation is

119892120484119891119905 = 201 + 265 mailsyear n = 4268 1198772 = 0138 bull (iv) The slope coefficient from part (iii) means that each mailing per year is associated with

ndash perhaps even ldquocausesrdquo ndash an estimated 265 additional guilders on average Therefore if each mailing costs one guilder the expected profit from each mailing is estimated to be 165 guilders This is only the average however Some mailings generate no contributions or a contribution less than the mailing cost other mailings generated much more than the mailing cost

bull (v) Because the smallest mailsyear in the sample is 25 the smallest predicted value of gifts is 201 + 265(25) asymp 267 Even if we look at the overall population where some people have received no mailings the smallest predicted value is about two So with this estimated equation we never predict zero charitable gifts

7

Week 4 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What do we mean when we say ldquoregress wage on educ and exprrdquo

Use OLS to estimate the multiple linear regression model wage = β0 + β1educ + β2expr + u bull Why and under what circumstance do we need to ldquocontrol forrdquo expr in the regression model

in order to quantify the effect of educ on wage We need to control for expr in the model when the partial or ceteris paribus effect of educ on wage is the parameter of interest and expr and educ are correlated

bull What is the bias of an estimator bias =E( 119895) ndash βj

bull What is the ldquoomitted variable biasrdquo It is the bias resulted from omitting relevant variables that are correlated with the explanatory variables in a regression model

bull What is the consequence of adding an irrelevant variable to a regression model Including an irrelevant variable will generally increase the variance of the OLS estimators

bull What is the requirement of the ZCM assumption in your own words You must do this by yourself

bull Why assuming E(u) = 0 is not restrictive when an intercept is included in the regression model When the mean value of the disturbance is not zero you can always define the new disturbance as the original disturbance minus the mean value and define the new intercept as the original intercept plus the mean value

bull In terms of notation why do we need two subscripts for independent variables In our notation the first subscript is for indexing observations (1st obs 2nd obs etc) and the second subscript is for indexing variables (1st variable 2nd variable etc)

bull Using OLS to estimate a multiple regression model with k independent variables and an intercept how many first-order conditions are there k + 1

bull How do you know that the OLS estimators are linear combinations of the observations on the dependent variable By the first-order conditions we see that the OLS estimators are the solution to k+1 linear equations with k+1 unknowns The ldquoright-hand-sidesrdquo of the equations are linear functions of observations on the dependent variable (say y) Hence the solution is linear combinations of the y-observations

bull What is Gauss-Markov theorem Read the textbook bull Try to explain why R2 never decreases (it is likely to increase) when additional explanatory

variables are added to the regression model First the smaller is the SSR the greater is the R-squared The OLS chooses parameter estimates to minimise the SSR With extra explanatory variables the R-squared does not decrease because the SSR does not increase which is true because the OLS has a larger parameter space to choose estimates (more possibilities to make SSR small)

bull What is an endogenous explanatory variable Exogenous explanatory variable See the top of page 88

bull What is multicollinearity and its likely effect on the OLS estimators Read pages 95-99

8

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 32 bull (i) Yes Because of budget constraints it makes sense that the more siblings there are in a

family the less education any one child in the family has To find the increase in the number of siblings that reduces predicted education by one year we solve 1 = 094(Δsibs) so Δsibs = 1094 asymp 106

bull (ii) Holding sibs and feduc fixed one more year of motherrsquos education implies 131 years more of predicted education So if a mother has four more years of education her son is predicted to have about a half a year (524) more years of education

bull (iii) Since the number of siblings is the same but meduc and feduc are both different the coefficients on meduc and feduc both need to be accounted for The predicted difference in education between B and A is 131(4) + 210(4) = 1364

Q2 Wooldridge 310 bull (i) Because x1 is highly correlated with x2 and x3 and these latter variables have large partial

effects on y the simple and multiple regression coefficients on x1 can differ by large amounts We have not done this case explicitly but given equation (346) and the discussion with a single omitted variable the intuition is pretty straightforward

bull (ii) Here we would expect 1205731 and 1 to be similar (subject of course to what we mean by ldquoalmost uncorrelatedrdquo) The amount of correlation between x2 and x3 does not directly effect the multiple regression estimate on x1 if is essentially uncorrelated with x2 and x3

bull (iii) In this case we are (unnecessarily) introducing multicollinearity into the regression x2 and x3 have small partial effects on y and yet x2 and x3 are highly correlated withx1 Adding x2 and x3 likely increases the standard error of the coefficient on x1 substantially so se(1) is likely to be much larger than se(1205731)

bull (iv) In this case adding x2 and x3 will decrease the residual variance without causing much collinearity (because x1 is almost uncorrelated with x2 and x3 ) so we should see se(1) smaller than se(1205731) The amount of correlation between x and does not directly affect se(1)

Q3 Wooldridge C32 (hprice1_ch03do) bull (i) The estimated equation is

119901119903120484119888119890 = minus1932 + 128 119904119902119903119891119905 + 1520 119887119889119903119898119904 119899 = 88 1198772 = 632 bull (ii) Holding square footage constant (the price increase) = 1520 (increase in bedroom)

Hence the price increases by 1520 which means $15200 bull (iii) Now (the price increase) = 128 (increase in square feet) + 1520 (increase in bedroom)

= 128 (140) + 1520 (1) = 3312 That is $33120 Because the house size is increasing the effect is larger than that in (ii)

bull (iv) About 632 bull (v) The predicted price is -1932+128(2438)+1520(4) = 353544 or $353544 bull (vi) The residual = 300000 ndash 353544 = - 53544 The buyer under-paid judged by the

predicted price But of course there are many other features of a house (some that we cannot even measure) that affect price We have not controlled for those

Q4 Wooldridge C36 (wage2_ch03_c3_6do) bull (i) The slope coefficient from regressing IQ on educ is 353383 bull (ii) The slope coefficient from log(wage) on educ is 05984 bull (iii) The slope coefficient from log(wage) on educ and IQ are 03912 and 00586 respectively bull (iv) The computed 211 βδβ ˆ~ˆ + = 03912 + 353383(00586) asymp 05984

9

Week 5 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the CLM assumptions

These are MLR1 MLR2 MLR3 and MLR6 MLR6 is a very strong assumption that implies both MLR4 and MLR5

bull What is the sampling distribution of the OLS estimators under the CLM assumptions The OLS estimators under the CLM assumptions follow the normal distribution

bull What are the standard errors of the OLS estimators The standard error is the square root of the estimated variance of the estimator

bull What is the null hypothesis about a parameter In this context the null hypothesis is a statement that assigns a known value (or hypothesised value) to the parameter of interest Usually the null is a maintained proposition (from economic theories or experience) that will only be rejected on very strong evidence

bull What is a one-tailed (two-tailed) alternative hypothesis This should be clear once you finish your reading

bull In testing hypotheses what is a Type 1 (Type 2) error What is the level of significance Type 1 error is the rejection of a true null Type 2 error is the non-rejection of a false null

bull The decision rule we use can be stated as ldquoreject the null if the t-statistic exceeds the critical valuerdquo How is the critical value determined The critical value is determined by the level of significance (or level of confidence in the case of confidence intervals) and the distribution of the test statistic For example for t-stat we use the t-distribution when the sample size is small and the normal distribution when the sample size is large (df gt 120)

bull Justify the statement ldquoGiven the observed test statistic the p-value is the smallest significant level at which the null hypothesis would be rejectedrdquo If you used the observed t-stat as your critical value you would just reject the null (because the t-stat is equal to this value) This ldquocritical valuerdquo (= the t-stat value) implies a level of significance say p which is exactly the p-value For any significance level smaller than p the corresponding critical value is more extreme than the t-stat value and does not lead to the rejection of the null Hence p is the smallest significance level at which the null would be rejected

bull What is the 90 confidence interval for a parameter It is an interval [L U] defined by two sample statistics the lower bound L and the upper bound U The probability that [L U] covers the parameter is 09

bull In constructing a confidence interval for a parameter what is the level of confidence It is the probability that the CI covers the parameter

bull When the level of confidence increases how would the width of the confidence interval change (holding other things fixed) The width will increase

bull Try to convince yourself that the event ldquothe 90 confidence interval covers a hypothesised value of the parameterrdquo is the same as the event ldquothe null of the parameter being the hypothesised value cannot be rejected in favour of the two-tailed alternative at the 10 level of significancerdquo It becomes apparent if you start with P(|t-stat| gt c) = 09 and disentangle the expression for the t-stat

10

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 41 bull (i) and (iii) generally cause the t statistics not to have a t distribution under H0

Homoskedasticity is one of the CLM assumptions An important omitted variable violates Assumption MLR4 (hence MLR6) in general The CLM assumptions contain no mention of the sample correlations among independent variables except to rule out the case where the correlation is one (MLR3)

Q2 Wooldridge 42 bull (i) H0β3 = 0 H1β3 gt 0 bull (ii) The proportionate effect on is 00024(50) = 012 To obtain the percentage effect we

multiply this by 100 12 Therefore a 50 point ceteris paribus increase in ros is predicted to increase salary by only 12 Practically speaking this is a very small effect for such a large change in ros

bull (iii) The 10 critical value for a one-tailed test using df = infin is obtained from Table G2 as 1282 The t statistic on ros is 0002400054 asymp 44 which is well below the critical value Therefore we fail to reject H0 at the 10 significance level

bull (iv) Based on this sample the estimated ros coefficient appears to be different from zero only because of sampling variation On the other hand including ros may not be causing any harm it depends on how correlated it is with the other independent variables (although these are very significant even with ros in the equation)

Q3 Wooldridge 45 bull (i) 412 plusmn 196(094) or about [228 596] bull (ii) No because the value 4 is well inside the 95 CI bull (iii) Yes because 1 is well outside the 95 CI

Q4 Wooldridge C48 (401ksubs_ch04do) bull (i) There are 2017 single people in the sample of 9275 bull (ii) The estimated equation is

119899119890119905119905119891119886 = minus4304 + 799 inc + 843 age ( 408) (060) (092) n = 2017 R2 = 119 The coefficient on inc indicates that one more dollar in income (holding age fixed) is reflected in about 80 more cents in predicted nettfa no surprise there The coefficient on age means that holding income fixed if a person gets another year older hisher nettfa is predicted to increase by about $843 (Remember nettfa is in thousands of dollars) Again this is not surprising

bull (iii) The intercept is not very interesting as it gives the predicted nettfa for inc = 0 and age = 0 Clearly there is no one with even close to these values in the relevant population

bull (iv) The t statistic is (843 minus 1)092 asymp minus171 Against the one-sided alternative H1 β2 lt 1 the p-value is about 044 Therefore we can reject H0 β2 = 1 at the 5 significance level (against the one-sided alternative)

bull (v) The slope coefficient on inc in the simple regression is about 821 which is not very different from the 799 obtained in part (ii) As it turns out the correlation between inc and age in the sample of single people is only about 039 which helps explain why the simple and multiple regression estimates are not very different see the text also

11

Week 6 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull How would you test hypotheses about a single linear combination of parameters

We may re-parameterise the regression model to isolate the ldquosingle linear combinationrdquo Then the OLS on the re-parameterised model will provide the estimate of the ldquosingle linear combinationrdquo and the related standard error See the example in the textbook and lecture slides

bull What are exclusion restrictions for a regression model Exclusion restrictions are the null hypothesis that a group of x-variables have zero coefficients in the regression model

bull What are restricted and unrestricted models When the null hypothesis is a set of restrictions on the parameters the regression under the null is called the restricted model while the regression under the alternative (which simply states that the null is false) is called the unrestricted model

bull How do you compute the F-statistic given that you have SSRs Clearly the general F-stat is based on the relative difference between the SSRs under the restricted model and unrestricted model F = [(SSRr ndash SSRur)q][SSRur(n-k-1)] where you should be able to explain the meanings of various symbols

bull What are general linear restrictions on parameters These are linear equations on the parameters For example β1 ndash 2β2 + 3β3 = 0

bull What is the test for the overall significance of a regression This is the F-test for the null hypothesis that all the coefficients of x-variables are zero

bull How would you report your regression results See the guidelines in Section 46 of the textbook

bull Why would you care about the asymptotic properties of the OLS estimators When MLR6 does not hold the finite-sample distribution of the OLS estimators is not available For reasonably large samples (large n) we use the asymptotic distribution of the OLS estimators which is known from studying the asymptotic properties to approximate the finite-sample distribution of the OLS estimators which is unknown

bull Comparing the inference procedures in Chapter 5 with those in Chapter 4 can you list the similarities and differences It is beneficial to make a list by yourself

bull Under MLR1-MLR5 the OLS estimators are consistent asymptotically normal and asymptotically efficient Try to explain these properties in your own words This is really a test on your understanding of these notions in econometrics

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 46 bull (i) With df = n ndash 2 = 86 we obtain the 5 critical value from Table G2 with df = 90 Because

each test is two-tailed the critical value is 1987 The t statistic for H0β0 = 0 is about -89 which is much less than 1987 in absolute value Therefore we fail to reject H0β0 = 0 The t statistic for H0β1 = 1 is (976 ndash 1)049 asymp -49 which is even less significant (Remember we reject H0 in favor of H1 in this case only if |t| gt 1987)

bull (ii) We use the SSR form of the F statistic We are testing q = 2 restrictions and the df in the unrestricted model is 86 We are given SSRr = 20944899 and SSRur = 16564451 Therefore F = [(20944899 minus 16564451)2][1656445186] asymp 1137

12

which is a strong rejection of H0 from Table G3c the 1 critical value with 2 and 90 df is 485

bull (iii) We use the R-squared form of the F statistic We are testing q = 3 restrictions and there are 88 ndash 5 = 83 df in the unrestricted model The F statistic is F = [(829 ndash 820)3][(1 ndash 829)83] asymp 146 The 10 critical value (again using 90 denominator df in Table G3a) is 215 so we fail to reject H0 at even the 10 level In fact the p-value is about 23

bull (iv) If heteroskedasticity were present Assumption MLR5 would be violated and the F statistic would not have an F distribution under the null hypothesis Therefore comparing the F statistic against the usual critical values or obtaining the p-value from the F distribution would not be especially meaningful

Q2 Wooldridge 48 bull (i) We use Property VAR3 from Appendix B

Var(1minus 32) = Var (1) + 9 Var (2) ndash 6 Cov (1 2) bull (ii) t = (1 minus 32 minus 1)se(1minus 32) so we need the standard error of 1minus 32 bull (iii) Because θ = β1 ndash 3β2 we can write β1 = θ + 3β2 Plugging this into the population model

gives y = β0 + (θ + 3β2)x1 + β2 x2 + β3 x3 + u = β0 + θ x1 + β2 (3 x1 + x2) + β3 x3 + u This last equation is what we would estimate by regressing y on x1 3x1 + x2 and x3 The coefficient and standard error on x1 are what we want

Q3 Wooldridge 410 bull (i) We need to compute the F statistic for the overall significance of the regression with n =

142 and k = 4 F = [0395(1 ndash 0395)](1374) asymp 141 The 5 critical value with 4 and 120 df is 245 which is well above the value of F Therefore we fail to reject H0 β1 = β2 = β3 = β4 = 0 at the 10 level No explanatory variable is individually significant at the 5 level The largest absolute t statistic is on dkr tdkr asymp 160 which is not significant at the 5 level against a two-sided alternative

bull (ii) The F statistic (with the same df) is now [0330(1 ndash 0330)](1374) asymp 117 which is even lower than in part (i) None of the t statistics is significant at a reasonable level

bull (iii) We probably should not use the logs as the logarithm is not defined for firms that have zero for dkr or eps Therefore we would lose some firms in the regression

bull (iv) It seems very weak There are no significant t statistics at the 5 level (against a two-sided alternative) and the F statistics are insignificant in both cases Plus less than 4 of the variation in return is explained by the independent variables

Q4 Wooldridge C49 (discrim_c4_9do) bull (i) The results from the OLS regression with standard errors in parentheses are

log (119901119904119900119889119886) = minus146 + 073 prpblck + 137 log(income) + 380 prppov (029) (031) (027) (133) n = 401 R2 = 087

bull The p-value for testing H0 β1= 0 against the two-sided alternative is about 018 so that we reject H0 at the 5 level but not at the 1 level

bull (ii) The correlation is about minus84 indicating a strong degree of multicollinearity Yet each coefficient is very statistically significant the t statistic for log (119894119899119888119900119898) is about 51 and that for prppov is about 286 (two-sided p-value = 004)

bull (iii) The OLS regression results when log(hseval) is added are log (119901119904119900119889119886) = minus84 + 098 prpblck minus 053 log(income) + 052 prppov + 121 log(hseval) (29) (029) (038) (134) (018)

13

n = 401 R2 = 184 The coefficient on log(hseval) is an elasticity a one percent increase in housing value holding the other variables fixed increases the predicted price by about 12 percent The two-sided p-value is zero to three decimal places

bull (iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the 15 significance level against a two-sided alternative for log(income) and prppov is does not have a t statistic even close to one in absolute value) Nevertheless they are jointly significant at the 5 level because the outcome of the F2396 statistic is about 352 with p-value = 030 All of the control variables ndash log(income) prppov and log(hseval) ndash are highly correlated so it is not surprising that some are individually insignificant

bull (v) Because the regression in (iii) contains the most controls log(hseval) is individually significant and log(income) and prppov are jointly significant (iii) seems the most reliable It holds fixed three measure of income and affluence Therefore a reasonable estimate is that if prpblck increases by 10 psoda is estimated to increase by 1 other factors held fixed

Q5 Wooldridge 52 bull This is about the inconsistency of the simple regression of pctstck on funds A higher

tolerance of risk means more willingness to invest in the stock market so β2 gt 0 By assumption funds and risktol are positively correlated Now we use equation (55) where δ1 gt 0 plim(1205731) = β1 + β2δ1 gt β1 so 1205731 has a positive inconsistency (asymptotic bias) This makes sense if we omit risktol from the regression and it is positively correlated with funds some of the estimated effect of funds is actually due to the effect of risktol

Q6 Wooldridge C51 (wage1_c5_1do) bull (i) The estimated equation is

119908119886119892119890 = minus287 + 599 educ + 022 exper + 169 tenure (073) (051) (012) (022) n = 526 R2 = 306 120590 = 3085

bull Below is a histogram of the 526 residual 119906119894 i = 1 2 526 The histogram uses 27 bins which is suggested by the formula in the Stata manual for 526 observations For comparison the normal distribution that provides the best fit to the histogram is also plotted

bull

14

bull (ii) With log(wage) as the dependent variable the estimated equation is log (119908119886119892119890) = 284 + 092 educ + 0041 exper + 022 tenure (104) (007) (0017) (003) n = 526 R2 = 316 120590 = 441 The histogram for the residuals from this equation with the best-fitting normal distribution overlaid is given below

bull (iii) The residuals from the log(wage) regression appear to be more normally distributed

Certainly the histogram in part (ii) fits under its comparable normal density better than in part (i) and the histogram for the wage residuals is notably skewed to the left In the wage regression there are some very large residuals (roughly equal to 15) that lie almost five estimated standard deviations (120590 = 3085) from the mean of the residuals which is identically zero of course Residuals far from zero does not appear to be nearly as much of a problem in the log(wage) regression

bull The skewness and kurtosis of the residuals from the OLS regression may be used to test the null that u is normally distributed (skewness = 0 and kurtosis = 3 under the null) The null distribution of the Jarque-Bera test statistic (which takes into account both skewness and kurtosis) is approximately the chi-squared with 2 degrees of freedom Hence at the 5 level we reject the normality of u whenever the JB statistic is greater than 599 In STATA this test is carried out by the command ldquosktestrdquo With the data in WAGE1RAW JB = sdot (extremely large) for the wage model and JB = 1059 for the log(wage) model While the normality is rejected for both models we do see that the JB statistic is much smaller for the log(wage) model [also compare its p-values of skewness and excess kurtosis tests against those from the wage model] Hence the normality appears to be a reasonable approximation to the distribution of the error term in the log(wage) model

15

Week 7 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the advantages of using the log of a variable in regression Find the ldquorules of

thumbrdquo for taking logs See page 191 of Wooldridge (p198-199 for the 3rd edition)

bull Be careful when you interpret the coefficients of explanatory variables in a model where some variables are in logarithm Do you remember Table 23 Consult Table 23

bull How do you compute the change in y caused by Δx when the model is built for log(y) A precise computation is given by Equation (68) of Wooldridge

bull Why do we need the ldquointeractionrdquo terms in regression models An interaction term is needed if the partial effect of an explanatory variable is linearly related to another explanatory variable See (617) for example

bull What is the adjusted R-squared What is the difference between it and the R-squared The primary attractiveness of the adjusted R-squared is that it imposes penalty for adding additional explanatory variables to a model The R-squared can never fall when a new explanatory variable is added However the adjusted R-squared will fall if the t-ratio on the new variable is less than one in absolute value

bull How do you construct interval prediction for given x-values This is nicely summarised in Equations (627)-(631) and the surrounding text

bull How do you predict y for given x-values when the model is built for log(y) Check the list on page 212 (p220 for the 3rd edition)

bull What is involved in ldquoresidual analysisrdquo See pages 209-210 (p217-218 for the 3rd edition)

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 64 bull (i) Holding all other factors fixed we have

Δlog(wage) = β1 Δeduc + β2 Δeducsdot pareduc = (β1 + β2 pareduc) Δeduc Dividing both sides by Δeduc gives the result The sign of β2 is not obvious although β2 gt 0 if we think a child gets more out of another year of education the more highly educated are the childrsquos parents

bull (ii) We use the values pareduc = 32 and pareduc = 24 to interpret the coefficient on educsdotpareduc The difference in the estimated return to education is 00078(32 ndash 24) = 0062 or about 62 percentage points (Percentage points are changes in percentages)

bull (iii) When we add pareduc by itself the coefficient on the interaction term is negative The t statistic on educsdotpareduc is about ndash133 which is not significant at the 10 level against a two-sided alternative Note that the coefficient on pareduc is significant at the 5 level against a two-sided alternative This provides a good example of how omitting a level effect (pareduc in this case) can lead to biased estimation of the interaction effect

Q2 Wooldridge 65 bull This would make little sense Performances on math and science exams are measures of

outputs of the educational process and we would like to know how various educational inputs and school characteristics affect math and science scores For example if the staff-to-pupil ratio has an effect on both exam scores why would we want to hold

16

performance on the science test fixed while studying the effects of staff on the math pass rate This would be an example of controlling for too many factors in a regression equation The variable scill could be a dependent variable in an identical regression equation

Q3 Wooldridge C63 (C62 for the 3rd Edition) (wage2_c6_3do) bull (i) Holding exper (and the elements in u) fixed we have

Δ log(wage) = β1Δeduc + β2Δeducsdotexper = (β1 + β2exper)Δeduc or Δ log(wage) Δeduc = (β1 + β2exper) This is the approximate proportionate change in wage given one more year of education

bull (ii) H0 β3 = 0 If we think that education and experience interact positively ndash so that people with more experience are more productive when given another year of education then β3 gt 0 is the appropriate alternative

bull (iii) The estimated equation is log (119908119886119892119890) = 595 + 0440 educ ndash 0215 exper + 00320 educsdotexper (024) (0174) (0200) (00153) n = 935 R2 = 135 adjusted-R2 = 132 The t statistic on the interaction term is about 213 which gives a p-value below 02 against H1 β3 gt 0 Therefore we reject H0 β3 = 0 in favor of H1 β3 gt 0 at the 2 level

bull (iv) We rewrite the equation as log(wage) = β0 + θ1educ + β2exper + β3 educ (exper minus 10) + u and run the regression log(wage) on educ exper and educ(exper ndash 10) We want the coefficient on educ We obtain θ1 asymp 0761 and se(θ1) asymp 0066 The 95 CI for θ1 is about 063 to 089

Q4 Wooldridge C68 (hprice1_c6_8do) bull (i) The estimated equation (where price is in dollars) is

119901119903120484119888119890 = minus217703 + 2068 lotsize + 12278 sqrft + 138525 bdrms (294750) (0642) (1324) (90101) n = 88 R2 = 672 adjusted-R2 = 661 120590 = 59833 The predicted price at lotsize = 10000 sqrft = 2300 and bdrms = 4 is about $336714

bull (ii) The regression is pricei on (lotsizei ndash 10000) (sqrfti ndash 2300) and (bdrmsi ndash 4) We want the intercept estimate and the associated 95 CI from this regression The CI is approximately 3367067 plusmn 14665 or about $322042 to $351372 when rounded to the nearest dollar

bull (iii) We must use equation (636) to obtain the standard error of and then use equation (637) (assuming that price is normally distributed) But from the regression in part (ii) se(1199100) asymp 73745 and 120590 asymp 59833 Therefore se(0) asymp [(73745)2 + (59833) 2]12 asymp 602858 Using 199 as the approximate 975th percentile in the t84 distribution gives the 95 CI for price0 at the given values of the explanatory variables as 3367067 plusmn 199(602858) or rounded to the nearest dollar $216738 to $456675 This is a fairly wide prediction interval But we have not used many factors to explain housing price If we had more we could presumably reduce the error standard deviation and therefore 120590 to obtain a tighter prediction interval

17

Week 8 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is a qualitative factor Give some examples

This should be easy bull How do you use dummy variables to represent qualitative factors

A factor with two levels (or values) is easily represented one dummy variable For a factor with more than two levels (or values) we need more than one dummies For example a factors with 5 values requires 4 dummy variables

bull How do you use dummy variables to represent an ordinal variable An ordinal variable may be represented by dummies in the same way as a multi-levelled factor

bull How do you test for differences in regression functions across different groups We generally use the F test on properly-defined restricted model and unrestricted model Chow test is a special case of the F test where the null hypothesis is no difference across groups

bull What can you achieve by interacting group dummy variables with other regressors The interaction allows the slope coefficients to be different across groups

bull What is ldquoprogram evaluationrdquo It assesses the effectiveness of a program by checking the differences in regression coefficients for the controlled (or base) group and the treatment group

bull What are the interpretations of the PRF and SRF when the dependent variable is binary The PRF is the conditional probability of ldquosuccessrdquo for given explanatory variables The SRF is the predicted conditional probability of ldquosuccessrdquo for given explanatory variables

bull What are the shortcomings of the LPM The major drawback of the LPM is that the predicted probability of success may be outside the interval [01]

bull Does MLR5 hold for the LPM No as indicated in the textbook and lecture the conditional variance of the disturbance is not a constant and depends on explanatory variables var(u|x) = E(y|x)[1minus E(y|x)]

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 74 bull (i) The approximate difference is just the coefficient on utility times 100 or ndash283 The t

statistic is minus283099 asymp minus286 which is statistically significant bull (ii) 100sdot[exp(minus283) ndash 1) asymp minus247 and so the estimate is somewhat smaller in magnitude bull (iii) The proportionate difference is 181 minus 158 = 023 or about 23 One equation that can

be estimated to obtain the standard error of this difference is log(salary) = β0 + β1log(sales) + β2roe + δ1consprod + δ2utility +δ3trans + u where trans is a dummy variable for the transportation industry Now the base group is finance and so the coefficient δ1 directly measures the difference between the consumer products and finance industries and we can use the t statistic on consprod

Q2 Wooldridge 76 bull In Section 33 ndash in particular in the discussion surrounding Table 32 ndash we discussed how to

determine the direction of bias in the OLS estimators when an important variable (ability in this case) has been omitted from the regression As we discussed there Table 32 only

18

strictly holds with a single explanatory variable included in the regression but we often ignore the presence of other independent variables and use this table as a rough guide (Or we can use the results of Problem 310 for a more precise analysis) If less able workers are more likely to receive training then train and u are negatively correlated If we ignore the presence of educ and exper or at least assume that train and u are negatively correlated after netting out educ and exper then we can use Table 32 the OLS estimator of β1 (with ability in the error term) has a downward bias Because we think β1 ge 0 we are less likely to conclude that the training program was effective Intuitively this makes sense if those chosen for training had not received training they would have lowers wages on average than the control group

Q3 Wooldridge 79 bull (i) Plugging in u = 0 and d = 1 gives f1(z) = (β0+δ0) + (β1+δ1)z bull (ii) Setting f0(z) = f1(z) gives β0 + β1z = (β0+δ0) + (β1+δ1)z Therefore provided δ1 ne 0

we have z = minusδ0δ1 Clearly z is positive if and only if δ0δ1 is negative which means δ0 and δ1 must have opposite signs

bull (iii) Using part (ii) we have totcoll = 357030 = 119 years (totcoll= total college years) bull (iv) The estimated years of college where women catch up to men is much too high to be

practically relevant While the estimated coefficient on femalesdottotcoll shows that the gap is reduced at higher levels of college it is never closed ndash not even close In fact at four years of college the difference in predicted log wage is still minus357+030(4) = 237 or about 211 less for women

Q4 Wooldridge C713 (apple_c7_13do) bull (i) 412660 asymp 624 bull (ii) The OLS estimates of the LPM are

119890119888119900119899119887119906119910 = 424 minus 803 ecoprc + 719 regprc + 00055 faminc + 024 hhsize (165) (109) (132) (00053) (013) + 025 educ minus 00050 age (008) (00125) n = 660 R2 = 110 If ecoprc increases by say 10 cents (10) then the probability of buying eco-labeled apples falls by about 080 If regprc increases by 10 cents the probability of buying eco-labeled apples increases by about 072 (Of course we are assuming that the probabilities are not close to the boundaries of zero and one respectively)

bull (iii) The F test with 4 and 653 df is 443 with p-value = 0015 Thus based on the usual F test the four non-price variables are jointly very significant Of the four variables educ appears to have the most important effect For example a difference of four years of education implies an increase of 025(4) = 10 in the estimated probability of buying eco-labeled apples This suggests that more highly educated people are more open to buying produce that is environmentally friendly which is perhaps expected Household size (hhsize) also has an effect Comparing a couple with two children to one that has no children ndash other factors equal ndash the couple with two children has a 048 higher probability of buying eco-labeled apples

bull (iv) The model with log(faminc) fits the data slightly better the R-squared increases to about 112 (We would not expect a large increase in R-squared from a simple change in the functional form) The coefficient on log(faminc) is about 045 (t = 155) If log(faminc) increases by 10 which means roughly a 10 increase in faminc then P(ecobuy = 1) is estimated to increase by about 0045 a pretty small effect

19

bull (v) The fitted probabilities range from about 185 to 1051 so none are negative There are two fitted probabilities above 1 which is not a source of concern with 660 observations

bull (vi) Using the standard prediction rule ndash predict one when 119890119888119900119899119887119906119910 ge 05 and zero otherwise ndash gives the fraction correctly predicted for ecobuy = 0 as 102248 asymp 411 so about 411 For ecobuy = 1 the fraction correctly predicted is 340412 asymp 825 or 825 With the usual prediction rule the model does a much better job predicting the decision to buy eco-labeled apples (The overall percent correctly predicted is about 67)

20

Week 9 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is heteroskedasticity in a regression model

When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i) we say that heteroskedasticity is present

bull In the presence of heteroskedasticity are the t-stat and F-stat from the usual OLS still valid Why Are there any other problems with the OLS under heteroskedasticity The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators

bull What are the heteroskedasticity-robust standard errors How do you use them in STATA These are the corrected standard errors that take into account the possible presence of heteroskedasticity The t-stat and F-stat computed using the heteroskedasticity-robust standard errors are valid test statistics In STATA these are easily obtained by using the option ldquorobustrdquo with the ldquoregressrdquo command eg regress lwage educ robust

bull How do you detect if there is heteroskedasticity The Breusch-Pagan test or the White test can be used to detect the presence of heteroskedasticity See Section 83 for details

bull If heteroskedasticity is present in a known form how would you estimate the model In this case the WLS estimators should be used which is more efficient than the OLS estimators See Section 84 for details

bull If heteroskedasticity is present in an unknown form how would you estimate the model If there is strong evidence for heteroskedasticity the FGLS estimators should be used which is based on an exponential functional form See Section 84 for details

bull What are the steps in the FGLS estimation You should summarise from Section 84

bull How would you handle the heteroskedasticity of the LPM The heteroskedasticity functional form is known for the LPM Hence the WLS can be used in principle However because the LPM can produce predicted probabilities that are outside the interval (01) the known functional form p(x)[1-p(x)] may not be useful for the WLS In that case it may be necessary to use FGLS for the LPM

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 81 bull Parts (ii) and (iii) The homoskedasticity assumption played no role in Chapter 5 in showing

that OLS is consistent But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid even in large samples As heteroskedasticity is a violation of the Gauss-Markov assumptions OLS is no longer BLUE

Q2 Wooldridge 82 bull With Var(u|incpriceeducfemale) = σ2inc2 h(x) = inc2 where h(x) is the heteroskedasticity

function defined in equation (821) Therefore radich(x) = inc and so the transformed equation is obtained by dividing the original equation by inc beerinc = β0(1inc) + β1 + β2priceinc + β3educinc + β4femaleinc + uinc

21

Notice that β1 which is the slope on inc in the original model is now a constant in the transformed equation This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation

Q3 Wooldridge 83 bull False The unbiasedness of WLS and OLS hinges crucially on Assumption MLR4 and as we

know from Chapter 4 this assumption is often violated when an important variable is omitted When MLR4 does not hold both WLS and OLS are biased Without specific information on how the omitted variable is correlated with the included explanatory variables it is not possible to determine which estimator has a small bias It is possible that WLS would have more bias than OLS or less bias Because we cannot know we should not claim to use WLS in order to solve ldquobiasesrdquo associated with OLS

Q4 Wooldridge 85 bull (i) No For each coefficient the usual standard errors and the heteroskedasticity-robust

ones are practically very similar bull (ii) The effect is minus029(4) = minus116 so the probability of smoking falls by about 116 bull (iii) As usual we compute the turning point in the quadratic 020[2(00026)] asymp 3846 so

about 38 and one-half years bull (iv) Holding other factors in the equation fixed a person in a state with restaurant smoking

restrictions has a 101 lower chance of smoking This is similar to the effect of having four more years of education

bull (v) We just plug the values of the independent variables into the OLS regression line 119904119898119900119896119890119904 = 656 minus 069log(6744)+012log(6500) minus029(16)+020(77) minus00026(77) asymp 0052 Thus the estimated probability of smoking for this person is close to zero (In fact this person is not a smoker so the equation predicts well for this particular observation)

Q5 Wooldridge C810 (401ksubs_c8_10do) bull (i) In the following equation estimated by OLS the usual standard errors are in (sdot) and the

heteroskedasticity-robust standard errors are in [sdot] 119890401119896 = minus506 + 0124 inc minus 000062 inc2 + 0265 age minus 00031 age2 minus 0035 male (081) (0006) (000005) (0039) (00005) (0121) [079] [0006] [000005] [0038] [00004] [0121] n = 9275 R2 = 094 There are no important differences if anything the robust standard errors are smaller

bull (ii) This is a general claim Since Var(y|x) = p(x)[1-p(x)] we can write E(u2|x) = p(x)-p(x)2 Written in error form u2 = p(x) - p(x)2 + v In other words we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v with the restrictions δ0 = 0 δ1 = 1 and δ2 = -1 Remember that for the LPM the fitted values 119910 are estimates of p(x) So when we run the regression 1199062 on 119910 and 1199102 (including an intercept) the intercept estimates should be close to zero the coefficient on 119910 should be close to one and the coefficient on 1199102 should be close to ndash1

bull (iii) The White LM statistic and F statistic about 5819 and 31032 respectively both of which are very significant The coefficient on 119890401119896 is about 1010 the coefficient on 119890401119896 2 about minus970 and the intercept is about -009 These estimates are quite close to what we expect to find from the theory in part (ii)

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 4: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

4

bull (vi) The average of exppp is about $519487 The standard deviation is $109189 which shows rather wide variation in spending per pupil [The minimum is $120688 and the maximum is $1195764]

bull (vii) 87 Note that by Taylor expansion log(x + h) ndash log(x) = log(1 + hx) asymp hx for positive x and small hx (in absolute value)

Q4 Wooldridge C14 (jtrain2_ch01do) bull (i) 185445 asymp 416 is the fraction of men receiving job training or about 416 bull (ii) For men receiving job training the average of re78 is about 635 or $6350 For men not

receiving job training the average of re78 is about 455 or $4550 The difference is $1800 which is very large On average the men receiving the job training had earnings about 40 higher than those not receiving training

bull (iii) About 243 of the men who received training were unemployed in 1978 the figure is 354 for men not receiving training This too is a big difference

bull (iv) The differences in earnings and unemployment rates suggest the training program had strong positive effects Our conclusions about economic significance would be stronger if we could also establish statistical significance (which is done in Computer Exercise C910 in Chapter 9)

STATA Hints bull Some of STATA do-files for solving the Problem Set are posted in the course website It is

important to understand the commands in the do-files so that you will be able write your own do-files for your assignments and the course project

5

Week 3 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull The minimum requirement for OLS to be carried out for the data set (xi yi) i=1hellipn with

the sample size n gt 2 is that the sample variance of x is positive In what circumstances is the sample variance of x zero When all observations on x have the same value there is no variation in x

bull The OLS estimation of the simple regression model has the following properties a) the sum of the residuals is zero b) the sample covariance of the residuals and x is zero

Why How would you relate them to the ldquoleast squaresrdquo principle These are derived from the first-order conditions for minimising the sum of squared residuals (SSR)

bull Convince yourself that the point )( yx the sample means of x and y is on the sample regression function (SRF) which is a straight line This can be derived from the first of the first-order conditions for minimising SSR

bull How do you know that SST = SSE + SSR is true The dependent variable values can be expressed as the fitted value plus the residual After taking the average away from both side of the equation we find (119910119894 minus 119910) = (119910119894 minus 119910) + 119906119894 The required ldquoSST = SSE + SSRrdquo is obtained by squaring and summing both sides of the above equation noting (119910119894 minus 119910) = 1(119909119894 minus ) and using the second of the first-order conditions of minimising SSR

bull Which of the following models is (are) nonlinear model(s) a) sales = β0 [1 + exp(-β1 ad_expenditure)] + u b) sales = β0 + β1 log(ad_expenditure) + u c) sales = β0 + β1 exp(ad_expenditure) + u d) sales = exp(β0 + β1 ad_expenditure + u)

The model in a) is nonlinear bull Can you follow the proofs of Theorems 21-23

You should be able to follow the proofs If not at this point try again

Problem Set Q1 Wooldridge 24 bull (i) When cigs = 0 predicted birth weight is 11977 ounces When cigs = 20 992256bwght =

10949 This is about an 86 drop bull (ii) Not necessarily There are many other factors that can affect birth weight

particularly overall health of the mother and quality of prenatal care These could be correlated with cigarette smoking during birth Also something such as caffeine consumption can affect birth weight and might also be correlated with cigarette smoking

bull (iii) If we want a predicted bwght of 125 then cigs = (125 ndash 11977)( ndash524) ndash1018 or about ndash10 cigarettes This is nonsense of course and it shows what happens when we are trying to predict something as complicated as birth weight with only a single explanatory variable The largest predicted birth weight is necessarily 11977 Yet almost 700 of the births in the sample had a birth weight higher than 11977

bull (iv) 1176 out of 1388 women did not smoke while pregnant or about 847 Because we are using only cigs to explain birth weight we have only one predicted birth weight

6

at cigs = 0 The predicted birth weight is necessarily roughly in the middle of the observed birth weights at cigs = 0 and so we will under predict high birth rates

Q2 Wooldridge 27 bull (i) When we condition on inc in computing an expectation inc becomes a constant So

E(u|inc) = E(radic119894119899119888sdote|inc) = radic119894119899119888sdotE(e|inc) = radic119894119899119888sdot0 because E(e|inc) = E(e) = 0 bull (ii) Again when we condition on inc in computing a variance inc becomes a constant So

Var(u|inc) = Var(radic119894119899119888sdote|inc) = (radic119894119899119888)2Var(e|inc) = 1205901205762inc because Var(e|inc) = 1205901205762 bull (iii) Families with low incomes do not have much discretion about spending typically a

low-income family must spend on food clothing housing and other necessities Higher income people have more discretion and some might choose more consumption while others more saving This discretion suggests wider variability in saving among higher income families

Q3 Wooldridge C26 (meap93_ch02do) bull (i) It seems plausible that another dollar of spending has a larger effect for low-spending

schools than for high-spending schools At low-spending schools more money can go toward purchasing more books computers and for hiring better qualified teachers At high levels of spending we would expect little if any effect because the high-spending schools already have high-quality teachers nice facilities plenty of books and so on

bull (ii) If we take changes as usual we obtain ∆119898119886119905ℎ10 = 1205731∆ log(119890119909119901119890119899119889) asymp 1205731

100 (∆119890119909119901119890119899119889)

just as in the second row of Table 23 So if Δexpend = 10 Δmath10 = β110 bull (iii) The regression results are

119898119886119905ℎ10 = minus6934 + 1116 log(119890119909119901119890119899119889) 119899 = 408 1198772 = 0297 bull (iv) If expend increases by 10 percent 119898119886119905ℎ10 increases by about 11 percentage points

This is not a huge effect but it is not trivial for low-spending schools where a 10 percent increase in spending might be a fairly small dollar amount

bull (v) In this data set the largest value of math10 is 667 which is not especially close to 100 In fact the largest fitted values is only about 302

Q4 Wooldridge C27 (charity_ch02do) bull (i) The average gift is about 744 Dutch guilders Out of 4268 respondents 2561 did not

give a gift or about 60 percent bull (ii) The average mailings per year are about 205 The minimum value is 25 (which

maximum value is 35 bull (iii) The estimated equation is

119892120484119891119905 = 201 + 265 mailsyear n = 4268 1198772 = 0138 bull (iv) The slope coefficient from part (iii) means that each mailing per year is associated with

ndash perhaps even ldquocausesrdquo ndash an estimated 265 additional guilders on average Therefore if each mailing costs one guilder the expected profit from each mailing is estimated to be 165 guilders This is only the average however Some mailings generate no contributions or a contribution less than the mailing cost other mailings generated much more than the mailing cost

bull (v) Because the smallest mailsyear in the sample is 25 the smallest predicted value of gifts is 201 + 265(25) asymp 267 Even if we look at the overall population where some people have received no mailings the smallest predicted value is about two So with this estimated equation we never predict zero charitable gifts

7

Week 4 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What do we mean when we say ldquoregress wage on educ and exprrdquo

Use OLS to estimate the multiple linear regression model wage = β0 + β1educ + β2expr + u bull Why and under what circumstance do we need to ldquocontrol forrdquo expr in the regression model

in order to quantify the effect of educ on wage We need to control for expr in the model when the partial or ceteris paribus effect of educ on wage is the parameter of interest and expr and educ are correlated

bull What is the bias of an estimator bias =E( 119895) ndash βj

bull What is the ldquoomitted variable biasrdquo It is the bias resulted from omitting relevant variables that are correlated with the explanatory variables in a regression model

bull What is the consequence of adding an irrelevant variable to a regression model Including an irrelevant variable will generally increase the variance of the OLS estimators

bull What is the requirement of the ZCM assumption in your own words You must do this by yourself

bull Why assuming E(u) = 0 is not restrictive when an intercept is included in the regression model When the mean value of the disturbance is not zero you can always define the new disturbance as the original disturbance minus the mean value and define the new intercept as the original intercept plus the mean value

bull In terms of notation why do we need two subscripts for independent variables In our notation the first subscript is for indexing observations (1st obs 2nd obs etc) and the second subscript is for indexing variables (1st variable 2nd variable etc)

bull Using OLS to estimate a multiple regression model with k independent variables and an intercept how many first-order conditions are there k + 1

bull How do you know that the OLS estimators are linear combinations of the observations on the dependent variable By the first-order conditions we see that the OLS estimators are the solution to k+1 linear equations with k+1 unknowns The ldquoright-hand-sidesrdquo of the equations are linear functions of observations on the dependent variable (say y) Hence the solution is linear combinations of the y-observations

bull What is Gauss-Markov theorem Read the textbook bull Try to explain why R2 never decreases (it is likely to increase) when additional explanatory

variables are added to the regression model First the smaller is the SSR the greater is the R-squared The OLS chooses parameter estimates to minimise the SSR With extra explanatory variables the R-squared does not decrease because the SSR does not increase which is true because the OLS has a larger parameter space to choose estimates (more possibilities to make SSR small)

bull What is an endogenous explanatory variable Exogenous explanatory variable See the top of page 88

bull What is multicollinearity and its likely effect on the OLS estimators Read pages 95-99

8

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 32 bull (i) Yes Because of budget constraints it makes sense that the more siblings there are in a

family the less education any one child in the family has To find the increase in the number of siblings that reduces predicted education by one year we solve 1 = 094(Δsibs) so Δsibs = 1094 asymp 106

bull (ii) Holding sibs and feduc fixed one more year of motherrsquos education implies 131 years more of predicted education So if a mother has four more years of education her son is predicted to have about a half a year (524) more years of education

bull (iii) Since the number of siblings is the same but meduc and feduc are both different the coefficients on meduc and feduc both need to be accounted for The predicted difference in education between B and A is 131(4) + 210(4) = 1364

Q2 Wooldridge 310 bull (i) Because x1 is highly correlated with x2 and x3 and these latter variables have large partial

effects on y the simple and multiple regression coefficients on x1 can differ by large amounts We have not done this case explicitly but given equation (346) and the discussion with a single omitted variable the intuition is pretty straightforward

bull (ii) Here we would expect 1205731 and 1 to be similar (subject of course to what we mean by ldquoalmost uncorrelatedrdquo) The amount of correlation between x2 and x3 does not directly effect the multiple regression estimate on x1 if is essentially uncorrelated with x2 and x3

bull (iii) In this case we are (unnecessarily) introducing multicollinearity into the regression x2 and x3 have small partial effects on y and yet x2 and x3 are highly correlated withx1 Adding x2 and x3 likely increases the standard error of the coefficient on x1 substantially so se(1) is likely to be much larger than se(1205731)

bull (iv) In this case adding x2 and x3 will decrease the residual variance without causing much collinearity (because x1 is almost uncorrelated with x2 and x3 ) so we should see se(1) smaller than se(1205731) The amount of correlation between x and does not directly affect se(1)

Q3 Wooldridge C32 (hprice1_ch03do) bull (i) The estimated equation is

119901119903120484119888119890 = minus1932 + 128 119904119902119903119891119905 + 1520 119887119889119903119898119904 119899 = 88 1198772 = 632 bull (ii) Holding square footage constant (the price increase) = 1520 (increase in bedroom)

Hence the price increases by 1520 which means $15200 bull (iii) Now (the price increase) = 128 (increase in square feet) + 1520 (increase in bedroom)

= 128 (140) + 1520 (1) = 3312 That is $33120 Because the house size is increasing the effect is larger than that in (ii)

bull (iv) About 632 bull (v) The predicted price is -1932+128(2438)+1520(4) = 353544 or $353544 bull (vi) The residual = 300000 ndash 353544 = - 53544 The buyer under-paid judged by the

predicted price But of course there are many other features of a house (some that we cannot even measure) that affect price We have not controlled for those

Q4 Wooldridge C36 (wage2_ch03_c3_6do) bull (i) The slope coefficient from regressing IQ on educ is 353383 bull (ii) The slope coefficient from log(wage) on educ is 05984 bull (iii) The slope coefficient from log(wage) on educ and IQ are 03912 and 00586 respectively bull (iv) The computed 211 βδβ ˆ~ˆ + = 03912 + 353383(00586) asymp 05984

9

Week 5 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the CLM assumptions

These are MLR1 MLR2 MLR3 and MLR6 MLR6 is a very strong assumption that implies both MLR4 and MLR5

bull What is the sampling distribution of the OLS estimators under the CLM assumptions The OLS estimators under the CLM assumptions follow the normal distribution

bull What are the standard errors of the OLS estimators The standard error is the square root of the estimated variance of the estimator

bull What is the null hypothesis about a parameter In this context the null hypothesis is a statement that assigns a known value (or hypothesised value) to the parameter of interest Usually the null is a maintained proposition (from economic theories or experience) that will only be rejected on very strong evidence

bull What is a one-tailed (two-tailed) alternative hypothesis This should be clear once you finish your reading

bull In testing hypotheses what is a Type 1 (Type 2) error What is the level of significance Type 1 error is the rejection of a true null Type 2 error is the non-rejection of a false null

bull The decision rule we use can be stated as ldquoreject the null if the t-statistic exceeds the critical valuerdquo How is the critical value determined The critical value is determined by the level of significance (or level of confidence in the case of confidence intervals) and the distribution of the test statistic For example for t-stat we use the t-distribution when the sample size is small and the normal distribution when the sample size is large (df gt 120)

bull Justify the statement ldquoGiven the observed test statistic the p-value is the smallest significant level at which the null hypothesis would be rejectedrdquo If you used the observed t-stat as your critical value you would just reject the null (because the t-stat is equal to this value) This ldquocritical valuerdquo (= the t-stat value) implies a level of significance say p which is exactly the p-value For any significance level smaller than p the corresponding critical value is more extreme than the t-stat value and does not lead to the rejection of the null Hence p is the smallest significance level at which the null would be rejected

bull What is the 90 confidence interval for a parameter It is an interval [L U] defined by two sample statistics the lower bound L and the upper bound U The probability that [L U] covers the parameter is 09

bull In constructing a confidence interval for a parameter what is the level of confidence It is the probability that the CI covers the parameter

bull When the level of confidence increases how would the width of the confidence interval change (holding other things fixed) The width will increase

bull Try to convince yourself that the event ldquothe 90 confidence interval covers a hypothesised value of the parameterrdquo is the same as the event ldquothe null of the parameter being the hypothesised value cannot be rejected in favour of the two-tailed alternative at the 10 level of significancerdquo It becomes apparent if you start with P(|t-stat| gt c) = 09 and disentangle the expression for the t-stat

10

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 41 bull (i) and (iii) generally cause the t statistics not to have a t distribution under H0

Homoskedasticity is one of the CLM assumptions An important omitted variable violates Assumption MLR4 (hence MLR6) in general The CLM assumptions contain no mention of the sample correlations among independent variables except to rule out the case where the correlation is one (MLR3)

Q2 Wooldridge 42 bull (i) H0β3 = 0 H1β3 gt 0 bull (ii) The proportionate effect on is 00024(50) = 012 To obtain the percentage effect we

multiply this by 100 12 Therefore a 50 point ceteris paribus increase in ros is predicted to increase salary by only 12 Practically speaking this is a very small effect for such a large change in ros

bull (iii) The 10 critical value for a one-tailed test using df = infin is obtained from Table G2 as 1282 The t statistic on ros is 0002400054 asymp 44 which is well below the critical value Therefore we fail to reject H0 at the 10 significance level

bull (iv) Based on this sample the estimated ros coefficient appears to be different from zero only because of sampling variation On the other hand including ros may not be causing any harm it depends on how correlated it is with the other independent variables (although these are very significant even with ros in the equation)

Q3 Wooldridge 45 bull (i) 412 plusmn 196(094) or about [228 596] bull (ii) No because the value 4 is well inside the 95 CI bull (iii) Yes because 1 is well outside the 95 CI

Q4 Wooldridge C48 (401ksubs_ch04do) bull (i) There are 2017 single people in the sample of 9275 bull (ii) The estimated equation is

119899119890119905119905119891119886 = minus4304 + 799 inc + 843 age ( 408) (060) (092) n = 2017 R2 = 119 The coefficient on inc indicates that one more dollar in income (holding age fixed) is reflected in about 80 more cents in predicted nettfa no surprise there The coefficient on age means that holding income fixed if a person gets another year older hisher nettfa is predicted to increase by about $843 (Remember nettfa is in thousands of dollars) Again this is not surprising

bull (iii) The intercept is not very interesting as it gives the predicted nettfa for inc = 0 and age = 0 Clearly there is no one with even close to these values in the relevant population

bull (iv) The t statistic is (843 minus 1)092 asymp minus171 Against the one-sided alternative H1 β2 lt 1 the p-value is about 044 Therefore we can reject H0 β2 = 1 at the 5 significance level (against the one-sided alternative)

bull (v) The slope coefficient on inc in the simple regression is about 821 which is not very different from the 799 obtained in part (ii) As it turns out the correlation between inc and age in the sample of single people is only about 039 which helps explain why the simple and multiple regression estimates are not very different see the text also

11

Week 6 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull How would you test hypotheses about a single linear combination of parameters

We may re-parameterise the regression model to isolate the ldquosingle linear combinationrdquo Then the OLS on the re-parameterised model will provide the estimate of the ldquosingle linear combinationrdquo and the related standard error See the example in the textbook and lecture slides

bull What are exclusion restrictions for a regression model Exclusion restrictions are the null hypothesis that a group of x-variables have zero coefficients in the regression model

bull What are restricted and unrestricted models When the null hypothesis is a set of restrictions on the parameters the regression under the null is called the restricted model while the regression under the alternative (which simply states that the null is false) is called the unrestricted model

bull How do you compute the F-statistic given that you have SSRs Clearly the general F-stat is based on the relative difference between the SSRs under the restricted model and unrestricted model F = [(SSRr ndash SSRur)q][SSRur(n-k-1)] where you should be able to explain the meanings of various symbols

bull What are general linear restrictions on parameters These are linear equations on the parameters For example β1 ndash 2β2 + 3β3 = 0

bull What is the test for the overall significance of a regression This is the F-test for the null hypothesis that all the coefficients of x-variables are zero

bull How would you report your regression results See the guidelines in Section 46 of the textbook

bull Why would you care about the asymptotic properties of the OLS estimators When MLR6 does not hold the finite-sample distribution of the OLS estimators is not available For reasonably large samples (large n) we use the asymptotic distribution of the OLS estimators which is known from studying the asymptotic properties to approximate the finite-sample distribution of the OLS estimators which is unknown

bull Comparing the inference procedures in Chapter 5 with those in Chapter 4 can you list the similarities and differences It is beneficial to make a list by yourself

bull Under MLR1-MLR5 the OLS estimators are consistent asymptotically normal and asymptotically efficient Try to explain these properties in your own words This is really a test on your understanding of these notions in econometrics

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 46 bull (i) With df = n ndash 2 = 86 we obtain the 5 critical value from Table G2 with df = 90 Because

each test is two-tailed the critical value is 1987 The t statistic for H0β0 = 0 is about -89 which is much less than 1987 in absolute value Therefore we fail to reject H0β0 = 0 The t statistic for H0β1 = 1 is (976 ndash 1)049 asymp -49 which is even less significant (Remember we reject H0 in favor of H1 in this case only if |t| gt 1987)

bull (ii) We use the SSR form of the F statistic We are testing q = 2 restrictions and the df in the unrestricted model is 86 We are given SSRr = 20944899 and SSRur = 16564451 Therefore F = [(20944899 minus 16564451)2][1656445186] asymp 1137

12

which is a strong rejection of H0 from Table G3c the 1 critical value with 2 and 90 df is 485

bull (iii) We use the R-squared form of the F statistic We are testing q = 3 restrictions and there are 88 ndash 5 = 83 df in the unrestricted model The F statistic is F = [(829 ndash 820)3][(1 ndash 829)83] asymp 146 The 10 critical value (again using 90 denominator df in Table G3a) is 215 so we fail to reject H0 at even the 10 level In fact the p-value is about 23

bull (iv) If heteroskedasticity were present Assumption MLR5 would be violated and the F statistic would not have an F distribution under the null hypothesis Therefore comparing the F statistic against the usual critical values or obtaining the p-value from the F distribution would not be especially meaningful

Q2 Wooldridge 48 bull (i) We use Property VAR3 from Appendix B

Var(1minus 32) = Var (1) + 9 Var (2) ndash 6 Cov (1 2) bull (ii) t = (1 minus 32 minus 1)se(1minus 32) so we need the standard error of 1minus 32 bull (iii) Because θ = β1 ndash 3β2 we can write β1 = θ + 3β2 Plugging this into the population model

gives y = β0 + (θ + 3β2)x1 + β2 x2 + β3 x3 + u = β0 + θ x1 + β2 (3 x1 + x2) + β3 x3 + u This last equation is what we would estimate by regressing y on x1 3x1 + x2 and x3 The coefficient and standard error on x1 are what we want

Q3 Wooldridge 410 bull (i) We need to compute the F statistic for the overall significance of the regression with n =

142 and k = 4 F = [0395(1 ndash 0395)](1374) asymp 141 The 5 critical value with 4 and 120 df is 245 which is well above the value of F Therefore we fail to reject H0 β1 = β2 = β3 = β4 = 0 at the 10 level No explanatory variable is individually significant at the 5 level The largest absolute t statistic is on dkr tdkr asymp 160 which is not significant at the 5 level against a two-sided alternative

bull (ii) The F statistic (with the same df) is now [0330(1 ndash 0330)](1374) asymp 117 which is even lower than in part (i) None of the t statistics is significant at a reasonable level

bull (iii) We probably should not use the logs as the logarithm is not defined for firms that have zero for dkr or eps Therefore we would lose some firms in the regression

bull (iv) It seems very weak There are no significant t statistics at the 5 level (against a two-sided alternative) and the F statistics are insignificant in both cases Plus less than 4 of the variation in return is explained by the independent variables

Q4 Wooldridge C49 (discrim_c4_9do) bull (i) The results from the OLS regression with standard errors in parentheses are

log (119901119904119900119889119886) = minus146 + 073 prpblck + 137 log(income) + 380 prppov (029) (031) (027) (133) n = 401 R2 = 087

bull The p-value for testing H0 β1= 0 against the two-sided alternative is about 018 so that we reject H0 at the 5 level but not at the 1 level

bull (ii) The correlation is about minus84 indicating a strong degree of multicollinearity Yet each coefficient is very statistically significant the t statistic for log (119894119899119888119900119898) is about 51 and that for prppov is about 286 (two-sided p-value = 004)

bull (iii) The OLS regression results when log(hseval) is added are log (119901119904119900119889119886) = minus84 + 098 prpblck minus 053 log(income) + 052 prppov + 121 log(hseval) (29) (029) (038) (134) (018)

13

n = 401 R2 = 184 The coefficient on log(hseval) is an elasticity a one percent increase in housing value holding the other variables fixed increases the predicted price by about 12 percent The two-sided p-value is zero to three decimal places

bull (iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the 15 significance level against a two-sided alternative for log(income) and prppov is does not have a t statistic even close to one in absolute value) Nevertheless they are jointly significant at the 5 level because the outcome of the F2396 statistic is about 352 with p-value = 030 All of the control variables ndash log(income) prppov and log(hseval) ndash are highly correlated so it is not surprising that some are individually insignificant

bull (v) Because the regression in (iii) contains the most controls log(hseval) is individually significant and log(income) and prppov are jointly significant (iii) seems the most reliable It holds fixed three measure of income and affluence Therefore a reasonable estimate is that if prpblck increases by 10 psoda is estimated to increase by 1 other factors held fixed

Q5 Wooldridge 52 bull This is about the inconsistency of the simple regression of pctstck on funds A higher

tolerance of risk means more willingness to invest in the stock market so β2 gt 0 By assumption funds and risktol are positively correlated Now we use equation (55) where δ1 gt 0 plim(1205731) = β1 + β2δ1 gt β1 so 1205731 has a positive inconsistency (asymptotic bias) This makes sense if we omit risktol from the regression and it is positively correlated with funds some of the estimated effect of funds is actually due to the effect of risktol

Q6 Wooldridge C51 (wage1_c5_1do) bull (i) The estimated equation is

119908119886119892119890 = minus287 + 599 educ + 022 exper + 169 tenure (073) (051) (012) (022) n = 526 R2 = 306 120590 = 3085

bull Below is a histogram of the 526 residual 119906119894 i = 1 2 526 The histogram uses 27 bins which is suggested by the formula in the Stata manual for 526 observations For comparison the normal distribution that provides the best fit to the histogram is also plotted

bull

14

bull (ii) With log(wage) as the dependent variable the estimated equation is log (119908119886119892119890) = 284 + 092 educ + 0041 exper + 022 tenure (104) (007) (0017) (003) n = 526 R2 = 316 120590 = 441 The histogram for the residuals from this equation with the best-fitting normal distribution overlaid is given below

bull (iii) The residuals from the log(wage) regression appear to be more normally distributed

Certainly the histogram in part (ii) fits under its comparable normal density better than in part (i) and the histogram for the wage residuals is notably skewed to the left In the wage regression there are some very large residuals (roughly equal to 15) that lie almost five estimated standard deviations (120590 = 3085) from the mean of the residuals which is identically zero of course Residuals far from zero does not appear to be nearly as much of a problem in the log(wage) regression

bull The skewness and kurtosis of the residuals from the OLS regression may be used to test the null that u is normally distributed (skewness = 0 and kurtosis = 3 under the null) The null distribution of the Jarque-Bera test statistic (which takes into account both skewness and kurtosis) is approximately the chi-squared with 2 degrees of freedom Hence at the 5 level we reject the normality of u whenever the JB statistic is greater than 599 In STATA this test is carried out by the command ldquosktestrdquo With the data in WAGE1RAW JB = sdot (extremely large) for the wage model and JB = 1059 for the log(wage) model While the normality is rejected for both models we do see that the JB statistic is much smaller for the log(wage) model [also compare its p-values of skewness and excess kurtosis tests against those from the wage model] Hence the normality appears to be a reasonable approximation to the distribution of the error term in the log(wage) model

15

Week 7 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the advantages of using the log of a variable in regression Find the ldquorules of

thumbrdquo for taking logs See page 191 of Wooldridge (p198-199 for the 3rd edition)

bull Be careful when you interpret the coefficients of explanatory variables in a model where some variables are in logarithm Do you remember Table 23 Consult Table 23

bull How do you compute the change in y caused by Δx when the model is built for log(y) A precise computation is given by Equation (68) of Wooldridge

bull Why do we need the ldquointeractionrdquo terms in regression models An interaction term is needed if the partial effect of an explanatory variable is linearly related to another explanatory variable See (617) for example

bull What is the adjusted R-squared What is the difference between it and the R-squared The primary attractiveness of the adjusted R-squared is that it imposes penalty for adding additional explanatory variables to a model The R-squared can never fall when a new explanatory variable is added However the adjusted R-squared will fall if the t-ratio on the new variable is less than one in absolute value

bull How do you construct interval prediction for given x-values This is nicely summarised in Equations (627)-(631) and the surrounding text

bull How do you predict y for given x-values when the model is built for log(y) Check the list on page 212 (p220 for the 3rd edition)

bull What is involved in ldquoresidual analysisrdquo See pages 209-210 (p217-218 for the 3rd edition)

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 64 bull (i) Holding all other factors fixed we have

Δlog(wage) = β1 Δeduc + β2 Δeducsdot pareduc = (β1 + β2 pareduc) Δeduc Dividing both sides by Δeduc gives the result The sign of β2 is not obvious although β2 gt 0 if we think a child gets more out of another year of education the more highly educated are the childrsquos parents

bull (ii) We use the values pareduc = 32 and pareduc = 24 to interpret the coefficient on educsdotpareduc The difference in the estimated return to education is 00078(32 ndash 24) = 0062 or about 62 percentage points (Percentage points are changes in percentages)

bull (iii) When we add pareduc by itself the coefficient on the interaction term is negative The t statistic on educsdotpareduc is about ndash133 which is not significant at the 10 level against a two-sided alternative Note that the coefficient on pareduc is significant at the 5 level against a two-sided alternative This provides a good example of how omitting a level effect (pareduc in this case) can lead to biased estimation of the interaction effect

Q2 Wooldridge 65 bull This would make little sense Performances on math and science exams are measures of

outputs of the educational process and we would like to know how various educational inputs and school characteristics affect math and science scores For example if the staff-to-pupil ratio has an effect on both exam scores why would we want to hold

16

performance on the science test fixed while studying the effects of staff on the math pass rate This would be an example of controlling for too many factors in a regression equation The variable scill could be a dependent variable in an identical regression equation

Q3 Wooldridge C63 (C62 for the 3rd Edition) (wage2_c6_3do) bull (i) Holding exper (and the elements in u) fixed we have

Δ log(wage) = β1Δeduc + β2Δeducsdotexper = (β1 + β2exper)Δeduc or Δ log(wage) Δeduc = (β1 + β2exper) This is the approximate proportionate change in wage given one more year of education

bull (ii) H0 β3 = 0 If we think that education and experience interact positively ndash so that people with more experience are more productive when given another year of education then β3 gt 0 is the appropriate alternative

bull (iii) The estimated equation is log (119908119886119892119890) = 595 + 0440 educ ndash 0215 exper + 00320 educsdotexper (024) (0174) (0200) (00153) n = 935 R2 = 135 adjusted-R2 = 132 The t statistic on the interaction term is about 213 which gives a p-value below 02 against H1 β3 gt 0 Therefore we reject H0 β3 = 0 in favor of H1 β3 gt 0 at the 2 level

bull (iv) We rewrite the equation as log(wage) = β0 + θ1educ + β2exper + β3 educ (exper minus 10) + u and run the regression log(wage) on educ exper and educ(exper ndash 10) We want the coefficient on educ We obtain θ1 asymp 0761 and se(θ1) asymp 0066 The 95 CI for θ1 is about 063 to 089

Q4 Wooldridge C68 (hprice1_c6_8do) bull (i) The estimated equation (where price is in dollars) is

119901119903120484119888119890 = minus217703 + 2068 lotsize + 12278 sqrft + 138525 bdrms (294750) (0642) (1324) (90101) n = 88 R2 = 672 adjusted-R2 = 661 120590 = 59833 The predicted price at lotsize = 10000 sqrft = 2300 and bdrms = 4 is about $336714

bull (ii) The regression is pricei on (lotsizei ndash 10000) (sqrfti ndash 2300) and (bdrmsi ndash 4) We want the intercept estimate and the associated 95 CI from this regression The CI is approximately 3367067 plusmn 14665 or about $322042 to $351372 when rounded to the nearest dollar

bull (iii) We must use equation (636) to obtain the standard error of and then use equation (637) (assuming that price is normally distributed) But from the regression in part (ii) se(1199100) asymp 73745 and 120590 asymp 59833 Therefore se(0) asymp [(73745)2 + (59833) 2]12 asymp 602858 Using 199 as the approximate 975th percentile in the t84 distribution gives the 95 CI for price0 at the given values of the explanatory variables as 3367067 plusmn 199(602858) or rounded to the nearest dollar $216738 to $456675 This is a fairly wide prediction interval But we have not used many factors to explain housing price If we had more we could presumably reduce the error standard deviation and therefore 120590 to obtain a tighter prediction interval

17

Week 8 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is a qualitative factor Give some examples

This should be easy bull How do you use dummy variables to represent qualitative factors

A factor with two levels (or values) is easily represented one dummy variable For a factor with more than two levels (or values) we need more than one dummies For example a factors with 5 values requires 4 dummy variables

bull How do you use dummy variables to represent an ordinal variable An ordinal variable may be represented by dummies in the same way as a multi-levelled factor

bull How do you test for differences in regression functions across different groups We generally use the F test on properly-defined restricted model and unrestricted model Chow test is a special case of the F test where the null hypothesis is no difference across groups

bull What can you achieve by interacting group dummy variables with other regressors The interaction allows the slope coefficients to be different across groups

bull What is ldquoprogram evaluationrdquo It assesses the effectiveness of a program by checking the differences in regression coefficients for the controlled (or base) group and the treatment group

bull What are the interpretations of the PRF and SRF when the dependent variable is binary The PRF is the conditional probability of ldquosuccessrdquo for given explanatory variables The SRF is the predicted conditional probability of ldquosuccessrdquo for given explanatory variables

bull What are the shortcomings of the LPM The major drawback of the LPM is that the predicted probability of success may be outside the interval [01]

bull Does MLR5 hold for the LPM No as indicated in the textbook and lecture the conditional variance of the disturbance is not a constant and depends on explanatory variables var(u|x) = E(y|x)[1minus E(y|x)]

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 74 bull (i) The approximate difference is just the coefficient on utility times 100 or ndash283 The t

statistic is minus283099 asymp minus286 which is statistically significant bull (ii) 100sdot[exp(minus283) ndash 1) asymp minus247 and so the estimate is somewhat smaller in magnitude bull (iii) The proportionate difference is 181 minus 158 = 023 or about 23 One equation that can

be estimated to obtain the standard error of this difference is log(salary) = β0 + β1log(sales) + β2roe + δ1consprod + δ2utility +δ3trans + u where trans is a dummy variable for the transportation industry Now the base group is finance and so the coefficient δ1 directly measures the difference between the consumer products and finance industries and we can use the t statistic on consprod

Q2 Wooldridge 76 bull In Section 33 ndash in particular in the discussion surrounding Table 32 ndash we discussed how to

determine the direction of bias in the OLS estimators when an important variable (ability in this case) has been omitted from the regression As we discussed there Table 32 only

18

strictly holds with a single explanatory variable included in the regression but we often ignore the presence of other independent variables and use this table as a rough guide (Or we can use the results of Problem 310 for a more precise analysis) If less able workers are more likely to receive training then train and u are negatively correlated If we ignore the presence of educ and exper or at least assume that train and u are negatively correlated after netting out educ and exper then we can use Table 32 the OLS estimator of β1 (with ability in the error term) has a downward bias Because we think β1 ge 0 we are less likely to conclude that the training program was effective Intuitively this makes sense if those chosen for training had not received training they would have lowers wages on average than the control group

Q3 Wooldridge 79 bull (i) Plugging in u = 0 and d = 1 gives f1(z) = (β0+δ0) + (β1+δ1)z bull (ii) Setting f0(z) = f1(z) gives β0 + β1z = (β0+δ0) + (β1+δ1)z Therefore provided δ1 ne 0

we have z = minusδ0δ1 Clearly z is positive if and only if δ0δ1 is negative which means δ0 and δ1 must have opposite signs

bull (iii) Using part (ii) we have totcoll = 357030 = 119 years (totcoll= total college years) bull (iv) The estimated years of college where women catch up to men is much too high to be

practically relevant While the estimated coefficient on femalesdottotcoll shows that the gap is reduced at higher levels of college it is never closed ndash not even close In fact at four years of college the difference in predicted log wage is still minus357+030(4) = 237 or about 211 less for women

Q4 Wooldridge C713 (apple_c7_13do) bull (i) 412660 asymp 624 bull (ii) The OLS estimates of the LPM are

119890119888119900119899119887119906119910 = 424 minus 803 ecoprc + 719 regprc + 00055 faminc + 024 hhsize (165) (109) (132) (00053) (013) + 025 educ minus 00050 age (008) (00125) n = 660 R2 = 110 If ecoprc increases by say 10 cents (10) then the probability of buying eco-labeled apples falls by about 080 If regprc increases by 10 cents the probability of buying eco-labeled apples increases by about 072 (Of course we are assuming that the probabilities are not close to the boundaries of zero and one respectively)

bull (iii) The F test with 4 and 653 df is 443 with p-value = 0015 Thus based on the usual F test the four non-price variables are jointly very significant Of the four variables educ appears to have the most important effect For example a difference of four years of education implies an increase of 025(4) = 10 in the estimated probability of buying eco-labeled apples This suggests that more highly educated people are more open to buying produce that is environmentally friendly which is perhaps expected Household size (hhsize) also has an effect Comparing a couple with two children to one that has no children ndash other factors equal ndash the couple with two children has a 048 higher probability of buying eco-labeled apples

bull (iv) The model with log(faminc) fits the data slightly better the R-squared increases to about 112 (We would not expect a large increase in R-squared from a simple change in the functional form) The coefficient on log(faminc) is about 045 (t = 155) If log(faminc) increases by 10 which means roughly a 10 increase in faminc then P(ecobuy = 1) is estimated to increase by about 0045 a pretty small effect

19

bull (v) The fitted probabilities range from about 185 to 1051 so none are negative There are two fitted probabilities above 1 which is not a source of concern with 660 observations

bull (vi) Using the standard prediction rule ndash predict one when 119890119888119900119899119887119906119910 ge 05 and zero otherwise ndash gives the fraction correctly predicted for ecobuy = 0 as 102248 asymp 411 so about 411 For ecobuy = 1 the fraction correctly predicted is 340412 asymp 825 or 825 With the usual prediction rule the model does a much better job predicting the decision to buy eco-labeled apples (The overall percent correctly predicted is about 67)

20

Week 9 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is heteroskedasticity in a regression model

When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i) we say that heteroskedasticity is present

bull In the presence of heteroskedasticity are the t-stat and F-stat from the usual OLS still valid Why Are there any other problems with the OLS under heteroskedasticity The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators

bull What are the heteroskedasticity-robust standard errors How do you use them in STATA These are the corrected standard errors that take into account the possible presence of heteroskedasticity The t-stat and F-stat computed using the heteroskedasticity-robust standard errors are valid test statistics In STATA these are easily obtained by using the option ldquorobustrdquo with the ldquoregressrdquo command eg regress lwage educ robust

bull How do you detect if there is heteroskedasticity The Breusch-Pagan test or the White test can be used to detect the presence of heteroskedasticity See Section 83 for details

bull If heteroskedasticity is present in a known form how would you estimate the model In this case the WLS estimators should be used which is more efficient than the OLS estimators See Section 84 for details

bull If heteroskedasticity is present in an unknown form how would you estimate the model If there is strong evidence for heteroskedasticity the FGLS estimators should be used which is based on an exponential functional form See Section 84 for details

bull What are the steps in the FGLS estimation You should summarise from Section 84

bull How would you handle the heteroskedasticity of the LPM The heteroskedasticity functional form is known for the LPM Hence the WLS can be used in principle However because the LPM can produce predicted probabilities that are outside the interval (01) the known functional form p(x)[1-p(x)] may not be useful for the WLS In that case it may be necessary to use FGLS for the LPM

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 81 bull Parts (ii) and (iii) The homoskedasticity assumption played no role in Chapter 5 in showing

that OLS is consistent But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid even in large samples As heteroskedasticity is a violation of the Gauss-Markov assumptions OLS is no longer BLUE

Q2 Wooldridge 82 bull With Var(u|incpriceeducfemale) = σ2inc2 h(x) = inc2 where h(x) is the heteroskedasticity

function defined in equation (821) Therefore radich(x) = inc and so the transformed equation is obtained by dividing the original equation by inc beerinc = β0(1inc) + β1 + β2priceinc + β3educinc + β4femaleinc + uinc

21

Notice that β1 which is the slope on inc in the original model is now a constant in the transformed equation This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation

Q3 Wooldridge 83 bull False The unbiasedness of WLS and OLS hinges crucially on Assumption MLR4 and as we

know from Chapter 4 this assumption is often violated when an important variable is omitted When MLR4 does not hold both WLS and OLS are biased Without specific information on how the omitted variable is correlated with the included explanatory variables it is not possible to determine which estimator has a small bias It is possible that WLS would have more bias than OLS or less bias Because we cannot know we should not claim to use WLS in order to solve ldquobiasesrdquo associated with OLS

Q4 Wooldridge 85 bull (i) No For each coefficient the usual standard errors and the heteroskedasticity-robust

ones are practically very similar bull (ii) The effect is minus029(4) = minus116 so the probability of smoking falls by about 116 bull (iii) As usual we compute the turning point in the quadratic 020[2(00026)] asymp 3846 so

about 38 and one-half years bull (iv) Holding other factors in the equation fixed a person in a state with restaurant smoking

restrictions has a 101 lower chance of smoking This is similar to the effect of having four more years of education

bull (v) We just plug the values of the independent variables into the OLS regression line 119904119898119900119896119890119904 = 656 minus 069log(6744)+012log(6500) minus029(16)+020(77) minus00026(77) asymp 0052 Thus the estimated probability of smoking for this person is close to zero (In fact this person is not a smoker so the equation predicts well for this particular observation)

Q5 Wooldridge C810 (401ksubs_c8_10do) bull (i) In the following equation estimated by OLS the usual standard errors are in (sdot) and the

heteroskedasticity-robust standard errors are in [sdot] 119890401119896 = minus506 + 0124 inc minus 000062 inc2 + 0265 age minus 00031 age2 minus 0035 male (081) (0006) (000005) (0039) (00005) (0121) [079] [0006] [000005] [0038] [00004] [0121] n = 9275 R2 = 094 There are no important differences if anything the robust standard errors are smaller

bull (ii) This is a general claim Since Var(y|x) = p(x)[1-p(x)] we can write E(u2|x) = p(x)-p(x)2 Written in error form u2 = p(x) - p(x)2 + v In other words we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v with the restrictions δ0 = 0 δ1 = 1 and δ2 = -1 Remember that for the LPM the fitted values 119910 are estimates of p(x) So when we run the regression 1199062 on 119910 and 1199102 (including an intercept) the intercept estimates should be close to zero the coefficient on 119910 should be close to one and the coefficient on 1199102 should be close to ndash1

bull (iii) The White LM statistic and F statistic about 5819 and 31032 respectively both of which are very significant The coefficient on 119890401119896 is about 1010 the coefficient on 119890401119896 2 about minus970 and the intercept is about -009 These estimates are quite close to what we expect to find from the theory in part (ii)

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 5: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

5

Week 3 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull The minimum requirement for OLS to be carried out for the data set (xi yi) i=1hellipn with

the sample size n gt 2 is that the sample variance of x is positive In what circumstances is the sample variance of x zero When all observations on x have the same value there is no variation in x

bull The OLS estimation of the simple regression model has the following properties a) the sum of the residuals is zero b) the sample covariance of the residuals and x is zero

Why How would you relate them to the ldquoleast squaresrdquo principle These are derived from the first-order conditions for minimising the sum of squared residuals (SSR)

bull Convince yourself that the point )( yx the sample means of x and y is on the sample regression function (SRF) which is a straight line This can be derived from the first of the first-order conditions for minimising SSR

bull How do you know that SST = SSE + SSR is true The dependent variable values can be expressed as the fitted value plus the residual After taking the average away from both side of the equation we find (119910119894 minus 119910) = (119910119894 minus 119910) + 119906119894 The required ldquoSST = SSE + SSRrdquo is obtained by squaring and summing both sides of the above equation noting (119910119894 minus 119910) = 1(119909119894 minus ) and using the second of the first-order conditions of minimising SSR

bull Which of the following models is (are) nonlinear model(s) a) sales = β0 [1 + exp(-β1 ad_expenditure)] + u b) sales = β0 + β1 log(ad_expenditure) + u c) sales = β0 + β1 exp(ad_expenditure) + u d) sales = exp(β0 + β1 ad_expenditure + u)

The model in a) is nonlinear bull Can you follow the proofs of Theorems 21-23

You should be able to follow the proofs If not at this point try again

Problem Set Q1 Wooldridge 24 bull (i) When cigs = 0 predicted birth weight is 11977 ounces When cigs = 20 992256bwght =

10949 This is about an 86 drop bull (ii) Not necessarily There are many other factors that can affect birth weight

particularly overall health of the mother and quality of prenatal care These could be correlated with cigarette smoking during birth Also something such as caffeine consumption can affect birth weight and might also be correlated with cigarette smoking

bull (iii) If we want a predicted bwght of 125 then cigs = (125 ndash 11977)( ndash524) ndash1018 or about ndash10 cigarettes This is nonsense of course and it shows what happens when we are trying to predict something as complicated as birth weight with only a single explanatory variable The largest predicted birth weight is necessarily 11977 Yet almost 700 of the births in the sample had a birth weight higher than 11977

bull (iv) 1176 out of 1388 women did not smoke while pregnant or about 847 Because we are using only cigs to explain birth weight we have only one predicted birth weight

6

at cigs = 0 The predicted birth weight is necessarily roughly in the middle of the observed birth weights at cigs = 0 and so we will under predict high birth rates

Q2 Wooldridge 27 bull (i) When we condition on inc in computing an expectation inc becomes a constant So

E(u|inc) = E(radic119894119899119888sdote|inc) = radic119894119899119888sdotE(e|inc) = radic119894119899119888sdot0 because E(e|inc) = E(e) = 0 bull (ii) Again when we condition on inc in computing a variance inc becomes a constant So

Var(u|inc) = Var(radic119894119899119888sdote|inc) = (radic119894119899119888)2Var(e|inc) = 1205901205762inc because Var(e|inc) = 1205901205762 bull (iii) Families with low incomes do not have much discretion about spending typically a

low-income family must spend on food clothing housing and other necessities Higher income people have more discretion and some might choose more consumption while others more saving This discretion suggests wider variability in saving among higher income families

Q3 Wooldridge C26 (meap93_ch02do) bull (i) It seems plausible that another dollar of spending has a larger effect for low-spending

schools than for high-spending schools At low-spending schools more money can go toward purchasing more books computers and for hiring better qualified teachers At high levels of spending we would expect little if any effect because the high-spending schools already have high-quality teachers nice facilities plenty of books and so on

bull (ii) If we take changes as usual we obtain ∆119898119886119905ℎ10 = 1205731∆ log(119890119909119901119890119899119889) asymp 1205731

100 (∆119890119909119901119890119899119889)

just as in the second row of Table 23 So if Δexpend = 10 Δmath10 = β110 bull (iii) The regression results are

119898119886119905ℎ10 = minus6934 + 1116 log(119890119909119901119890119899119889) 119899 = 408 1198772 = 0297 bull (iv) If expend increases by 10 percent 119898119886119905ℎ10 increases by about 11 percentage points

This is not a huge effect but it is not trivial for low-spending schools where a 10 percent increase in spending might be a fairly small dollar amount

bull (v) In this data set the largest value of math10 is 667 which is not especially close to 100 In fact the largest fitted values is only about 302

Q4 Wooldridge C27 (charity_ch02do) bull (i) The average gift is about 744 Dutch guilders Out of 4268 respondents 2561 did not

give a gift or about 60 percent bull (ii) The average mailings per year are about 205 The minimum value is 25 (which

maximum value is 35 bull (iii) The estimated equation is

119892120484119891119905 = 201 + 265 mailsyear n = 4268 1198772 = 0138 bull (iv) The slope coefficient from part (iii) means that each mailing per year is associated with

ndash perhaps even ldquocausesrdquo ndash an estimated 265 additional guilders on average Therefore if each mailing costs one guilder the expected profit from each mailing is estimated to be 165 guilders This is only the average however Some mailings generate no contributions or a contribution less than the mailing cost other mailings generated much more than the mailing cost

bull (v) Because the smallest mailsyear in the sample is 25 the smallest predicted value of gifts is 201 + 265(25) asymp 267 Even if we look at the overall population where some people have received no mailings the smallest predicted value is about two So with this estimated equation we never predict zero charitable gifts

7

Week 4 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What do we mean when we say ldquoregress wage on educ and exprrdquo

Use OLS to estimate the multiple linear regression model wage = β0 + β1educ + β2expr + u bull Why and under what circumstance do we need to ldquocontrol forrdquo expr in the regression model

in order to quantify the effect of educ on wage We need to control for expr in the model when the partial or ceteris paribus effect of educ on wage is the parameter of interest and expr and educ are correlated

bull What is the bias of an estimator bias =E( 119895) ndash βj

bull What is the ldquoomitted variable biasrdquo It is the bias resulted from omitting relevant variables that are correlated with the explanatory variables in a regression model

bull What is the consequence of adding an irrelevant variable to a regression model Including an irrelevant variable will generally increase the variance of the OLS estimators

bull What is the requirement of the ZCM assumption in your own words You must do this by yourself

bull Why assuming E(u) = 0 is not restrictive when an intercept is included in the regression model When the mean value of the disturbance is not zero you can always define the new disturbance as the original disturbance minus the mean value and define the new intercept as the original intercept plus the mean value

bull In terms of notation why do we need two subscripts for independent variables In our notation the first subscript is for indexing observations (1st obs 2nd obs etc) and the second subscript is for indexing variables (1st variable 2nd variable etc)

bull Using OLS to estimate a multiple regression model with k independent variables and an intercept how many first-order conditions are there k + 1

bull How do you know that the OLS estimators are linear combinations of the observations on the dependent variable By the first-order conditions we see that the OLS estimators are the solution to k+1 linear equations with k+1 unknowns The ldquoright-hand-sidesrdquo of the equations are linear functions of observations on the dependent variable (say y) Hence the solution is linear combinations of the y-observations

bull What is Gauss-Markov theorem Read the textbook bull Try to explain why R2 never decreases (it is likely to increase) when additional explanatory

variables are added to the regression model First the smaller is the SSR the greater is the R-squared The OLS chooses parameter estimates to minimise the SSR With extra explanatory variables the R-squared does not decrease because the SSR does not increase which is true because the OLS has a larger parameter space to choose estimates (more possibilities to make SSR small)

bull What is an endogenous explanatory variable Exogenous explanatory variable See the top of page 88

bull What is multicollinearity and its likely effect on the OLS estimators Read pages 95-99

8

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 32 bull (i) Yes Because of budget constraints it makes sense that the more siblings there are in a

family the less education any one child in the family has To find the increase in the number of siblings that reduces predicted education by one year we solve 1 = 094(Δsibs) so Δsibs = 1094 asymp 106

bull (ii) Holding sibs and feduc fixed one more year of motherrsquos education implies 131 years more of predicted education So if a mother has four more years of education her son is predicted to have about a half a year (524) more years of education

bull (iii) Since the number of siblings is the same but meduc and feduc are both different the coefficients on meduc and feduc both need to be accounted for The predicted difference in education between B and A is 131(4) + 210(4) = 1364

Q2 Wooldridge 310 bull (i) Because x1 is highly correlated with x2 and x3 and these latter variables have large partial

effects on y the simple and multiple regression coefficients on x1 can differ by large amounts We have not done this case explicitly but given equation (346) and the discussion with a single omitted variable the intuition is pretty straightforward

bull (ii) Here we would expect 1205731 and 1 to be similar (subject of course to what we mean by ldquoalmost uncorrelatedrdquo) The amount of correlation between x2 and x3 does not directly effect the multiple regression estimate on x1 if is essentially uncorrelated with x2 and x3

bull (iii) In this case we are (unnecessarily) introducing multicollinearity into the regression x2 and x3 have small partial effects on y and yet x2 and x3 are highly correlated withx1 Adding x2 and x3 likely increases the standard error of the coefficient on x1 substantially so se(1) is likely to be much larger than se(1205731)

bull (iv) In this case adding x2 and x3 will decrease the residual variance without causing much collinearity (because x1 is almost uncorrelated with x2 and x3 ) so we should see se(1) smaller than se(1205731) The amount of correlation between x and does not directly affect se(1)

Q3 Wooldridge C32 (hprice1_ch03do) bull (i) The estimated equation is

119901119903120484119888119890 = minus1932 + 128 119904119902119903119891119905 + 1520 119887119889119903119898119904 119899 = 88 1198772 = 632 bull (ii) Holding square footage constant (the price increase) = 1520 (increase in bedroom)

Hence the price increases by 1520 which means $15200 bull (iii) Now (the price increase) = 128 (increase in square feet) + 1520 (increase in bedroom)

= 128 (140) + 1520 (1) = 3312 That is $33120 Because the house size is increasing the effect is larger than that in (ii)

bull (iv) About 632 bull (v) The predicted price is -1932+128(2438)+1520(4) = 353544 or $353544 bull (vi) The residual = 300000 ndash 353544 = - 53544 The buyer under-paid judged by the

predicted price But of course there are many other features of a house (some that we cannot even measure) that affect price We have not controlled for those

Q4 Wooldridge C36 (wage2_ch03_c3_6do) bull (i) The slope coefficient from regressing IQ on educ is 353383 bull (ii) The slope coefficient from log(wage) on educ is 05984 bull (iii) The slope coefficient from log(wage) on educ and IQ are 03912 and 00586 respectively bull (iv) The computed 211 βδβ ˆ~ˆ + = 03912 + 353383(00586) asymp 05984

9

Week 5 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the CLM assumptions

These are MLR1 MLR2 MLR3 and MLR6 MLR6 is a very strong assumption that implies both MLR4 and MLR5

bull What is the sampling distribution of the OLS estimators under the CLM assumptions The OLS estimators under the CLM assumptions follow the normal distribution

bull What are the standard errors of the OLS estimators The standard error is the square root of the estimated variance of the estimator

bull What is the null hypothesis about a parameter In this context the null hypothesis is a statement that assigns a known value (or hypothesised value) to the parameter of interest Usually the null is a maintained proposition (from economic theories or experience) that will only be rejected on very strong evidence

bull What is a one-tailed (two-tailed) alternative hypothesis This should be clear once you finish your reading

bull In testing hypotheses what is a Type 1 (Type 2) error What is the level of significance Type 1 error is the rejection of a true null Type 2 error is the non-rejection of a false null

bull The decision rule we use can be stated as ldquoreject the null if the t-statistic exceeds the critical valuerdquo How is the critical value determined The critical value is determined by the level of significance (or level of confidence in the case of confidence intervals) and the distribution of the test statistic For example for t-stat we use the t-distribution when the sample size is small and the normal distribution when the sample size is large (df gt 120)

bull Justify the statement ldquoGiven the observed test statistic the p-value is the smallest significant level at which the null hypothesis would be rejectedrdquo If you used the observed t-stat as your critical value you would just reject the null (because the t-stat is equal to this value) This ldquocritical valuerdquo (= the t-stat value) implies a level of significance say p which is exactly the p-value For any significance level smaller than p the corresponding critical value is more extreme than the t-stat value and does not lead to the rejection of the null Hence p is the smallest significance level at which the null would be rejected

bull What is the 90 confidence interval for a parameter It is an interval [L U] defined by two sample statistics the lower bound L and the upper bound U The probability that [L U] covers the parameter is 09

bull In constructing a confidence interval for a parameter what is the level of confidence It is the probability that the CI covers the parameter

bull When the level of confidence increases how would the width of the confidence interval change (holding other things fixed) The width will increase

bull Try to convince yourself that the event ldquothe 90 confidence interval covers a hypothesised value of the parameterrdquo is the same as the event ldquothe null of the parameter being the hypothesised value cannot be rejected in favour of the two-tailed alternative at the 10 level of significancerdquo It becomes apparent if you start with P(|t-stat| gt c) = 09 and disentangle the expression for the t-stat

10

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 41 bull (i) and (iii) generally cause the t statistics not to have a t distribution under H0

Homoskedasticity is one of the CLM assumptions An important omitted variable violates Assumption MLR4 (hence MLR6) in general The CLM assumptions contain no mention of the sample correlations among independent variables except to rule out the case where the correlation is one (MLR3)

Q2 Wooldridge 42 bull (i) H0β3 = 0 H1β3 gt 0 bull (ii) The proportionate effect on is 00024(50) = 012 To obtain the percentage effect we

multiply this by 100 12 Therefore a 50 point ceteris paribus increase in ros is predicted to increase salary by only 12 Practically speaking this is a very small effect for such a large change in ros

bull (iii) The 10 critical value for a one-tailed test using df = infin is obtained from Table G2 as 1282 The t statistic on ros is 0002400054 asymp 44 which is well below the critical value Therefore we fail to reject H0 at the 10 significance level

bull (iv) Based on this sample the estimated ros coefficient appears to be different from zero only because of sampling variation On the other hand including ros may not be causing any harm it depends on how correlated it is with the other independent variables (although these are very significant even with ros in the equation)

Q3 Wooldridge 45 bull (i) 412 plusmn 196(094) or about [228 596] bull (ii) No because the value 4 is well inside the 95 CI bull (iii) Yes because 1 is well outside the 95 CI

Q4 Wooldridge C48 (401ksubs_ch04do) bull (i) There are 2017 single people in the sample of 9275 bull (ii) The estimated equation is

119899119890119905119905119891119886 = minus4304 + 799 inc + 843 age ( 408) (060) (092) n = 2017 R2 = 119 The coefficient on inc indicates that one more dollar in income (holding age fixed) is reflected in about 80 more cents in predicted nettfa no surprise there The coefficient on age means that holding income fixed if a person gets another year older hisher nettfa is predicted to increase by about $843 (Remember nettfa is in thousands of dollars) Again this is not surprising

bull (iii) The intercept is not very interesting as it gives the predicted nettfa for inc = 0 and age = 0 Clearly there is no one with even close to these values in the relevant population

bull (iv) The t statistic is (843 minus 1)092 asymp minus171 Against the one-sided alternative H1 β2 lt 1 the p-value is about 044 Therefore we can reject H0 β2 = 1 at the 5 significance level (against the one-sided alternative)

bull (v) The slope coefficient on inc in the simple regression is about 821 which is not very different from the 799 obtained in part (ii) As it turns out the correlation between inc and age in the sample of single people is only about 039 which helps explain why the simple and multiple regression estimates are not very different see the text also

11

Week 6 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull How would you test hypotheses about a single linear combination of parameters

We may re-parameterise the regression model to isolate the ldquosingle linear combinationrdquo Then the OLS on the re-parameterised model will provide the estimate of the ldquosingle linear combinationrdquo and the related standard error See the example in the textbook and lecture slides

bull What are exclusion restrictions for a regression model Exclusion restrictions are the null hypothesis that a group of x-variables have zero coefficients in the regression model

bull What are restricted and unrestricted models When the null hypothesis is a set of restrictions on the parameters the regression under the null is called the restricted model while the regression under the alternative (which simply states that the null is false) is called the unrestricted model

bull How do you compute the F-statistic given that you have SSRs Clearly the general F-stat is based on the relative difference between the SSRs under the restricted model and unrestricted model F = [(SSRr ndash SSRur)q][SSRur(n-k-1)] where you should be able to explain the meanings of various symbols

bull What are general linear restrictions on parameters These are linear equations on the parameters For example β1 ndash 2β2 + 3β3 = 0

bull What is the test for the overall significance of a regression This is the F-test for the null hypothesis that all the coefficients of x-variables are zero

bull How would you report your regression results See the guidelines in Section 46 of the textbook

bull Why would you care about the asymptotic properties of the OLS estimators When MLR6 does not hold the finite-sample distribution of the OLS estimators is not available For reasonably large samples (large n) we use the asymptotic distribution of the OLS estimators which is known from studying the asymptotic properties to approximate the finite-sample distribution of the OLS estimators which is unknown

bull Comparing the inference procedures in Chapter 5 with those in Chapter 4 can you list the similarities and differences It is beneficial to make a list by yourself

bull Under MLR1-MLR5 the OLS estimators are consistent asymptotically normal and asymptotically efficient Try to explain these properties in your own words This is really a test on your understanding of these notions in econometrics

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 46 bull (i) With df = n ndash 2 = 86 we obtain the 5 critical value from Table G2 with df = 90 Because

each test is two-tailed the critical value is 1987 The t statistic for H0β0 = 0 is about -89 which is much less than 1987 in absolute value Therefore we fail to reject H0β0 = 0 The t statistic for H0β1 = 1 is (976 ndash 1)049 asymp -49 which is even less significant (Remember we reject H0 in favor of H1 in this case only if |t| gt 1987)

bull (ii) We use the SSR form of the F statistic We are testing q = 2 restrictions and the df in the unrestricted model is 86 We are given SSRr = 20944899 and SSRur = 16564451 Therefore F = [(20944899 minus 16564451)2][1656445186] asymp 1137

12

which is a strong rejection of H0 from Table G3c the 1 critical value with 2 and 90 df is 485

bull (iii) We use the R-squared form of the F statistic We are testing q = 3 restrictions and there are 88 ndash 5 = 83 df in the unrestricted model The F statistic is F = [(829 ndash 820)3][(1 ndash 829)83] asymp 146 The 10 critical value (again using 90 denominator df in Table G3a) is 215 so we fail to reject H0 at even the 10 level In fact the p-value is about 23

bull (iv) If heteroskedasticity were present Assumption MLR5 would be violated and the F statistic would not have an F distribution under the null hypothesis Therefore comparing the F statistic against the usual critical values or obtaining the p-value from the F distribution would not be especially meaningful

Q2 Wooldridge 48 bull (i) We use Property VAR3 from Appendix B

Var(1minus 32) = Var (1) + 9 Var (2) ndash 6 Cov (1 2) bull (ii) t = (1 minus 32 minus 1)se(1minus 32) so we need the standard error of 1minus 32 bull (iii) Because θ = β1 ndash 3β2 we can write β1 = θ + 3β2 Plugging this into the population model

gives y = β0 + (θ + 3β2)x1 + β2 x2 + β3 x3 + u = β0 + θ x1 + β2 (3 x1 + x2) + β3 x3 + u This last equation is what we would estimate by regressing y on x1 3x1 + x2 and x3 The coefficient and standard error on x1 are what we want

Q3 Wooldridge 410 bull (i) We need to compute the F statistic for the overall significance of the regression with n =

142 and k = 4 F = [0395(1 ndash 0395)](1374) asymp 141 The 5 critical value with 4 and 120 df is 245 which is well above the value of F Therefore we fail to reject H0 β1 = β2 = β3 = β4 = 0 at the 10 level No explanatory variable is individually significant at the 5 level The largest absolute t statistic is on dkr tdkr asymp 160 which is not significant at the 5 level against a two-sided alternative

bull (ii) The F statistic (with the same df) is now [0330(1 ndash 0330)](1374) asymp 117 which is even lower than in part (i) None of the t statistics is significant at a reasonable level

bull (iii) We probably should not use the logs as the logarithm is not defined for firms that have zero for dkr or eps Therefore we would lose some firms in the regression

bull (iv) It seems very weak There are no significant t statistics at the 5 level (against a two-sided alternative) and the F statistics are insignificant in both cases Plus less than 4 of the variation in return is explained by the independent variables

Q4 Wooldridge C49 (discrim_c4_9do) bull (i) The results from the OLS regression with standard errors in parentheses are

log (119901119904119900119889119886) = minus146 + 073 prpblck + 137 log(income) + 380 prppov (029) (031) (027) (133) n = 401 R2 = 087

bull The p-value for testing H0 β1= 0 against the two-sided alternative is about 018 so that we reject H0 at the 5 level but not at the 1 level

bull (ii) The correlation is about minus84 indicating a strong degree of multicollinearity Yet each coefficient is very statistically significant the t statistic for log (119894119899119888119900119898) is about 51 and that for prppov is about 286 (two-sided p-value = 004)

bull (iii) The OLS regression results when log(hseval) is added are log (119901119904119900119889119886) = minus84 + 098 prpblck minus 053 log(income) + 052 prppov + 121 log(hseval) (29) (029) (038) (134) (018)

13

n = 401 R2 = 184 The coefficient on log(hseval) is an elasticity a one percent increase in housing value holding the other variables fixed increases the predicted price by about 12 percent The two-sided p-value is zero to three decimal places

bull (iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the 15 significance level against a two-sided alternative for log(income) and prppov is does not have a t statistic even close to one in absolute value) Nevertheless they are jointly significant at the 5 level because the outcome of the F2396 statistic is about 352 with p-value = 030 All of the control variables ndash log(income) prppov and log(hseval) ndash are highly correlated so it is not surprising that some are individually insignificant

bull (v) Because the regression in (iii) contains the most controls log(hseval) is individually significant and log(income) and prppov are jointly significant (iii) seems the most reliable It holds fixed three measure of income and affluence Therefore a reasonable estimate is that if prpblck increases by 10 psoda is estimated to increase by 1 other factors held fixed

Q5 Wooldridge 52 bull This is about the inconsistency of the simple regression of pctstck on funds A higher

tolerance of risk means more willingness to invest in the stock market so β2 gt 0 By assumption funds and risktol are positively correlated Now we use equation (55) where δ1 gt 0 plim(1205731) = β1 + β2δ1 gt β1 so 1205731 has a positive inconsistency (asymptotic bias) This makes sense if we omit risktol from the regression and it is positively correlated with funds some of the estimated effect of funds is actually due to the effect of risktol

Q6 Wooldridge C51 (wage1_c5_1do) bull (i) The estimated equation is

119908119886119892119890 = minus287 + 599 educ + 022 exper + 169 tenure (073) (051) (012) (022) n = 526 R2 = 306 120590 = 3085

bull Below is a histogram of the 526 residual 119906119894 i = 1 2 526 The histogram uses 27 bins which is suggested by the formula in the Stata manual for 526 observations For comparison the normal distribution that provides the best fit to the histogram is also plotted

bull

14

bull (ii) With log(wage) as the dependent variable the estimated equation is log (119908119886119892119890) = 284 + 092 educ + 0041 exper + 022 tenure (104) (007) (0017) (003) n = 526 R2 = 316 120590 = 441 The histogram for the residuals from this equation with the best-fitting normal distribution overlaid is given below

bull (iii) The residuals from the log(wage) regression appear to be more normally distributed

Certainly the histogram in part (ii) fits under its comparable normal density better than in part (i) and the histogram for the wage residuals is notably skewed to the left In the wage regression there are some very large residuals (roughly equal to 15) that lie almost five estimated standard deviations (120590 = 3085) from the mean of the residuals which is identically zero of course Residuals far from zero does not appear to be nearly as much of a problem in the log(wage) regression

bull The skewness and kurtosis of the residuals from the OLS regression may be used to test the null that u is normally distributed (skewness = 0 and kurtosis = 3 under the null) The null distribution of the Jarque-Bera test statistic (which takes into account both skewness and kurtosis) is approximately the chi-squared with 2 degrees of freedom Hence at the 5 level we reject the normality of u whenever the JB statistic is greater than 599 In STATA this test is carried out by the command ldquosktestrdquo With the data in WAGE1RAW JB = sdot (extremely large) for the wage model and JB = 1059 for the log(wage) model While the normality is rejected for both models we do see that the JB statistic is much smaller for the log(wage) model [also compare its p-values of skewness and excess kurtosis tests against those from the wage model] Hence the normality appears to be a reasonable approximation to the distribution of the error term in the log(wage) model

15

Week 7 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the advantages of using the log of a variable in regression Find the ldquorules of

thumbrdquo for taking logs See page 191 of Wooldridge (p198-199 for the 3rd edition)

bull Be careful when you interpret the coefficients of explanatory variables in a model where some variables are in logarithm Do you remember Table 23 Consult Table 23

bull How do you compute the change in y caused by Δx when the model is built for log(y) A precise computation is given by Equation (68) of Wooldridge

bull Why do we need the ldquointeractionrdquo terms in regression models An interaction term is needed if the partial effect of an explanatory variable is linearly related to another explanatory variable See (617) for example

bull What is the adjusted R-squared What is the difference between it and the R-squared The primary attractiveness of the adjusted R-squared is that it imposes penalty for adding additional explanatory variables to a model The R-squared can never fall when a new explanatory variable is added However the adjusted R-squared will fall if the t-ratio on the new variable is less than one in absolute value

bull How do you construct interval prediction for given x-values This is nicely summarised in Equations (627)-(631) and the surrounding text

bull How do you predict y for given x-values when the model is built for log(y) Check the list on page 212 (p220 for the 3rd edition)

bull What is involved in ldquoresidual analysisrdquo See pages 209-210 (p217-218 for the 3rd edition)

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 64 bull (i) Holding all other factors fixed we have

Δlog(wage) = β1 Δeduc + β2 Δeducsdot pareduc = (β1 + β2 pareduc) Δeduc Dividing both sides by Δeduc gives the result The sign of β2 is not obvious although β2 gt 0 if we think a child gets more out of another year of education the more highly educated are the childrsquos parents

bull (ii) We use the values pareduc = 32 and pareduc = 24 to interpret the coefficient on educsdotpareduc The difference in the estimated return to education is 00078(32 ndash 24) = 0062 or about 62 percentage points (Percentage points are changes in percentages)

bull (iii) When we add pareduc by itself the coefficient on the interaction term is negative The t statistic on educsdotpareduc is about ndash133 which is not significant at the 10 level against a two-sided alternative Note that the coefficient on pareduc is significant at the 5 level against a two-sided alternative This provides a good example of how omitting a level effect (pareduc in this case) can lead to biased estimation of the interaction effect

Q2 Wooldridge 65 bull This would make little sense Performances on math and science exams are measures of

outputs of the educational process and we would like to know how various educational inputs and school characteristics affect math and science scores For example if the staff-to-pupil ratio has an effect on both exam scores why would we want to hold

16

performance on the science test fixed while studying the effects of staff on the math pass rate This would be an example of controlling for too many factors in a regression equation The variable scill could be a dependent variable in an identical regression equation

Q3 Wooldridge C63 (C62 for the 3rd Edition) (wage2_c6_3do) bull (i) Holding exper (and the elements in u) fixed we have

Δ log(wage) = β1Δeduc + β2Δeducsdotexper = (β1 + β2exper)Δeduc or Δ log(wage) Δeduc = (β1 + β2exper) This is the approximate proportionate change in wage given one more year of education

bull (ii) H0 β3 = 0 If we think that education and experience interact positively ndash so that people with more experience are more productive when given another year of education then β3 gt 0 is the appropriate alternative

bull (iii) The estimated equation is log (119908119886119892119890) = 595 + 0440 educ ndash 0215 exper + 00320 educsdotexper (024) (0174) (0200) (00153) n = 935 R2 = 135 adjusted-R2 = 132 The t statistic on the interaction term is about 213 which gives a p-value below 02 against H1 β3 gt 0 Therefore we reject H0 β3 = 0 in favor of H1 β3 gt 0 at the 2 level

bull (iv) We rewrite the equation as log(wage) = β0 + θ1educ + β2exper + β3 educ (exper minus 10) + u and run the regression log(wage) on educ exper and educ(exper ndash 10) We want the coefficient on educ We obtain θ1 asymp 0761 and se(θ1) asymp 0066 The 95 CI for θ1 is about 063 to 089

Q4 Wooldridge C68 (hprice1_c6_8do) bull (i) The estimated equation (where price is in dollars) is

119901119903120484119888119890 = minus217703 + 2068 lotsize + 12278 sqrft + 138525 bdrms (294750) (0642) (1324) (90101) n = 88 R2 = 672 adjusted-R2 = 661 120590 = 59833 The predicted price at lotsize = 10000 sqrft = 2300 and bdrms = 4 is about $336714

bull (ii) The regression is pricei on (lotsizei ndash 10000) (sqrfti ndash 2300) and (bdrmsi ndash 4) We want the intercept estimate and the associated 95 CI from this regression The CI is approximately 3367067 plusmn 14665 or about $322042 to $351372 when rounded to the nearest dollar

bull (iii) We must use equation (636) to obtain the standard error of and then use equation (637) (assuming that price is normally distributed) But from the regression in part (ii) se(1199100) asymp 73745 and 120590 asymp 59833 Therefore se(0) asymp [(73745)2 + (59833) 2]12 asymp 602858 Using 199 as the approximate 975th percentile in the t84 distribution gives the 95 CI for price0 at the given values of the explanatory variables as 3367067 plusmn 199(602858) or rounded to the nearest dollar $216738 to $456675 This is a fairly wide prediction interval But we have not used many factors to explain housing price If we had more we could presumably reduce the error standard deviation and therefore 120590 to obtain a tighter prediction interval

17

Week 8 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is a qualitative factor Give some examples

This should be easy bull How do you use dummy variables to represent qualitative factors

A factor with two levels (or values) is easily represented one dummy variable For a factor with more than two levels (or values) we need more than one dummies For example a factors with 5 values requires 4 dummy variables

bull How do you use dummy variables to represent an ordinal variable An ordinal variable may be represented by dummies in the same way as a multi-levelled factor

bull How do you test for differences in regression functions across different groups We generally use the F test on properly-defined restricted model and unrestricted model Chow test is a special case of the F test where the null hypothesis is no difference across groups

bull What can you achieve by interacting group dummy variables with other regressors The interaction allows the slope coefficients to be different across groups

bull What is ldquoprogram evaluationrdquo It assesses the effectiveness of a program by checking the differences in regression coefficients for the controlled (or base) group and the treatment group

bull What are the interpretations of the PRF and SRF when the dependent variable is binary The PRF is the conditional probability of ldquosuccessrdquo for given explanatory variables The SRF is the predicted conditional probability of ldquosuccessrdquo for given explanatory variables

bull What are the shortcomings of the LPM The major drawback of the LPM is that the predicted probability of success may be outside the interval [01]

bull Does MLR5 hold for the LPM No as indicated in the textbook and lecture the conditional variance of the disturbance is not a constant and depends on explanatory variables var(u|x) = E(y|x)[1minus E(y|x)]

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 74 bull (i) The approximate difference is just the coefficient on utility times 100 or ndash283 The t

statistic is minus283099 asymp minus286 which is statistically significant bull (ii) 100sdot[exp(minus283) ndash 1) asymp minus247 and so the estimate is somewhat smaller in magnitude bull (iii) The proportionate difference is 181 minus 158 = 023 or about 23 One equation that can

be estimated to obtain the standard error of this difference is log(salary) = β0 + β1log(sales) + β2roe + δ1consprod + δ2utility +δ3trans + u where trans is a dummy variable for the transportation industry Now the base group is finance and so the coefficient δ1 directly measures the difference between the consumer products and finance industries and we can use the t statistic on consprod

Q2 Wooldridge 76 bull In Section 33 ndash in particular in the discussion surrounding Table 32 ndash we discussed how to

determine the direction of bias in the OLS estimators when an important variable (ability in this case) has been omitted from the regression As we discussed there Table 32 only

18

strictly holds with a single explanatory variable included in the regression but we often ignore the presence of other independent variables and use this table as a rough guide (Or we can use the results of Problem 310 for a more precise analysis) If less able workers are more likely to receive training then train and u are negatively correlated If we ignore the presence of educ and exper or at least assume that train and u are negatively correlated after netting out educ and exper then we can use Table 32 the OLS estimator of β1 (with ability in the error term) has a downward bias Because we think β1 ge 0 we are less likely to conclude that the training program was effective Intuitively this makes sense if those chosen for training had not received training they would have lowers wages on average than the control group

Q3 Wooldridge 79 bull (i) Plugging in u = 0 and d = 1 gives f1(z) = (β0+δ0) + (β1+δ1)z bull (ii) Setting f0(z) = f1(z) gives β0 + β1z = (β0+δ0) + (β1+δ1)z Therefore provided δ1 ne 0

we have z = minusδ0δ1 Clearly z is positive if and only if δ0δ1 is negative which means δ0 and δ1 must have opposite signs

bull (iii) Using part (ii) we have totcoll = 357030 = 119 years (totcoll= total college years) bull (iv) The estimated years of college where women catch up to men is much too high to be

practically relevant While the estimated coefficient on femalesdottotcoll shows that the gap is reduced at higher levels of college it is never closed ndash not even close In fact at four years of college the difference in predicted log wage is still minus357+030(4) = 237 or about 211 less for women

Q4 Wooldridge C713 (apple_c7_13do) bull (i) 412660 asymp 624 bull (ii) The OLS estimates of the LPM are

119890119888119900119899119887119906119910 = 424 minus 803 ecoprc + 719 regprc + 00055 faminc + 024 hhsize (165) (109) (132) (00053) (013) + 025 educ minus 00050 age (008) (00125) n = 660 R2 = 110 If ecoprc increases by say 10 cents (10) then the probability of buying eco-labeled apples falls by about 080 If regprc increases by 10 cents the probability of buying eco-labeled apples increases by about 072 (Of course we are assuming that the probabilities are not close to the boundaries of zero and one respectively)

bull (iii) The F test with 4 and 653 df is 443 with p-value = 0015 Thus based on the usual F test the four non-price variables are jointly very significant Of the four variables educ appears to have the most important effect For example a difference of four years of education implies an increase of 025(4) = 10 in the estimated probability of buying eco-labeled apples This suggests that more highly educated people are more open to buying produce that is environmentally friendly which is perhaps expected Household size (hhsize) also has an effect Comparing a couple with two children to one that has no children ndash other factors equal ndash the couple with two children has a 048 higher probability of buying eco-labeled apples

bull (iv) The model with log(faminc) fits the data slightly better the R-squared increases to about 112 (We would not expect a large increase in R-squared from a simple change in the functional form) The coefficient on log(faminc) is about 045 (t = 155) If log(faminc) increases by 10 which means roughly a 10 increase in faminc then P(ecobuy = 1) is estimated to increase by about 0045 a pretty small effect

19

bull (v) The fitted probabilities range from about 185 to 1051 so none are negative There are two fitted probabilities above 1 which is not a source of concern with 660 observations

bull (vi) Using the standard prediction rule ndash predict one when 119890119888119900119899119887119906119910 ge 05 and zero otherwise ndash gives the fraction correctly predicted for ecobuy = 0 as 102248 asymp 411 so about 411 For ecobuy = 1 the fraction correctly predicted is 340412 asymp 825 or 825 With the usual prediction rule the model does a much better job predicting the decision to buy eco-labeled apples (The overall percent correctly predicted is about 67)

20

Week 9 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is heteroskedasticity in a regression model

When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i) we say that heteroskedasticity is present

bull In the presence of heteroskedasticity are the t-stat and F-stat from the usual OLS still valid Why Are there any other problems with the OLS under heteroskedasticity The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators

bull What are the heteroskedasticity-robust standard errors How do you use them in STATA These are the corrected standard errors that take into account the possible presence of heteroskedasticity The t-stat and F-stat computed using the heteroskedasticity-robust standard errors are valid test statistics In STATA these are easily obtained by using the option ldquorobustrdquo with the ldquoregressrdquo command eg regress lwage educ robust

bull How do you detect if there is heteroskedasticity The Breusch-Pagan test or the White test can be used to detect the presence of heteroskedasticity See Section 83 for details

bull If heteroskedasticity is present in a known form how would you estimate the model In this case the WLS estimators should be used which is more efficient than the OLS estimators See Section 84 for details

bull If heteroskedasticity is present in an unknown form how would you estimate the model If there is strong evidence for heteroskedasticity the FGLS estimators should be used which is based on an exponential functional form See Section 84 for details

bull What are the steps in the FGLS estimation You should summarise from Section 84

bull How would you handle the heteroskedasticity of the LPM The heteroskedasticity functional form is known for the LPM Hence the WLS can be used in principle However because the LPM can produce predicted probabilities that are outside the interval (01) the known functional form p(x)[1-p(x)] may not be useful for the WLS In that case it may be necessary to use FGLS for the LPM

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 81 bull Parts (ii) and (iii) The homoskedasticity assumption played no role in Chapter 5 in showing

that OLS is consistent But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid even in large samples As heteroskedasticity is a violation of the Gauss-Markov assumptions OLS is no longer BLUE

Q2 Wooldridge 82 bull With Var(u|incpriceeducfemale) = σ2inc2 h(x) = inc2 where h(x) is the heteroskedasticity

function defined in equation (821) Therefore radich(x) = inc and so the transformed equation is obtained by dividing the original equation by inc beerinc = β0(1inc) + β1 + β2priceinc + β3educinc + β4femaleinc + uinc

21

Notice that β1 which is the slope on inc in the original model is now a constant in the transformed equation This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation

Q3 Wooldridge 83 bull False The unbiasedness of WLS and OLS hinges crucially on Assumption MLR4 and as we

know from Chapter 4 this assumption is often violated when an important variable is omitted When MLR4 does not hold both WLS and OLS are biased Without specific information on how the omitted variable is correlated with the included explanatory variables it is not possible to determine which estimator has a small bias It is possible that WLS would have more bias than OLS or less bias Because we cannot know we should not claim to use WLS in order to solve ldquobiasesrdquo associated with OLS

Q4 Wooldridge 85 bull (i) No For each coefficient the usual standard errors and the heteroskedasticity-robust

ones are practically very similar bull (ii) The effect is minus029(4) = minus116 so the probability of smoking falls by about 116 bull (iii) As usual we compute the turning point in the quadratic 020[2(00026)] asymp 3846 so

about 38 and one-half years bull (iv) Holding other factors in the equation fixed a person in a state with restaurant smoking

restrictions has a 101 lower chance of smoking This is similar to the effect of having four more years of education

bull (v) We just plug the values of the independent variables into the OLS regression line 119904119898119900119896119890119904 = 656 minus 069log(6744)+012log(6500) minus029(16)+020(77) minus00026(77) asymp 0052 Thus the estimated probability of smoking for this person is close to zero (In fact this person is not a smoker so the equation predicts well for this particular observation)

Q5 Wooldridge C810 (401ksubs_c8_10do) bull (i) In the following equation estimated by OLS the usual standard errors are in (sdot) and the

heteroskedasticity-robust standard errors are in [sdot] 119890401119896 = minus506 + 0124 inc minus 000062 inc2 + 0265 age minus 00031 age2 minus 0035 male (081) (0006) (000005) (0039) (00005) (0121) [079] [0006] [000005] [0038] [00004] [0121] n = 9275 R2 = 094 There are no important differences if anything the robust standard errors are smaller

bull (ii) This is a general claim Since Var(y|x) = p(x)[1-p(x)] we can write E(u2|x) = p(x)-p(x)2 Written in error form u2 = p(x) - p(x)2 + v In other words we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v with the restrictions δ0 = 0 δ1 = 1 and δ2 = -1 Remember that for the LPM the fitted values 119910 are estimates of p(x) So when we run the regression 1199062 on 119910 and 1199102 (including an intercept) the intercept estimates should be close to zero the coefficient on 119910 should be close to one and the coefficient on 1199102 should be close to ndash1

bull (iii) The White LM statistic and F statistic about 5819 and 31032 respectively both of which are very significant The coefficient on 119890401119896 is about 1010 the coefficient on 119890401119896 2 about minus970 and the intercept is about -009 These estimates are quite close to what we expect to find from the theory in part (ii)

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 6: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

6

at cigs = 0 The predicted birth weight is necessarily roughly in the middle of the observed birth weights at cigs = 0 and so we will under predict high birth rates

Q2 Wooldridge 27 bull (i) When we condition on inc in computing an expectation inc becomes a constant So

E(u|inc) = E(radic119894119899119888sdote|inc) = radic119894119899119888sdotE(e|inc) = radic119894119899119888sdot0 because E(e|inc) = E(e) = 0 bull (ii) Again when we condition on inc in computing a variance inc becomes a constant So

Var(u|inc) = Var(radic119894119899119888sdote|inc) = (radic119894119899119888)2Var(e|inc) = 1205901205762inc because Var(e|inc) = 1205901205762 bull (iii) Families with low incomes do not have much discretion about spending typically a

low-income family must spend on food clothing housing and other necessities Higher income people have more discretion and some might choose more consumption while others more saving This discretion suggests wider variability in saving among higher income families

Q3 Wooldridge C26 (meap93_ch02do) bull (i) It seems plausible that another dollar of spending has a larger effect for low-spending

schools than for high-spending schools At low-spending schools more money can go toward purchasing more books computers and for hiring better qualified teachers At high levels of spending we would expect little if any effect because the high-spending schools already have high-quality teachers nice facilities plenty of books and so on

bull (ii) If we take changes as usual we obtain ∆119898119886119905ℎ10 = 1205731∆ log(119890119909119901119890119899119889) asymp 1205731

100 (∆119890119909119901119890119899119889)

just as in the second row of Table 23 So if Δexpend = 10 Δmath10 = β110 bull (iii) The regression results are

119898119886119905ℎ10 = minus6934 + 1116 log(119890119909119901119890119899119889) 119899 = 408 1198772 = 0297 bull (iv) If expend increases by 10 percent 119898119886119905ℎ10 increases by about 11 percentage points

This is not a huge effect but it is not trivial for low-spending schools where a 10 percent increase in spending might be a fairly small dollar amount

bull (v) In this data set the largest value of math10 is 667 which is not especially close to 100 In fact the largest fitted values is only about 302

Q4 Wooldridge C27 (charity_ch02do) bull (i) The average gift is about 744 Dutch guilders Out of 4268 respondents 2561 did not

give a gift or about 60 percent bull (ii) The average mailings per year are about 205 The minimum value is 25 (which

maximum value is 35 bull (iii) The estimated equation is

119892120484119891119905 = 201 + 265 mailsyear n = 4268 1198772 = 0138 bull (iv) The slope coefficient from part (iii) means that each mailing per year is associated with

ndash perhaps even ldquocausesrdquo ndash an estimated 265 additional guilders on average Therefore if each mailing costs one guilder the expected profit from each mailing is estimated to be 165 guilders This is only the average however Some mailings generate no contributions or a contribution less than the mailing cost other mailings generated much more than the mailing cost

bull (v) Because the smallest mailsyear in the sample is 25 the smallest predicted value of gifts is 201 + 265(25) asymp 267 Even if we look at the overall population where some people have received no mailings the smallest predicted value is about two So with this estimated equation we never predict zero charitable gifts

7

Week 4 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What do we mean when we say ldquoregress wage on educ and exprrdquo

Use OLS to estimate the multiple linear regression model wage = β0 + β1educ + β2expr + u bull Why and under what circumstance do we need to ldquocontrol forrdquo expr in the regression model

in order to quantify the effect of educ on wage We need to control for expr in the model when the partial or ceteris paribus effect of educ on wage is the parameter of interest and expr and educ are correlated

bull What is the bias of an estimator bias =E( 119895) ndash βj

bull What is the ldquoomitted variable biasrdquo It is the bias resulted from omitting relevant variables that are correlated with the explanatory variables in a regression model

bull What is the consequence of adding an irrelevant variable to a regression model Including an irrelevant variable will generally increase the variance of the OLS estimators

bull What is the requirement of the ZCM assumption in your own words You must do this by yourself

bull Why assuming E(u) = 0 is not restrictive when an intercept is included in the regression model When the mean value of the disturbance is not zero you can always define the new disturbance as the original disturbance minus the mean value and define the new intercept as the original intercept plus the mean value

bull In terms of notation why do we need two subscripts for independent variables In our notation the first subscript is for indexing observations (1st obs 2nd obs etc) and the second subscript is for indexing variables (1st variable 2nd variable etc)

bull Using OLS to estimate a multiple regression model with k independent variables and an intercept how many first-order conditions are there k + 1

bull How do you know that the OLS estimators are linear combinations of the observations on the dependent variable By the first-order conditions we see that the OLS estimators are the solution to k+1 linear equations with k+1 unknowns The ldquoright-hand-sidesrdquo of the equations are linear functions of observations on the dependent variable (say y) Hence the solution is linear combinations of the y-observations

bull What is Gauss-Markov theorem Read the textbook bull Try to explain why R2 never decreases (it is likely to increase) when additional explanatory

variables are added to the regression model First the smaller is the SSR the greater is the R-squared The OLS chooses parameter estimates to minimise the SSR With extra explanatory variables the R-squared does not decrease because the SSR does not increase which is true because the OLS has a larger parameter space to choose estimates (more possibilities to make SSR small)

bull What is an endogenous explanatory variable Exogenous explanatory variable See the top of page 88

bull What is multicollinearity and its likely effect on the OLS estimators Read pages 95-99

8

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 32 bull (i) Yes Because of budget constraints it makes sense that the more siblings there are in a

family the less education any one child in the family has To find the increase in the number of siblings that reduces predicted education by one year we solve 1 = 094(Δsibs) so Δsibs = 1094 asymp 106

bull (ii) Holding sibs and feduc fixed one more year of motherrsquos education implies 131 years more of predicted education So if a mother has four more years of education her son is predicted to have about a half a year (524) more years of education

bull (iii) Since the number of siblings is the same but meduc and feduc are both different the coefficients on meduc and feduc both need to be accounted for The predicted difference in education between B and A is 131(4) + 210(4) = 1364

Q2 Wooldridge 310 bull (i) Because x1 is highly correlated with x2 and x3 and these latter variables have large partial

effects on y the simple and multiple regression coefficients on x1 can differ by large amounts We have not done this case explicitly but given equation (346) and the discussion with a single omitted variable the intuition is pretty straightforward

bull (ii) Here we would expect 1205731 and 1 to be similar (subject of course to what we mean by ldquoalmost uncorrelatedrdquo) The amount of correlation between x2 and x3 does not directly effect the multiple regression estimate on x1 if is essentially uncorrelated with x2 and x3

bull (iii) In this case we are (unnecessarily) introducing multicollinearity into the regression x2 and x3 have small partial effects on y and yet x2 and x3 are highly correlated withx1 Adding x2 and x3 likely increases the standard error of the coefficient on x1 substantially so se(1) is likely to be much larger than se(1205731)

bull (iv) In this case adding x2 and x3 will decrease the residual variance without causing much collinearity (because x1 is almost uncorrelated with x2 and x3 ) so we should see se(1) smaller than se(1205731) The amount of correlation between x and does not directly affect se(1)

Q3 Wooldridge C32 (hprice1_ch03do) bull (i) The estimated equation is

119901119903120484119888119890 = minus1932 + 128 119904119902119903119891119905 + 1520 119887119889119903119898119904 119899 = 88 1198772 = 632 bull (ii) Holding square footage constant (the price increase) = 1520 (increase in bedroom)

Hence the price increases by 1520 which means $15200 bull (iii) Now (the price increase) = 128 (increase in square feet) + 1520 (increase in bedroom)

= 128 (140) + 1520 (1) = 3312 That is $33120 Because the house size is increasing the effect is larger than that in (ii)

bull (iv) About 632 bull (v) The predicted price is -1932+128(2438)+1520(4) = 353544 or $353544 bull (vi) The residual = 300000 ndash 353544 = - 53544 The buyer under-paid judged by the

predicted price But of course there are many other features of a house (some that we cannot even measure) that affect price We have not controlled for those

Q4 Wooldridge C36 (wage2_ch03_c3_6do) bull (i) The slope coefficient from regressing IQ on educ is 353383 bull (ii) The slope coefficient from log(wage) on educ is 05984 bull (iii) The slope coefficient from log(wage) on educ and IQ are 03912 and 00586 respectively bull (iv) The computed 211 βδβ ˆ~ˆ + = 03912 + 353383(00586) asymp 05984

9

Week 5 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the CLM assumptions

These are MLR1 MLR2 MLR3 and MLR6 MLR6 is a very strong assumption that implies both MLR4 and MLR5

bull What is the sampling distribution of the OLS estimators under the CLM assumptions The OLS estimators under the CLM assumptions follow the normal distribution

bull What are the standard errors of the OLS estimators The standard error is the square root of the estimated variance of the estimator

bull What is the null hypothesis about a parameter In this context the null hypothesis is a statement that assigns a known value (or hypothesised value) to the parameter of interest Usually the null is a maintained proposition (from economic theories or experience) that will only be rejected on very strong evidence

bull What is a one-tailed (two-tailed) alternative hypothesis This should be clear once you finish your reading

bull In testing hypotheses what is a Type 1 (Type 2) error What is the level of significance Type 1 error is the rejection of a true null Type 2 error is the non-rejection of a false null

bull The decision rule we use can be stated as ldquoreject the null if the t-statistic exceeds the critical valuerdquo How is the critical value determined The critical value is determined by the level of significance (or level of confidence in the case of confidence intervals) and the distribution of the test statistic For example for t-stat we use the t-distribution when the sample size is small and the normal distribution when the sample size is large (df gt 120)

bull Justify the statement ldquoGiven the observed test statistic the p-value is the smallest significant level at which the null hypothesis would be rejectedrdquo If you used the observed t-stat as your critical value you would just reject the null (because the t-stat is equal to this value) This ldquocritical valuerdquo (= the t-stat value) implies a level of significance say p which is exactly the p-value For any significance level smaller than p the corresponding critical value is more extreme than the t-stat value and does not lead to the rejection of the null Hence p is the smallest significance level at which the null would be rejected

bull What is the 90 confidence interval for a parameter It is an interval [L U] defined by two sample statistics the lower bound L and the upper bound U The probability that [L U] covers the parameter is 09

bull In constructing a confidence interval for a parameter what is the level of confidence It is the probability that the CI covers the parameter

bull When the level of confidence increases how would the width of the confidence interval change (holding other things fixed) The width will increase

bull Try to convince yourself that the event ldquothe 90 confidence interval covers a hypothesised value of the parameterrdquo is the same as the event ldquothe null of the parameter being the hypothesised value cannot be rejected in favour of the two-tailed alternative at the 10 level of significancerdquo It becomes apparent if you start with P(|t-stat| gt c) = 09 and disentangle the expression for the t-stat

10

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 41 bull (i) and (iii) generally cause the t statistics not to have a t distribution under H0

Homoskedasticity is one of the CLM assumptions An important omitted variable violates Assumption MLR4 (hence MLR6) in general The CLM assumptions contain no mention of the sample correlations among independent variables except to rule out the case where the correlation is one (MLR3)

Q2 Wooldridge 42 bull (i) H0β3 = 0 H1β3 gt 0 bull (ii) The proportionate effect on is 00024(50) = 012 To obtain the percentage effect we

multiply this by 100 12 Therefore a 50 point ceteris paribus increase in ros is predicted to increase salary by only 12 Practically speaking this is a very small effect for such a large change in ros

bull (iii) The 10 critical value for a one-tailed test using df = infin is obtained from Table G2 as 1282 The t statistic on ros is 0002400054 asymp 44 which is well below the critical value Therefore we fail to reject H0 at the 10 significance level

bull (iv) Based on this sample the estimated ros coefficient appears to be different from zero only because of sampling variation On the other hand including ros may not be causing any harm it depends on how correlated it is with the other independent variables (although these are very significant even with ros in the equation)

Q3 Wooldridge 45 bull (i) 412 plusmn 196(094) or about [228 596] bull (ii) No because the value 4 is well inside the 95 CI bull (iii) Yes because 1 is well outside the 95 CI

Q4 Wooldridge C48 (401ksubs_ch04do) bull (i) There are 2017 single people in the sample of 9275 bull (ii) The estimated equation is

119899119890119905119905119891119886 = minus4304 + 799 inc + 843 age ( 408) (060) (092) n = 2017 R2 = 119 The coefficient on inc indicates that one more dollar in income (holding age fixed) is reflected in about 80 more cents in predicted nettfa no surprise there The coefficient on age means that holding income fixed if a person gets another year older hisher nettfa is predicted to increase by about $843 (Remember nettfa is in thousands of dollars) Again this is not surprising

bull (iii) The intercept is not very interesting as it gives the predicted nettfa for inc = 0 and age = 0 Clearly there is no one with even close to these values in the relevant population

bull (iv) The t statistic is (843 minus 1)092 asymp minus171 Against the one-sided alternative H1 β2 lt 1 the p-value is about 044 Therefore we can reject H0 β2 = 1 at the 5 significance level (against the one-sided alternative)

bull (v) The slope coefficient on inc in the simple regression is about 821 which is not very different from the 799 obtained in part (ii) As it turns out the correlation between inc and age in the sample of single people is only about 039 which helps explain why the simple and multiple regression estimates are not very different see the text also

11

Week 6 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull How would you test hypotheses about a single linear combination of parameters

We may re-parameterise the regression model to isolate the ldquosingle linear combinationrdquo Then the OLS on the re-parameterised model will provide the estimate of the ldquosingle linear combinationrdquo and the related standard error See the example in the textbook and lecture slides

bull What are exclusion restrictions for a regression model Exclusion restrictions are the null hypothesis that a group of x-variables have zero coefficients in the regression model

bull What are restricted and unrestricted models When the null hypothesis is a set of restrictions on the parameters the regression under the null is called the restricted model while the regression under the alternative (which simply states that the null is false) is called the unrestricted model

bull How do you compute the F-statistic given that you have SSRs Clearly the general F-stat is based on the relative difference between the SSRs under the restricted model and unrestricted model F = [(SSRr ndash SSRur)q][SSRur(n-k-1)] where you should be able to explain the meanings of various symbols

bull What are general linear restrictions on parameters These are linear equations on the parameters For example β1 ndash 2β2 + 3β3 = 0

bull What is the test for the overall significance of a regression This is the F-test for the null hypothesis that all the coefficients of x-variables are zero

bull How would you report your regression results See the guidelines in Section 46 of the textbook

bull Why would you care about the asymptotic properties of the OLS estimators When MLR6 does not hold the finite-sample distribution of the OLS estimators is not available For reasonably large samples (large n) we use the asymptotic distribution of the OLS estimators which is known from studying the asymptotic properties to approximate the finite-sample distribution of the OLS estimators which is unknown

bull Comparing the inference procedures in Chapter 5 with those in Chapter 4 can you list the similarities and differences It is beneficial to make a list by yourself

bull Under MLR1-MLR5 the OLS estimators are consistent asymptotically normal and asymptotically efficient Try to explain these properties in your own words This is really a test on your understanding of these notions in econometrics

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 46 bull (i) With df = n ndash 2 = 86 we obtain the 5 critical value from Table G2 with df = 90 Because

each test is two-tailed the critical value is 1987 The t statistic for H0β0 = 0 is about -89 which is much less than 1987 in absolute value Therefore we fail to reject H0β0 = 0 The t statistic for H0β1 = 1 is (976 ndash 1)049 asymp -49 which is even less significant (Remember we reject H0 in favor of H1 in this case only if |t| gt 1987)

bull (ii) We use the SSR form of the F statistic We are testing q = 2 restrictions and the df in the unrestricted model is 86 We are given SSRr = 20944899 and SSRur = 16564451 Therefore F = [(20944899 minus 16564451)2][1656445186] asymp 1137

12

which is a strong rejection of H0 from Table G3c the 1 critical value with 2 and 90 df is 485

bull (iii) We use the R-squared form of the F statistic We are testing q = 3 restrictions and there are 88 ndash 5 = 83 df in the unrestricted model The F statistic is F = [(829 ndash 820)3][(1 ndash 829)83] asymp 146 The 10 critical value (again using 90 denominator df in Table G3a) is 215 so we fail to reject H0 at even the 10 level In fact the p-value is about 23

bull (iv) If heteroskedasticity were present Assumption MLR5 would be violated and the F statistic would not have an F distribution under the null hypothesis Therefore comparing the F statistic against the usual critical values or obtaining the p-value from the F distribution would not be especially meaningful

Q2 Wooldridge 48 bull (i) We use Property VAR3 from Appendix B

Var(1minus 32) = Var (1) + 9 Var (2) ndash 6 Cov (1 2) bull (ii) t = (1 minus 32 minus 1)se(1minus 32) so we need the standard error of 1minus 32 bull (iii) Because θ = β1 ndash 3β2 we can write β1 = θ + 3β2 Plugging this into the population model

gives y = β0 + (θ + 3β2)x1 + β2 x2 + β3 x3 + u = β0 + θ x1 + β2 (3 x1 + x2) + β3 x3 + u This last equation is what we would estimate by regressing y on x1 3x1 + x2 and x3 The coefficient and standard error on x1 are what we want

Q3 Wooldridge 410 bull (i) We need to compute the F statistic for the overall significance of the regression with n =

142 and k = 4 F = [0395(1 ndash 0395)](1374) asymp 141 The 5 critical value with 4 and 120 df is 245 which is well above the value of F Therefore we fail to reject H0 β1 = β2 = β3 = β4 = 0 at the 10 level No explanatory variable is individually significant at the 5 level The largest absolute t statistic is on dkr tdkr asymp 160 which is not significant at the 5 level against a two-sided alternative

bull (ii) The F statistic (with the same df) is now [0330(1 ndash 0330)](1374) asymp 117 which is even lower than in part (i) None of the t statistics is significant at a reasonable level

bull (iii) We probably should not use the logs as the logarithm is not defined for firms that have zero for dkr or eps Therefore we would lose some firms in the regression

bull (iv) It seems very weak There are no significant t statistics at the 5 level (against a two-sided alternative) and the F statistics are insignificant in both cases Plus less than 4 of the variation in return is explained by the independent variables

Q4 Wooldridge C49 (discrim_c4_9do) bull (i) The results from the OLS regression with standard errors in parentheses are

log (119901119904119900119889119886) = minus146 + 073 prpblck + 137 log(income) + 380 prppov (029) (031) (027) (133) n = 401 R2 = 087

bull The p-value for testing H0 β1= 0 against the two-sided alternative is about 018 so that we reject H0 at the 5 level but not at the 1 level

bull (ii) The correlation is about minus84 indicating a strong degree of multicollinearity Yet each coefficient is very statistically significant the t statistic for log (119894119899119888119900119898) is about 51 and that for prppov is about 286 (two-sided p-value = 004)

bull (iii) The OLS regression results when log(hseval) is added are log (119901119904119900119889119886) = minus84 + 098 prpblck minus 053 log(income) + 052 prppov + 121 log(hseval) (29) (029) (038) (134) (018)

13

n = 401 R2 = 184 The coefficient on log(hseval) is an elasticity a one percent increase in housing value holding the other variables fixed increases the predicted price by about 12 percent The two-sided p-value is zero to three decimal places

bull (iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the 15 significance level against a two-sided alternative for log(income) and prppov is does not have a t statistic even close to one in absolute value) Nevertheless they are jointly significant at the 5 level because the outcome of the F2396 statistic is about 352 with p-value = 030 All of the control variables ndash log(income) prppov and log(hseval) ndash are highly correlated so it is not surprising that some are individually insignificant

bull (v) Because the regression in (iii) contains the most controls log(hseval) is individually significant and log(income) and prppov are jointly significant (iii) seems the most reliable It holds fixed three measure of income and affluence Therefore a reasonable estimate is that if prpblck increases by 10 psoda is estimated to increase by 1 other factors held fixed

Q5 Wooldridge 52 bull This is about the inconsistency of the simple regression of pctstck on funds A higher

tolerance of risk means more willingness to invest in the stock market so β2 gt 0 By assumption funds and risktol are positively correlated Now we use equation (55) where δ1 gt 0 plim(1205731) = β1 + β2δ1 gt β1 so 1205731 has a positive inconsistency (asymptotic bias) This makes sense if we omit risktol from the regression and it is positively correlated with funds some of the estimated effect of funds is actually due to the effect of risktol

Q6 Wooldridge C51 (wage1_c5_1do) bull (i) The estimated equation is

119908119886119892119890 = minus287 + 599 educ + 022 exper + 169 tenure (073) (051) (012) (022) n = 526 R2 = 306 120590 = 3085

bull Below is a histogram of the 526 residual 119906119894 i = 1 2 526 The histogram uses 27 bins which is suggested by the formula in the Stata manual for 526 observations For comparison the normal distribution that provides the best fit to the histogram is also plotted

bull

14

bull (ii) With log(wage) as the dependent variable the estimated equation is log (119908119886119892119890) = 284 + 092 educ + 0041 exper + 022 tenure (104) (007) (0017) (003) n = 526 R2 = 316 120590 = 441 The histogram for the residuals from this equation with the best-fitting normal distribution overlaid is given below

bull (iii) The residuals from the log(wage) regression appear to be more normally distributed

Certainly the histogram in part (ii) fits under its comparable normal density better than in part (i) and the histogram for the wage residuals is notably skewed to the left In the wage regression there are some very large residuals (roughly equal to 15) that lie almost five estimated standard deviations (120590 = 3085) from the mean of the residuals which is identically zero of course Residuals far from zero does not appear to be nearly as much of a problem in the log(wage) regression

bull The skewness and kurtosis of the residuals from the OLS regression may be used to test the null that u is normally distributed (skewness = 0 and kurtosis = 3 under the null) The null distribution of the Jarque-Bera test statistic (which takes into account both skewness and kurtosis) is approximately the chi-squared with 2 degrees of freedom Hence at the 5 level we reject the normality of u whenever the JB statistic is greater than 599 In STATA this test is carried out by the command ldquosktestrdquo With the data in WAGE1RAW JB = sdot (extremely large) for the wage model and JB = 1059 for the log(wage) model While the normality is rejected for both models we do see that the JB statistic is much smaller for the log(wage) model [also compare its p-values of skewness and excess kurtosis tests against those from the wage model] Hence the normality appears to be a reasonable approximation to the distribution of the error term in the log(wage) model

15

Week 7 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the advantages of using the log of a variable in regression Find the ldquorules of

thumbrdquo for taking logs See page 191 of Wooldridge (p198-199 for the 3rd edition)

bull Be careful when you interpret the coefficients of explanatory variables in a model where some variables are in logarithm Do you remember Table 23 Consult Table 23

bull How do you compute the change in y caused by Δx when the model is built for log(y) A precise computation is given by Equation (68) of Wooldridge

bull Why do we need the ldquointeractionrdquo terms in regression models An interaction term is needed if the partial effect of an explanatory variable is linearly related to another explanatory variable See (617) for example

bull What is the adjusted R-squared What is the difference between it and the R-squared The primary attractiveness of the adjusted R-squared is that it imposes penalty for adding additional explanatory variables to a model The R-squared can never fall when a new explanatory variable is added However the adjusted R-squared will fall if the t-ratio on the new variable is less than one in absolute value

bull How do you construct interval prediction for given x-values This is nicely summarised in Equations (627)-(631) and the surrounding text

bull How do you predict y for given x-values when the model is built for log(y) Check the list on page 212 (p220 for the 3rd edition)

bull What is involved in ldquoresidual analysisrdquo See pages 209-210 (p217-218 for the 3rd edition)

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 64 bull (i) Holding all other factors fixed we have

Δlog(wage) = β1 Δeduc + β2 Δeducsdot pareduc = (β1 + β2 pareduc) Δeduc Dividing both sides by Δeduc gives the result The sign of β2 is not obvious although β2 gt 0 if we think a child gets more out of another year of education the more highly educated are the childrsquos parents

bull (ii) We use the values pareduc = 32 and pareduc = 24 to interpret the coefficient on educsdotpareduc The difference in the estimated return to education is 00078(32 ndash 24) = 0062 or about 62 percentage points (Percentage points are changes in percentages)

bull (iii) When we add pareduc by itself the coefficient on the interaction term is negative The t statistic on educsdotpareduc is about ndash133 which is not significant at the 10 level against a two-sided alternative Note that the coefficient on pareduc is significant at the 5 level against a two-sided alternative This provides a good example of how omitting a level effect (pareduc in this case) can lead to biased estimation of the interaction effect

Q2 Wooldridge 65 bull This would make little sense Performances on math and science exams are measures of

outputs of the educational process and we would like to know how various educational inputs and school characteristics affect math and science scores For example if the staff-to-pupil ratio has an effect on both exam scores why would we want to hold

16

performance on the science test fixed while studying the effects of staff on the math pass rate This would be an example of controlling for too many factors in a regression equation The variable scill could be a dependent variable in an identical regression equation

Q3 Wooldridge C63 (C62 for the 3rd Edition) (wage2_c6_3do) bull (i) Holding exper (and the elements in u) fixed we have

Δ log(wage) = β1Δeduc + β2Δeducsdotexper = (β1 + β2exper)Δeduc or Δ log(wage) Δeduc = (β1 + β2exper) This is the approximate proportionate change in wage given one more year of education

bull (ii) H0 β3 = 0 If we think that education and experience interact positively ndash so that people with more experience are more productive when given another year of education then β3 gt 0 is the appropriate alternative

bull (iii) The estimated equation is log (119908119886119892119890) = 595 + 0440 educ ndash 0215 exper + 00320 educsdotexper (024) (0174) (0200) (00153) n = 935 R2 = 135 adjusted-R2 = 132 The t statistic on the interaction term is about 213 which gives a p-value below 02 against H1 β3 gt 0 Therefore we reject H0 β3 = 0 in favor of H1 β3 gt 0 at the 2 level

bull (iv) We rewrite the equation as log(wage) = β0 + θ1educ + β2exper + β3 educ (exper minus 10) + u and run the regression log(wage) on educ exper and educ(exper ndash 10) We want the coefficient on educ We obtain θ1 asymp 0761 and se(θ1) asymp 0066 The 95 CI for θ1 is about 063 to 089

Q4 Wooldridge C68 (hprice1_c6_8do) bull (i) The estimated equation (where price is in dollars) is

119901119903120484119888119890 = minus217703 + 2068 lotsize + 12278 sqrft + 138525 bdrms (294750) (0642) (1324) (90101) n = 88 R2 = 672 adjusted-R2 = 661 120590 = 59833 The predicted price at lotsize = 10000 sqrft = 2300 and bdrms = 4 is about $336714

bull (ii) The regression is pricei on (lotsizei ndash 10000) (sqrfti ndash 2300) and (bdrmsi ndash 4) We want the intercept estimate and the associated 95 CI from this regression The CI is approximately 3367067 plusmn 14665 or about $322042 to $351372 when rounded to the nearest dollar

bull (iii) We must use equation (636) to obtain the standard error of and then use equation (637) (assuming that price is normally distributed) But from the regression in part (ii) se(1199100) asymp 73745 and 120590 asymp 59833 Therefore se(0) asymp [(73745)2 + (59833) 2]12 asymp 602858 Using 199 as the approximate 975th percentile in the t84 distribution gives the 95 CI for price0 at the given values of the explanatory variables as 3367067 plusmn 199(602858) or rounded to the nearest dollar $216738 to $456675 This is a fairly wide prediction interval But we have not used many factors to explain housing price If we had more we could presumably reduce the error standard deviation and therefore 120590 to obtain a tighter prediction interval

17

Week 8 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is a qualitative factor Give some examples

This should be easy bull How do you use dummy variables to represent qualitative factors

A factor with two levels (or values) is easily represented one dummy variable For a factor with more than two levels (or values) we need more than one dummies For example a factors with 5 values requires 4 dummy variables

bull How do you use dummy variables to represent an ordinal variable An ordinal variable may be represented by dummies in the same way as a multi-levelled factor

bull How do you test for differences in regression functions across different groups We generally use the F test on properly-defined restricted model and unrestricted model Chow test is a special case of the F test where the null hypothesis is no difference across groups

bull What can you achieve by interacting group dummy variables with other regressors The interaction allows the slope coefficients to be different across groups

bull What is ldquoprogram evaluationrdquo It assesses the effectiveness of a program by checking the differences in regression coefficients for the controlled (or base) group and the treatment group

bull What are the interpretations of the PRF and SRF when the dependent variable is binary The PRF is the conditional probability of ldquosuccessrdquo for given explanatory variables The SRF is the predicted conditional probability of ldquosuccessrdquo for given explanatory variables

bull What are the shortcomings of the LPM The major drawback of the LPM is that the predicted probability of success may be outside the interval [01]

bull Does MLR5 hold for the LPM No as indicated in the textbook and lecture the conditional variance of the disturbance is not a constant and depends on explanatory variables var(u|x) = E(y|x)[1minus E(y|x)]

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 74 bull (i) The approximate difference is just the coefficient on utility times 100 or ndash283 The t

statistic is minus283099 asymp minus286 which is statistically significant bull (ii) 100sdot[exp(minus283) ndash 1) asymp minus247 and so the estimate is somewhat smaller in magnitude bull (iii) The proportionate difference is 181 minus 158 = 023 or about 23 One equation that can

be estimated to obtain the standard error of this difference is log(salary) = β0 + β1log(sales) + β2roe + δ1consprod + δ2utility +δ3trans + u where trans is a dummy variable for the transportation industry Now the base group is finance and so the coefficient δ1 directly measures the difference between the consumer products and finance industries and we can use the t statistic on consprod

Q2 Wooldridge 76 bull In Section 33 ndash in particular in the discussion surrounding Table 32 ndash we discussed how to

determine the direction of bias in the OLS estimators when an important variable (ability in this case) has been omitted from the regression As we discussed there Table 32 only

18

strictly holds with a single explanatory variable included in the regression but we often ignore the presence of other independent variables and use this table as a rough guide (Or we can use the results of Problem 310 for a more precise analysis) If less able workers are more likely to receive training then train and u are negatively correlated If we ignore the presence of educ and exper or at least assume that train and u are negatively correlated after netting out educ and exper then we can use Table 32 the OLS estimator of β1 (with ability in the error term) has a downward bias Because we think β1 ge 0 we are less likely to conclude that the training program was effective Intuitively this makes sense if those chosen for training had not received training they would have lowers wages on average than the control group

Q3 Wooldridge 79 bull (i) Plugging in u = 0 and d = 1 gives f1(z) = (β0+δ0) + (β1+δ1)z bull (ii) Setting f0(z) = f1(z) gives β0 + β1z = (β0+δ0) + (β1+δ1)z Therefore provided δ1 ne 0

we have z = minusδ0δ1 Clearly z is positive if and only if δ0δ1 is negative which means δ0 and δ1 must have opposite signs

bull (iii) Using part (ii) we have totcoll = 357030 = 119 years (totcoll= total college years) bull (iv) The estimated years of college where women catch up to men is much too high to be

practically relevant While the estimated coefficient on femalesdottotcoll shows that the gap is reduced at higher levels of college it is never closed ndash not even close In fact at four years of college the difference in predicted log wage is still minus357+030(4) = 237 or about 211 less for women

Q4 Wooldridge C713 (apple_c7_13do) bull (i) 412660 asymp 624 bull (ii) The OLS estimates of the LPM are

119890119888119900119899119887119906119910 = 424 minus 803 ecoprc + 719 regprc + 00055 faminc + 024 hhsize (165) (109) (132) (00053) (013) + 025 educ minus 00050 age (008) (00125) n = 660 R2 = 110 If ecoprc increases by say 10 cents (10) then the probability of buying eco-labeled apples falls by about 080 If regprc increases by 10 cents the probability of buying eco-labeled apples increases by about 072 (Of course we are assuming that the probabilities are not close to the boundaries of zero and one respectively)

bull (iii) The F test with 4 and 653 df is 443 with p-value = 0015 Thus based on the usual F test the four non-price variables are jointly very significant Of the four variables educ appears to have the most important effect For example a difference of four years of education implies an increase of 025(4) = 10 in the estimated probability of buying eco-labeled apples This suggests that more highly educated people are more open to buying produce that is environmentally friendly which is perhaps expected Household size (hhsize) also has an effect Comparing a couple with two children to one that has no children ndash other factors equal ndash the couple with two children has a 048 higher probability of buying eco-labeled apples

bull (iv) The model with log(faminc) fits the data slightly better the R-squared increases to about 112 (We would not expect a large increase in R-squared from a simple change in the functional form) The coefficient on log(faminc) is about 045 (t = 155) If log(faminc) increases by 10 which means roughly a 10 increase in faminc then P(ecobuy = 1) is estimated to increase by about 0045 a pretty small effect

19

bull (v) The fitted probabilities range from about 185 to 1051 so none are negative There are two fitted probabilities above 1 which is not a source of concern with 660 observations

bull (vi) Using the standard prediction rule ndash predict one when 119890119888119900119899119887119906119910 ge 05 and zero otherwise ndash gives the fraction correctly predicted for ecobuy = 0 as 102248 asymp 411 so about 411 For ecobuy = 1 the fraction correctly predicted is 340412 asymp 825 or 825 With the usual prediction rule the model does a much better job predicting the decision to buy eco-labeled apples (The overall percent correctly predicted is about 67)

20

Week 9 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is heteroskedasticity in a regression model

When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i) we say that heteroskedasticity is present

bull In the presence of heteroskedasticity are the t-stat and F-stat from the usual OLS still valid Why Are there any other problems with the OLS under heteroskedasticity The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators

bull What are the heteroskedasticity-robust standard errors How do you use them in STATA These are the corrected standard errors that take into account the possible presence of heteroskedasticity The t-stat and F-stat computed using the heteroskedasticity-robust standard errors are valid test statistics In STATA these are easily obtained by using the option ldquorobustrdquo with the ldquoregressrdquo command eg regress lwage educ robust

bull How do you detect if there is heteroskedasticity The Breusch-Pagan test or the White test can be used to detect the presence of heteroskedasticity See Section 83 for details

bull If heteroskedasticity is present in a known form how would you estimate the model In this case the WLS estimators should be used which is more efficient than the OLS estimators See Section 84 for details

bull If heteroskedasticity is present in an unknown form how would you estimate the model If there is strong evidence for heteroskedasticity the FGLS estimators should be used which is based on an exponential functional form See Section 84 for details

bull What are the steps in the FGLS estimation You should summarise from Section 84

bull How would you handle the heteroskedasticity of the LPM The heteroskedasticity functional form is known for the LPM Hence the WLS can be used in principle However because the LPM can produce predicted probabilities that are outside the interval (01) the known functional form p(x)[1-p(x)] may not be useful for the WLS In that case it may be necessary to use FGLS for the LPM

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 81 bull Parts (ii) and (iii) The homoskedasticity assumption played no role in Chapter 5 in showing

that OLS is consistent But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid even in large samples As heteroskedasticity is a violation of the Gauss-Markov assumptions OLS is no longer BLUE

Q2 Wooldridge 82 bull With Var(u|incpriceeducfemale) = σ2inc2 h(x) = inc2 where h(x) is the heteroskedasticity

function defined in equation (821) Therefore radich(x) = inc and so the transformed equation is obtained by dividing the original equation by inc beerinc = β0(1inc) + β1 + β2priceinc + β3educinc + β4femaleinc + uinc

21

Notice that β1 which is the slope on inc in the original model is now a constant in the transformed equation This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation

Q3 Wooldridge 83 bull False The unbiasedness of WLS and OLS hinges crucially on Assumption MLR4 and as we

know from Chapter 4 this assumption is often violated when an important variable is omitted When MLR4 does not hold both WLS and OLS are biased Without specific information on how the omitted variable is correlated with the included explanatory variables it is not possible to determine which estimator has a small bias It is possible that WLS would have more bias than OLS or less bias Because we cannot know we should not claim to use WLS in order to solve ldquobiasesrdquo associated with OLS

Q4 Wooldridge 85 bull (i) No For each coefficient the usual standard errors and the heteroskedasticity-robust

ones are practically very similar bull (ii) The effect is minus029(4) = minus116 so the probability of smoking falls by about 116 bull (iii) As usual we compute the turning point in the quadratic 020[2(00026)] asymp 3846 so

about 38 and one-half years bull (iv) Holding other factors in the equation fixed a person in a state with restaurant smoking

restrictions has a 101 lower chance of smoking This is similar to the effect of having four more years of education

bull (v) We just plug the values of the independent variables into the OLS regression line 119904119898119900119896119890119904 = 656 minus 069log(6744)+012log(6500) minus029(16)+020(77) minus00026(77) asymp 0052 Thus the estimated probability of smoking for this person is close to zero (In fact this person is not a smoker so the equation predicts well for this particular observation)

Q5 Wooldridge C810 (401ksubs_c8_10do) bull (i) In the following equation estimated by OLS the usual standard errors are in (sdot) and the

heteroskedasticity-robust standard errors are in [sdot] 119890401119896 = minus506 + 0124 inc minus 000062 inc2 + 0265 age minus 00031 age2 minus 0035 male (081) (0006) (000005) (0039) (00005) (0121) [079] [0006] [000005] [0038] [00004] [0121] n = 9275 R2 = 094 There are no important differences if anything the robust standard errors are smaller

bull (ii) This is a general claim Since Var(y|x) = p(x)[1-p(x)] we can write E(u2|x) = p(x)-p(x)2 Written in error form u2 = p(x) - p(x)2 + v In other words we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v with the restrictions δ0 = 0 δ1 = 1 and δ2 = -1 Remember that for the LPM the fitted values 119910 are estimates of p(x) So when we run the regression 1199062 on 119910 and 1199102 (including an intercept) the intercept estimates should be close to zero the coefficient on 119910 should be close to one and the coefficient on 1199102 should be close to ndash1

bull (iii) The White LM statistic and F statistic about 5819 and 31032 respectively both of which are very significant The coefficient on 119890401119896 is about 1010 the coefficient on 119890401119896 2 about minus970 and the intercept is about -009 These estimates are quite close to what we expect to find from the theory in part (ii)

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 7: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

7

Week 4 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What do we mean when we say ldquoregress wage on educ and exprrdquo

Use OLS to estimate the multiple linear regression model wage = β0 + β1educ + β2expr + u bull Why and under what circumstance do we need to ldquocontrol forrdquo expr in the regression model

in order to quantify the effect of educ on wage We need to control for expr in the model when the partial or ceteris paribus effect of educ on wage is the parameter of interest and expr and educ are correlated

bull What is the bias of an estimator bias =E( 119895) ndash βj

bull What is the ldquoomitted variable biasrdquo It is the bias resulted from omitting relevant variables that are correlated with the explanatory variables in a regression model

bull What is the consequence of adding an irrelevant variable to a regression model Including an irrelevant variable will generally increase the variance of the OLS estimators

bull What is the requirement of the ZCM assumption in your own words You must do this by yourself

bull Why assuming E(u) = 0 is not restrictive when an intercept is included in the regression model When the mean value of the disturbance is not zero you can always define the new disturbance as the original disturbance minus the mean value and define the new intercept as the original intercept plus the mean value

bull In terms of notation why do we need two subscripts for independent variables In our notation the first subscript is for indexing observations (1st obs 2nd obs etc) and the second subscript is for indexing variables (1st variable 2nd variable etc)

bull Using OLS to estimate a multiple regression model with k independent variables and an intercept how many first-order conditions are there k + 1

bull How do you know that the OLS estimators are linear combinations of the observations on the dependent variable By the first-order conditions we see that the OLS estimators are the solution to k+1 linear equations with k+1 unknowns The ldquoright-hand-sidesrdquo of the equations are linear functions of observations on the dependent variable (say y) Hence the solution is linear combinations of the y-observations

bull What is Gauss-Markov theorem Read the textbook bull Try to explain why R2 never decreases (it is likely to increase) when additional explanatory

variables are added to the regression model First the smaller is the SSR the greater is the R-squared The OLS chooses parameter estimates to minimise the SSR With extra explanatory variables the R-squared does not decrease because the SSR does not increase which is true because the OLS has a larger parameter space to choose estimates (more possibilities to make SSR small)

bull What is an endogenous explanatory variable Exogenous explanatory variable See the top of page 88

bull What is multicollinearity and its likely effect on the OLS estimators Read pages 95-99

8

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 32 bull (i) Yes Because of budget constraints it makes sense that the more siblings there are in a

family the less education any one child in the family has To find the increase in the number of siblings that reduces predicted education by one year we solve 1 = 094(Δsibs) so Δsibs = 1094 asymp 106

bull (ii) Holding sibs and feduc fixed one more year of motherrsquos education implies 131 years more of predicted education So if a mother has four more years of education her son is predicted to have about a half a year (524) more years of education

bull (iii) Since the number of siblings is the same but meduc and feduc are both different the coefficients on meduc and feduc both need to be accounted for The predicted difference in education between B and A is 131(4) + 210(4) = 1364

Q2 Wooldridge 310 bull (i) Because x1 is highly correlated with x2 and x3 and these latter variables have large partial

effects on y the simple and multiple regression coefficients on x1 can differ by large amounts We have not done this case explicitly but given equation (346) and the discussion with a single omitted variable the intuition is pretty straightforward

bull (ii) Here we would expect 1205731 and 1 to be similar (subject of course to what we mean by ldquoalmost uncorrelatedrdquo) The amount of correlation between x2 and x3 does not directly effect the multiple regression estimate on x1 if is essentially uncorrelated with x2 and x3

bull (iii) In this case we are (unnecessarily) introducing multicollinearity into the regression x2 and x3 have small partial effects on y and yet x2 and x3 are highly correlated withx1 Adding x2 and x3 likely increases the standard error of the coefficient on x1 substantially so se(1) is likely to be much larger than se(1205731)

bull (iv) In this case adding x2 and x3 will decrease the residual variance without causing much collinearity (because x1 is almost uncorrelated with x2 and x3 ) so we should see se(1) smaller than se(1205731) The amount of correlation between x and does not directly affect se(1)

Q3 Wooldridge C32 (hprice1_ch03do) bull (i) The estimated equation is

119901119903120484119888119890 = minus1932 + 128 119904119902119903119891119905 + 1520 119887119889119903119898119904 119899 = 88 1198772 = 632 bull (ii) Holding square footage constant (the price increase) = 1520 (increase in bedroom)

Hence the price increases by 1520 which means $15200 bull (iii) Now (the price increase) = 128 (increase in square feet) + 1520 (increase in bedroom)

= 128 (140) + 1520 (1) = 3312 That is $33120 Because the house size is increasing the effect is larger than that in (ii)

bull (iv) About 632 bull (v) The predicted price is -1932+128(2438)+1520(4) = 353544 or $353544 bull (vi) The residual = 300000 ndash 353544 = - 53544 The buyer under-paid judged by the

predicted price But of course there are many other features of a house (some that we cannot even measure) that affect price We have not controlled for those

Q4 Wooldridge C36 (wage2_ch03_c3_6do) bull (i) The slope coefficient from regressing IQ on educ is 353383 bull (ii) The slope coefficient from log(wage) on educ is 05984 bull (iii) The slope coefficient from log(wage) on educ and IQ are 03912 and 00586 respectively bull (iv) The computed 211 βδβ ˆ~ˆ + = 03912 + 353383(00586) asymp 05984

9

Week 5 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the CLM assumptions

These are MLR1 MLR2 MLR3 and MLR6 MLR6 is a very strong assumption that implies both MLR4 and MLR5

bull What is the sampling distribution of the OLS estimators under the CLM assumptions The OLS estimators under the CLM assumptions follow the normal distribution

bull What are the standard errors of the OLS estimators The standard error is the square root of the estimated variance of the estimator

bull What is the null hypothesis about a parameter In this context the null hypothesis is a statement that assigns a known value (or hypothesised value) to the parameter of interest Usually the null is a maintained proposition (from economic theories or experience) that will only be rejected on very strong evidence

bull What is a one-tailed (two-tailed) alternative hypothesis This should be clear once you finish your reading

bull In testing hypotheses what is a Type 1 (Type 2) error What is the level of significance Type 1 error is the rejection of a true null Type 2 error is the non-rejection of a false null

bull The decision rule we use can be stated as ldquoreject the null if the t-statistic exceeds the critical valuerdquo How is the critical value determined The critical value is determined by the level of significance (or level of confidence in the case of confidence intervals) and the distribution of the test statistic For example for t-stat we use the t-distribution when the sample size is small and the normal distribution when the sample size is large (df gt 120)

bull Justify the statement ldquoGiven the observed test statistic the p-value is the smallest significant level at which the null hypothesis would be rejectedrdquo If you used the observed t-stat as your critical value you would just reject the null (because the t-stat is equal to this value) This ldquocritical valuerdquo (= the t-stat value) implies a level of significance say p which is exactly the p-value For any significance level smaller than p the corresponding critical value is more extreme than the t-stat value and does not lead to the rejection of the null Hence p is the smallest significance level at which the null would be rejected

bull What is the 90 confidence interval for a parameter It is an interval [L U] defined by two sample statistics the lower bound L and the upper bound U The probability that [L U] covers the parameter is 09

bull In constructing a confidence interval for a parameter what is the level of confidence It is the probability that the CI covers the parameter

bull When the level of confidence increases how would the width of the confidence interval change (holding other things fixed) The width will increase

bull Try to convince yourself that the event ldquothe 90 confidence interval covers a hypothesised value of the parameterrdquo is the same as the event ldquothe null of the parameter being the hypothesised value cannot be rejected in favour of the two-tailed alternative at the 10 level of significancerdquo It becomes apparent if you start with P(|t-stat| gt c) = 09 and disentangle the expression for the t-stat

10

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 41 bull (i) and (iii) generally cause the t statistics not to have a t distribution under H0

Homoskedasticity is one of the CLM assumptions An important omitted variable violates Assumption MLR4 (hence MLR6) in general The CLM assumptions contain no mention of the sample correlations among independent variables except to rule out the case where the correlation is one (MLR3)

Q2 Wooldridge 42 bull (i) H0β3 = 0 H1β3 gt 0 bull (ii) The proportionate effect on is 00024(50) = 012 To obtain the percentage effect we

multiply this by 100 12 Therefore a 50 point ceteris paribus increase in ros is predicted to increase salary by only 12 Practically speaking this is a very small effect for such a large change in ros

bull (iii) The 10 critical value for a one-tailed test using df = infin is obtained from Table G2 as 1282 The t statistic on ros is 0002400054 asymp 44 which is well below the critical value Therefore we fail to reject H0 at the 10 significance level

bull (iv) Based on this sample the estimated ros coefficient appears to be different from zero only because of sampling variation On the other hand including ros may not be causing any harm it depends on how correlated it is with the other independent variables (although these are very significant even with ros in the equation)

Q3 Wooldridge 45 bull (i) 412 plusmn 196(094) or about [228 596] bull (ii) No because the value 4 is well inside the 95 CI bull (iii) Yes because 1 is well outside the 95 CI

Q4 Wooldridge C48 (401ksubs_ch04do) bull (i) There are 2017 single people in the sample of 9275 bull (ii) The estimated equation is

119899119890119905119905119891119886 = minus4304 + 799 inc + 843 age ( 408) (060) (092) n = 2017 R2 = 119 The coefficient on inc indicates that one more dollar in income (holding age fixed) is reflected in about 80 more cents in predicted nettfa no surprise there The coefficient on age means that holding income fixed if a person gets another year older hisher nettfa is predicted to increase by about $843 (Remember nettfa is in thousands of dollars) Again this is not surprising

bull (iii) The intercept is not very interesting as it gives the predicted nettfa for inc = 0 and age = 0 Clearly there is no one with even close to these values in the relevant population

bull (iv) The t statistic is (843 minus 1)092 asymp minus171 Against the one-sided alternative H1 β2 lt 1 the p-value is about 044 Therefore we can reject H0 β2 = 1 at the 5 significance level (against the one-sided alternative)

bull (v) The slope coefficient on inc in the simple regression is about 821 which is not very different from the 799 obtained in part (ii) As it turns out the correlation between inc and age in the sample of single people is only about 039 which helps explain why the simple and multiple regression estimates are not very different see the text also

11

Week 6 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull How would you test hypotheses about a single linear combination of parameters

We may re-parameterise the regression model to isolate the ldquosingle linear combinationrdquo Then the OLS on the re-parameterised model will provide the estimate of the ldquosingle linear combinationrdquo and the related standard error See the example in the textbook and lecture slides

bull What are exclusion restrictions for a regression model Exclusion restrictions are the null hypothesis that a group of x-variables have zero coefficients in the regression model

bull What are restricted and unrestricted models When the null hypothesis is a set of restrictions on the parameters the regression under the null is called the restricted model while the regression under the alternative (which simply states that the null is false) is called the unrestricted model

bull How do you compute the F-statistic given that you have SSRs Clearly the general F-stat is based on the relative difference between the SSRs under the restricted model and unrestricted model F = [(SSRr ndash SSRur)q][SSRur(n-k-1)] where you should be able to explain the meanings of various symbols

bull What are general linear restrictions on parameters These are linear equations on the parameters For example β1 ndash 2β2 + 3β3 = 0

bull What is the test for the overall significance of a regression This is the F-test for the null hypothesis that all the coefficients of x-variables are zero

bull How would you report your regression results See the guidelines in Section 46 of the textbook

bull Why would you care about the asymptotic properties of the OLS estimators When MLR6 does not hold the finite-sample distribution of the OLS estimators is not available For reasonably large samples (large n) we use the asymptotic distribution of the OLS estimators which is known from studying the asymptotic properties to approximate the finite-sample distribution of the OLS estimators which is unknown

bull Comparing the inference procedures in Chapter 5 with those in Chapter 4 can you list the similarities and differences It is beneficial to make a list by yourself

bull Under MLR1-MLR5 the OLS estimators are consistent asymptotically normal and asymptotically efficient Try to explain these properties in your own words This is really a test on your understanding of these notions in econometrics

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 46 bull (i) With df = n ndash 2 = 86 we obtain the 5 critical value from Table G2 with df = 90 Because

each test is two-tailed the critical value is 1987 The t statistic for H0β0 = 0 is about -89 which is much less than 1987 in absolute value Therefore we fail to reject H0β0 = 0 The t statistic for H0β1 = 1 is (976 ndash 1)049 asymp -49 which is even less significant (Remember we reject H0 in favor of H1 in this case only if |t| gt 1987)

bull (ii) We use the SSR form of the F statistic We are testing q = 2 restrictions and the df in the unrestricted model is 86 We are given SSRr = 20944899 and SSRur = 16564451 Therefore F = [(20944899 minus 16564451)2][1656445186] asymp 1137

12

which is a strong rejection of H0 from Table G3c the 1 critical value with 2 and 90 df is 485

bull (iii) We use the R-squared form of the F statistic We are testing q = 3 restrictions and there are 88 ndash 5 = 83 df in the unrestricted model The F statistic is F = [(829 ndash 820)3][(1 ndash 829)83] asymp 146 The 10 critical value (again using 90 denominator df in Table G3a) is 215 so we fail to reject H0 at even the 10 level In fact the p-value is about 23

bull (iv) If heteroskedasticity were present Assumption MLR5 would be violated and the F statistic would not have an F distribution under the null hypothesis Therefore comparing the F statistic against the usual critical values or obtaining the p-value from the F distribution would not be especially meaningful

Q2 Wooldridge 48 bull (i) We use Property VAR3 from Appendix B

Var(1minus 32) = Var (1) + 9 Var (2) ndash 6 Cov (1 2) bull (ii) t = (1 minus 32 minus 1)se(1minus 32) so we need the standard error of 1minus 32 bull (iii) Because θ = β1 ndash 3β2 we can write β1 = θ + 3β2 Plugging this into the population model

gives y = β0 + (θ + 3β2)x1 + β2 x2 + β3 x3 + u = β0 + θ x1 + β2 (3 x1 + x2) + β3 x3 + u This last equation is what we would estimate by regressing y on x1 3x1 + x2 and x3 The coefficient and standard error on x1 are what we want

Q3 Wooldridge 410 bull (i) We need to compute the F statistic for the overall significance of the regression with n =

142 and k = 4 F = [0395(1 ndash 0395)](1374) asymp 141 The 5 critical value with 4 and 120 df is 245 which is well above the value of F Therefore we fail to reject H0 β1 = β2 = β3 = β4 = 0 at the 10 level No explanatory variable is individually significant at the 5 level The largest absolute t statistic is on dkr tdkr asymp 160 which is not significant at the 5 level against a two-sided alternative

bull (ii) The F statistic (with the same df) is now [0330(1 ndash 0330)](1374) asymp 117 which is even lower than in part (i) None of the t statistics is significant at a reasonable level

bull (iii) We probably should not use the logs as the logarithm is not defined for firms that have zero for dkr or eps Therefore we would lose some firms in the regression

bull (iv) It seems very weak There are no significant t statistics at the 5 level (against a two-sided alternative) and the F statistics are insignificant in both cases Plus less than 4 of the variation in return is explained by the independent variables

Q4 Wooldridge C49 (discrim_c4_9do) bull (i) The results from the OLS regression with standard errors in parentheses are

log (119901119904119900119889119886) = minus146 + 073 prpblck + 137 log(income) + 380 prppov (029) (031) (027) (133) n = 401 R2 = 087

bull The p-value for testing H0 β1= 0 against the two-sided alternative is about 018 so that we reject H0 at the 5 level but not at the 1 level

bull (ii) The correlation is about minus84 indicating a strong degree of multicollinearity Yet each coefficient is very statistically significant the t statistic for log (119894119899119888119900119898) is about 51 and that for prppov is about 286 (two-sided p-value = 004)

bull (iii) The OLS regression results when log(hseval) is added are log (119901119904119900119889119886) = minus84 + 098 prpblck minus 053 log(income) + 052 prppov + 121 log(hseval) (29) (029) (038) (134) (018)

13

n = 401 R2 = 184 The coefficient on log(hseval) is an elasticity a one percent increase in housing value holding the other variables fixed increases the predicted price by about 12 percent The two-sided p-value is zero to three decimal places

bull (iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the 15 significance level against a two-sided alternative for log(income) and prppov is does not have a t statistic even close to one in absolute value) Nevertheless they are jointly significant at the 5 level because the outcome of the F2396 statistic is about 352 with p-value = 030 All of the control variables ndash log(income) prppov and log(hseval) ndash are highly correlated so it is not surprising that some are individually insignificant

bull (v) Because the regression in (iii) contains the most controls log(hseval) is individually significant and log(income) and prppov are jointly significant (iii) seems the most reliable It holds fixed three measure of income and affluence Therefore a reasonable estimate is that if prpblck increases by 10 psoda is estimated to increase by 1 other factors held fixed

Q5 Wooldridge 52 bull This is about the inconsistency of the simple regression of pctstck on funds A higher

tolerance of risk means more willingness to invest in the stock market so β2 gt 0 By assumption funds and risktol are positively correlated Now we use equation (55) where δ1 gt 0 plim(1205731) = β1 + β2δ1 gt β1 so 1205731 has a positive inconsistency (asymptotic bias) This makes sense if we omit risktol from the regression and it is positively correlated with funds some of the estimated effect of funds is actually due to the effect of risktol

Q6 Wooldridge C51 (wage1_c5_1do) bull (i) The estimated equation is

119908119886119892119890 = minus287 + 599 educ + 022 exper + 169 tenure (073) (051) (012) (022) n = 526 R2 = 306 120590 = 3085

bull Below is a histogram of the 526 residual 119906119894 i = 1 2 526 The histogram uses 27 bins which is suggested by the formula in the Stata manual for 526 observations For comparison the normal distribution that provides the best fit to the histogram is also plotted

bull

14

bull (ii) With log(wage) as the dependent variable the estimated equation is log (119908119886119892119890) = 284 + 092 educ + 0041 exper + 022 tenure (104) (007) (0017) (003) n = 526 R2 = 316 120590 = 441 The histogram for the residuals from this equation with the best-fitting normal distribution overlaid is given below

bull (iii) The residuals from the log(wage) regression appear to be more normally distributed

Certainly the histogram in part (ii) fits under its comparable normal density better than in part (i) and the histogram for the wage residuals is notably skewed to the left In the wage regression there are some very large residuals (roughly equal to 15) that lie almost five estimated standard deviations (120590 = 3085) from the mean of the residuals which is identically zero of course Residuals far from zero does not appear to be nearly as much of a problem in the log(wage) regression

bull The skewness and kurtosis of the residuals from the OLS regression may be used to test the null that u is normally distributed (skewness = 0 and kurtosis = 3 under the null) The null distribution of the Jarque-Bera test statistic (which takes into account both skewness and kurtosis) is approximately the chi-squared with 2 degrees of freedom Hence at the 5 level we reject the normality of u whenever the JB statistic is greater than 599 In STATA this test is carried out by the command ldquosktestrdquo With the data in WAGE1RAW JB = sdot (extremely large) for the wage model and JB = 1059 for the log(wage) model While the normality is rejected for both models we do see that the JB statistic is much smaller for the log(wage) model [also compare its p-values of skewness and excess kurtosis tests against those from the wage model] Hence the normality appears to be a reasonable approximation to the distribution of the error term in the log(wage) model

15

Week 7 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the advantages of using the log of a variable in regression Find the ldquorules of

thumbrdquo for taking logs See page 191 of Wooldridge (p198-199 for the 3rd edition)

bull Be careful when you interpret the coefficients of explanatory variables in a model where some variables are in logarithm Do you remember Table 23 Consult Table 23

bull How do you compute the change in y caused by Δx when the model is built for log(y) A precise computation is given by Equation (68) of Wooldridge

bull Why do we need the ldquointeractionrdquo terms in regression models An interaction term is needed if the partial effect of an explanatory variable is linearly related to another explanatory variable See (617) for example

bull What is the adjusted R-squared What is the difference between it and the R-squared The primary attractiveness of the adjusted R-squared is that it imposes penalty for adding additional explanatory variables to a model The R-squared can never fall when a new explanatory variable is added However the adjusted R-squared will fall if the t-ratio on the new variable is less than one in absolute value

bull How do you construct interval prediction for given x-values This is nicely summarised in Equations (627)-(631) and the surrounding text

bull How do you predict y for given x-values when the model is built for log(y) Check the list on page 212 (p220 for the 3rd edition)

bull What is involved in ldquoresidual analysisrdquo See pages 209-210 (p217-218 for the 3rd edition)

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 64 bull (i) Holding all other factors fixed we have

Δlog(wage) = β1 Δeduc + β2 Δeducsdot pareduc = (β1 + β2 pareduc) Δeduc Dividing both sides by Δeduc gives the result The sign of β2 is not obvious although β2 gt 0 if we think a child gets more out of another year of education the more highly educated are the childrsquos parents

bull (ii) We use the values pareduc = 32 and pareduc = 24 to interpret the coefficient on educsdotpareduc The difference in the estimated return to education is 00078(32 ndash 24) = 0062 or about 62 percentage points (Percentage points are changes in percentages)

bull (iii) When we add pareduc by itself the coefficient on the interaction term is negative The t statistic on educsdotpareduc is about ndash133 which is not significant at the 10 level against a two-sided alternative Note that the coefficient on pareduc is significant at the 5 level against a two-sided alternative This provides a good example of how omitting a level effect (pareduc in this case) can lead to biased estimation of the interaction effect

Q2 Wooldridge 65 bull This would make little sense Performances on math and science exams are measures of

outputs of the educational process and we would like to know how various educational inputs and school characteristics affect math and science scores For example if the staff-to-pupil ratio has an effect on both exam scores why would we want to hold

16

performance on the science test fixed while studying the effects of staff on the math pass rate This would be an example of controlling for too many factors in a regression equation The variable scill could be a dependent variable in an identical regression equation

Q3 Wooldridge C63 (C62 for the 3rd Edition) (wage2_c6_3do) bull (i) Holding exper (and the elements in u) fixed we have

Δ log(wage) = β1Δeduc + β2Δeducsdotexper = (β1 + β2exper)Δeduc or Δ log(wage) Δeduc = (β1 + β2exper) This is the approximate proportionate change in wage given one more year of education

bull (ii) H0 β3 = 0 If we think that education and experience interact positively ndash so that people with more experience are more productive when given another year of education then β3 gt 0 is the appropriate alternative

bull (iii) The estimated equation is log (119908119886119892119890) = 595 + 0440 educ ndash 0215 exper + 00320 educsdotexper (024) (0174) (0200) (00153) n = 935 R2 = 135 adjusted-R2 = 132 The t statistic on the interaction term is about 213 which gives a p-value below 02 against H1 β3 gt 0 Therefore we reject H0 β3 = 0 in favor of H1 β3 gt 0 at the 2 level

bull (iv) We rewrite the equation as log(wage) = β0 + θ1educ + β2exper + β3 educ (exper minus 10) + u and run the regression log(wage) on educ exper and educ(exper ndash 10) We want the coefficient on educ We obtain θ1 asymp 0761 and se(θ1) asymp 0066 The 95 CI for θ1 is about 063 to 089

Q4 Wooldridge C68 (hprice1_c6_8do) bull (i) The estimated equation (where price is in dollars) is

119901119903120484119888119890 = minus217703 + 2068 lotsize + 12278 sqrft + 138525 bdrms (294750) (0642) (1324) (90101) n = 88 R2 = 672 adjusted-R2 = 661 120590 = 59833 The predicted price at lotsize = 10000 sqrft = 2300 and bdrms = 4 is about $336714

bull (ii) The regression is pricei on (lotsizei ndash 10000) (sqrfti ndash 2300) and (bdrmsi ndash 4) We want the intercept estimate and the associated 95 CI from this regression The CI is approximately 3367067 plusmn 14665 or about $322042 to $351372 when rounded to the nearest dollar

bull (iii) We must use equation (636) to obtain the standard error of and then use equation (637) (assuming that price is normally distributed) But from the regression in part (ii) se(1199100) asymp 73745 and 120590 asymp 59833 Therefore se(0) asymp [(73745)2 + (59833) 2]12 asymp 602858 Using 199 as the approximate 975th percentile in the t84 distribution gives the 95 CI for price0 at the given values of the explanatory variables as 3367067 plusmn 199(602858) or rounded to the nearest dollar $216738 to $456675 This is a fairly wide prediction interval But we have not used many factors to explain housing price If we had more we could presumably reduce the error standard deviation and therefore 120590 to obtain a tighter prediction interval

17

Week 8 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is a qualitative factor Give some examples

This should be easy bull How do you use dummy variables to represent qualitative factors

A factor with two levels (or values) is easily represented one dummy variable For a factor with more than two levels (or values) we need more than one dummies For example a factors with 5 values requires 4 dummy variables

bull How do you use dummy variables to represent an ordinal variable An ordinal variable may be represented by dummies in the same way as a multi-levelled factor

bull How do you test for differences in regression functions across different groups We generally use the F test on properly-defined restricted model and unrestricted model Chow test is a special case of the F test where the null hypothesis is no difference across groups

bull What can you achieve by interacting group dummy variables with other regressors The interaction allows the slope coefficients to be different across groups

bull What is ldquoprogram evaluationrdquo It assesses the effectiveness of a program by checking the differences in regression coefficients for the controlled (or base) group and the treatment group

bull What are the interpretations of the PRF and SRF when the dependent variable is binary The PRF is the conditional probability of ldquosuccessrdquo for given explanatory variables The SRF is the predicted conditional probability of ldquosuccessrdquo for given explanatory variables

bull What are the shortcomings of the LPM The major drawback of the LPM is that the predicted probability of success may be outside the interval [01]

bull Does MLR5 hold for the LPM No as indicated in the textbook and lecture the conditional variance of the disturbance is not a constant and depends on explanatory variables var(u|x) = E(y|x)[1minus E(y|x)]

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 74 bull (i) The approximate difference is just the coefficient on utility times 100 or ndash283 The t

statistic is minus283099 asymp minus286 which is statistically significant bull (ii) 100sdot[exp(minus283) ndash 1) asymp minus247 and so the estimate is somewhat smaller in magnitude bull (iii) The proportionate difference is 181 minus 158 = 023 or about 23 One equation that can

be estimated to obtain the standard error of this difference is log(salary) = β0 + β1log(sales) + β2roe + δ1consprod + δ2utility +δ3trans + u where trans is a dummy variable for the transportation industry Now the base group is finance and so the coefficient δ1 directly measures the difference between the consumer products and finance industries and we can use the t statistic on consprod

Q2 Wooldridge 76 bull In Section 33 ndash in particular in the discussion surrounding Table 32 ndash we discussed how to

determine the direction of bias in the OLS estimators when an important variable (ability in this case) has been omitted from the regression As we discussed there Table 32 only

18

strictly holds with a single explanatory variable included in the regression but we often ignore the presence of other independent variables and use this table as a rough guide (Or we can use the results of Problem 310 for a more precise analysis) If less able workers are more likely to receive training then train and u are negatively correlated If we ignore the presence of educ and exper or at least assume that train and u are negatively correlated after netting out educ and exper then we can use Table 32 the OLS estimator of β1 (with ability in the error term) has a downward bias Because we think β1 ge 0 we are less likely to conclude that the training program was effective Intuitively this makes sense if those chosen for training had not received training they would have lowers wages on average than the control group

Q3 Wooldridge 79 bull (i) Plugging in u = 0 and d = 1 gives f1(z) = (β0+δ0) + (β1+δ1)z bull (ii) Setting f0(z) = f1(z) gives β0 + β1z = (β0+δ0) + (β1+δ1)z Therefore provided δ1 ne 0

we have z = minusδ0δ1 Clearly z is positive if and only if δ0δ1 is negative which means δ0 and δ1 must have opposite signs

bull (iii) Using part (ii) we have totcoll = 357030 = 119 years (totcoll= total college years) bull (iv) The estimated years of college where women catch up to men is much too high to be

practically relevant While the estimated coefficient on femalesdottotcoll shows that the gap is reduced at higher levels of college it is never closed ndash not even close In fact at four years of college the difference in predicted log wage is still minus357+030(4) = 237 or about 211 less for women

Q4 Wooldridge C713 (apple_c7_13do) bull (i) 412660 asymp 624 bull (ii) The OLS estimates of the LPM are

119890119888119900119899119887119906119910 = 424 minus 803 ecoprc + 719 regprc + 00055 faminc + 024 hhsize (165) (109) (132) (00053) (013) + 025 educ minus 00050 age (008) (00125) n = 660 R2 = 110 If ecoprc increases by say 10 cents (10) then the probability of buying eco-labeled apples falls by about 080 If regprc increases by 10 cents the probability of buying eco-labeled apples increases by about 072 (Of course we are assuming that the probabilities are not close to the boundaries of zero and one respectively)

bull (iii) The F test with 4 and 653 df is 443 with p-value = 0015 Thus based on the usual F test the four non-price variables are jointly very significant Of the four variables educ appears to have the most important effect For example a difference of four years of education implies an increase of 025(4) = 10 in the estimated probability of buying eco-labeled apples This suggests that more highly educated people are more open to buying produce that is environmentally friendly which is perhaps expected Household size (hhsize) also has an effect Comparing a couple with two children to one that has no children ndash other factors equal ndash the couple with two children has a 048 higher probability of buying eco-labeled apples

bull (iv) The model with log(faminc) fits the data slightly better the R-squared increases to about 112 (We would not expect a large increase in R-squared from a simple change in the functional form) The coefficient on log(faminc) is about 045 (t = 155) If log(faminc) increases by 10 which means roughly a 10 increase in faminc then P(ecobuy = 1) is estimated to increase by about 0045 a pretty small effect

19

bull (v) The fitted probabilities range from about 185 to 1051 so none are negative There are two fitted probabilities above 1 which is not a source of concern with 660 observations

bull (vi) Using the standard prediction rule ndash predict one when 119890119888119900119899119887119906119910 ge 05 and zero otherwise ndash gives the fraction correctly predicted for ecobuy = 0 as 102248 asymp 411 so about 411 For ecobuy = 1 the fraction correctly predicted is 340412 asymp 825 or 825 With the usual prediction rule the model does a much better job predicting the decision to buy eco-labeled apples (The overall percent correctly predicted is about 67)

20

Week 9 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is heteroskedasticity in a regression model

When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i) we say that heteroskedasticity is present

bull In the presence of heteroskedasticity are the t-stat and F-stat from the usual OLS still valid Why Are there any other problems with the OLS under heteroskedasticity The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators

bull What are the heteroskedasticity-robust standard errors How do you use them in STATA These are the corrected standard errors that take into account the possible presence of heteroskedasticity The t-stat and F-stat computed using the heteroskedasticity-robust standard errors are valid test statistics In STATA these are easily obtained by using the option ldquorobustrdquo with the ldquoregressrdquo command eg regress lwage educ robust

bull How do you detect if there is heteroskedasticity The Breusch-Pagan test or the White test can be used to detect the presence of heteroskedasticity See Section 83 for details

bull If heteroskedasticity is present in a known form how would you estimate the model In this case the WLS estimators should be used which is more efficient than the OLS estimators See Section 84 for details

bull If heteroskedasticity is present in an unknown form how would you estimate the model If there is strong evidence for heteroskedasticity the FGLS estimators should be used which is based on an exponential functional form See Section 84 for details

bull What are the steps in the FGLS estimation You should summarise from Section 84

bull How would you handle the heteroskedasticity of the LPM The heteroskedasticity functional form is known for the LPM Hence the WLS can be used in principle However because the LPM can produce predicted probabilities that are outside the interval (01) the known functional form p(x)[1-p(x)] may not be useful for the WLS In that case it may be necessary to use FGLS for the LPM

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 81 bull Parts (ii) and (iii) The homoskedasticity assumption played no role in Chapter 5 in showing

that OLS is consistent But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid even in large samples As heteroskedasticity is a violation of the Gauss-Markov assumptions OLS is no longer BLUE

Q2 Wooldridge 82 bull With Var(u|incpriceeducfemale) = σ2inc2 h(x) = inc2 where h(x) is the heteroskedasticity

function defined in equation (821) Therefore radich(x) = inc and so the transformed equation is obtained by dividing the original equation by inc beerinc = β0(1inc) + β1 + β2priceinc + β3educinc + β4femaleinc + uinc

21

Notice that β1 which is the slope on inc in the original model is now a constant in the transformed equation This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation

Q3 Wooldridge 83 bull False The unbiasedness of WLS and OLS hinges crucially on Assumption MLR4 and as we

know from Chapter 4 this assumption is often violated when an important variable is omitted When MLR4 does not hold both WLS and OLS are biased Without specific information on how the omitted variable is correlated with the included explanatory variables it is not possible to determine which estimator has a small bias It is possible that WLS would have more bias than OLS or less bias Because we cannot know we should not claim to use WLS in order to solve ldquobiasesrdquo associated with OLS

Q4 Wooldridge 85 bull (i) No For each coefficient the usual standard errors and the heteroskedasticity-robust

ones are practically very similar bull (ii) The effect is minus029(4) = minus116 so the probability of smoking falls by about 116 bull (iii) As usual we compute the turning point in the quadratic 020[2(00026)] asymp 3846 so

about 38 and one-half years bull (iv) Holding other factors in the equation fixed a person in a state with restaurant smoking

restrictions has a 101 lower chance of smoking This is similar to the effect of having four more years of education

bull (v) We just plug the values of the independent variables into the OLS regression line 119904119898119900119896119890119904 = 656 minus 069log(6744)+012log(6500) minus029(16)+020(77) minus00026(77) asymp 0052 Thus the estimated probability of smoking for this person is close to zero (In fact this person is not a smoker so the equation predicts well for this particular observation)

Q5 Wooldridge C810 (401ksubs_c8_10do) bull (i) In the following equation estimated by OLS the usual standard errors are in (sdot) and the

heteroskedasticity-robust standard errors are in [sdot] 119890401119896 = minus506 + 0124 inc minus 000062 inc2 + 0265 age minus 00031 age2 minus 0035 male (081) (0006) (000005) (0039) (00005) (0121) [079] [0006] [000005] [0038] [00004] [0121] n = 9275 R2 = 094 There are no important differences if anything the robust standard errors are smaller

bull (ii) This is a general claim Since Var(y|x) = p(x)[1-p(x)] we can write E(u2|x) = p(x)-p(x)2 Written in error form u2 = p(x) - p(x)2 + v In other words we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v with the restrictions δ0 = 0 δ1 = 1 and δ2 = -1 Remember that for the LPM the fitted values 119910 are estimates of p(x) So when we run the regression 1199062 on 119910 and 1199102 (including an intercept) the intercept estimates should be close to zero the coefficient on 119910 should be close to one and the coefficient on 1199102 should be close to ndash1

bull (iii) The White LM statistic and F statistic about 5819 and 31032 respectively both of which are very significant The coefficient on 119890401119896 is about 1010 the coefficient on 119890401119896 2 about minus970 and the intercept is about -009 These estimates are quite close to what we expect to find from the theory in part (ii)

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 8: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

8

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 32 bull (i) Yes Because of budget constraints it makes sense that the more siblings there are in a

family the less education any one child in the family has To find the increase in the number of siblings that reduces predicted education by one year we solve 1 = 094(Δsibs) so Δsibs = 1094 asymp 106

bull (ii) Holding sibs and feduc fixed one more year of motherrsquos education implies 131 years more of predicted education So if a mother has four more years of education her son is predicted to have about a half a year (524) more years of education

bull (iii) Since the number of siblings is the same but meduc and feduc are both different the coefficients on meduc and feduc both need to be accounted for The predicted difference in education between B and A is 131(4) + 210(4) = 1364

Q2 Wooldridge 310 bull (i) Because x1 is highly correlated with x2 and x3 and these latter variables have large partial

effects on y the simple and multiple regression coefficients on x1 can differ by large amounts We have not done this case explicitly but given equation (346) and the discussion with a single omitted variable the intuition is pretty straightforward

bull (ii) Here we would expect 1205731 and 1 to be similar (subject of course to what we mean by ldquoalmost uncorrelatedrdquo) The amount of correlation between x2 and x3 does not directly effect the multiple regression estimate on x1 if is essentially uncorrelated with x2 and x3

bull (iii) In this case we are (unnecessarily) introducing multicollinearity into the regression x2 and x3 have small partial effects on y and yet x2 and x3 are highly correlated withx1 Adding x2 and x3 likely increases the standard error of the coefficient on x1 substantially so se(1) is likely to be much larger than se(1205731)

bull (iv) In this case adding x2 and x3 will decrease the residual variance without causing much collinearity (because x1 is almost uncorrelated with x2 and x3 ) so we should see se(1) smaller than se(1205731) The amount of correlation between x and does not directly affect se(1)

Q3 Wooldridge C32 (hprice1_ch03do) bull (i) The estimated equation is

119901119903120484119888119890 = minus1932 + 128 119904119902119903119891119905 + 1520 119887119889119903119898119904 119899 = 88 1198772 = 632 bull (ii) Holding square footage constant (the price increase) = 1520 (increase in bedroom)

Hence the price increases by 1520 which means $15200 bull (iii) Now (the price increase) = 128 (increase in square feet) + 1520 (increase in bedroom)

= 128 (140) + 1520 (1) = 3312 That is $33120 Because the house size is increasing the effect is larger than that in (ii)

bull (iv) About 632 bull (v) The predicted price is -1932+128(2438)+1520(4) = 353544 or $353544 bull (vi) The residual = 300000 ndash 353544 = - 53544 The buyer under-paid judged by the

predicted price But of course there are many other features of a house (some that we cannot even measure) that affect price We have not controlled for those

Q4 Wooldridge C36 (wage2_ch03_c3_6do) bull (i) The slope coefficient from regressing IQ on educ is 353383 bull (ii) The slope coefficient from log(wage) on educ is 05984 bull (iii) The slope coefficient from log(wage) on educ and IQ are 03912 and 00586 respectively bull (iv) The computed 211 βδβ ˆ~ˆ + = 03912 + 353383(00586) asymp 05984

9

Week 5 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the CLM assumptions

These are MLR1 MLR2 MLR3 and MLR6 MLR6 is a very strong assumption that implies both MLR4 and MLR5

bull What is the sampling distribution of the OLS estimators under the CLM assumptions The OLS estimators under the CLM assumptions follow the normal distribution

bull What are the standard errors of the OLS estimators The standard error is the square root of the estimated variance of the estimator

bull What is the null hypothesis about a parameter In this context the null hypothesis is a statement that assigns a known value (or hypothesised value) to the parameter of interest Usually the null is a maintained proposition (from economic theories or experience) that will only be rejected on very strong evidence

bull What is a one-tailed (two-tailed) alternative hypothesis This should be clear once you finish your reading

bull In testing hypotheses what is a Type 1 (Type 2) error What is the level of significance Type 1 error is the rejection of a true null Type 2 error is the non-rejection of a false null

bull The decision rule we use can be stated as ldquoreject the null if the t-statistic exceeds the critical valuerdquo How is the critical value determined The critical value is determined by the level of significance (or level of confidence in the case of confidence intervals) and the distribution of the test statistic For example for t-stat we use the t-distribution when the sample size is small and the normal distribution when the sample size is large (df gt 120)

bull Justify the statement ldquoGiven the observed test statistic the p-value is the smallest significant level at which the null hypothesis would be rejectedrdquo If you used the observed t-stat as your critical value you would just reject the null (because the t-stat is equal to this value) This ldquocritical valuerdquo (= the t-stat value) implies a level of significance say p which is exactly the p-value For any significance level smaller than p the corresponding critical value is more extreme than the t-stat value and does not lead to the rejection of the null Hence p is the smallest significance level at which the null would be rejected

bull What is the 90 confidence interval for a parameter It is an interval [L U] defined by two sample statistics the lower bound L and the upper bound U The probability that [L U] covers the parameter is 09

bull In constructing a confidence interval for a parameter what is the level of confidence It is the probability that the CI covers the parameter

bull When the level of confidence increases how would the width of the confidence interval change (holding other things fixed) The width will increase

bull Try to convince yourself that the event ldquothe 90 confidence interval covers a hypothesised value of the parameterrdquo is the same as the event ldquothe null of the parameter being the hypothesised value cannot be rejected in favour of the two-tailed alternative at the 10 level of significancerdquo It becomes apparent if you start with P(|t-stat| gt c) = 09 and disentangle the expression for the t-stat

10

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 41 bull (i) and (iii) generally cause the t statistics not to have a t distribution under H0

Homoskedasticity is one of the CLM assumptions An important omitted variable violates Assumption MLR4 (hence MLR6) in general The CLM assumptions contain no mention of the sample correlations among independent variables except to rule out the case where the correlation is one (MLR3)

Q2 Wooldridge 42 bull (i) H0β3 = 0 H1β3 gt 0 bull (ii) The proportionate effect on is 00024(50) = 012 To obtain the percentage effect we

multiply this by 100 12 Therefore a 50 point ceteris paribus increase in ros is predicted to increase salary by only 12 Practically speaking this is a very small effect for such a large change in ros

bull (iii) The 10 critical value for a one-tailed test using df = infin is obtained from Table G2 as 1282 The t statistic on ros is 0002400054 asymp 44 which is well below the critical value Therefore we fail to reject H0 at the 10 significance level

bull (iv) Based on this sample the estimated ros coefficient appears to be different from zero only because of sampling variation On the other hand including ros may not be causing any harm it depends on how correlated it is with the other independent variables (although these are very significant even with ros in the equation)

Q3 Wooldridge 45 bull (i) 412 plusmn 196(094) or about [228 596] bull (ii) No because the value 4 is well inside the 95 CI bull (iii) Yes because 1 is well outside the 95 CI

Q4 Wooldridge C48 (401ksubs_ch04do) bull (i) There are 2017 single people in the sample of 9275 bull (ii) The estimated equation is

119899119890119905119905119891119886 = minus4304 + 799 inc + 843 age ( 408) (060) (092) n = 2017 R2 = 119 The coefficient on inc indicates that one more dollar in income (holding age fixed) is reflected in about 80 more cents in predicted nettfa no surprise there The coefficient on age means that holding income fixed if a person gets another year older hisher nettfa is predicted to increase by about $843 (Remember nettfa is in thousands of dollars) Again this is not surprising

bull (iii) The intercept is not very interesting as it gives the predicted nettfa for inc = 0 and age = 0 Clearly there is no one with even close to these values in the relevant population

bull (iv) The t statistic is (843 minus 1)092 asymp minus171 Against the one-sided alternative H1 β2 lt 1 the p-value is about 044 Therefore we can reject H0 β2 = 1 at the 5 significance level (against the one-sided alternative)

bull (v) The slope coefficient on inc in the simple regression is about 821 which is not very different from the 799 obtained in part (ii) As it turns out the correlation between inc and age in the sample of single people is only about 039 which helps explain why the simple and multiple regression estimates are not very different see the text also

11

Week 6 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull How would you test hypotheses about a single linear combination of parameters

We may re-parameterise the regression model to isolate the ldquosingle linear combinationrdquo Then the OLS on the re-parameterised model will provide the estimate of the ldquosingle linear combinationrdquo and the related standard error See the example in the textbook and lecture slides

bull What are exclusion restrictions for a regression model Exclusion restrictions are the null hypothesis that a group of x-variables have zero coefficients in the regression model

bull What are restricted and unrestricted models When the null hypothesis is a set of restrictions on the parameters the regression under the null is called the restricted model while the regression under the alternative (which simply states that the null is false) is called the unrestricted model

bull How do you compute the F-statistic given that you have SSRs Clearly the general F-stat is based on the relative difference between the SSRs under the restricted model and unrestricted model F = [(SSRr ndash SSRur)q][SSRur(n-k-1)] where you should be able to explain the meanings of various symbols

bull What are general linear restrictions on parameters These are linear equations on the parameters For example β1 ndash 2β2 + 3β3 = 0

bull What is the test for the overall significance of a regression This is the F-test for the null hypothesis that all the coefficients of x-variables are zero

bull How would you report your regression results See the guidelines in Section 46 of the textbook

bull Why would you care about the asymptotic properties of the OLS estimators When MLR6 does not hold the finite-sample distribution of the OLS estimators is not available For reasonably large samples (large n) we use the asymptotic distribution of the OLS estimators which is known from studying the asymptotic properties to approximate the finite-sample distribution of the OLS estimators which is unknown

bull Comparing the inference procedures in Chapter 5 with those in Chapter 4 can you list the similarities and differences It is beneficial to make a list by yourself

bull Under MLR1-MLR5 the OLS estimators are consistent asymptotically normal and asymptotically efficient Try to explain these properties in your own words This is really a test on your understanding of these notions in econometrics

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 46 bull (i) With df = n ndash 2 = 86 we obtain the 5 critical value from Table G2 with df = 90 Because

each test is two-tailed the critical value is 1987 The t statistic for H0β0 = 0 is about -89 which is much less than 1987 in absolute value Therefore we fail to reject H0β0 = 0 The t statistic for H0β1 = 1 is (976 ndash 1)049 asymp -49 which is even less significant (Remember we reject H0 in favor of H1 in this case only if |t| gt 1987)

bull (ii) We use the SSR form of the F statistic We are testing q = 2 restrictions and the df in the unrestricted model is 86 We are given SSRr = 20944899 and SSRur = 16564451 Therefore F = [(20944899 minus 16564451)2][1656445186] asymp 1137

12

which is a strong rejection of H0 from Table G3c the 1 critical value with 2 and 90 df is 485

bull (iii) We use the R-squared form of the F statistic We are testing q = 3 restrictions and there are 88 ndash 5 = 83 df in the unrestricted model The F statistic is F = [(829 ndash 820)3][(1 ndash 829)83] asymp 146 The 10 critical value (again using 90 denominator df in Table G3a) is 215 so we fail to reject H0 at even the 10 level In fact the p-value is about 23

bull (iv) If heteroskedasticity were present Assumption MLR5 would be violated and the F statistic would not have an F distribution under the null hypothesis Therefore comparing the F statistic against the usual critical values or obtaining the p-value from the F distribution would not be especially meaningful

Q2 Wooldridge 48 bull (i) We use Property VAR3 from Appendix B

Var(1minus 32) = Var (1) + 9 Var (2) ndash 6 Cov (1 2) bull (ii) t = (1 minus 32 minus 1)se(1minus 32) so we need the standard error of 1minus 32 bull (iii) Because θ = β1 ndash 3β2 we can write β1 = θ + 3β2 Plugging this into the population model

gives y = β0 + (θ + 3β2)x1 + β2 x2 + β3 x3 + u = β0 + θ x1 + β2 (3 x1 + x2) + β3 x3 + u This last equation is what we would estimate by regressing y on x1 3x1 + x2 and x3 The coefficient and standard error on x1 are what we want

Q3 Wooldridge 410 bull (i) We need to compute the F statistic for the overall significance of the regression with n =

142 and k = 4 F = [0395(1 ndash 0395)](1374) asymp 141 The 5 critical value with 4 and 120 df is 245 which is well above the value of F Therefore we fail to reject H0 β1 = β2 = β3 = β4 = 0 at the 10 level No explanatory variable is individually significant at the 5 level The largest absolute t statistic is on dkr tdkr asymp 160 which is not significant at the 5 level against a two-sided alternative

bull (ii) The F statistic (with the same df) is now [0330(1 ndash 0330)](1374) asymp 117 which is even lower than in part (i) None of the t statistics is significant at a reasonable level

bull (iii) We probably should not use the logs as the logarithm is not defined for firms that have zero for dkr or eps Therefore we would lose some firms in the regression

bull (iv) It seems very weak There are no significant t statistics at the 5 level (against a two-sided alternative) and the F statistics are insignificant in both cases Plus less than 4 of the variation in return is explained by the independent variables

Q4 Wooldridge C49 (discrim_c4_9do) bull (i) The results from the OLS regression with standard errors in parentheses are

log (119901119904119900119889119886) = minus146 + 073 prpblck + 137 log(income) + 380 prppov (029) (031) (027) (133) n = 401 R2 = 087

bull The p-value for testing H0 β1= 0 against the two-sided alternative is about 018 so that we reject H0 at the 5 level but not at the 1 level

bull (ii) The correlation is about minus84 indicating a strong degree of multicollinearity Yet each coefficient is very statistically significant the t statistic for log (119894119899119888119900119898) is about 51 and that for prppov is about 286 (two-sided p-value = 004)

bull (iii) The OLS regression results when log(hseval) is added are log (119901119904119900119889119886) = minus84 + 098 prpblck minus 053 log(income) + 052 prppov + 121 log(hseval) (29) (029) (038) (134) (018)

13

n = 401 R2 = 184 The coefficient on log(hseval) is an elasticity a one percent increase in housing value holding the other variables fixed increases the predicted price by about 12 percent The two-sided p-value is zero to three decimal places

bull (iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the 15 significance level against a two-sided alternative for log(income) and prppov is does not have a t statistic even close to one in absolute value) Nevertheless they are jointly significant at the 5 level because the outcome of the F2396 statistic is about 352 with p-value = 030 All of the control variables ndash log(income) prppov and log(hseval) ndash are highly correlated so it is not surprising that some are individually insignificant

bull (v) Because the regression in (iii) contains the most controls log(hseval) is individually significant and log(income) and prppov are jointly significant (iii) seems the most reliable It holds fixed three measure of income and affluence Therefore a reasonable estimate is that if prpblck increases by 10 psoda is estimated to increase by 1 other factors held fixed

Q5 Wooldridge 52 bull This is about the inconsistency of the simple regression of pctstck on funds A higher

tolerance of risk means more willingness to invest in the stock market so β2 gt 0 By assumption funds and risktol are positively correlated Now we use equation (55) where δ1 gt 0 plim(1205731) = β1 + β2δ1 gt β1 so 1205731 has a positive inconsistency (asymptotic bias) This makes sense if we omit risktol from the regression and it is positively correlated with funds some of the estimated effect of funds is actually due to the effect of risktol

Q6 Wooldridge C51 (wage1_c5_1do) bull (i) The estimated equation is

119908119886119892119890 = minus287 + 599 educ + 022 exper + 169 tenure (073) (051) (012) (022) n = 526 R2 = 306 120590 = 3085

bull Below is a histogram of the 526 residual 119906119894 i = 1 2 526 The histogram uses 27 bins which is suggested by the formula in the Stata manual for 526 observations For comparison the normal distribution that provides the best fit to the histogram is also plotted

bull

14

bull (ii) With log(wage) as the dependent variable the estimated equation is log (119908119886119892119890) = 284 + 092 educ + 0041 exper + 022 tenure (104) (007) (0017) (003) n = 526 R2 = 316 120590 = 441 The histogram for the residuals from this equation with the best-fitting normal distribution overlaid is given below

bull (iii) The residuals from the log(wage) regression appear to be more normally distributed

Certainly the histogram in part (ii) fits under its comparable normal density better than in part (i) and the histogram for the wage residuals is notably skewed to the left In the wage regression there are some very large residuals (roughly equal to 15) that lie almost five estimated standard deviations (120590 = 3085) from the mean of the residuals which is identically zero of course Residuals far from zero does not appear to be nearly as much of a problem in the log(wage) regression

bull The skewness and kurtosis of the residuals from the OLS regression may be used to test the null that u is normally distributed (skewness = 0 and kurtosis = 3 under the null) The null distribution of the Jarque-Bera test statistic (which takes into account both skewness and kurtosis) is approximately the chi-squared with 2 degrees of freedom Hence at the 5 level we reject the normality of u whenever the JB statistic is greater than 599 In STATA this test is carried out by the command ldquosktestrdquo With the data in WAGE1RAW JB = sdot (extremely large) for the wage model and JB = 1059 for the log(wage) model While the normality is rejected for both models we do see that the JB statistic is much smaller for the log(wage) model [also compare its p-values of skewness and excess kurtosis tests against those from the wage model] Hence the normality appears to be a reasonable approximation to the distribution of the error term in the log(wage) model

15

Week 7 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the advantages of using the log of a variable in regression Find the ldquorules of

thumbrdquo for taking logs See page 191 of Wooldridge (p198-199 for the 3rd edition)

bull Be careful when you interpret the coefficients of explanatory variables in a model where some variables are in logarithm Do you remember Table 23 Consult Table 23

bull How do you compute the change in y caused by Δx when the model is built for log(y) A precise computation is given by Equation (68) of Wooldridge

bull Why do we need the ldquointeractionrdquo terms in regression models An interaction term is needed if the partial effect of an explanatory variable is linearly related to another explanatory variable See (617) for example

bull What is the adjusted R-squared What is the difference between it and the R-squared The primary attractiveness of the adjusted R-squared is that it imposes penalty for adding additional explanatory variables to a model The R-squared can never fall when a new explanatory variable is added However the adjusted R-squared will fall if the t-ratio on the new variable is less than one in absolute value

bull How do you construct interval prediction for given x-values This is nicely summarised in Equations (627)-(631) and the surrounding text

bull How do you predict y for given x-values when the model is built for log(y) Check the list on page 212 (p220 for the 3rd edition)

bull What is involved in ldquoresidual analysisrdquo See pages 209-210 (p217-218 for the 3rd edition)

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 64 bull (i) Holding all other factors fixed we have

Δlog(wage) = β1 Δeduc + β2 Δeducsdot pareduc = (β1 + β2 pareduc) Δeduc Dividing both sides by Δeduc gives the result The sign of β2 is not obvious although β2 gt 0 if we think a child gets more out of another year of education the more highly educated are the childrsquos parents

bull (ii) We use the values pareduc = 32 and pareduc = 24 to interpret the coefficient on educsdotpareduc The difference in the estimated return to education is 00078(32 ndash 24) = 0062 or about 62 percentage points (Percentage points are changes in percentages)

bull (iii) When we add pareduc by itself the coefficient on the interaction term is negative The t statistic on educsdotpareduc is about ndash133 which is not significant at the 10 level against a two-sided alternative Note that the coefficient on pareduc is significant at the 5 level against a two-sided alternative This provides a good example of how omitting a level effect (pareduc in this case) can lead to biased estimation of the interaction effect

Q2 Wooldridge 65 bull This would make little sense Performances on math and science exams are measures of

outputs of the educational process and we would like to know how various educational inputs and school characteristics affect math and science scores For example if the staff-to-pupil ratio has an effect on both exam scores why would we want to hold

16

performance on the science test fixed while studying the effects of staff on the math pass rate This would be an example of controlling for too many factors in a regression equation The variable scill could be a dependent variable in an identical regression equation

Q3 Wooldridge C63 (C62 for the 3rd Edition) (wage2_c6_3do) bull (i) Holding exper (and the elements in u) fixed we have

Δ log(wage) = β1Δeduc + β2Δeducsdotexper = (β1 + β2exper)Δeduc or Δ log(wage) Δeduc = (β1 + β2exper) This is the approximate proportionate change in wage given one more year of education

bull (ii) H0 β3 = 0 If we think that education and experience interact positively ndash so that people with more experience are more productive when given another year of education then β3 gt 0 is the appropriate alternative

bull (iii) The estimated equation is log (119908119886119892119890) = 595 + 0440 educ ndash 0215 exper + 00320 educsdotexper (024) (0174) (0200) (00153) n = 935 R2 = 135 adjusted-R2 = 132 The t statistic on the interaction term is about 213 which gives a p-value below 02 against H1 β3 gt 0 Therefore we reject H0 β3 = 0 in favor of H1 β3 gt 0 at the 2 level

bull (iv) We rewrite the equation as log(wage) = β0 + θ1educ + β2exper + β3 educ (exper minus 10) + u and run the regression log(wage) on educ exper and educ(exper ndash 10) We want the coefficient on educ We obtain θ1 asymp 0761 and se(θ1) asymp 0066 The 95 CI for θ1 is about 063 to 089

Q4 Wooldridge C68 (hprice1_c6_8do) bull (i) The estimated equation (where price is in dollars) is

119901119903120484119888119890 = minus217703 + 2068 lotsize + 12278 sqrft + 138525 bdrms (294750) (0642) (1324) (90101) n = 88 R2 = 672 adjusted-R2 = 661 120590 = 59833 The predicted price at lotsize = 10000 sqrft = 2300 and bdrms = 4 is about $336714

bull (ii) The regression is pricei on (lotsizei ndash 10000) (sqrfti ndash 2300) and (bdrmsi ndash 4) We want the intercept estimate and the associated 95 CI from this regression The CI is approximately 3367067 plusmn 14665 or about $322042 to $351372 when rounded to the nearest dollar

bull (iii) We must use equation (636) to obtain the standard error of and then use equation (637) (assuming that price is normally distributed) But from the regression in part (ii) se(1199100) asymp 73745 and 120590 asymp 59833 Therefore se(0) asymp [(73745)2 + (59833) 2]12 asymp 602858 Using 199 as the approximate 975th percentile in the t84 distribution gives the 95 CI for price0 at the given values of the explanatory variables as 3367067 plusmn 199(602858) or rounded to the nearest dollar $216738 to $456675 This is a fairly wide prediction interval But we have not used many factors to explain housing price If we had more we could presumably reduce the error standard deviation and therefore 120590 to obtain a tighter prediction interval

17

Week 8 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is a qualitative factor Give some examples

This should be easy bull How do you use dummy variables to represent qualitative factors

A factor with two levels (or values) is easily represented one dummy variable For a factor with more than two levels (or values) we need more than one dummies For example a factors with 5 values requires 4 dummy variables

bull How do you use dummy variables to represent an ordinal variable An ordinal variable may be represented by dummies in the same way as a multi-levelled factor

bull How do you test for differences in regression functions across different groups We generally use the F test on properly-defined restricted model and unrestricted model Chow test is a special case of the F test where the null hypothesis is no difference across groups

bull What can you achieve by interacting group dummy variables with other regressors The interaction allows the slope coefficients to be different across groups

bull What is ldquoprogram evaluationrdquo It assesses the effectiveness of a program by checking the differences in regression coefficients for the controlled (or base) group and the treatment group

bull What are the interpretations of the PRF and SRF when the dependent variable is binary The PRF is the conditional probability of ldquosuccessrdquo for given explanatory variables The SRF is the predicted conditional probability of ldquosuccessrdquo for given explanatory variables

bull What are the shortcomings of the LPM The major drawback of the LPM is that the predicted probability of success may be outside the interval [01]

bull Does MLR5 hold for the LPM No as indicated in the textbook and lecture the conditional variance of the disturbance is not a constant and depends on explanatory variables var(u|x) = E(y|x)[1minus E(y|x)]

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 74 bull (i) The approximate difference is just the coefficient on utility times 100 or ndash283 The t

statistic is minus283099 asymp minus286 which is statistically significant bull (ii) 100sdot[exp(minus283) ndash 1) asymp minus247 and so the estimate is somewhat smaller in magnitude bull (iii) The proportionate difference is 181 minus 158 = 023 or about 23 One equation that can

be estimated to obtain the standard error of this difference is log(salary) = β0 + β1log(sales) + β2roe + δ1consprod + δ2utility +δ3trans + u where trans is a dummy variable for the transportation industry Now the base group is finance and so the coefficient δ1 directly measures the difference between the consumer products and finance industries and we can use the t statistic on consprod

Q2 Wooldridge 76 bull In Section 33 ndash in particular in the discussion surrounding Table 32 ndash we discussed how to

determine the direction of bias in the OLS estimators when an important variable (ability in this case) has been omitted from the regression As we discussed there Table 32 only

18

strictly holds with a single explanatory variable included in the regression but we often ignore the presence of other independent variables and use this table as a rough guide (Or we can use the results of Problem 310 for a more precise analysis) If less able workers are more likely to receive training then train and u are negatively correlated If we ignore the presence of educ and exper or at least assume that train and u are negatively correlated after netting out educ and exper then we can use Table 32 the OLS estimator of β1 (with ability in the error term) has a downward bias Because we think β1 ge 0 we are less likely to conclude that the training program was effective Intuitively this makes sense if those chosen for training had not received training they would have lowers wages on average than the control group

Q3 Wooldridge 79 bull (i) Plugging in u = 0 and d = 1 gives f1(z) = (β0+δ0) + (β1+δ1)z bull (ii) Setting f0(z) = f1(z) gives β0 + β1z = (β0+δ0) + (β1+δ1)z Therefore provided δ1 ne 0

we have z = minusδ0δ1 Clearly z is positive if and only if δ0δ1 is negative which means δ0 and δ1 must have opposite signs

bull (iii) Using part (ii) we have totcoll = 357030 = 119 years (totcoll= total college years) bull (iv) The estimated years of college where women catch up to men is much too high to be

practically relevant While the estimated coefficient on femalesdottotcoll shows that the gap is reduced at higher levels of college it is never closed ndash not even close In fact at four years of college the difference in predicted log wage is still minus357+030(4) = 237 or about 211 less for women

Q4 Wooldridge C713 (apple_c7_13do) bull (i) 412660 asymp 624 bull (ii) The OLS estimates of the LPM are

119890119888119900119899119887119906119910 = 424 minus 803 ecoprc + 719 regprc + 00055 faminc + 024 hhsize (165) (109) (132) (00053) (013) + 025 educ minus 00050 age (008) (00125) n = 660 R2 = 110 If ecoprc increases by say 10 cents (10) then the probability of buying eco-labeled apples falls by about 080 If regprc increases by 10 cents the probability of buying eco-labeled apples increases by about 072 (Of course we are assuming that the probabilities are not close to the boundaries of zero and one respectively)

bull (iii) The F test with 4 and 653 df is 443 with p-value = 0015 Thus based on the usual F test the four non-price variables are jointly very significant Of the four variables educ appears to have the most important effect For example a difference of four years of education implies an increase of 025(4) = 10 in the estimated probability of buying eco-labeled apples This suggests that more highly educated people are more open to buying produce that is environmentally friendly which is perhaps expected Household size (hhsize) also has an effect Comparing a couple with two children to one that has no children ndash other factors equal ndash the couple with two children has a 048 higher probability of buying eco-labeled apples

bull (iv) The model with log(faminc) fits the data slightly better the R-squared increases to about 112 (We would not expect a large increase in R-squared from a simple change in the functional form) The coefficient on log(faminc) is about 045 (t = 155) If log(faminc) increases by 10 which means roughly a 10 increase in faminc then P(ecobuy = 1) is estimated to increase by about 0045 a pretty small effect

19

bull (v) The fitted probabilities range from about 185 to 1051 so none are negative There are two fitted probabilities above 1 which is not a source of concern with 660 observations

bull (vi) Using the standard prediction rule ndash predict one when 119890119888119900119899119887119906119910 ge 05 and zero otherwise ndash gives the fraction correctly predicted for ecobuy = 0 as 102248 asymp 411 so about 411 For ecobuy = 1 the fraction correctly predicted is 340412 asymp 825 or 825 With the usual prediction rule the model does a much better job predicting the decision to buy eco-labeled apples (The overall percent correctly predicted is about 67)

20

Week 9 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is heteroskedasticity in a regression model

When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i) we say that heteroskedasticity is present

bull In the presence of heteroskedasticity are the t-stat and F-stat from the usual OLS still valid Why Are there any other problems with the OLS under heteroskedasticity The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators

bull What are the heteroskedasticity-robust standard errors How do you use them in STATA These are the corrected standard errors that take into account the possible presence of heteroskedasticity The t-stat and F-stat computed using the heteroskedasticity-robust standard errors are valid test statistics In STATA these are easily obtained by using the option ldquorobustrdquo with the ldquoregressrdquo command eg regress lwage educ robust

bull How do you detect if there is heteroskedasticity The Breusch-Pagan test or the White test can be used to detect the presence of heteroskedasticity See Section 83 for details

bull If heteroskedasticity is present in a known form how would you estimate the model In this case the WLS estimators should be used which is more efficient than the OLS estimators See Section 84 for details

bull If heteroskedasticity is present in an unknown form how would you estimate the model If there is strong evidence for heteroskedasticity the FGLS estimators should be used which is based on an exponential functional form See Section 84 for details

bull What are the steps in the FGLS estimation You should summarise from Section 84

bull How would you handle the heteroskedasticity of the LPM The heteroskedasticity functional form is known for the LPM Hence the WLS can be used in principle However because the LPM can produce predicted probabilities that are outside the interval (01) the known functional form p(x)[1-p(x)] may not be useful for the WLS In that case it may be necessary to use FGLS for the LPM

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 81 bull Parts (ii) and (iii) The homoskedasticity assumption played no role in Chapter 5 in showing

that OLS is consistent But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid even in large samples As heteroskedasticity is a violation of the Gauss-Markov assumptions OLS is no longer BLUE

Q2 Wooldridge 82 bull With Var(u|incpriceeducfemale) = σ2inc2 h(x) = inc2 where h(x) is the heteroskedasticity

function defined in equation (821) Therefore radich(x) = inc and so the transformed equation is obtained by dividing the original equation by inc beerinc = β0(1inc) + β1 + β2priceinc + β3educinc + β4femaleinc + uinc

21

Notice that β1 which is the slope on inc in the original model is now a constant in the transformed equation This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation

Q3 Wooldridge 83 bull False The unbiasedness of WLS and OLS hinges crucially on Assumption MLR4 and as we

know from Chapter 4 this assumption is often violated when an important variable is omitted When MLR4 does not hold both WLS and OLS are biased Without specific information on how the omitted variable is correlated with the included explanatory variables it is not possible to determine which estimator has a small bias It is possible that WLS would have more bias than OLS or less bias Because we cannot know we should not claim to use WLS in order to solve ldquobiasesrdquo associated with OLS

Q4 Wooldridge 85 bull (i) No For each coefficient the usual standard errors and the heteroskedasticity-robust

ones are practically very similar bull (ii) The effect is minus029(4) = minus116 so the probability of smoking falls by about 116 bull (iii) As usual we compute the turning point in the quadratic 020[2(00026)] asymp 3846 so

about 38 and one-half years bull (iv) Holding other factors in the equation fixed a person in a state with restaurant smoking

restrictions has a 101 lower chance of smoking This is similar to the effect of having four more years of education

bull (v) We just plug the values of the independent variables into the OLS regression line 119904119898119900119896119890119904 = 656 minus 069log(6744)+012log(6500) minus029(16)+020(77) minus00026(77) asymp 0052 Thus the estimated probability of smoking for this person is close to zero (In fact this person is not a smoker so the equation predicts well for this particular observation)

Q5 Wooldridge C810 (401ksubs_c8_10do) bull (i) In the following equation estimated by OLS the usual standard errors are in (sdot) and the

heteroskedasticity-robust standard errors are in [sdot] 119890401119896 = minus506 + 0124 inc minus 000062 inc2 + 0265 age minus 00031 age2 minus 0035 male (081) (0006) (000005) (0039) (00005) (0121) [079] [0006] [000005] [0038] [00004] [0121] n = 9275 R2 = 094 There are no important differences if anything the robust standard errors are smaller

bull (ii) This is a general claim Since Var(y|x) = p(x)[1-p(x)] we can write E(u2|x) = p(x)-p(x)2 Written in error form u2 = p(x) - p(x)2 + v In other words we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v with the restrictions δ0 = 0 δ1 = 1 and δ2 = -1 Remember that for the LPM the fitted values 119910 are estimates of p(x) So when we run the regression 1199062 on 119910 and 1199102 (including an intercept) the intercept estimates should be close to zero the coefficient on 119910 should be close to one and the coefficient on 1199102 should be close to ndash1

bull (iii) The White LM statistic and F statistic about 5819 and 31032 respectively both of which are very significant The coefficient on 119890401119896 is about 1010 the coefficient on 119890401119896 2 about minus970 and the intercept is about -009 These estimates are quite close to what we expect to find from the theory in part (ii)

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 9: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

9

Week 5 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the CLM assumptions

These are MLR1 MLR2 MLR3 and MLR6 MLR6 is a very strong assumption that implies both MLR4 and MLR5

bull What is the sampling distribution of the OLS estimators under the CLM assumptions The OLS estimators under the CLM assumptions follow the normal distribution

bull What are the standard errors of the OLS estimators The standard error is the square root of the estimated variance of the estimator

bull What is the null hypothesis about a parameter In this context the null hypothesis is a statement that assigns a known value (or hypothesised value) to the parameter of interest Usually the null is a maintained proposition (from economic theories or experience) that will only be rejected on very strong evidence

bull What is a one-tailed (two-tailed) alternative hypothesis This should be clear once you finish your reading

bull In testing hypotheses what is a Type 1 (Type 2) error What is the level of significance Type 1 error is the rejection of a true null Type 2 error is the non-rejection of a false null

bull The decision rule we use can be stated as ldquoreject the null if the t-statistic exceeds the critical valuerdquo How is the critical value determined The critical value is determined by the level of significance (or level of confidence in the case of confidence intervals) and the distribution of the test statistic For example for t-stat we use the t-distribution when the sample size is small and the normal distribution when the sample size is large (df gt 120)

bull Justify the statement ldquoGiven the observed test statistic the p-value is the smallest significant level at which the null hypothesis would be rejectedrdquo If you used the observed t-stat as your critical value you would just reject the null (because the t-stat is equal to this value) This ldquocritical valuerdquo (= the t-stat value) implies a level of significance say p which is exactly the p-value For any significance level smaller than p the corresponding critical value is more extreme than the t-stat value and does not lead to the rejection of the null Hence p is the smallest significance level at which the null would be rejected

bull What is the 90 confidence interval for a parameter It is an interval [L U] defined by two sample statistics the lower bound L and the upper bound U The probability that [L U] covers the parameter is 09

bull In constructing a confidence interval for a parameter what is the level of confidence It is the probability that the CI covers the parameter

bull When the level of confidence increases how would the width of the confidence interval change (holding other things fixed) The width will increase

bull Try to convince yourself that the event ldquothe 90 confidence interval covers a hypothesised value of the parameterrdquo is the same as the event ldquothe null of the parameter being the hypothesised value cannot be rejected in favour of the two-tailed alternative at the 10 level of significancerdquo It becomes apparent if you start with P(|t-stat| gt c) = 09 and disentangle the expression for the t-stat

10

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 41 bull (i) and (iii) generally cause the t statistics not to have a t distribution under H0

Homoskedasticity is one of the CLM assumptions An important omitted variable violates Assumption MLR4 (hence MLR6) in general The CLM assumptions contain no mention of the sample correlations among independent variables except to rule out the case where the correlation is one (MLR3)

Q2 Wooldridge 42 bull (i) H0β3 = 0 H1β3 gt 0 bull (ii) The proportionate effect on is 00024(50) = 012 To obtain the percentage effect we

multiply this by 100 12 Therefore a 50 point ceteris paribus increase in ros is predicted to increase salary by only 12 Practically speaking this is a very small effect for such a large change in ros

bull (iii) The 10 critical value for a one-tailed test using df = infin is obtained from Table G2 as 1282 The t statistic on ros is 0002400054 asymp 44 which is well below the critical value Therefore we fail to reject H0 at the 10 significance level

bull (iv) Based on this sample the estimated ros coefficient appears to be different from zero only because of sampling variation On the other hand including ros may not be causing any harm it depends on how correlated it is with the other independent variables (although these are very significant even with ros in the equation)

Q3 Wooldridge 45 bull (i) 412 plusmn 196(094) or about [228 596] bull (ii) No because the value 4 is well inside the 95 CI bull (iii) Yes because 1 is well outside the 95 CI

Q4 Wooldridge C48 (401ksubs_ch04do) bull (i) There are 2017 single people in the sample of 9275 bull (ii) The estimated equation is

119899119890119905119905119891119886 = minus4304 + 799 inc + 843 age ( 408) (060) (092) n = 2017 R2 = 119 The coefficient on inc indicates that one more dollar in income (holding age fixed) is reflected in about 80 more cents in predicted nettfa no surprise there The coefficient on age means that holding income fixed if a person gets another year older hisher nettfa is predicted to increase by about $843 (Remember nettfa is in thousands of dollars) Again this is not surprising

bull (iii) The intercept is not very interesting as it gives the predicted nettfa for inc = 0 and age = 0 Clearly there is no one with even close to these values in the relevant population

bull (iv) The t statistic is (843 minus 1)092 asymp minus171 Against the one-sided alternative H1 β2 lt 1 the p-value is about 044 Therefore we can reject H0 β2 = 1 at the 5 significance level (against the one-sided alternative)

bull (v) The slope coefficient on inc in the simple regression is about 821 which is not very different from the 799 obtained in part (ii) As it turns out the correlation between inc and age in the sample of single people is only about 039 which helps explain why the simple and multiple regression estimates are not very different see the text also

11

Week 6 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull How would you test hypotheses about a single linear combination of parameters

We may re-parameterise the regression model to isolate the ldquosingle linear combinationrdquo Then the OLS on the re-parameterised model will provide the estimate of the ldquosingle linear combinationrdquo and the related standard error See the example in the textbook and lecture slides

bull What are exclusion restrictions for a regression model Exclusion restrictions are the null hypothesis that a group of x-variables have zero coefficients in the regression model

bull What are restricted and unrestricted models When the null hypothesis is a set of restrictions on the parameters the regression under the null is called the restricted model while the regression under the alternative (which simply states that the null is false) is called the unrestricted model

bull How do you compute the F-statistic given that you have SSRs Clearly the general F-stat is based on the relative difference between the SSRs under the restricted model and unrestricted model F = [(SSRr ndash SSRur)q][SSRur(n-k-1)] where you should be able to explain the meanings of various symbols

bull What are general linear restrictions on parameters These are linear equations on the parameters For example β1 ndash 2β2 + 3β3 = 0

bull What is the test for the overall significance of a regression This is the F-test for the null hypothesis that all the coefficients of x-variables are zero

bull How would you report your regression results See the guidelines in Section 46 of the textbook

bull Why would you care about the asymptotic properties of the OLS estimators When MLR6 does not hold the finite-sample distribution of the OLS estimators is not available For reasonably large samples (large n) we use the asymptotic distribution of the OLS estimators which is known from studying the asymptotic properties to approximate the finite-sample distribution of the OLS estimators which is unknown

bull Comparing the inference procedures in Chapter 5 with those in Chapter 4 can you list the similarities and differences It is beneficial to make a list by yourself

bull Under MLR1-MLR5 the OLS estimators are consistent asymptotically normal and asymptotically efficient Try to explain these properties in your own words This is really a test on your understanding of these notions in econometrics

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 46 bull (i) With df = n ndash 2 = 86 we obtain the 5 critical value from Table G2 with df = 90 Because

each test is two-tailed the critical value is 1987 The t statistic for H0β0 = 0 is about -89 which is much less than 1987 in absolute value Therefore we fail to reject H0β0 = 0 The t statistic for H0β1 = 1 is (976 ndash 1)049 asymp -49 which is even less significant (Remember we reject H0 in favor of H1 in this case only if |t| gt 1987)

bull (ii) We use the SSR form of the F statistic We are testing q = 2 restrictions and the df in the unrestricted model is 86 We are given SSRr = 20944899 and SSRur = 16564451 Therefore F = [(20944899 minus 16564451)2][1656445186] asymp 1137

12

which is a strong rejection of H0 from Table G3c the 1 critical value with 2 and 90 df is 485

bull (iii) We use the R-squared form of the F statistic We are testing q = 3 restrictions and there are 88 ndash 5 = 83 df in the unrestricted model The F statistic is F = [(829 ndash 820)3][(1 ndash 829)83] asymp 146 The 10 critical value (again using 90 denominator df in Table G3a) is 215 so we fail to reject H0 at even the 10 level In fact the p-value is about 23

bull (iv) If heteroskedasticity were present Assumption MLR5 would be violated and the F statistic would not have an F distribution under the null hypothesis Therefore comparing the F statistic against the usual critical values or obtaining the p-value from the F distribution would not be especially meaningful

Q2 Wooldridge 48 bull (i) We use Property VAR3 from Appendix B

Var(1minus 32) = Var (1) + 9 Var (2) ndash 6 Cov (1 2) bull (ii) t = (1 minus 32 minus 1)se(1minus 32) so we need the standard error of 1minus 32 bull (iii) Because θ = β1 ndash 3β2 we can write β1 = θ + 3β2 Plugging this into the population model

gives y = β0 + (θ + 3β2)x1 + β2 x2 + β3 x3 + u = β0 + θ x1 + β2 (3 x1 + x2) + β3 x3 + u This last equation is what we would estimate by regressing y on x1 3x1 + x2 and x3 The coefficient and standard error on x1 are what we want

Q3 Wooldridge 410 bull (i) We need to compute the F statistic for the overall significance of the regression with n =

142 and k = 4 F = [0395(1 ndash 0395)](1374) asymp 141 The 5 critical value with 4 and 120 df is 245 which is well above the value of F Therefore we fail to reject H0 β1 = β2 = β3 = β4 = 0 at the 10 level No explanatory variable is individually significant at the 5 level The largest absolute t statistic is on dkr tdkr asymp 160 which is not significant at the 5 level against a two-sided alternative

bull (ii) The F statistic (with the same df) is now [0330(1 ndash 0330)](1374) asymp 117 which is even lower than in part (i) None of the t statistics is significant at a reasonable level

bull (iii) We probably should not use the logs as the logarithm is not defined for firms that have zero for dkr or eps Therefore we would lose some firms in the regression

bull (iv) It seems very weak There are no significant t statistics at the 5 level (against a two-sided alternative) and the F statistics are insignificant in both cases Plus less than 4 of the variation in return is explained by the independent variables

Q4 Wooldridge C49 (discrim_c4_9do) bull (i) The results from the OLS regression with standard errors in parentheses are

log (119901119904119900119889119886) = minus146 + 073 prpblck + 137 log(income) + 380 prppov (029) (031) (027) (133) n = 401 R2 = 087

bull The p-value for testing H0 β1= 0 against the two-sided alternative is about 018 so that we reject H0 at the 5 level but not at the 1 level

bull (ii) The correlation is about minus84 indicating a strong degree of multicollinearity Yet each coefficient is very statistically significant the t statistic for log (119894119899119888119900119898) is about 51 and that for prppov is about 286 (two-sided p-value = 004)

bull (iii) The OLS regression results when log(hseval) is added are log (119901119904119900119889119886) = minus84 + 098 prpblck minus 053 log(income) + 052 prppov + 121 log(hseval) (29) (029) (038) (134) (018)

13

n = 401 R2 = 184 The coefficient on log(hseval) is an elasticity a one percent increase in housing value holding the other variables fixed increases the predicted price by about 12 percent The two-sided p-value is zero to three decimal places

bull (iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the 15 significance level against a two-sided alternative for log(income) and prppov is does not have a t statistic even close to one in absolute value) Nevertheless they are jointly significant at the 5 level because the outcome of the F2396 statistic is about 352 with p-value = 030 All of the control variables ndash log(income) prppov and log(hseval) ndash are highly correlated so it is not surprising that some are individually insignificant

bull (v) Because the regression in (iii) contains the most controls log(hseval) is individually significant and log(income) and prppov are jointly significant (iii) seems the most reliable It holds fixed three measure of income and affluence Therefore a reasonable estimate is that if prpblck increases by 10 psoda is estimated to increase by 1 other factors held fixed

Q5 Wooldridge 52 bull This is about the inconsistency of the simple regression of pctstck on funds A higher

tolerance of risk means more willingness to invest in the stock market so β2 gt 0 By assumption funds and risktol are positively correlated Now we use equation (55) where δ1 gt 0 plim(1205731) = β1 + β2δ1 gt β1 so 1205731 has a positive inconsistency (asymptotic bias) This makes sense if we omit risktol from the regression and it is positively correlated with funds some of the estimated effect of funds is actually due to the effect of risktol

Q6 Wooldridge C51 (wage1_c5_1do) bull (i) The estimated equation is

119908119886119892119890 = minus287 + 599 educ + 022 exper + 169 tenure (073) (051) (012) (022) n = 526 R2 = 306 120590 = 3085

bull Below is a histogram of the 526 residual 119906119894 i = 1 2 526 The histogram uses 27 bins which is suggested by the formula in the Stata manual for 526 observations For comparison the normal distribution that provides the best fit to the histogram is also plotted

bull

14

bull (ii) With log(wage) as the dependent variable the estimated equation is log (119908119886119892119890) = 284 + 092 educ + 0041 exper + 022 tenure (104) (007) (0017) (003) n = 526 R2 = 316 120590 = 441 The histogram for the residuals from this equation with the best-fitting normal distribution overlaid is given below

bull (iii) The residuals from the log(wage) regression appear to be more normally distributed

Certainly the histogram in part (ii) fits under its comparable normal density better than in part (i) and the histogram for the wage residuals is notably skewed to the left In the wage regression there are some very large residuals (roughly equal to 15) that lie almost five estimated standard deviations (120590 = 3085) from the mean of the residuals which is identically zero of course Residuals far from zero does not appear to be nearly as much of a problem in the log(wage) regression

bull The skewness and kurtosis of the residuals from the OLS regression may be used to test the null that u is normally distributed (skewness = 0 and kurtosis = 3 under the null) The null distribution of the Jarque-Bera test statistic (which takes into account both skewness and kurtosis) is approximately the chi-squared with 2 degrees of freedom Hence at the 5 level we reject the normality of u whenever the JB statistic is greater than 599 In STATA this test is carried out by the command ldquosktestrdquo With the data in WAGE1RAW JB = sdot (extremely large) for the wage model and JB = 1059 for the log(wage) model While the normality is rejected for both models we do see that the JB statistic is much smaller for the log(wage) model [also compare its p-values of skewness and excess kurtosis tests against those from the wage model] Hence the normality appears to be a reasonable approximation to the distribution of the error term in the log(wage) model

15

Week 7 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the advantages of using the log of a variable in regression Find the ldquorules of

thumbrdquo for taking logs See page 191 of Wooldridge (p198-199 for the 3rd edition)

bull Be careful when you interpret the coefficients of explanatory variables in a model where some variables are in logarithm Do you remember Table 23 Consult Table 23

bull How do you compute the change in y caused by Δx when the model is built for log(y) A precise computation is given by Equation (68) of Wooldridge

bull Why do we need the ldquointeractionrdquo terms in regression models An interaction term is needed if the partial effect of an explanatory variable is linearly related to another explanatory variable See (617) for example

bull What is the adjusted R-squared What is the difference between it and the R-squared The primary attractiveness of the adjusted R-squared is that it imposes penalty for adding additional explanatory variables to a model The R-squared can never fall when a new explanatory variable is added However the adjusted R-squared will fall if the t-ratio on the new variable is less than one in absolute value

bull How do you construct interval prediction for given x-values This is nicely summarised in Equations (627)-(631) and the surrounding text

bull How do you predict y for given x-values when the model is built for log(y) Check the list on page 212 (p220 for the 3rd edition)

bull What is involved in ldquoresidual analysisrdquo See pages 209-210 (p217-218 for the 3rd edition)

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 64 bull (i) Holding all other factors fixed we have

Δlog(wage) = β1 Δeduc + β2 Δeducsdot pareduc = (β1 + β2 pareduc) Δeduc Dividing both sides by Δeduc gives the result The sign of β2 is not obvious although β2 gt 0 if we think a child gets more out of another year of education the more highly educated are the childrsquos parents

bull (ii) We use the values pareduc = 32 and pareduc = 24 to interpret the coefficient on educsdotpareduc The difference in the estimated return to education is 00078(32 ndash 24) = 0062 or about 62 percentage points (Percentage points are changes in percentages)

bull (iii) When we add pareduc by itself the coefficient on the interaction term is negative The t statistic on educsdotpareduc is about ndash133 which is not significant at the 10 level against a two-sided alternative Note that the coefficient on pareduc is significant at the 5 level against a two-sided alternative This provides a good example of how omitting a level effect (pareduc in this case) can lead to biased estimation of the interaction effect

Q2 Wooldridge 65 bull This would make little sense Performances on math and science exams are measures of

outputs of the educational process and we would like to know how various educational inputs and school characteristics affect math and science scores For example if the staff-to-pupil ratio has an effect on both exam scores why would we want to hold

16

performance on the science test fixed while studying the effects of staff on the math pass rate This would be an example of controlling for too many factors in a regression equation The variable scill could be a dependent variable in an identical regression equation

Q3 Wooldridge C63 (C62 for the 3rd Edition) (wage2_c6_3do) bull (i) Holding exper (and the elements in u) fixed we have

Δ log(wage) = β1Δeduc + β2Δeducsdotexper = (β1 + β2exper)Δeduc or Δ log(wage) Δeduc = (β1 + β2exper) This is the approximate proportionate change in wage given one more year of education

bull (ii) H0 β3 = 0 If we think that education and experience interact positively ndash so that people with more experience are more productive when given another year of education then β3 gt 0 is the appropriate alternative

bull (iii) The estimated equation is log (119908119886119892119890) = 595 + 0440 educ ndash 0215 exper + 00320 educsdotexper (024) (0174) (0200) (00153) n = 935 R2 = 135 adjusted-R2 = 132 The t statistic on the interaction term is about 213 which gives a p-value below 02 against H1 β3 gt 0 Therefore we reject H0 β3 = 0 in favor of H1 β3 gt 0 at the 2 level

bull (iv) We rewrite the equation as log(wage) = β0 + θ1educ + β2exper + β3 educ (exper minus 10) + u and run the regression log(wage) on educ exper and educ(exper ndash 10) We want the coefficient on educ We obtain θ1 asymp 0761 and se(θ1) asymp 0066 The 95 CI for θ1 is about 063 to 089

Q4 Wooldridge C68 (hprice1_c6_8do) bull (i) The estimated equation (where price is in dollars) is

119901119903120484119888119890 = minus217703 + 2068 lotsize + 12278 sqrft + 138525 bdrms (294750) (0642) (1324) (90101) n = 88 R2 = 672 adjusted-R2 = 661 120590 = 59833 The predicted price at lotsize = 10000 sqrft = 2300 and bdrms = 4 is about $336714

bull (ii) The regression is pricei on (lotsizei ndash 10000) (sqrfti ndash 2300) and (bdrmsi ndash 4) We want the intercept estimate and the associated 95 CI from this regression The CI is approximately 3367067 plusmn 14665 or about $322042 to $351372 when rounded to the nearest dollar

bull (iii) We must use equation (636) to obtain the standard error of and then use equation (637) (assuming that price is normally distributed) But from the regression in part (ii) se(1199100) asymp 73745 and 120590 asymp 59833 Therefore se(0) asymp [(73745)2 + (59833) 2]12 asymp 602858 Using 199 as the approximate 975th percentile in the t84 distribution gives the 95 CI for price0 at the given values of the explanatory variables as 3367067 plusmn 199(602858) or rounded to the nearest dollar $216738 to $456675 This is a fairly wide prediction interval But we have not used many factors to explain housing price If we had more we could presumably reduce the error standard deviation and therefore 120590 to obtain a tighter prediction interval

17

Week 8 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is a qualitative factor Give some examples

This should be easy bull How do you use dummy variables to represent qualitative factors

A factor with two levels (or values) is easily represented one dummy variable For a factor with more than two levels (or values) we need more than one dummies For example a factors with 5 values requires 4 dummy variables

bull How do you use dummy variables to represent an ordinal variable An ordinal variable may be represented by dummies in the same way as a multi-levelled factor

bull How do you test for differences in regression functions across different groups We generally use the F test on properly-defined restricted model and unrestricted model Chow test is a special case of the F test where the null hypothesis is no difference across groups

bull What can you achieve by interacting group dummy variables with other regressors The interaction allows the slope coefficients to be different across groups

bull What is ldquoprogram evaluationrdquo It assesses the effectiveness of a program by checking the differences in regression coefficients for the controlled (or base) group and the treatment group

bull What are the interpretations of the PRF and SRF when the dependent variable is binary The PRF is the conditional probability of ldquosuccessrdquo for given explanatory variables The SRF is the predicted conditional probability of ldquosuccessrdquo for given explanatory variables

bull What are the shortcomings of the LPM The major drawback of the LPM is that the predicted probability of success may be outside the interval [01]

bull Does MLR5 hold for the LPM No as indicated in the textbook and lecture the conditional variance of the disturbance is not a constant and depends on explanatory variables var(u|x) = E(y|x)[1minus E(y|x)]

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 74 bull (i) The approximate difference is just the coefficient on utility times 100 or ndash283 The t

statistic is minus283099 asymp minus286 which is statistically significant bull (ii) 100sdot[exp(minus283) ndash 1) asymp minus247 and so the estimate is somewhat smaller in magnitude bull (iii) The proportionate difference is 181 minus 158 = 023 or about 23 One equation that can

be estimated to obtain the standard error of this difference is log(salary) = β0 + β1log(sales) + β2roe + δ1consprod + δ2utility +δ3trans + u where trans is a dummy variable for the transportation industry Now the base group is finance and so the coefficient δ1 directly measures the difference between the consumer products and finance industries and we can use the t statistic on consprod

Q2 Wooldridge 76 bull In Section 33 ndash in particular in the discussion surrounding Table 32 ndash we discussed how to

determine the direction of bias in the OLS estimators when an important variable (ability in this case) has been omitted from the regression As we discussed there Table 32 only

18

strictly holds with a single explanatory variable included in the regression but we often ignore the presence of other independent variables and use this table as a rough guide (Or we can use the results of Problem 310 for a more precise analysis) If less able workers are more likely to receive training then train and u are negatively correlated If we ignore the presence of educ and exper or at least assume that train and u are negatively correlated after netting out educ and exper then we can use Table 32 the OLS estimator of β1 (with ability in the error term) has a downward bias Because we think β1 ge 0 we are less likely to conclude that the training program was effective Intuitively this makes sense if those chosen for training had not received training they would have lowers wages on average than the control group

Q3 Wooldridge 79 bull (i) Plugging in u = 0 and d = 1 gives f1(z) = (β0+δ0) + (β1+δ1)z bull (ii) Setting f0(z) = f1(z) gives β0 + β1z = (β0+δ0) + (β1+δ1)z Therefore provided δ1 ne 0

we have z = minusδ0δ1 Clearly z is positive if and only if δ0δ1 is negative which means δ0 and δ1 must have opposite signs

bull (iii) Using part (ii) we have totcoll = 357030 = 119 years (totcoll= total college years) bull (iv) The estimated years of college where women catch up to men is much too high to be

practically relevant While the estimated coefficient on femalesdottotcoll shows that the gap is reduced at higher levels of college it is never closed ndash not even close In fact at four years of college the difference in predicted log wage is still minus357+030(4) = 237 or about 211 less for women

Q4 Wooldridge C713 (apple_c7_13do) bull (i) 412660 asymp 624 bull (ii) The OLS estimates of the LPM are

119890119888119900119899119887119906119910 = 424 minus 803 ecoprc + 719 regprc + 00055 faminc + 024 hhsize (165) (109) (132) (00053) (013) + 025 educ minus 00050 age (008) (00125) n = 660 R2 = 110 If ecoprc increases by say 10 cents (10) then the probability of buying eco-labeled apples falls by about 080 If regprc increases by 10 cents the probability of buying eco-labeled apples increases by about 072 (Of course we are assuming that the probabilities are not close to the boundaries of zero and one respectively)

bull (iii) The F test with 4 and 653 df is 443 with p-value = 0015 Thus based on the usual F test the four non-price variables are jointly very significant Of the four variables educ appears to have the most important effect For example a difference of four years of education implies an increase of 025(4) = 10 in the estimated probability of buying eco-labeled apples This suggests that more highly educated people are more open to buying produce that is environmentally friendly which is perhaps expected Household size (hhsize) also has an effect Comparing a couple with two children to one that has no children ndash other factors equal ndash the couple with two children has a 048 higher probability of buying eco-labeled apples

bull (iv) The model with log(faminc) fits the data slightly better the R-squared increases to about 112 (We would not expect a large increase in R-squared from a simple change in the functional form) The coefficient on log(faminc) is about 045 (t = 155) If log(faminc) increases by 10 which means roughly a 10 increase in faminc then P(ecobuy = 1) is estimated to increase by about 0045 a pretty small effect

19

bull (v) The fitted probabilities range from about 185 to 1051 so none are negative There are two fitted probabilities above 1 which is not a source of concern with 660 observations

bull (vi) Using the standard prediction rule ndash predict one when 119890119888119900119899119887119906119910 ge 05 and zero otherwise ndash gives the fraction correctly predicted for ecobuy = 0 as 102248 asymp 411 so about 411 For ecobuy = 1 the fraction correctly predicted is 340412 asymp 825 or 825 With the usual prediction rule the model does a much better job predicting the decision to buy eco-labeled apples (The overall percent correctly predicted is about 67)

20

Week 9 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is heteroskedasticity in a regression model

When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i) we say that heteroskedasticity is present

bull In the presence of heteroskedasticity are the t-stat and F-stat from the usual OLS still valid Why Are there any other problems with the OLS under heteroskedasticity The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators

bull What are the heteroskedasticity-robust standard errors How do you use them in STATA These are the corrected standard errors that take into account the possible presence of heteroskedasticity The t-stat and F-stat computed using the heteroskedasticity-robust standard errors are valid test statistics In STATA these are easily obtained by using the option ldquorobustrdquo with the ldquoregressrdquo command eg regress lwage educ robust

bull How do you detect if there is heteroskedasticity The Breusch-Pagan test or the White test can be used to detect the presence of heteroskedasticity See Section 83 for details

bull If heteroskedasticity is present in a known form how would you estimate the model In this case the WLS estimators should be used which is more efficient than the OLS estimators See Section 84 for details

bull If heteroskedasticity is present in an unknown form how would you estimate the model If there is strong evidence for heteroskedasticity the FGLS estimators should be used which is based on an exponential functional form See Section 84 for details

bull What are the steps in the FGLS estimation You should summarise from Section 84

bull How would you handle the heteroskedasticity of the LPM The heteroskedasticity functional form is known for the LPM Hence the WLS can be used in principle However because the LPM can produce predicted probabilities that are outside the interval (01) the known functional form p(x)[1-p(x)] may not be useful for the WLS In that case it may be necessary to use FGLS for the LPM

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 81 bull Parts (ii) and (iii) The homoskedasticity assumption played no role in Chapter 5 in showing

that OLS is consistent But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid even in large samples As heteroskedasticity is a violation of the Gauss-Markov assumptions OLS is no longer BLUE

Q2 Wooldridge 82 bull With Var(u|incpriceeducfemale) = σ2inc2 h(x) = inc2 where h(x) is the heteroskedasticity

function defined in equation (821) Therefore radich(x) = inc and so the transformed equation is obtained by dividing the original equation by inc beerinc = β0(1inc) + β1 + β2priceinc + β3educinc + β4femaleinc + uinc

21

Notice that β1 which is the slope on inc in the original model is now a constant in the transformed equation This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation

Q3 Wooldridge 83 bull False The unbiasedness of WLS and OLS hinges crucially on Assumption MLR4 and as we

know from Chapter 4 this assumption is often violated when an important variable is omitted When MLR4 does not hold both WLS and OLS are biased Without specific information on how the omitted variable is correlated with the included explanatory variables it is not possible to determine which estimator has a small bias It is possible that WLS would have more bias than OLS or less bias Because we cannot know we should not claim to use WLS in order to solve ldquobiasesrdquo associated with OLS

Q4 Wooldridge 85 bull (i) No For each coefficient the usual standard errors and the heteroskedasticity-robust

ones are practically very similar bull (ii) The effect is minus029(4) = minus116 so the probability of smoking falls by about 116 bull (iii) As usual we compute the turning point in the quadratic 020[2(00026)] asymp 3846 so

about 38 and one-half years bull (iv) Holding other factors in the equation fixed a person in a state with restaurant smoking

restrictions has a 101 lower chance of smoking This is similar to the effect of having four more years of education

bull (v) We just plug the values of the independent variables into the OLS regression line 119904119898119900119896119890119904 = 656 minus 069log(6744)+012log(6500) minus029(16)+020(77) minus00026(77) asymp 0052 Thus the estimated probability of smoking for this person is close to zero (In fact this person is not a smoker so the equation predicts well for this particular observation)

Q5 Wooldridge C810 (401ksubs_c8_10do) bull (i) In the following equation estimated by OLS the usual standard errors are in (sdot) and the

heteroskedasticity-robust standard errors are in [sdot] 119890401119896 = minus506 + 0124 inc minus 000062 inc2 + 0265 age minus 00031 age2 minus 0035 male (081) (0006) (000005) (0039) (00005) (0121) [079] [0006] [000005] [0038] [00004] [0121] n = 9275 R2 = 094 There are no important differences if anything the robust standard errors are smaller

bull (ii) This is a general claim Since Var(y|x) = p(x)[1-p(x)] we can write E(u2|x) = p(x)-p(x)2 Written in error form u2 = p(x) - p(x)2 + v In other words we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v with the restrictions δ0 = 0 δ1 = 1 and δ2 = -1 Remember that for the LPM the fitted values 119910 are estimates of p(x) So when we run the regression 1199062 on 119910 and 1199102 (including an intercept) the intercept estimates should be close to zero the coefficient on 119910 should be close to one and the coefficient on 1199102 should be close to ndash1

bull (iii) The White LM statistic and F statistic about 5819 and 31032 respectively both of which are very significant The coefficient on 119890401119896 is about 1010 the coefficient on 119890401119896 2 about minus970 and the intercept is about -009 These estimates are quite close to what we expect to find from the theory in part (ii)

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 10: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

10

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 41 bull (i) and (iii) generally cause the t statistics not to have a t distribution under H0

Homoskedasticity is one of the CLM assumptions An important omitted variable violates Assumption MLR4 (hence MLR6) in general The CLM assumptions contain no mention of the sample correlations among independent variables except to rule out the case where the correlation is one (MLR3)

Q2 Wooldridge 42 bull (i) H0β3 = 0 H1β3 gt 0 bull (ii) The proportionate effect on is 00024(50) = 012 To obtain the percentage effect we

multiply this by 100 12 Therefore a 50 point ceteris paribus increase in ros is predicted to increase salary by only 12 Practically speaking this is a very small effect for such a large change in ros

bull (iii) The 10 critical value for a one-tailed test using df = infin is obtained from Table G2 as 1282 The t statistic on ros is 0002400054 asymp 44 which is well below the critical value Therefore we fail to reject H0 at the 10 significance level

bull (iv) Based on this sample the estimated ros coefficient appears to be different from zero only because of sampling variation On the other hand including ros may not be causing any harm it depends on how correlated it is with the other independent variables (although these are very significant even with ros in the equation)

Q3 Wooldridge 45 bull (i) 412 plusmn 196(094) or about [228 596] bull (ii) No because the value 4 is well inside the 95 CI bull (iii) Yes because 1 is well outside the 95 CI

Q4 Wooldridge C48 (401ksubs_ch04do) bull (i) There are 2017 single people in the sample of 9275 bull (ii) The estimated equation is

119899119890119905119905119891119886 = minus4304 + 799 inc + 843 age ( 408) (060) (092) n = 2017 R2 = 119 The coefficient on inc indicates that one more dollar in income (holding age fixed) is reflected in about 80 more cents in predicted nettfa no surprise there The coefficient on age means that holding income fixed if a person gets another year older hisher nettfa is predicted to increase by about $843 (Remember nettfa is in thousands of dollars) Again this is not surprising

bull (iii) The intercept is not very interesting as it gives the predicted nettfa for inc = 0 and age = 0 Clearly there is no one with even close to these values in the relevant population

bull (iv) The t statistic is (843 minus 1)092 asymp minus171 Against the one-sided alternative H1 β2 lt 1 the p-value is about 044 Therefore we can reject H0 β2 = 1 at the 5 significance level (against the one-sided alternative)

bull (v) The slope coefficient on inc in the simple regression is about 821 which is not very different from the 799 obtained in part (ii) As it turns out the correlation between inc and age in the sample of single people is only about 039 which helps explain why the simple and multiple regression estimates are not very different see the text also

11

Week 6 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull How would you test hypotheses about a single linear combination of parameters

We may re-parameterise the regression model to isolate the ldquosingle linear combinationrdquo Then the OLS on the re-parameterised model will provide the estimate of the ldquosingle linear combinationrdquo and the related standard error See the example in the textbook and lecture slides

bull What are exclusion restrictions for a regression model Exclusion restrictions are the null hypothesis that a group of x-variables have zero coefficients in the regression model

bull What are restricted and unrestricted models When the null hypothesis is a set of restrictions on the parameters the regression under the null is called the restricted model while the regression under the alternative (which simply states that the null is false) is called the unrestricted model

bull How do you compute the F-statistic given that you have SSRs Clearly the general F-stat is based on the relative difference between the SSRs under the restricted model and unrestricted model F = [(SSRr ndash SSRur)q][SSRur(n-k-1)] where you should be able to explain the meanings of various symbols

bull What are general linear restrictions on parameters These are linear equations on the parameters For example β1 ndash 2β2 + 3β3 = 0

bull What is the test for the overall significance of a regression This is the F-test for the null hypothesis that all the coefficients of x-variables are zero

bull How would you report your regression results See the guidelines in Section 46 of the textbook

bull Why would you care about the asymptotic properties of the OLS estimators When MLR6 does not hold the finite-sample distribution of the OLS estimators is not available For reasonably large samples (large n) we use the asymptotic distribution of the OLS estimators which is known from studying the asymptotic properties to approximate the finite-sample distribution of the OLS estimators which is unknown

bull Comparing the inference procedures in Chapter 5 with those in Chapter 4 can you list the similarities and differences It is beneficial to make a list by yourself

bull Under MLR1-MLR5 the OLS estimators are consistent asymptotically normal and asymptotically efficient Try to explain these properties in your own words This is really a test on your understanding of these notions in econometrics

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 46 bull (i) With df = n ndash 2 = 86 we obtain the 5 critical value from Table G2 with df = 90 Because

each test is two-tailed the critical value is 1987 The t statistic for H0β0 = 0 is about -89 which is much less than 1987 in absolute value Therefore we fail to reject H0β0 = 0 The t statistic for H0β1 = 1 is (976 ndash 1)049 asymp -49 which is even less significant (Remember we reject H0 in favor of H1 in this case only if |t| gt 1987)

bull (ii) We use the SSR form of the F statistic We are testing q = 2 restrictions and the df in the unrestricted model is 86 We are given SSRr = 20944899 and SSRur = 16564451 Therefore F = [(20944899 minus 16564451)2][1656445186] asymp 1137

12

which is a strong rejection of H0 from Table G3c the 1 critical value with 2 and 90 df is 485

bull (iii) We use the R-squared form of the F statistic We are testing q = 3 restrictions and there are 88 ndash 5 = 83 df in the unrestricted model The F statistic is F = [(829 ndash 820)3][(1 ndash 829)83] asymp 146 The 10 critical value (again using 90 denominator df in Table G3a) is 215 so we fail to reject H0 at even the 10 level In fact the p-value is about 23

bull (iv) If heteroskedasticity were present Assumption MLR5 would be violated and the F statistic would not have an F distribution under the null hypothesis Therefore comparing the F statistic against the usual critical values or obtaining the p-value from the F distribution would not be especially meaningful

Q2 Wooldridge 48 bull (i) We use Property VAR3 from Appendix B

Var(1minus 32) = Var (1) + 9 Var (2) ndash 6 Cov (1 2) bull (ii) t = (1 minus 32 minus 1)se(1minus 32) so we need the standard error of 1minus 32 bull (iii) Because θ = β1 ndash 3β2 we can write β1 = θ + 3β2 Plugging this into the population model

gives y = β0 + (θ + 3β2)x1 + β2 x2 + β3 x3 + u = β0 + θ x1 + β2 (3 x1 + x2) + β3 x3 + u This last equation is what we would estimate by regressing y on x1 3x1 + x2 and x3 The coefficient and standard error on x1 are what we want

Q3 Wooldridge 410 bull (i) We need to compute the F statistic for the overall significance of the regression with n =

142 and k = 4 F = [0395(1 ndash 0395)](1374) asymp 141 The 5 critical value with 4 and 120 df is 245 which is well above the value of F Therefore we fail to reject H0 β1 = β2 = β3 = β4 = 0 at the 10 level No explanatory variable is individually significant at the 5 level The largest absolute t statistic is on dkr tdkr asymp 160 which is not significant at the 5 level against a two-sided alternative

bull (ii) The F statistic (with the same df) is now [0330(1 ndash 0330)](1374) asymp 117 which is even lower than in part (i) None of the t statistics is significant at a reasonable level

bull (iii) We probably should not use the logs as the logarithm is not defined for firms that have zero for dkr or eps Therefore we would lose some firms in the regression

bull (iv) It seems very weak There are no significant t statistics at the 5 level (against a two-sided alternative) and the F statistics are insignificant in both cases Plus less than 4 of the variation in return is explained by the independent variables

Q4 Wooldridge C49 (discrim_c4_9do) bull (i) The results from the OLS regression with standard errors in parentheses are

log (119901119904119900119889119886) = minus146 + 073 prpblck + 137 log(income) + 380 prppov (029) (031) (027) (133) n = 401 R2 = 087

bull The p-value for testing H0 β1= 0 against the two-sided alternative is about 018 so that we reject H0 at the 5 level but not at the 1 level

bull (ii) The correlation is about minus84 indicating a strong degree of multicollinearity Yet each coefficient is very statistically significant the t statistic for log (119894119899119888119900119898) is about 51 and that for prppov is about 286 (two-sided p-value = 004)

bull (iii) The OLS regression results when log(hseval) is added are log (119901119904119900119889119886) = minus84 + 098 prpblck minus 053 log(income) + 052 prppov + 121 log(hseval) (29) (029) (038) (134) (018)

13

n = 401 R2 = 184 The coefficient on log(hseval) is an elasticity a one percent increase in housing value holding the other variables fixed increases the predicted price by about 12 percent The two-sided p-value is zero to three decimal places

bull (iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the 15 significance level against a two-sided alternative for log(income) and prppov is does not have a t statistic even close to one in absolute value) Nevertheless they are jointly significant at the 5 level because the outcome of the F2396 statistic is about 352 with p-value = 030 All of the control variables ndash log(income) prppov and log(hseval) ndash are highly correlated so it is not surprising that some are individually insignificant

bull (v) Because the regression in (iii) contains the most controls log(hseval) is individually significant and log(income) and prppov are jointly significant (iii) seems the most reliable It holds fixed three measure of income and affluence Therefore a reasonable estimate is that if prpblck increases by 10 psoda is estimated to increase by 1 other factors held fixed

Q5 Wooldridge 52 bull This is about the inconsistency of the simple regression of pctstck on funds A higher

tolerance of risk means more willingness to invest in the stock market so β2 gt 0 By assumption funds and risktol are positively correlated Now we use equation (55) where δ1 gt 0 plim(1205731) = β1 + β2δ1 gt β1 so 1205731 has a positive inconsistency (asymptotic bias) This makes sense if we omit risktol from the regression and it is positively correlated with funds some of the estimated effect of funds is actually due to the effect of risktol

Q6 Wooldridge C51 (wage1_c5_1do) bull (i) The estimated equation is

119908119886119892119890 = minus287 + 599 educ + 022 exper + 169 tenure (073) (051) (012) (022) n = 526 R2 = 306 120590 = 3085

bull Below is a histogram of the 526 residual 119906119894 i = 1 2 526 The histogram uses 27 bins which is suggested by the formula in the Stata manual for 526 observations For comparison the normal distribution that provides the best fit to the histogram is also plotted

bull

14

bull (ii) With log(wage) as the dependent variable the estimated equation is log (119908119886119892119890) = 284 + 092 educ + 0041 exper + 022 tenure (104) (007) (0017) (003) n = 526 R2 = 316 120590 = 441 The histogram for the residuals from this equation with the best-fitting normal distribution overlaid is given below

bull (iii) The residuals from the log(wage) regression appear to be more normally distributed

Certainly the histogram in part (ii) fits under its comparable normal density better than in part (i) and the histogram for the wage residuals is notably skewed to the left In the wage regression there are some very large residuals (roughly equal to 15) that lie almost five estimated standard deviations (120590 = 3085) from the mean of the residuals which is identically zero of course Residuals far from zero does not appear to be nearly as much of a problem in the log(wage) regression

bull The skewness and kurtosis of the residuals from the OLS regression may be used to test the null that u is normally distributed (skewness = 0 and kurtosis = 3 under the null) The null distribution of the Jarque-Bera test statistic (which takes into account both skewness and kurtosis) is approximately the chi-squared with 2 degrees of freedom Hence at the 5 level we reject the normality of u whenever the JB statistic is greater than 599 In STATA this test is carried out by the command ldquosktestrdquo With the data in WAGE1RAW JB = sdot (extremely large) for the wage model and JB = 1059 for the log(wage) model While the normality is rejected for both models we do see that the JB statistic is much smaller for the log(wage) model [also compare its p-values of skewness and excess kurtosis tests against those from the wage model] Hence the normality appears to be a reasonable approximation to the distribution of the error term in the log(wage) model

15

Week 7 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the advantages of using the log of a variable in regression Find the ldquorules of

thumbrdquo for taking logs See page 191 of Wooldridge (p198-199 for the 3rd edition)

bull Be careful when you interpret the coefficients of explanatory variables in a model where some variables are in logarithm Do you remember Table 23 Consult Table 23

bull How do you compute the change in y caused by Δx when the model is built for log(y) A precise computation is given by Equation (68) of Wooldridge

bull Why do we need the ldquointeractionrdquo terms in regression models An interaction term is needed if the partial effect of an explanatory variable is linearly related to another explanatory variable See (617) for example

bull What is the adjusted R-squared What is the difference between it and the R-squared The primary attractiveness of the adjusted R-squared is that it imposes penalty for adding additional explanatory variables to a model The R-squared can never fall when a new explanatory variable is added However the adjusted R-squared will fall if the t-ratio on the new variable is less than one in absolute value

bull How do you construct interval prediction for given x-values This is nicely summarised in Equations (627)-(631) and the surrounding text

bull How do you predict y for given x-values when the model is built for log(y) Check the list on page 212 (p220 for the 3rd edition)

bull What is involved in ldquoresidual analysisrdquo See pages 209-210 (p217-218 for the 3rd edition)

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 64 bull (i) Holding all other factors fixed we have

Δlog(wage) = β1 Δeduc + β2 Δeducsdot pareduc = (β1 + β2 pareduc) Δeduc Dividing both sides by Δeduc gives the result The sign of β2 is not obvious although β2 gt 0 if we think a child gets more out of another year of education the more highly educated are the childrsquos parents

bull (ii) We use the values pareduc = 32 and pareduc = 24 to interpret the coefficient on educsdotpareduc The difference in the estimated return to education is 00078(32 ndash 24) = 0062 or about 62 percentage points (Percentage points are changes in percentages)

bull (iii) When we add pareduc by itself the coefficient on the interaction term is negative The t statistic on educsdotpareduc is about ndash133 which is not significant at the 10 level against a two-sided alternative Note that the coefficient on pareduc is significant at the 5 level against a two-sided alternative This provides a good example of how omitting a level effect (pareduc in this case) can lead to biased estimation of the interaction effect

Q2 Wooldridge 65 bull This would make little sense Performances on math and science exams are measures of

outputs of the educational process and we would like to know how various educational inputs and school characteristics affect math and science scores For example if the staff-to-pupil ratio has an effect on both exam scores why would we want to hold

16

performance on the science test fixed while studying the effects of staff on the math pass rate This would be an example of controlling for too many factors in a regression equation The variable scill could be a dependent variable in an identical regression equation

Q3 Wooldridge C63 (C62 for the 3rd Edition) (wage2_c6_3do) bull (i) Holding exper (and the elements in u) fixed we have

Δ log(wage) = β1Δeduc + β2Δeducsdotexper = (β1 + β2exper)Δeduc or Δ log(wage) Δeduc = (β1 + β2exper) This is the approximate proportionate change in wage given one more year of education

bull (ii) H0 β3 = 0 If we think that education and experience interact positively ndash so that people with more experience are more productive when given another year of education then β3 gt 0 is the appropriate alternative

bull (iii) The estimated equation is log (119908119886119892119890) = 595 + 0440 educ ndash 0215 exper + 00320 educsdotexper (024) (0174) (0200) (00153) n = 935 R2 = 135 adjusted-R2 = 132 The t statistic on the interaction term is about 213 which gives a p-value below 02 against H1 β3 gt 0 Therefore we reject H0 β3 = 0 in favor of H1 β3 gt 0 at the 2 level

bull (iv) We rewrite the equation as log(wage) = β0 + θ1educ + β2exper + β3 educ (exper minus 10) + u and run the regression log(wage) on educ exper and educ(exper ndash 10) We want the coefficient on educ We obtain θ1 asymp 0761 and se(θ1) asymp 0066 The 95 CI for θ1 is about 063 to 089

Q4 Wooldridge C68 (hprice1_c6_8do) bull (i) The estimated equation (where price is in dollars) is

119901119903120484119888119890 = minus217703 + 2068 lotsize + 12278 sqrft + 138525 bdrms (294750) (0642) (1324) (90101) n = 88 R2 = 672 adjusted-R2 = 661 120590 = 59833 The predicted price at lotsize = 10000 sqrft = 2300 and bdrms = 4 is about $336714

bull (ii) The regression is pricei on (lotsizei ndash 10000) (sqrfti ndash 2300) and (bdrmsi ndash 4) We want the intercept estimate and the associated 95 CI from this regression The CI is approximately 3367067 plusmn 14665 or about $322042 to $351372 when rounded to the nearest dollar

bull (iii) We must use equation (636) to obtain the standard error of and then use equation (637) (assuming that price is normally distributed) But from the regression in part (ii) se(1199100) asymp 73745 and 120590 asymp 59833 Therefore se(0) asymp [(73745)2 + (59833) 2]12 asymp 602858 Using 199 as the approximate 975th percentile in the t84 distribution gives the 95 CI for price0 at the given values of the explanatory variables as 3367067 plusmn 199(602858) or rounded to the nearest dollar $216738 to $456675 This is a fairly wide prediction interval But we have not used many factors to explain housing price If we had more we could presumably reduce the error standard deviation and therefore 120590 to obtain a tighter prediction interval

17

Week 8 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is a qualitative factor Give some examples

This should be easy bull How do you use dummy variables to represent qualitative factors

A factor with two levels (or values) is easily represented one dummy variable For a factor with more than two levels (or values) we need more than one dummies For example a factors with 5 values requires 4 dummy variables

bull How do you use dummy variables to represent an ordinal variable An ordinal variable may be represented by dummies in the same way as a multi-levelled factor

bull How do you test for differences in regression functions across different groups We generally use the F test on properly-defined restricted model and unrestricted model Chow test is a special case of the F test where the null hypothesis is no difference across groups

bull What can you achieve by interacting group dummy variables with other regressors The interaction allows the slope coefficients to be different across groups

bull What is ldquoprogram evaluationrdquo It assesses the effectiveness of a program by checking the differences in regression coefficients for the controlled (or base) group and the treatment group

bull What are the interpretations of the PRF and SRF when the dependent variable is binary The PRF is the conditional probability of ldquosuccessrdquo for given explanatory variables The SRF is the predicted conditional probability of ldquosuccessrdquo for given explanatory variables

bull What are the shortcomings of the LPM The major drawback of the LPM is that the predicted probability of success may be outside the interval [01]

bull Does MLR5 hold for the LPM No as indicated in the textbook and lecture the conditional variance of the disturbance is not a constant and depends on explanatory variables var(u|x) = E(y|x)[1minus E(y|x)]

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 74 bull (i) The approximate difference is just the coefficient on utility times 100 or ndash283 The t

statistic is minus283099 asymp minus286 which is statistically significant bull (ii) 100sdot[exp(minus283) ndash 1) asymp minus247 and so the estimate is somewhat smaller in magnitude bull (iii) The proportionate difference is 181 minus 158 = 023 or about 23 One equation that can

be estimated to obtain the standard error of this difference is log(salary) = β0 + β1log(sales) + β2roe + δ1consprod + δ2utility +δ3trans + u where trans is a dummy variable for the transportation industry Now the base group is finance and so the coefficient δ1 directly measures the difference between the consumer products and finance industries and we can use the t statistic on consprod

Q2 Wooldridge 76 bull In Section 33 ndash in particular in the discussion surrounding Table 32 ndash we discussed how to

determine the direction of bias in the OLS estimators when an important variable (ability in this case) has been omitted from the regression As we discussed there Table 32 only

18

strictly holds with a single explanatory variable included in the regression but we often ignore the presence of other independent variables and use this table as a rough guide (Or we can use the results of Problem 310 for a more precise analysis) If less able workers are more likely to receive training then train and u are negatively correlated If we ignore the presence of educ and exper or at least assume that train and u are negatively correlated after netting out educ and exper then we can use Table 32 the OLS estimator of β1 (with ability in the error term) has a downward bias Because we think β1 ge 0 we are less likely to conclude that the training program was effective Intuitively this makes sense if those chosen for training had not received training they would have lowers wages on average than the control group

Q3 Wooldridge 79 bull (i) Plugging in u = 0 and d = 1 gives f1(z) = (β0+δ0) + (β1+δ1)z bull (ii) Setting f0(z) = f1(z) gives β0 + β1z = (β0+δ0) + (β1+δ1)z Therefore provided δ1 ne 0

we have z = minusδ0δ1 Clearly z is positive if and only if δ0δ1 is negative which means δ0 and δ1 must have opposite signs

bull (iii) Using part (ii) we have totcoll = 357030 = 119 years (totcoll= total college years) bull (iv) The estimated years of college where women catch up to men is much too high to be

practically relevant While the estimated coefficient on femalesdottotcoll shows that the gap is reduced at higher levels of college it is never closed ndash not even close In fact at four years of college the difference in predicted log wage is still minus357+030(4) = 237 or about 211 less for women

Q4 Wooldridge C713 (apple_c7_13do) bull (i) 412660 asymp 624 bull (ii) The OLS estimates of the LPM are

119890119888119900119899119887119906119910 = 424 minus 803 ecoprc + 719 regprc + 00055 faminc + 024 hhsize (165) (109) (132) (00053) (013) + 025 educ minus 00050 age (008) (00125) n = 660 R2 = 110 If ecoprc increases by say 10 cents (10) then the probability of buying eco-labeled apples falls by about 080 If regprc increases by 10 cents the probability of buying eco-labeled apples increases by about 072 (Of course we are assuming that the probabilities are not close to the boundaries of zero and one respectively)

bull (iii) The F test with 4 and 653 df is 443 with p-value = 0015 Thus based on the usual F test the four non-price variables are jointly very significant Of the four variables educ appears to have the most important effect For example a difference of four years of education implies an increase of 025(4) = 10 in the estimated probability of buying eco-labeled apples This suggests that more highly educated people are more open to buying produce that is environmentally friendly which is perhaps expected Household size (hhsize) also has an effect Comparing a couple with two children to one that has no children ndash other factors equal ndash the couple with two children has a 048 higher probability of buying eco-labeled apples

bull (iv) The model with log(faminc) fits the data slightly better the R-squared increases to about 112 (We would not expect a large increase in R-squared from a simple change in the functional form) The coefficient on log(faminc) is about 045 (t = 155) If log(faminc) increases by 10 which means roughly a 10 increase in faminc then P(ecobuy = 1) is estimated to increase by about 0045 a pretty small effect

19

bull (v) The fitted probabilities range from about 185 to 1051 so none are negative There are two fitted probabilities above 1 which is not a source of concern with 660 observations

bull (vi) Using the standard prediction rule ndash predict one when 119890119888119900119899119887119906119910 ge 05 and zero otherwise ndash gives the fraction correctly predicted for ecobuy = 0 as 102248 asymp 411 so about 411 For ecobuy = 1 the fraction correctly predicted is 340412 asymp 825 or 825 With the usual prediction rule the model does a much better job predicting the decision to buy eco-labeled apples (The overall percent correctly predicted is about 67)

20

Week 9 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is heteroskedasticity in a regression model

When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i) we say that heteroskedasticity is present

bull In the presence of heteroskedasticity are the t-stat and F-stat from the usual OLS still valid Why Are there any other problems with the OLS under heteroskedasticity The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators

bull What are the heteroskedasticity-robust standard errors How do you use them in STATA These are the corrected standard errors that take into account the possible presence of heteroskedasticity The t-stat and F-stat computed using the heteroskedasticity-robust standard errors are valid test statistics In STATA these are easily obtained by using the option ldquorobustrdquo with the ldquoregressrdquo command eg regress lwage educ robust

bull How do you detect if there is heteroskedasticity The Breusch-Pagan test or the White test can be used to detect the presence of heteroskedasticity See Section 83 for details

bull If heteroskedasticity is present in a known form how would you estimate the model In this case the WLS estimators should be used which is more efficient than the OLS estimators See Section 84 for details

bull If heteroskedasticity is present in an unknown form how would you estimate the model If there is strong evidence for heteroskedasticity the FGLS estimators should be used which is based on an exponential functional form See Section 84 for details

bull What are the steps in the FGLS estimation You should summarise from Section 84

bull How would you handle the heteroskedasticity of the LPM The heteroskedasticity functional form is known for the LPM Hence the WLS can be used in principle However because the LPM can produce predicted probabilities that are outside the interval (01) the known functional form p(x)[1-p(x)] may not be useful for the WLS In that case it may be necessary to use FGLS for the LPM

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 81 bull Parts (ii) and (iii) The homoskedasticity assumption played no role in Chapter 5 in showing

that OLS is consistent But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid even in large samples As heteroskedasticity is a violation of the Gauss-Markov assumptions OLS is no longer BLUE

Q2 Wooldridge 82 bull With Var(u|incpriceeducfemale) = σ2inc2 h(x) = inc2 where h(x) is the heteroskedasticity

function defined in equation (821) Therefore radich(x) = inc and so the transformed equation is obtained by dividing the original equation by inc beerinc = β0(1inc) + β1 + β2priceinc + β3educinc + β4femaleinc + uinc

21

Notice that β1 which is the slope on inc in the original model is now a constant in the transformed equation This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation

Q3 Wooldridge 83 bull False The unbiasedness of WLS and OLS hinges crucially on Assumption MLR4 and as we

know from Chapter 4 this assumption is often violated when an important variable is omitted When MLR4 does not hold both WLS and OLS are biased Without specific information on how the omitted variable is correlated with the included explanatory variables it is not possible to determine which estimator has a small bias It is possible that WLS would have more bias than OLS or less bias Because we cannot know we should not claim to use WLS in order to solve ldquobiasesrdquo associated with OLS

Q4 Wooldridge 85 bull (i) No For each coefficient the usual standard errors and the heteroskedasticity-robust

ones are practically very similar bull (ii) The effect is minus029(4) = minus116 so the probability of smoking falls by about 116 bull (iii) As usual we compute the turning point in the quadratic 020[2(00026)] asymp 3846 so

about 38 and one-half years bull (iv) Holding other factors in the equation fixed a person in a state with restaurant smoking

restrictions has a 101 lower chance of smoking This is similar to the effect of having four more years of education

bull (v) We just plug the values of the independent variables into the OLS regression line 119904119898119900119896119890119904 = 656 minus 069log(6744)+012log(6500) minus029(16)+020(77) minus00026(77) asymp 0052 Thus the estimated probability of smoking for this person is close to zero (In fact this person is not a smoker so the equation predicts well for this particular observation)

Q5 Wooldridge C810 (401ksubs_c8_10do) bull (i) In the following equation estimated by OLS the usual standard errors are in (sdot) and the

heteroskedasticity-robust standard errors are in [sdot] 119890401119896 = minus506 + 0124 inc minus 000062 inc2 + 0265 age minus 00031 age2 minus 0035 male (081) (0006) (000005) (0039) (00005) (0121) [079] [0006] [000005] [0038] [00004] [0121] n = 9275 R2 = 094 There are no important differences if anything the robust standard errors are smaller

bull (ii) This is a general claim Since Var(y|x) = p(x)[1-p(x)] we can write E(u2|x) = p(x)-p(x)2 Written in error form u2 = p(x) - p(x)2 + v In other words we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v with the restrictions δ0 = 0 δ1 = 1 and δ2 = -1 Remember that for the LPM the fitted values 119910 are estimates of p(x) So when we run the regression 1199062 on 119910 and 1199102 (including an intercept) the intercept estimates should be close to zero the coefficient on 119910 should be close to one and the coefficient on 1199102 should be close to ndash1

bull (iii) The White LM statistic and F statistic about 5819 and 31032 respectively both of which are very significant The coefficient on 119890401119896 is about 1010 the coefficient on 119890401119896 2 about minus970 and the intercept is about -009 These estimates are quite close to what we expect to find from the theory in part (ii)

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 11: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

11

Week 6 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull How would you test hypotheses about a single linear combination of parameters

We may re-parameterise the regression model to isolate the ldquosingle linear combinationrdquo Then the OLS on the re-parameterised model will provide the estimate of the ldquosingle linear combinationrdquo and the related standard error See the example in the textbook and lecture slides

bull What are exclusion restrictions for a regression model Exclusion restrictions are the null hypothesis that a group of x-variables have zero coefficients in the regression model

bull What are restricted and unrestricted models When the null hypothesis is a set of restrictions on the parameters the regression under the null is called the restricted model while the regression under the alternative (which simply states that the null is false) is called the unrestricted model

bull How do you compute the F-statistic given that you have SSRs Clearly the general F-stat is based on the relative difference between the SSRs under the restricted model and unrestricted model F = [(SSRr ndash SSRur)q][SSRur(n-k-1)] where you should be able to explain the meanings of various symbols

bull What are general linear restrictions on parameters These are linear equations on the parameters For example β1 ndash 2β2 + 3β3 = 0

bull What is the test for the overall significance of a regression This is the F-test for the null hypothesis that all the coefficients of x-variables are zero

bull How would you report your regression results See the guidelines in Section 46 of the textbook

bull Why would you care about the asymptotic properties of the OLS estimators When MLR6 does not hold the finite-sample distribution of the OLS estimators is not available For reasonably large samples (large n) we use the asymptotic distribution of the OLS estimators which is known from studying the asymptotic properties to approximate the finite-sample distribution of the OLS estimators which is unknown

bull Comparing the inference procedures in Chapter 5 with those in Chapter 4 can you list the similarities and differences It is beneficial to make a list by yourself

bull Under MLR1-MLR5 the OLS estimators are consistent asymptotically normal and asymptotically efficient Try to explain these properties in your own words This is really a test on your understanding of these notions in econometrics

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 46 bull (i) With df = n ndash 2 = 86 we obtain the 5 critical value from Table G2 with df = 90 Because

each test is two-tailed the critical value is 1987 The t statistic for H0β0 = 0 is about -89 which is much less than 1987 in absolute value Therefore we fail to reject H0β0 = 0 The t statistic for H0β1 = 1 is (976 ndash 1)049 asymp -49 which is even less significant (Remember we reject H0 in favor of H1 in this case only if |t| gt 1987)

bull (ii) We use the SSR form of the F statistic We are testing q = 2 restrictions and the df in the unrestricted model is 86 We are given SSRr = 20944899 and SSRur = 16564451 Therefore F = [(20944899 minus 16564451)2][1656445186] asymp 1137

12

which is a strong rejection of H0 from Table G3c the 1 critical value with 2 and 90 df is 485

bull (iii) We use the R-squared form of the F statistic We are testing q = 3 restrictions and there are 88 ndash 5 = 83 df in the unrestricted model The F statistic is F = [(829 ndash 820)3][(1 ndash 829)83] asymp 146 The 10 critical value (again using 90 denominator df in Table G3a) is 215 so we fail to reject H0 at even the 10 level In fact the p-value is about 23

bull (iv) If heteroskedasticity were present Assumption MLR5 would be violated and the F statistic would not have an F distribution under the null hypothesis Therefore comparing the F statistic against the usual critical values or obtaining the p-value from the F distribution would not be especially meaningful

Q2 Wooldridge 48 bull (i) We use Property VAR3 from Appendix B

Var(1minus 32) = Var (1) + 9 Var (2) ndash 6 Cov (1 2) bull (ii) t = (1 minus 32 minus 1)se(1minus 32) so we need the standard error of 1minus 32 bull (iii) Because θ = β1 ndash 3β2 we can write β1 = θ + 3β2 Plugging this into the population model

gives y = β0 + (θ + 3β2)x1 + β2 x2 + β3 x3 + u = β0 + θ x1 + β2 (3 x1 + x2) + β3 x3 + u This last equation is what we would estimate by regressing y on x1 3x1 + x2 and x3 The coefficient and standard error on x1 are what we want

Q3 Wooldridge 410 bull (i) We need to compute the F statistic for the overall significance of the regression with n =

142 and k = 4 F = [0395(1 ndash 0395)](1374) asymp 141 The 5 critical value with 4 and 120 df is 245 which is well above the value of F Therefore we fail to reject H0 β1 = β2 = β3 = β4 = 0 at the 10 level No explanatory variable is individually significant at the 5 level The largest absolute t statistic is on dkr tdkr asymp 160 which is not significant at the 5 level against a two-sided alternative

bull (ii) The F statistic (with the same df) is now [0330(1 ndash 0330)](1374) asymp 117 which is even lower than in part (i) None of the t statistics is significant at a reasonable level

bull (iii) We probably should not use the logs as the logarithm is not defined for firms that have zero for dkr or eps Therefore we would lose some firms in the regression

bull (iv) It seems very weak There are no significant t statistics at the 5 level (against a two-sided alternative) and the F statistics are insignificant in both cases Plus less than 4 of the variation in return is explained by the independent variables

Q4 Wooldridge C49 (discrim_c4_9do) bull (i) The results from the OLS regression with standard errors in parentheses are

log (119901119904119900119889119886) = minus146 + 073 prpblck + 137 log(income) + 380 prppov (029) (031) (027) (133) n = 401 R2 = 087

bull The p-value for testing H0 β1= 0 against the two-sided alternative is about 018 so that we reject H0 at the 5 level but not at the 1 level

bull (ii) The correlation is about minus84 indicating a strong degree of multicollinearity Yet each coefficient is very statistically significant the t statistic for log (119894119899119888119900119898) is about 51 and that for prppov is about 286 (two-sided p-value = 004)

bull (iii) The OLS regression results when log(hseval) is added are log (119901119904119900119889119886) = minus84 + 098 prpblck minus 053 log(income) + 052 prppov + 121 log(hseval) (29) (029) (038) (134) (018)

13

n = 401 R2 = 184 The coefficient on log(hseval) is an elasticity a one percent increase in housing value holding the other variables fixed increases the predicted price by about 12 percent The two-sided p-value is zero to three decimal places

bull (iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the 15 significance level against a two-sided alternative for log(income) and prppov is does not have a t statistic even close to one in absolute value) Nevertheless they are jointly significant at the 5 level because the outcome of the F2396 statistic is about 352 with p-value = 030 All of the control variables ndash log(income) prppov and log(hseval) ndash are highly correlated so it is not surprising that some are individually insignificant

bull (v) Because the regression in (iii) contains the most controls log(hseval) is individually significant and log(income) and prppov are jointly significant (iii) seems the most reliable It holds fixed three measure of income and affluence Therefore a reasonable estimate is that if prpblck increases by 10 psoda is estimated to increase by 1 other factors held fixed

Q5 Wooldridge 52 bull This is about the inconsistency of the simple regression of pctstck on funds A higher

tolerance of risk means more willingness to invest in the stock market so β2 gt 0 By assumption funds and risktol are positively correlated Now we use equation (55) where δ1 gt 0 plim(1205731) = β1 + β2δ1 gt β1 so 1205731 has a positive inconsistency (asymptotic bias) This makes sense if we omit risktol from the regression and it is positively correlated with funds some of the estimated effect of funds is actually due to the effect of risktol

Q6 Wooldridge C51 (wage1_c5_1do) bull (i) The estimated equation is

119908119886119892119890 = minus287 + 599 educ + 022 exper + 169 tenure (073) (051) (012) (022) n = 526 R2 = 306 120590 = 3085

bull Below is a histogram of the 526 residual 119906119894 i = 1 2 526 The histogram uses 27 bins which is suggested by the formula in the Stata manual for 526 observations For comparison the normal distribution that provides the best fit to the histogram is also plotted

bull

14

bull (ii) With log(wage) as the dependent variable the estimated equation is log (119908119886119892119890) = 284 + 092 educ + 0041 exper + 022 tenure (104) (007) (0017) (003) n = 526 R2 = 316 120590 = 441 The histogram for the residuals from this equation with the best-fitting normal distribution overlaid is given below

bull (iii) The residuals from the log(wage) regression appear to be more normally distributed

Certainly the histogram in part (ii) fits under its comparable normal density better than in part (i) and the histogram for the wage residuals is notably skewed to the left In the wage regression there are some very large residuals (roughly equal to 15) that lie almost five estimated standard deviations (120590 = 3085) from the mean of the residuals which is identically zero of course Residuals far from zero does not appear to be nearly as much of a problem in the log(wage) regression

bull The skewness and kurtosis of the residuals from the OLS regression may be used to test the null that u is normally distributed (skewness = 0 and kurtosis = 3 under the null) The null distribution of the Jarque-Bera test statistic (which takes into account both skewness and kurtosis) is approximately the chi-squared with 2 degrees of freedom Hence at the 5 level we reject the normality of u whenever the JB statistic is greater than 599 In STATA this test is carried out by the command ldquosktestrdquo With the data in WAGE1RAW JB = sdot (extremely large) for the wage model and JB = 1059 for the log(wage) model While the normality is rejected for both models we do see that the JB statistic is much smaller for the log(wage) model [also compare its p-values of skewness and excess kurtosis tests against those from the wage model] Hence the normality appears to be a reasonable approximation to the distribution of the error term in the log(wage) model

15

Week 7 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the advantages of using the log of a variable in regression Find the ldquorules of

thumbrdquo for taking logs See page 191 of Wooldridge (p198-199 for the 3rd edition)

bull Be careful when you interpret the coefficients of explanatory variables in a model where some variables are in logarithm Do you remember Table 23 Consult Table 23

bull How do you compute the change in y caused by Δx when the model is built for log(y) A precise computation is given by Equation (68) of Wooldridge

bull Why do we need the ldquointeractionrdquo terms in regression models An interaction term is needed if the partial effect of an explanatory variable is linearly related to another explanatory variable See (617) for example

bull What is the adjusted R-squared What is the difference between it and the R-squared The primary attractiveness of the adjusted R-squared is that it imposes penalty for adding additional explanatory variables to a model The R-squared can never fall when a new explanatory variable is added However the adjusted R-squared will fall if the t-ratio on the new variable is less than one in absolute value

bull How do you construct interval prediction for given x-values This is nicely summarised in Equations (627)-(631) and the surrounding text

bull How do you predict y for given x-values when the model is built for log(y) Check the list on page 212 (p220 for the 3rd edition)

bull What is involved in ldquoresidual analysisrdquo See pages 209-210 (p217-218 for the 3rd edition)

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 64 bull (i) Holding all other factors fixed we have

Δlog(wage) = β1 Δeduc + β2 Δeducsdot pareduc = (β1 + β2 pareduc) Δeduc Dividing both sides by Δeduc gives the result The sign of β2 is not obvious although β2 gt 0 if we think a child gets more out of another year of education the more highly educated are the childrsquos parents

bull (ii) We use the values pareduc = 32 and pareduc = 24 to interpret the coefficient on educsdotpareduc The difference in the estimated return to education is 00078(32 ndash 24) = 0062 or about 62 percentage points (Percentage points are changes in percentages)

bull (iii) When we add pareduc by itself the coefficient on the interaction term is negative The t statistic on educsdotpareduc is about ndash133 which is not significant at the 10 level against a two-sided alternative Note that the coefficient on pareduc is significant at the 5 level against a two-sided alternative This provides a good example of how omitting a level effect (pareduc in this case) can lead to biased estimation of the interaction effect

Q2 Wooldridge 65 bull This would make little sense Performances on math and science exams are measures of

outputs of the educational process and we would like to know how various educational inputs and school characteristics affect math and science scores For example if the staff-to-pupil ratio has an effect on both exam scores why would we want to hold

16

performance on the science test fixed while studying the effects of staff on the math pass rate This would be an example of controlling for too many factors in a regression equation The variable scill could be a dependent variable in an identical regression equation

Q3 Wooldridge C63 (C62 for the 3rd Edition) (wage2_c6_3do) bull (i) Holding exper (and the elements in u) fixed we have

Δ log(wage) = β1Δeduc + β2Δeducsdotexper = (β1 + β2exper)Δeduc or Δ log(wage) Δeduc = (β1 + β2exper) This is the approximate proportionate change in wage given one more year of education

bull (ii) H0 β3 = 0 If we think that education and experience interact positively ndash so that people with more experience are more productive when given another year of education then β3 gt 0 is the appropriate alternative

bull (iii) The estimated equation is log (119908119886119892119890) = 595 + 0440 educ ndash 0215 exper + 00320 educsdotexper (024) (0174) (0200) (00153) n = 935 R2 = 135 adjusted-R2 = 132 The t statistic on the interaction term is about 213 which gives a p-value below 02 against H1 β3 gt 0 Therefore we reject H0 β3 = 0 in favor of H1 β3 gt 0 at the 2 level

bull (iv) We rewrite the equation as log(wage) = β0 + θ1educ + β2exper + β3 educ (exper minus 10) + u and run the regression log(wage) on educ exper and educ(exper ndash 10) We want the coefficient on educ We obtain θ1 asymp 0761 and se(θ1) asymp 0066 The 95 CI for θ1 is about 063 to 089

Q4 Wooldridge C68 (hprice1_c6_8do) bull (i) The estimated equation (where price is in dollars) is

119901119903120484119888119890 = minus217703 + 2068 lotsize + 12278 sqrft + 138525 bdrms (294750) (0642) (1324) (90101) n = 88 R2 = 672 adjusted-R2 = 661 120590 = 59833 The predicted price at lotsize = 10000 sqrft = 2300 and bdrms = 4 is about $336714

bull (ii) The regression is pricei on (lotsizei ndash 10000) (sqrfti ndash 2300) and (bdrmsi ndash 4) We want the intercept estimate and the associated 95 CI from this regression The CI is approximately 3367067 plusmn 14665 or about $322042 to $351372 when rounded to the nearest dollar

bull (iii) We must use equation (636) to obtain the standard error of and then use equation (637) (assuming that price is normally distributed) But from the regression in part (ii) se(1199100) asymp 73745 and 120590 asymp 59833 Therefore se(0) asymp [(73745)2 + (59833) 2]12 asymp 602858 Using 199 as the approximate 975th percentile in the t84 distribution gives the 95 CI for price0 at the given values of the explanatory variables as 3367067 plusmn 199(602858) or rounded to the nearest dollar $216738 to $456675 This is a fairly wide prediction interval But we have not used many factors to explain housing price If we had more we could presumably reduce the error standard deviation and therefore 120590 to obtain a tighter prediction interval

17

Week 8 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is a qualitative factor Give some examples

This should be easy bull How do you use dummy variables to represent qualitative factors

A factor with two levels (or values) is easily represented one dummy variable For a factor with more than two levels (or values) we need more than one dummies For example a factors with 5 values requires 4 dummy variables

bull How do you use dummy variables to represent an ordinal variable An ordinal variable may be represented by dummies in the same way as a multi-levelled factor

bull How do you test for differences in regression functions across different groups We generally use the F test on properly-defined restricted model and unrestricted model Chow test is a special case of the F test where the null hypothesis is no difference across groups

bull What can you achieve by interacting group dummy variables with other regressors The interaction allows the slope coefficients to be different across groups

bull What is ldquoprogram evaluationrdquo It assesses the effectiveness of a program by checking the differences in regression coefficients for the controlled (or base) group and the treatment group

bull What are the interpretations of the PRF and SRF when the dependent variable is binary The PRF is the conditional probability of ldquosuccessrdquo for given explanatory variables The SRF is the predicted conditional probability of ldquosuccessrdquo for given explanatory variables

bull What are the shortcomings of the LPM The major drawback of the LPM is that the predicted probability of success may be outside the interval [01]

bull Does MLR5 hold for the LPM No as indicated in the textbook and lecture the conditional variance of the disturbance is not a constant and depends on explanatory variables var(u|x) = E(y|x)[1minus E(y|x)]

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 74 bull (i) The approximate difference is just the coefficient on utility times 100 or ndash283 The t

statistic is minus283099 asymp minus286 which is statistically significant bull (ii) 100sdot[exp(minus283) ndash 1) asymp minus247 and so the estimate is somewhat smaller in magnitude bull (iii) The proportionate difference is 181 minus 158 = 023 or about 23 One equation that can

be estimated to obtain the standard error of this difference is log(salary) = β0 + β1log(sales) + β2roe + δ1consprod + δ2utility +δ3trans + u where trans is a dummy variable for the transportation industry Now the base group is finance and so the coefficient δ1 directly measures the difference between the consumer products and finance industries and we can use the t statistic on consprod

Q2 Wooldridge 76 bull In Section 33 ndash in particular in the discussion surrounding Table 32 ndash we discussed how to

determine the direction of bias in the OLS estimators when an important variable (ability in this case) has been omitted from the regression As we discussed there Table 32 only

18

strictly holds with a single explanatory variable included in the regression but we often ignore the presence of other independent variables and use this table as a rough guide (Or we can use the results of Problem 310 for a more precise analysis) If less able workers are more likely to receive training then train and u are negatively correlated If we ignore the presence of educ and exper or at least assume that train and u are negatively correlated after netting out educ and exper then we can use Table 32 the OLS estimator of β1 (with ability in the error term) has a downward bias Because we think β1 ge 0 we are less likely to conclude that the training program was effective Intuitively this makes sense if those chosen for training had not received training they would have lowers wages on average than the control group

Q3 Wooldridge 79 bull (i) Plugging in u = 0 and d = 1 gives f1(z) = (β0+δ0) + (β1+δ1)z bull (ii) Setting f0(z) = f1(z) gives β0 + β1z = (β0+δ0) + (β1+δ1)z Therefore provided δ1 ne 0

we have z = minusδ0δ1 Clearly z is positive if and only if δ0δ1 is negative which means δ0 and δ1 must have opposite signs

bull (iii) Using part (ii) we have totcoll = 357030 = 119 years (totcoll= total college years) bull (iv) The estimated years of college where women catch up to men is much too high to be

practically relevant While the estimated coefficient on femalesdottotcoll shows that the gap is reduced at higher levels of college it is never closed ndash not even close In fact at four years of college the difference in predicted log wage is still minus357+030(4) = 237 or about 211 less for women

Q4 Wooldridge C713 (apple_c7_13do) bull (i) 412660 asymp 624 bull (ii) The OLS estimates of the LPM are

119890119888119900119899119887119906119910 = 424 minus 803 ecoprc + 719 regprc + 00055 faminc + 024 hhsize (165) (109) (132) (00053) (013) + 025 educ minus 00050 age (008) (00125) n = 660 R2 = 110 If ecoprc increases by say 10 cents (10) then the probability of buying eco-labeled apples falls by about 080 If regprc increases by 10 cents the probability of buying eco-labeled apples increases by about 072 (Of course we are assuming that the probabilities are not close to the boundaries of zero and one respectively)

bull (iii) The F test with 4 and 653 df is 443 with p-value = 0015 Thus based on the usual F test the four non-price variables are jointly very significant Of the four variables educ appears to have the most important effect For example a difference of four years of education implies an increase of 025(4) = 10 in the estimated probability of buying eco-labeled apples This suggests that more highly educated people are more open to buying produce that is environmentally friendly which is perhaps expected Household size (hhsize) also has an effect Comparing a couple with two children to one that has no children ndash other factors equal ndash the couple with two children has a 048 higher probability of buying eco-labeled apples

bull (iv) The model with log(faminc) fits the data slightly better the R-squared increases to about 112 (We would not expect a large increase in R-squared from a simple change in the functional form) The coefficient on log(faminc) is about 045 (t = 155) If log(faminc) increases by 10 which means roughly a 10 increase in faminc then P(ecobuy = 1) is estimated to increase by about 0045 a pretty small effect

19

bull (v) The fitted probabilities range from about 185 to 1051 so none are negative There are two fitted probabilities above 1 which is not a source of concern with 660 observations

bull (vi) Using the standard prediction rule ndash predict one when 119890119888119900119899119887119906119910 ge 05 and zero otherwise ndash gives the fraction correctly predicted for ecobuy = 0 as 102248 asymp 411 so about 411 For ecobuy = 1 the fraction correctly predicted is 340412 asymp 825 or 825 With the usual prediction rule the model does a much better job predicting the decision to buy eco-labeled apples (The overall percent correctly predicted is about 67)

20

Week 9 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is heteroskedasticity in a regression model

When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i) we say that heteroskedasticity is present

bull In the presence of heteroskedasticity are the t-stat and F-stat from the usual OLS still valid Why Are there any other problems with the OLS under heteroskedasticity The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators

bull What are the heteroskedasticity-robust standard errors How do you use them in STATA These are the corrected standard errors that take into account the possible presence of heteroskedasticity The t-stat and F-stat computed using the heteroskedasticity-robust standard errors are valid test statistics In STATA these are easily obtained by using the option ldquorobustrdquo with the ldquoregressrdquo command eg regress lwage educ robust

bull How do you detect if there is heteroskedasticity The Breusch-Pagan test or the White test can be used to detect the presence of heteroskedasticity See Section 83 for details

bull If heteroskedasticity is present in a known form how would you estimate the model In this case the WLS estimators should be used which is more efficient than the OLS estimators See Section 84 for details

bull If heteroskedasticity is present in an unknown form how would you estimate the model If there is strong evidence for heteroskedasticity the FGLS estimators should be used which is based on an exponential functional form See Section 84 for details

bull What are the steps in the FGLS estimation You should summarise from Section 84

bull How would you handle the heteroskedasticity of the LPM The heteroskedasticity functional form is known for the LPM Hence the WLS can be used in principle However because the LPM can produce predicted probabilities that are outside the interval (01) the known functional form p(x)[1-p(x)] may not be useful for the WLS In that case it may be necessary to use FGLS for the LPM

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 81 bull Parts (ii) and (iii) The homoskedasticity assumption played no role in Chapter 5 in showing

that OLS is consistent But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid even in large samples As heteroskedasticity is a violation of the Gauss-Markov assumptions OLS is no longer BLUE

Q2 Wooldridge 82 bull With Var(u|incpriceeducfemale) = σ2inc2 h(x) = inc2 where h(x) is the heteroskedasticity

function defined in equation (821) Therefore radich(x) = inc and so the transformed equation is obtained by dividing the original equation by inc beerinc = β0(1inc) + β1 + β2priceinc + β3educinc + β4femaleinc + uinc

21

Notice that β1 which is the slope on inc in the original model is now a constant in the transformed equation This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation

Q3 Wooldridge 83 bull False The unbiasedness of WLS and OLS hinges crucially on Assumption MLR4 and as we

know from Chapter 4 this assumption is often violated when an important variable is omitted When MLR4 does not hold both WLS and OLS are biased Without specific information on how the omitted variable is correlated with the included explanatory variables it is not possible to determine which estimator has a small bias It is possible that WLS would have more bias than OLS or less bias Because we cannot know we should not claim to use WLS in order to solve ldquobiasesrdquo associated with OLS

Q4 Wooldridge 85 bull (i) No For each coefficient the usual standard errors and the heteroskedasticity-robust

ones are practically very similar bull (ii) The effect is minus029(4) = minus116 so the probability of smoking falls by about 116 bull (iii) As usual we compute the turning point in the quadratic 020[2(00026)] asymp 3846 so

about 38 and one-half years bull (iv) Holding other factors in the equation fixed a person in a state with restaurant smoking

restrictions has a 101 lower chance of smoking This is similar to the effect of having four more years of education

bull (v) We just plug the values of the independent variables into the OLS regression line 119904119898119900119896119890119904 = 656 minus 069log(6744)+012log(6500) minus029(16)+020(77) minus00026(77) asymp 0052 Thus the estimated probability of smoking for this person is close to zero (In fact this person is not a smoker so the equation predicts well for this particular observation)

Q5 Wooldridge C810 (401ksubs_c8_10do) bull (i) In the following equation estimated by OLS the usual standard errors are in (sdot) and the

heteroskedasticity-robust standard errors are in [sdot] 119890401119896 = minus506 + 0124 inc minus 000062 inc2 + 0265 age minus 00031 age2 minus 0035 male (081) (0006) (000005) (0039) (00005) (0121) [079] [0006] [000005] [0038] [00004] [0121] n = 9275 R2 = 094 There are no important differences if anything the robust standard errors are smaller

bull (ii) This is a general claim Since Var(y|x) = p(x)[1-p(x)] we can write E(u2|x) = p(x)-p(x)2 Written in error form u2 = p(x) - p(x)2 + v In other words we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v with the restrictions δ0 = 0 δ1 = 1 and δ2 = -1 Remember that for the LPM the fitted values 119910 are estimates of p(x) So when we run the regression 1199062 on 119910 and 1199102 (including an intercept) the intercept estimates should be close to zero the coefficient on 119910 should be close to one and the coefficient on 1199102 should be close to ndash1

bull (iii) The White LM statistic and F statistic about 5819 and 31032 respectively both of which are very significant The coefficient on 119890401119896 is about 1010 the coefficient on 119890401119896 2 about minus970 and the intercept is about -009 These estimates are quite close to what we expect to find from the theory in part (ii)

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 12: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

12

which is a strong rejection of H0 from Table G3c the 1 critical value with 2 and 90 df is 485

bull (iii) We use the R-squared form of the F statistic We are testing q = 3 restrictions and there are 88 ndash 5 = 83 df in the unrestricted model The F statistic is F = [(829 ndash 820)3][(1 ndash 829)83] asymp 146 The 10 critical value (again using 90 denominator df in Table G3a) is 215 so we fail to reject H0 at even the 10 level In fact the p-value is about 23

bull (iv) If heteroskedasticity were present Assumption MLR5 would be violated and the F statistic would not have an F distribution under the null hypothesis Therefore comparing the F statistic against the usual critical values or obtaining the p-value from the F distribution would not be especially meaningful

Q2 Wooldridge 48 bull (i) We use Property VAR3 from Appendix B

Var(1minus 32) = Var (1) + 9 Var (2) ndash 6 Cov (1 2) bull (ii) t = (1 minus 32 minus 1)se(1minus 32) so we need the standard error of 1minus 32 bull (iii) Because θ = β1 ndash 3β2 we can write β1 = θ + 3β2 Plugging this into the population model

gives y = β0 + (θ + 3β2)x1 + β2 x2 + β3 x3 + u = β0 + θ x1 + β2 (3 x1 + x2) + β3 x3 + u This last equation is what we would estimate by regressing y on x1 3x1 + x2 and x3 The coefficient and standard error on x1 are what we want

Q3 Wooldridge 410 bull (i) We need to compute the F statistic for the overall significance of the regression with n =

142 and k = 4 F = [0395(1 ndash 0395)](1374) asymp 141 The 5 critical value with 4 and 120 df is 245 which is well above the value of F Therefore we fail to reject H0 β1 = β2 = β3 = β4 = 0 at the 10 level No explanatory variable is individually significant at the 5 level The largest absolute t statistic is on dkr tdkr asymp 160 which is not significant at the 5 level against a two-sided alternative

bull (ii) The F statistic (with the same df) is now [0330(1 ndash 0330)](1374) asymp 117 which is even lower than in part (i) None of the t statistics is significant at a reasonable level

bull (iii) We probably should not use the logs as the logarithm is not defined for firms that have zero for dkr or eps Therefore we would lose some firms in the regression

bull (iv) It seems very weak There are no significant t statistics at the 5 level (against a two-sided alternative) and the F statistics are insignificant in both cases Plus less than 4 of the variation in return is explained by the independent variables

Q4 Wooldridge C49 (discrim_c4_9do) bull (i) The results from the OLS regression with standard errors in parentheses are

log (119901119904119900119889119886) = minus146 + 073 prpblck + 137 log(income) + 380 prppov (029) (031) (027) (133) n = 401 R2 = 087

bull The p-value for testing H0 β1= 0 against the two-sided alternative is about 018 so that we reject H0 at the 5 level but not at the 1 level

bull (ii) The correlation is about minus84 indicating a strong degree of multicollinearity Yet each coefficient is very statistically significant the t statistic for log (119894119899119888119900119898) is about 51 and that for prppov is about 286 (two-sided p-value = 004)

bull (iii) The OLS regression results when log(hseval) is added are log (119901119904119900119889119886) = minus84 + 098 prpblck minus 053 log(income) + 052 prppov + 121 log(hseval) (29) (029) (038) (134) (018)

13

n = 401 R2 = 184 The coefficient on log(hseval) is an elasticity a one percent increase in housing value holding the other variables fixed increases the predicted price by about 12 percent The two-sided p-value is zero to three decimal places

bull (iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the 15 significance level against a two-sided alternative for log(income) and prppov is does not have a t statistic even close to one in absolute value) Nevertheless they are jointly significant at the 5 level because the outcome of the F2396 statistic is about 352 with p-value = 030 All of the control variables ndash log(income) prppov and log(hseval) ndash are highly correlated so it is not surprising that some are individually insignificant

bull (v) Because the regression in (iii) contains the most controls log(hseval) is individually significant and log(income) and prppov are jointly significant (iii) seems the most reliable It holds fixed three measure of income and affluence Therefore a reasonable estimate is that if prpblck increases by 10 psoda is estimated to increase by 1 other factors held fixed

Q5 Wooldridge 52 bull This is about the inconsistency of the simple regression of pctstck on funds A higher

tolerance of risk means more willingness to invest in the stock market so β2 gt 0 By assumption funds and risktol are positively correlated Now we use equation (55) where δ1 gt 0 plim(1205731) = β1 + β2δ1 gt β1 so 1205731 has a positive inconsistency (asymptotic bias) This makes sense if we omit risktol from the regression and it is positively correlated with funds some of the estimated effect of funds is actually due to the effect of risktol

Q6 Wooldridge C51 (wage1_c5_1do) bull (i) The estimated equation is

119908119886119892119890 = minus287 + 599 educ + 022 exper + 169 tenure (073) (051) (012) (022) n = 526 R2 = 306 120590 = 3085

bull Below is a histogram of the 526 residual 119906119894 i = 1 2 526 The histogram uses 27 bins which is suggested by the formula in the Stata manual for 526 observations For comparison the normal distribution that provides the best fit to the histogram is also plotted

bull

14

bull (ii) With log(wage) as the dependent variable the estimated equation is log (119908119886119892119890) = 284 + 092 educ + 0041 exper + 022 tenure (104) (007) (0017) (003) n = 526 R2 = 316 120590 = 441 The histogram for the residuals from this equation with the best-fitting normal distribution overlaid is given below

bull (iii) The residuals from the log(wage) regression appear to be more normally distributed

Certainly the histogram in part (ii) fits under its comparable normal density better than in part (i) and the histogram for the wage residuals is notably skewed to the left In the wage regression there are some very large residuals (roughly equal to 15) that lie almost five estimated standard deviations (120590 = 3085) from the mean of the residuals which is identically zero of course Residuals far from zero does not appear to be nearly as much of a problem in the log(wage) regression

bull The skewness and kurtosis of the residuals from the OLS regression may be used to test the null that u is normally distributed (skewness = 0 and kurtosis = 3 under the null) The null distribution of the Jarque-Bera test statistic (which takes into account both skewness and kurtosis) is approximately the chi-squared with 2 degrees of freedom Hence at the 5 level we reject the normality of u whenever the JB statistic is greater than 599 In STATA this test is carried out by the command ldquosktestrdquo With the data in WAGE1RAW JB = sdot (extremely large) for the wage model and JB = 1059 for the log(wage) model While the normality is rejected for both models we do see that the JB statistic is much smaller for the log(wage) model [also compare its p-values of skewness and excess kurtosis tests against those from the wage model] Hence the normality appears to be a reasonable approximation to the distribution of the error term in the log(wage) model

15

Week 7 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the advantages of using the log of a variable in regression Find the ldquorules of

thumbrdquo for taking logs See page 191 of Wooldridge (p198-199 for the 3rd edition)

bull Be careful when you interpret the coefficients of explanatory variables in a model where some variables are in logarithm Do you remember Table 23 Consult Table 23

bull How do you compute the change in y caused by Δx when the model is built for log(y) A precise computation is given by Equation (68) of Wooldridge

bull Why do we need the ldquointeractionrdquo terms in regression models An interaction term is needed if the partial effect of an explanatory variable is linearly related to another explanatory variable See (617) for example

bull What is the adjusted R-squared What is the difference between it and the R-squared The primary attractiveness of the adjusted R-squared is that it imposes penalty for adding additional explanatory variables to a model The R-squared can never fall when a new explanatory variable is added However the adjusted R-squared will fall if the t-ratio on the new variable is less than one in absolute value

bull How do you construct interval prediction for given x-values This is nicely summarised in Equations (627)-(631) and the surrounding text

bull How do you predict y for given x-values when the model is built for log(y) Check the list on page 212 (p220 for the 3rd edition)

bull What is involved in ldquoresidual analysisrdquo See pages 209-210 (p217-218 for the 3rd edition)

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 64 bull (i) Holding all other factors fixed we have

Δlog(wage) = β1 Δeduc + β2 Δeducsdot pareduc = (β1 + β2 pareduc) Δeduc Dividing both sides by Δeduc gives the result The sign of β2 is not obvious although β2 gt 0 if we think a child gets more out of another year of education the more highly educated are the childrsquos parents

bull (ii) We use the values pareduc = 32 and pareduc = 24 to interpret the coefficient on educsdotpareduc The difference in the estimated return to education is 00078(32 ndash 24) = 0062 or about 62 percentage points (Percentage points are changes in percentages)

bull (iii) When we add pareduc by itself the coefficient on the interaction term is negative The t statistic on educsdotpareduc is about ndash133 which is not significant at the 10 level against a two-sided alternative Note that the coefficient on pareduc is significant at the 5 level against a two-sided alternative This provides a good example of how omitting a level effect (pareduc in this case) can lead to biased estimation of the interaction effect

Q2 Wooldridge 65 bull This would make little sense Performances on math and science exams are measures of

outputs of the educational process and we would like to know how various educational inputs and school characteristics affect math and science scores For example if the staff-to-pupil ratio has an effect on both exam scores why would we want to hold

16

performance on the science test fixed while studying the effects of staff on the math pass rate This would be an example of controlling for too many factors in a regression equation The variable scill could be a dependent variable in an identical regression equation

Q3 Wooldridge C63 (C62 for the 3rd Edition) (wage2_c6_3do) bull (i) Holding exper (and the elements in u) fixed we have

Δ log(wage) = β1Δeduc + β2Δeducsdotexper = (β1 + β2exper)Δeduc or Δ log(wage) Δeduc = (β1 + β2exper) This is the approximate proportionate change in wage given one more year of education

bull (ii) H0 β3 = 0 If we think that education and experience interact positively ndash so that people with more experience are more productive when given another year of education then β3 gt 0 is the appropriate alternative

bull (iii) The estimated equation is log (119908119886119892119890) = 595 + 0440 educ ndash 0215 exper + 00320 educsdotexper (024) (0174) (0200) (00153) n = 935 R2 = 135 adjusted-R2 = 132 The t statistic on the interaction term is about 213 which gives a p-value below 02 against H1 β3 gt 0 Therefore we reject H0 β3 = 0 in favor of H1 β3 gt 0 at the 2 level

bull (iv) We rewrite the equation as log(wage) = β0 + θ1educ + β2exper + β3 educ (exper minus 10) + u and run the regression log(wage) on educ exper and educ(exper ndash 10) We want the coefficient on educ We obtain θ1 asymp 0761 and se(θ1) asymp 0066 The 95 CI for θ1 is about 063 to 089

Q4 Wooldridge C68 (hprice1_c6_8do) bull (i) The estimated equation (where price is in dollars) is

119901119903120484119888119890 = minus217703 + 2068 lotsize + 12278 sqrft + 138525 bdrms (294750) (0642) (1324) (90101) n = 88 R2 = 672 adjusted-R2 = 661 120590 = 59833 The predicted price at lotsize = 10000 sqrft = 2300 and bdrms = 4 is about $336714

bull (ii) The regression is pricei on (lotsizei ndash 10000) (sqrfti ndash 2300) and (bdrmsi ndash 4) We want the intercept estimate and the associated 95 CI from this regression The CI is approximately 3367067 plusmn 14665 or about $322042 to $351372 when rounded to the nearest dollar

bull (iii) We must use equation (636) to obtain the standard error of and then use equation (637) (assuming that price is normally distributed) But from the regression in part (ii) se(1199100) asymp 73745 and 120590 asymp 59833 Therefore se(0) asymp [(73745)2 + (59833) 2]12 asymp 602858 Using 199 as the approximate 975th percentile in the t84 distribution gives the 95 CI for price0 at the given values of the explanatory variables as 3367067 plusmn 199(602858) or rounded to the nearest dollar $216738 to $456675 This is a fairly wide prediction interval But we have not used many factors to explain housing price If we had more we could presumably reduce the error standard deviation and therefore 120590 to obtain a tighter prediction interval

17

Week 8 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is a qualitative factor Give some examples

This should be easy bull How do you use dummy variables to represent qualitative factors

A factor with two levels (or values) is easily represented one dummy variable For a factor with more than two levels (or values) we need more than one dummies For example a factors with 5 values requires 4 dummy variables

bull How do you use dummy variables to represent an ordinal variable An ordinal variable may be represented by dummies in the same way as a multi-levelled factor

bull How do you test for differences in regression functions across different groups We generally use the F test on properly-defined restricted model and unrestricted model Chow test is a special case of the F test where the null hypothesis is no difference across groups

bull What can you achieve by interacting group dummy variables with other regressors The interaction allows the slope coefficients to be different across groups

bull What is ldquoprogram evaluationrdquo It assesses the effectiveness of a program by checking the differences in regression coefficients for the controlled (or base) group and the treatment group

bull What are the interpretations of the PRF and SRF when the dependent variable is binary The PRF is the conditional probability of ldquosuccessrdquo for given explanatory variables The SRF is the predicted conditional probability of ldquosuccessrdquo for given explanatory variables

bull What are the shortcomings of the LPM The major drawback of the LPM is that the predicted probability of success may be outside the interval [01]

bull Does MLR5 hold for the LPM No as indicated in the textbook and lecture the conditional variance of the disturbance is not a constant and depends on explanatory variables var(u|x) = E(y|x)[1minus E(y|x)]

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 74 bull (i) The approximate difference is just the coefficient on utility times 100 or ndash283 The t

statistic is minus283099 asymp minus286 which is statistically significant bull (ii) 100sdot[exp(minus283) ndash 1) asymp minus247 and so the estimate is somewhat smaller in magnitude bull (iii) The proportionate difference is 181 minus 158 = 023 or about 23 One equation that can

be estimated to obtain the standard error of this difference is log(salary) = β0 + β1log(sales) + β2roe + δ1consprod + δ2utility +δ3trans + u where trans is a dummy variable for the transportation industry Now the base group is finance and so the coefficient δ1 directly measures the difference between the consumer products and finance industries and we can use the t statistic on consprod

Q2 Wooldridge 76 bull In Section 33 ndash in particular in the discussion surrounding Table 32 ndash we discussed how to

determine the direction of bias in the OLS estimators when an important variable (ability in this case) has been omitted from the regression As we discussed there Table 32 only

18

strictly holds with a single explanatory variable included in the regression but we often ignore the presence of other independent variables and use this table as a rough guide (Or we can use the results of Problem 310 for a more precise analysis) If less able workers are more likely to receive training then train and u are negatively correlated If we ignore the presence of educ and exper or at least assume that train and u are negatively correlated after netting out educ and exper then we can use Table 32 the OLS estimator of β1 (with ability in the error term) has a downward bias Because we think β1 ge 0 we are less likely to conclude that the training program was effective Intuitively this makes sense if those chosen for training had not received training they would have lowers wages on average than the control group

Q3 Wooldridge 79 bull (i) Plugging in u = 0 and d = 1 gives f1(z) = (β0+δ0) + (β1+δ1)z bull (ii) Setting f0(z) = f1(z) gives β0 + β1z = (β0+δ0) + (β1+δ1)z Therefore provided δ1 ne 0

we have z = minusδ0δ1 Clearly z is positive if and only if δ0δ1 is negative which means δ0 and δ1 must have opposite signs

bull (iii) Using part (ii) we have totcoll = 357030 = 119 years (totcoll= total college years) bull (iv) The estimated years of college where women catch up to men is much too high to be

practically relevant While the estimated coefficient on femalesdottotcoll shows that the gap is reduced at higher levels of college it is never closed ndash not even close In fact at four years of college the difference in predicted log wage is still minus357+030(4) = 237 or about 211 less for women

Q4 Wooldridge C713 (apple_c7_13do) bull (i) 412660 asymp 624 bull (ii) The OLS estimates of the LPM are

119890119888119900119899119887119906119910 = 424 minus 803 ecoprc + 719 regprc + 00055 faminc + 024 hhsize (165) (109) (132) (00053) (013) + 025 educ minus 00050 age (008) (00125) n = 660 R2 = 110 If ecoprc increases by say 10 cents (10) then the probability of buying eco-labeled apples falls by about 080 If regprc increases by 10 cents the probability of buying eco-labeled apples increases by about 072 (Of course we are assuming that the probabilities are not close to the boundaries of zero and one respectively)

bull (iii) The F test with 4 and 653 df is 443 with p-value = 0015 Thus based on the usual F test the four non-price variables are jointly very significant Of the four variables educ appears to have the most important effect For example a difference of four years of education implies an increase of 025(4) = 10 in the estimated probability of buying eco-labeled apples This suggests that more highly educated people are more open to buying produce that is environmentally friendly which is perhaps expected Household size (hhsize) also has an effect Comparing a couple with two children to one that has no children ndash other factors equal ndash the couple with two children has a 048 higher probability of buying eco-labeled apples

bull (iv) The model with log(faminc) fits the data slightly better the R-squared increases to about 112 (We would not expect a large increase in R-squared from a simple change in the functional form) The coefficient on log(faminc) is about 045 (t = 155) If log(faminc) increases by 10 which means roughly a 10 increase in faminc then P(ecobuy = 1) is estimated to increase by about 0045 a pretty small effect

19

bull (v) The fitted probabilities range from about 185 to 1051 so none are negative There are two fitted probabilities above 1 which is not a source of concern with 660 observations

bull (vi) Using the standard prediction rule ndash predict one when 119890119888119900119899119887119906119910 ge 05 and zero otherwise ndash gives the fraction correctly predicted for ecobuy = 0 as 102248 asymp 411 so about 411 For ecobuy = 1 the fraction correctly predicted is 340412 asymp 825 or 825 With the usual prediction rule the model does a much better job predicting the decision to buy eco-labeled apples (The overall percent correctly predicted is about 67)

20

Week 9 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is heteroskedasticity in a regression model

When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i) we say that heteroskedasticity is present

bull In the presence of heteroskedasticity are the t-stat and F-stat from the usual OLS still valid Why Are there any other problems with the OLS under heteroskedasticity The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators

bull What are the heteroskedasticity-robust standard errors How do you use them in STATA These are the corrected standard errors that take into account the possible presence of heteroskedasticity The t-stat and F-stat computed using the heteroskedasticity-robust standard errors are valid test statistics In STATA these are easily obtained by using the option ldquorobustrdquo with the ldquoregressrdquo command eg regress lwage educ robust

bull How do you detect if there is heteroskedasticity The Breusch-Pagan test or the White test can be used to detect the presence of heteroskedasticity See Section 83 for details

bull If heteroskedasticity is present in a known form how would you estimate the model In this case the WLS estimators should be used which is more efficient than the OLS estimators See Section 84 for details

bull If heteroskedasticity is present in an unknown form how would you estimate the model If there is strong evidence for heteroskedasticity the FGLS estimators should be used which is based on an exponential functional form See Section 84 for details

bull What are the steps in the FGLS estimation You should summarise from Section 84

bull How would you handle the heteroskedasticity of the LPM The heteroskedasticity functional form is known for the LPM Hence the WLS can be used in principle However because the LPM can produce predicted probabilities that are outside the interval (01) the known functional form p(x)[1-p(x)] may not be useful for the WLS In that case it may be necessary to use FGLS for the LPM

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 81 bull Parts (ii) and (iii) The homoskedasticity assumption played no role in Chapter 5 in showing

that OLS is consistent But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid even in large samples As heteroskedasticity is a violation of the Gauss-Markov assumptions OLS is no longer BLUE

Q2 Wooldridge 82 bull With Var(u|incpriceeducfemale) = σ2inc2 h(x) = inc2 where h(x) is the heteroskedasticity

function defined in equation (821) Therefore radich(x) = inc and so the transformed equation is obtained by dividing the original equation by inc beerinc = β0(1inc) + β1 + β2priceinc + β3educinc + β4femaleinc + uinc

21

Notice that β1 which is the slope on inc in the original model is now a constant in the transformed equation This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation

Q3 Wooldridge 83 bull False The unbiasedness of WLS and OLS hinges crucially on Assumption MLR4 and as we

know from Chapter 4 this assumption is often violated when an important variable is omitted When MLR4 does not hold both WLS and OLS are biased Without specific information on how the omitted variable is correlated with the included explanatory variables it is not possible to determine which estimator has a small bias It is possible that WLS would have more bias than OLS or less bias Because we cannot know we should not claim to use WLS in order to solve ldquobiasesrdquo associated with OLS

Q4 Wooldridge 85 bull (i) No For each coefficient the usual standard errors and the heteroskedasticity-robust

ones are practically very similar bull (ii) The effect is minus029(4) = minus116 so the probability of smoking falls by about 116 bull (iii) As usual we compute the turning point in the quadratic 020[2(00026)] asymp 3846 so

about 38 and one-half years bull (iv) Holding other factors in the equation fixed a person in a state with restaurant smoking

restrictions has a 101 lower chance of smoking This is similar to the effect of having four more years of education

bull (v) We just plug the values of the independent variables into the OLS regression line 119904119898119900119896119890119904 = 656 minus 069log(6744)+012log(6500) minus029(16)+020(77) minus00026(77) asymp 0052 Thus the estimated probability of smoking for this person is close to zero (In fact this person is not a smoker so the equation predicts well for this particular observation)

Q5 Wooldridge C810 (401ksubs_c8_10do) bull (i) In the following equation estimated by OLS the usual standard errors are in (sdot) and the

heteroskedasticity-robust standard errors are in [sdot] 119890401119896 = minus506 + 0124 inc minus 000062 inc2 + 0265 age minus 00031 age2 minus 0035 male (081) (0006) (000005) (0039) (00005) (0121) [079] [0006] [000005] [0038] [00004] [0121] n = 9275 R2 = 094 There are no important differences if anything the robust standard errors are smaller

bull (ii) This is a general claim Since Var(y|x) = p(x)[1-p(x)] we can write E(u2|x) = p(x)-p(x)2 Written in error form u2 = p(x) - p(x)2 + v In other words we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v with the restrictions δ0 = 0 δ1 = 1 and δ2 = -1 Remember that for the LPM the fitted values 119910 are estimates of p(x) So when we run the regression 1199062 on 119910 and 1199102 (including an intercept) the intercept estimates should be close to zero the coefficient on 119910 should be close to one and the coefficient on 1199102 should be close to ndash1

bull (iii) The White LM statistic and F statistic about 5819 and 31032 respectively both of which are very significant The coefficient on 119890401119896 is about 1010 the coefficient on 119890401119896 2 about minus970 and the intercept is about -009 These estimates are quite close to what we expect to find from the theory in part (ii)

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 13: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

13

n = 401 R2 = 184 The coefficient on log(hseval) is an elasticity a one percent increase in housing value holding the other variables fixed increases the predicted price by about 12 percent The two-sided p-value is zero to three decimal places

bull (iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the 15 significance level against a two-sided alternative for log(income) and prppov is does not have a t statistic even close to one in absolute value) Nevertheless they are jointly significant at the 5 level because the outcome of the F2396 statistic is about 352 with p-value = 030 All of the control variables ndash log(income) prppov and log(hseval) ndash are highly correlated so it is not surprising that some are individually insignificant

bull (v) Because the regression in (iii) contains the most controls log(hseval) is individually significant and log(income) and prppov are jointly significant (iii) seems the most reliable It holds fixed three measure of income and affluence Therefore a reasonable estimate is that if prpblck increases by 10 psoda is estimated to increase by 1 other factors held fixed

Q5 Wooldridge 52 bull This is about the inconsistency of the simple regression of pctstck on funds A higher

tolerance of risk means more willingness to invest in the stock market so β2 gt 0 By assumption funds and risktol are positively correlated Now we use equation (55) where δ1 gt 0 plim(1205731) = β1 + β2δ1 gt β1 so 1205731 has a positive inconsistency (asymptotic bias) This makes sense if we omit risktol from the regression and it is positively correlated with funds some of the estimated effect of funds is actually due to the effect of risktol

Q6 Wooldridge C51 (wage1_c5_1do) bull (i) The estimated equation is

119908119886119892119890 = minus287 + 599 educ + 022 exper + 169 tenure (073) (051) (012) (022) n = 526 R2 = 306 120590 = 3085

bull Below is a histogram of the 526 residual 119906119894 i = 1 2 526 The histogram uses 27 bins which is suggested by the formula in the Stata manual for 526 observations For comparison the normal distribution that provides the best fit to the histogram is also plotted

bull

14

bull (ii) With log(wage) as the dependent variable the estimated equation is log (119908119886119892119890) = 284 + 092 educ + 0041 exper + 022 tenure (104) (007) (0017) (003) n = 526 R2 = 316 120590 = 441 The histogram for the residuals from this equation with the best-fitting normal distribution overlaid is given below

bull (iii) The residuals from the log(wage) regression appear to be more normally distributed

Certainly the histogram in part (ii) fits under its comparable normal density better than in part (i) and the histogram for the wage residuals is notably skewed to the left In the wage regression there are some very large residuals (roughly equal to 15) that lie almost five estimated standard deviations (120590 = 3085) from the mean of the residuals which is identically zero of course Residuals far from zero does not appear to be nearly as much of a problem in the log(wage) regression

bull The skewness and kurtosis of the residuals from the OLS regression may be used to test the null that u is normally distributed (skewness = 0 and kurtosis = 3 under the null) The null distribution of the Jarque-Bera test statistic (which takes into account both skewness and kurtosis) is approximately the chi-squared with 2 degrees of freedom Hence at the 5 level we reject the normality of u whenever the JB statistic is greater than 599 In STATA this test is carried out by the command ldquosktestrdquo With the data in WAGE1RAW JB = sdot (extremely large) for the wage model and JB = 1059 for the log(wage) model While the normality is rejected for both models we do see that the JB statistic is much smaller for the log(wage) model [also compare its p-values of skewness and excess kurtosis tests against those from the wage model] Hence the normality appears to be a reasonable approximation to the distribution of the error term in the log(wage) model

15

Week 7 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the advantages of using the log of a variable in regression Find the ldquorules of

thumbrdquo for taking logs See page 191 of Wooldridge (p198-199 for the 3rd edition)

bull Be careful when you interpret the coefficients of explanatory variables in a model where some variables are in logarithm Do you remember Table 23 Consult Table 23

bull How do you compute the change in y caused by Δx when the model is built for log(y) A precise computation is given by Equation (68) of Wooldridge

bull Why do we need the ldquointeractionrdquo terms in regression models An interaction term is needed if the partial effect of an explanatory variable is linearly related to another explanatory variable See (617) for example

bull What is the adjusted R-squared What is the difference between it and the R-squared The primary attractiveness of the adjusted R-squared is that it imposes penalty for adding additional explanatory variables to a model The R-squared can never fall when a new explanatory variable is added However the adjusted R-squared will fall if the t-ratio on the new variable is less than one in absolute value

bull How do you construct interval prediction for given x-values This is nicely summarised in Equations (627)-(631) and the surrounding text

bull How do you predict y for given x-values when the model is built for log(y) Check the list on page 212 (p220 for the 3rd edition)

bull What is involved in ldquoresidual analysisrdquo See pages 209-210 (p217-218 for the 3rd edition)

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 64 bull (i) Holding all other factors fixed we have

Δlog(wage) = β1 Δeduc + β2 Δeducsdot pareduc = (β1 + β2 pareduc) Δeduc Dividing both sides by Δeduc gives the result The sign of β2 is not obvious although β2 gt 0 if we think a child gets more out of another year of education the more highly educated are the childrsquos parents

bull (ii) We use the values pareduc = 32 and pareduc = 24 to interpret the coefficient on educsdotpareduc The difference in the estimated return to education is 00078(32 ndash 24) = 0062 or about 62 percentage points (Percentage points are changes in percentages)

bull (iii) When we add pareduc by itself the coefficient on the interaction term is negative The t statistic on educsdotpareduc is about ndash133 which is not significant at the 10 level against a two-sided alternative Note that the coefficient on pareduc is significant at the 5 level against a two-sided alternative This provides a good example of how omitting a level effect (pareduc in this case) can lead to biased estimation of the interaction effect

Q2 Wooldridge 65 bull This would make little sense Performances on math and science exams are measures of

outputs of the educational process and we would like to know how various educational inputs and school characteristics affect math and science scores For example if the staff-to-pupil ratio has an effect on both exam scores why would we want to hold

16

performance on the science test fixed while studying the effects of staff on the math pass rate This would be an example of controlling for too many factors in a regression equation The variable scill could be a dependent variable in an identical regression equation

Q3 Wooldridge C63 (C62 for the 3rd Edition) (wage2_c6_3do) bull (i) Holding exper (and the elements in u) fixed we have

Δ log(wage) = β1Δeduc + β2Δeducsdotexper = (β1 + β2exper)Δeduc or Δ log(wage) Δeduc = (β1 + β2exper) This is the approximate proportionate change in wage given one more year of education

bull (ii) H0 β3 = 0 If we think that education and experience interact positively ndash so that people with more experience are more productive when given another year of education then β3 gt 0 is the appropriate alternative

bull (iii) The estimated equation is log (119908119886119892119890) = 595 + 0440 educ ndash 0215 exper + 00320 educsdotexper (024) (0174) (0200) (00153) n = 935 R2 = 135 adjusted-R2 = 132 The t statistic on the interaction term is about 213 which gives a p-value below 02 against H1 β3 gt 0 Therefore we reject H0 β3 = 0 in favor of H1 β3 gt 0 at the 2 level

bull (iv) We rewrite the equation as log(wage) = β0 + θ1educ + β2exper + β3 educ (exper minus 10) + u and run the regression log(wage) on educ exper and educ(exper ndash 10) We want the coefficient on educ We obtain θ1 asymp 0761 and se(θ1) asymp 0066 The 95 CI for θ1 is about 063 to 089

Q4 Wooldridge C68 (hprice1_c6_8do) bull (i) The estimated equation (where price is in dollars) is

119901119903120484119888119890 = minus217703 + 2068 lotsize + 12278 sqrft + 138525 bdrms (294750) (0642) (1324) (90101) n = 88 R2 = 672 adjusted-R2 = 661 120590 = 59833 The predicted price at lotsize = 10000 sqrft = 2300 and bdrms = 4 is about $336714

bull (ii) The regression is pricei on (lotsizei ndash 10000) (sqrfti ndash 2300) and (bdrmsi ndash 4) We want the intercept estimate and the associated 95 CI from this regression The CI is approximately 3367067 plusmn 14665 or about $322042 to $351372 when rounded to the nearest dollar

bull (iii) We must use equation (636) to obtain the standard error of and then use equation (637) (assuming that price is normally distributed) But from the regression in part (ii) se(1199100) asymp 73745 and 120590 asymp 59833 Therefore se(0) asymp [(73745)2 + (59833) 2]12 asymp 602858 Using 199 as the approximate 975th percentile in the t84 distribution gives the 95 CI for price0 at the given values of the explanatory variables as 3367067 plusmn 199(602858) or rounded to the nearest dollar $216738 to $456675 This is a fairly wide prediction interval But we have not used many factors to explain housing price If we had more we could presumably reduce the error standard deviation and therefore 120590 to obtain a tighter prediction interval

17

Week 8 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is a qualitative factor Give some examples

This should be easy bull How do you use dummy variables to represent qualitative factors

A factor with two levels (or values) is easily represented one dummy variable For a factor with more than two levels (or values) we need more than one dummies For example a factors with 5 values requires 4 dummy variables

bull How do you use dummy variables to represent an ordinal variable An ordinal variable may be represented by dummies in the same way as a multi-levelled factor

bull How do you test for differences in regression functions across different groups We generally use the F test on properly-defined restricted model and unrestricted model Chow test is a special case of the F test where the null hypothesis is no difference across groups

bull What can you achieve by interacting group dummy variables with other regressors The interaction allows the slope coefficients to be different across groups

bull What is ldquoprogram evaluationrdquo It assesses the effectiveness of a program by checking the differences in regression coefficients for the controlled (or base) group and the treatment group

bull What are the interpretations of the PRF and SRF when the dependent variable is binary The PRF is the conditional probability of ldquosuccessrdquo for given explanatory variables The SRF is the predicted conditional probability of ldquosuccessrdquo for given explanatory variables

bull What are the shortcomings of the LPM The major drawback of the LPM is that the predicted probability of success may be outside the interval [01]

bull Does MLR5 hold for the LPM No as indicated in the textbook and lecture the conditional variance of the disturbance is not a constant and depends on explanatory variables var(u|x) = E(y|x)[1minus E(y|x)]

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 74 bull (i) The approximate difference is just the coefficient on utility times 100 or ndash283 The t

statistic is minus283099 asymp minus286 which is statistically significant bull (ii) 100sdot[exp(minus283) ndash 1) asymp minus247 and so the estimate is somewhat smaller in magnitude bull (iii) The proportionate difference is 181 minus 158 = 023 or about 23 One equation that can

be estimated to obtain the standard error of this difference is log(salary) = β0 + β1log(sales) + β2roe + δ1consprod + δ2utility +δ3trans + u where trans is a dummy variable for the transportation industry Now the base group is finance and so the coefficient δ1 directly measures the difference between the consumer products and finance industries and we can use the t statistic on consprod

Q2 Wooldridge 76 bull In Section 33 ndash in particular in the discussion surrounding Table 32 ndash we discussed how to

determine the direction of bias in the OLS estimators when an important variable (ability in this case) has been omitted from the regression As we discussed there Table 32 only

18

strictly holds with a single explanatory variable included in the regression but we often ignore the presence of other independent variables and use this table as a rough guide (Or we can use the results of Problem 310 for a more precise analysis) If less able workers are more likely to receive training then train and u are negatively correlated If we ignore the presence of educ and exper or at least assume that train and u are negatively correlated after netting out educ and exper then we can use Table 32 the OLS estimator of β1 (with ability in the error term) has a downward bias Because we think β1 ge 0 we are less likely to conclude that the training program was effective Intuitively this makes sense if those chosen for training had not received training they would have lowers wages on average than the control group

Q3 Wooldridge 79 bull (i) Plugging in u = 0 and d = 1 gives f1(z) = (β0+δ0) + (β1+δ1)z bull (ii) Setting f0(z) = f1(z) gives β0 + β1z = (β0+δ0) + (β1+δ1)z Therefore provided δ1 ne 0

we have z = minusδ0δ1 Clearly z is positive if and only if δ0δ1 is negative which means δ0 and δ1 must have opposite signs

bull (iii) Using part (ii) we have totcoll = 357030 = 119 years (totcoll= total college years) bull (iv) The estimated years of college where women catch up to men is much too high to be

practically relevant While the estimated coefficient on femalesdottotcoll shows that the gap is reduced at higher levels of college it is never closed ndash not even close In fact at four years of college the difference in predicted log wage is still minus357+030(4) = 237 or about 211 less for women

Q4 Wooldridge C713 (apple_c7_13do) bull (i) 412660 asymp 624 bull (ii) The OLS estimates of the LPM are

119890119888119900119899119887119906119910 = 424 minus 803 ecoprc + 719 regprc + 00055 faminc + 024 hhsize (165) (109) (132) (00053) (013) + 025 educ minus 00050 age (008) (00125) n = 660 R2 = 110 If ecoprc increases by say 10 cents (10) then the probability of buying eco-labeled apples falls by about 080 If regprc increases by 10 cents the probability of buying eco-labeled apples increases by about 072 (Of course we are assuming that the probabilities are not close to the boundaries of zero and one respectively)

bull (iii) The F test with 4 and 653 df is 443 with p-value = 0015 Thus based on the usual F test the four non-price variables are jointly very significant Of the four variables educ appears to have the most important effect For example a difference of four years of education implies an increase of 025(4) = 10 in the estimated probability of buying eco-labeled apples This suggests that more highly educated people are more open to buying produce that is environmentally friendly which is perhaps expected Household size (hhsize) also has an effect Comparing a couple with two children to one that has no children ndash other factors equal ndash the couple with two children has a 048 higher probability of buying eco-labeled apples

bull (iv) The model with log(faminc) fits the data slightly better the R-squared increases to about 112 (We would not expect a large increase in R-squared from a simple change in the functional form) The coefficient on log(faminc) is about 045 (t = 155) If log(faminc) increases by 10 which means roughly a 10 increase in faminc then P(ecobuy = 1) is estimated to increase by about 0045 a pretty small effect

19

bull (v) The fitted probabilities range from about 185 to 1051 so none are negative There are two fitted probabilities above 1 which is not a source of concern with 660 observations

bull (vi) Using the standard prediction rule ndash predict one when 119890119888119900119899119887119906119910 ge 05 and zero otherwise ndash gives the fraction correctly predicted for ecobuy = 0 as 102248 asymp 411 so about 411 For ecobuy = 1 the fraction correctly predicted is 340412 asymp 825 or 825 With the usual prediction rule the model does a much better job predicting the decision to buy eco-labeled apples (The overall percent correctly predicted is about 67)

20

Week 9 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is heteroskedasticity in a regression model

When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i) we say that heteroskedasticity is present

bull In the presence of heteroskedasticity are the t-stat and F-stat from the usual OLS still valid Why Are there any other problems with the OLS under heteroskedasticity The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators

bull What are the heteroskedasticity-robust standard errors How do you use them in STATA These are the corrected standard errors that take into account the possible presence of heteroskedasticity The t-stat and F-stat computed using the heteroskedasticity-robust standard errors are valid test statistics In STATA these are easily obtained by using the option ldquorobustrdquo with the ldquoregressrdquo command eg regress lwage educ robust

bull How do you detect if there is heteroskedasticity The Breusch-Pagan test or the White test can be used to detect the presence of heteroskedasticity See Section 83 for details

bull If heteroskedasticity is present in a known form how would you estimate the model In this case the WLS estimators should be used which is more efficient than the OLS estimators See Section 84 for details

bull If heteroskedasticity is present in an unknown form how would you estimate the model If there is strong evidence for heteroskedasticity the FGLS estimators should be used which is based on an exponential functional form See Section 84 for details

bull What are the steps in the FGLS estimation You should summarise from Section 84

bull How would you handle the heteroskedasticity of the LPM The heteroskedasticity functional form is known for the LPM Hence the WLS can be used in principle However because the LPM can produce predicted probabilities that are outside the interval (01) the known functional form p(x)[1-p(x)] may not be useful for the WLS In that case it may be necessary to use FGLS for the LPM

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 81 bull Parts (ii) and (iii) The homoskedasticity assumption played no role in Chapter 5 in showing

that OLS is consistent But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid even in large samples As heteroskedasticity is a violation of the Gauss-Markov assumptions OLS is no longer BLUE

Q2 Wooldridge 82 bull With Var(u|incpriceeducfemale) = σ2inc2 h(x) = inc2 where h(x) is the heteroskedasticity

function defined in equation (821) Therefore radich(x) = inc and so the transformed equation is obtained by dividing the original equation by inc beerinc = β0(1inc) + β1 + β2priceinc + β3educinc + β4femaleinc + uinc

21

Notice that β1 which is the slope on inc in the original model is now a constant in the transformed equation This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation

Q3 Wooldridge 83 bull False The unbiasedness of WLS and OLS hinges crucially on Assumption MLR4 and as we

know from Chapter 4 this assumption is often violated when an important variable is omitted When MLR4 does not hold both WLS and OLS are biased Without specific information on how the omitted variable is correlated with the included explanatory variables it is not possible to determine which estimator has a small bias It is possible that WLS would have more bias than OLS or less bias Because we cannot know we should not claim to use WLS in order to solve ldquobiasesrdquo associated with OLS

Q4 Wooldridge 85 bull (i) No For each coefficient the usual standard errors and the heteroskedasticity-robust

ones are practically very similar bull (ii) The effect is minus029(4) = minus116 so the probability of smoking falls by about 116 bull (iii) As usual we compute the turning point in the quadratic 020[2(00026)] asymp 3846 so

about 38 and one-half years bull (iv) Holding other factors in the equation fixed a person in a state with restaurant smoking

restrictions has a 101 lower chance of smoking This is similar to the effect of having four more years of education

bull (v) We just plug the values of the independent variables into the OLS regression line 119904119898119900119896119890119904 = 656 minus 069log(6744)+012log(6500) minus029(16)+020(77) minus00026(77) asymp 0052 Thus the estimated probability of smoking for this person is close to zero (In fact this person is not a smoker so the equation predicts well for this particular observation)

Q5 Wooldridge C810 (401ksubs_c8_10do) bull (i) In the following equation estimated by OLS the usual standard errors are in (sdot) and the

heteroskedasticity-robust standard errors are in [sdot] 119890401119896 = minus506 + 0124 inc minus 000062 inc2 + 0265 age minus 00031 age2 minus 0035 male (081) (0006) (000005) (0039) (00005) (0121) [079] [0006] [000005] [0038] [00004] [0121] n = 9275 R2 = 094 There are no important differences if anything the robust standard errors are smaller

bull (ii) This is a general claim Since Var(y|x) = p(x)[1-p(x)] we can write E(u2|x) = p(x)-p(x)2 Written in error form u2 = p(x) - p(x)2 + v In other words we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v with the restrictions δ0 = 0 δ1 = 1 and δ2 = -1 Remember that for the LPM the fitted values 119910 are estimates of p(x) So when we run the regression 1199062 on 119910 and 1199102 (including an intercept) the intercept estimates should be close to zero the coefficient on 119910 should be close to one and the coefficient on 1199102 should be close to ndash1

bull (iii) The White LM statistic and F statistic about 5819 and 31032 respectively both of which are very significant The coefficient on 119890401119896 is about 1010 the coefficient on 119890401119896 2 about minus970 and the intercept is about -009 These estimates are quite close to what we expect to find from the theory in part (ii)

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 14: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

14

bull (ii) With log(wage) as the dependent variable the estimated equation is log (119908119886119892119890) = 284 + 092 educ + 0041 exper + 022 tenure (104) (007) (0017) (003) n = 526 R2 = 316 120590 = 441 The histogram for the residuals from this equation with the best-fitting normal distribution overlaid is given below

bull (iii) The residuals from the log(wage) regression appear to be more normally distributed

Certainly the histogram in part (ii) fits under its comparable normal density better than in part (i) and the histogram for the wage residuals is notably skewed to the left In the wage regression there are some very large residuals (roughly equal to 15) that lie almost five estimated standard deviations (120590 = 3085) from the mean of the residuals which is identically zero of course Residuals far from zero does not appear to be nearly as much of a problem in the log(wage) regression

bull The skewness and kurtosis of the residuals from the OLS regression may be used to test the null that u is normally distributed (skewness = 0 and kurtosis = 3 under the null) The null distribution of the Jarque-Bera test statistic (which takes into account both skewness and kurtosis) is approximately the chi-squared with 2 degrees of freedom Hence at the 5 level we reject the normality of u whenever the JB statistic is greater than 599 In STATA this test is carried out by the command ldquosktestrdquo With the data in WAGE1RAW JB = sdot (extremely large) for the wage model and JB = 1059 for the log(wage) model While the normality is rejected for both models we do see that the JB statistic is much smaller for the log(wage) model [also compare its p-values of skewness and excess kurtosis tests against those from the wage model] Hence the normality appears to be a reasonable approximation to the distribution of the error term in the log(wage) model

15

Week 7 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the advantages of using the log of a variable in regression Find the ldquorules of

thumbrdquo for taking logs See page 191 of Wooldridge (p198-199 for the 3rd edition)

bull Be careful when you interpret the coefficients of explanatory variables in a model where some variables are in logarithm Do you remember Table 23 Consult Table 23

bull How do you compute the change in y caused by Δx when the model is built for log(y) A precise computation is given by Equation (68) of Wooldridge

bull Why do we need the ldquointeractionrdquo terms in regression models An interaction term is needed if the partial effect of an explanatory variable is linearly related to another explanatory variable See (617) for example

bull What is the adjusted R-squared What is the difference between it and the R-squared The primary attractiveness of the adjusted R-squared is that it imposes penalty for adding additional explanatory variables to a model The R-squared can never fall when a new explanatory variable is added However the adjusted R-squared will fall if the t-ratio on the new variable is less than one in absolute value

bull How do you construct interval prediction for given x-values This is nicely summarised in Equations (627)-(631) and the surrounding text

bull How do you predict y for given x-values when the model is built for log(y) Check the list on page 212 (p220 for the 3rd edition)

bull What is involved in ldquoresidual analysisrdquo See pages 209-210 (p217-218 for the 3rd edition)

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 64 bull (i) Holding all other factors fixed we have

Δlog(wage) = β1 Δeduc + β2 Δeducsdot pareduc = (β1 + β2 pareduc) Δeduc Dividing both sides by Δeduc gives the result The sign of β2 is not obvious although β2 gt 0 if we think a child gets more out of another year of education the more highly educated are the childrsquos parents

bull (ii) We use the values pareduc = 32 and pareduc = 24 to interpret the coefficient on educsdotpareduc The difference in the estimated return to education is 00078(32 ndash 24) = 0062 or about 62 percentage points (Percentage points are changes in percentages)

bull (iii) When we add pareduc by itself the coefficient on the interaction term is negative The t statistic on educsdotpareduc is about ndash133 which is not significant at the 10 level against a two-sided alternative Note that the coefficient on pareduc is significant at the 5 level against a two-sided alternative This provides a good example of how omitting a level effect (pareduc in this case) can lead to biased estimation of the interaction effect

Q2 Wooldridge 65 bull This would make little sense Performances on math and science exams are measures of

outputs of the educational process and we would like to know how various educational inputs and school characteristics affect math and science scores For example if the staff-to-pupil ratio has an effect on both exam scores why would we want to hold

16

performance on the science test fixed while studying the effects of staff on the math pass rate This would be an example of controlling for too many factors in a regression equation The variable scill could be a dependent variable in an identical regression equation

Q3 Wooldridge C63 (C62 for the 3rd Edition) (wage2_c6_3do) bull (i) Holding exper (and the elements in u) fixed we have

Δ log(wage) = β1Δeduc + β2Δeducsdotexper = (β1 + β2exper)Δeduc or Δ log(wage) Δeduc = (β1 + β2exper) This is the approximate proportionate change in wage given one more year of education

bull (ii) H0 β3 = 0 If we think that education and experience interact positively ndash so that people with more experience are more productive when given another year of education then β3 gt 0 is the appropriate alternative

bull (iii) The estimated equation is log (119908119886119892119890) = 595 + 0440 educ ndash 0215 exper + 00320 educsdotexper (024) (0174) (0200) (00153) n = 935 R2 = 135 adjusted-R2 = 132 The t statistic on the interaction term is about 213 which gives a p-value below 02 against H1 β3 gt 0 Therefore we reject H0 β3 = 0 in favor of H1 β3 gt 0 at the 2 level

bull (iv) We rewrite the equation as log(wage) = β0 + θ1educ + β2exper + β3 educ (exper minus 10) + u and run the regression log(wage) on educ exper and educ(exper ndash 10) We want the coefficient on educ We obtain θ1 asymp 0761 and se(θ1) asymp 0066 The 95 CI for θ1 is about 063 to 089

Q4 Wooldridge C68 (hprice1_c6_8do) bull (i) The estimated equation (where price is in dollars) is

119901119903120484119888119890 = minus217703 + 2068 lotsize + 12278 sqrft + 138525 bdrms (294750) (0642) (1324) (90101) n = 88 R2 = 672 adjusted-R2 = 661 120590 = 59833 The predicted price at lotsize = 10000 sqrft = 2300 and bdrms = 4 is about $336714

bull (ii) The regression is pricei on (lotsizei ndash 10000) (sqrfti ndash 2300) and (bdrmsi ndash 4) We want the intercept estimate and the associated 95 CI from this regression The CI is approximately 3367067 plusmn 14665 or about $322042 to $351372 when rounded to the nearest dollar

bull (iii) We must use equation (636) to obtain the standard error of and then use equation (637) (assuming that price is normally distributed) But from the regression in part (ii) se(1199100) asymp 73745 and 120590 asymp 59833 Therefore se(0) asymp [(73745)2 + (59833) 2]12 asymp 602858 Using 199 as the approximate 975th percentile in the t84 distribution gives the 95 CI for price0 at the given values of the explanatory variables as 3367067 plusmn 199(602858) or rounded to the nearest dollar $216738 to $456675 This is a fairly wide prediction interval But we have not used many factors to explain housing price If we had more we could presumably reduce the error standard deviation and therefore 120590 to obtain a tighter prediction interval

17

Week 8 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is a qualitative factor Give some examples

This should be easy bull How do you use dummy variables to represent qualitative factors

A factor with two levels (or values) is easily represented one dummy variable For a factor with more than two levels (or values) we need more than one dummies For example a factors with 5 values requires 4 dummy variables

bull How do you use dummy variables to represent an ordinal variable An ordinal variable may be represented by dummies in the same way as a multi-levelled factor

bull How do you test for differences in regression functions across different groups We generally use the F test on properly-defined restricted model and unrestricted model Chow test is a special case of the F test where the null hypothesis is no difference across groups

bull What can you achieve by interacting group dummy variables with other regressors The interaction allows the slope coefficients to be different across groups

bull What is ldquoprogram evaluationrdquo It assesses the effectiveness of a program by checking the differences in regression coefficients for the controlled (or base) group and the treatment group

bull What are the interpretations of the PRF and SRF when the dependent variable is binary The PRF is the conditional probability of ldquosuccessrdquo for given explanatory variables The SRF is the predicted conditional probability of ldquosuccessrdquo for given explanatory variables

bull What are the shortcomings of the LPM The major drawback of the LPM is that the predicted probability of success may be outside the interval [01]

bull Does MLR5 hold for the LPM No as indicated in the textbook and lecture the conditional variance of the disturbance is not a constant and depends on explanatory variables var(u|x) = E(y|x)[1minus E(y|x)]

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 74 bull (i) The approximate difference is just the coefficient on utility times 100 or ndash283 The t

statistic is minus283099 asymp minus286 which is statistically significant bull (ii) 100sdot[exp(minus283) ndash 1) asymp minus247 and so the estimate is somewhat smaller in magnitude bull (iii) The proportionate difference is 181 minus 158 = 023 or about 23 One equation that can

be estimated to obtain the standard error of this difference is log(salary) = β0 + β1log(sales) + β2roe + δ1consprod + δ2utility +δ3trans + u where trans is a dummy variable for the transportation industry Now the base group is finance and so the coefficient δ1 directly measures the difference between the consumer products and finance industries and we can use the t statistic on consprod

Q2 Wooldridge 76 bull In Section 33 ndash in particular in the discussion surrounding Table 32 ndash we discussed how to

determine the direction of bias in the OLS estimators when an important variable (ability in this case) has been omitted from the regression As we discussed there Table 32 only

18

strictly holds with a single explanatory variable included in the regression but we often ignore the presence of other independent variables and use this table as a rough guide (Or we can use the results of Problem 310 for a more precise analysis) If less able workers are more likely to receive training then train and u are negatively correlated If we ignore the presence of educ and exper or at least assume that train and u are negatively correlated after netting out educ and exper then we can use Table 32 the OLS estimator of β1 (with ability in the error term) has a downward bias Because we think β1 ge 0 we are less likely to conclude that the training program was effective Intuitively this makes sense if those chosen for training had not received training they would have lowers wages on average than the control group

Q3 Wooldridge 79 bull (i) Plugging in u = 0 and d = 1 gives f1(z) = (β0+δ0) + (β1+δ1)z bull (ii) Setting f0(z) = f1(z) gives β0 + β1z = (β0+δ0) + (β1+δ1)z Therefore provided δ1 ne 0

we have z = minusδ0δ1 Clearly z is positive if and only if δ0δ1 is negative which means δ0 and δ1 must have opposite signs

bull (iii) Using part (ii) we have totcoll = 357030 = 119 years (totcoll= total college years) bull (iv) The estimated years of college where women catch up to men is much too high to be

practically relevant While the estimated coefficient on femalesdottotcoll shows that the gap is reduced at higher levels of college it is never closed ndash not even close In fact at four years of college the difference in predicted log wage is still minus357+030(4) = 237 or about 211 less for women

Q4 Wooldridge C713 (apple_c7_13do) bull (i) 412660 asymp 624 bull (ii) The OLS estimates of the LPM are

119890119888119900119899119887119906119910 = 424 minus 803 ecoprc + 719 regprc + 00055 faminc + 024 hhsize (165) (109) (132) (00053) (013) + 025 educ minus 00050 age (008) (00125) n = 660 R2 = 110 If ecoprc increases by say 10 cents (10) then the probability of buying eco-labeled apples falls by about 080 If regprc increases by 10 cents the probability of buying eco-labeled apples increases by about 072 (Of course we are assuming that the probabilities are not close to the boundaries of zero and one respectively)

bull (iii) The F test with 4 and 653 df is 443 with p-value = 0015 Thus based on the usual F test the four non-price variables are jointly very significant Of the four variables educ appears to have the most important effect For example a difference of four years of education implies an increase of 025(4) = 10 in the estimated probability of buying eco-labeled apples This suggests that more highly educated people are more open to buying produce that is environmentally friendly which is perhaps expected Household size (hhsize) also has an effect Comparing a couple with two children to one that has no children ndash other factors equal ndash the couple with two children has a 048 higher probability of buying eco-labeled apples

bull (iv) The model with log(faminc) fits the data slightly better the R-squared increases to about 112 (We would not expect a large increase in R-squared from a simple change in the functional form) The coefficient on log(faminc) is about 045 (t = 155) If log(faminc) increases by 10 which means roughly a 10 increase in faminc then P(ecobuy = 1) is estimated to increase by about 0045 a pretty small effect

19

bull (v) The fitted probabilities range from about 185 to 1051 so none are negative There are two fitted probabilities above 1 which is not a source of concern with 660 observations

bull (vi) Using the standard prediction rule ndash predict one when 119890119888119900119899119887119906119910 ge 05 and zero otherwise ndash gives the fraction correctly predicted for ecobuy = 0 as 102248 asymp 411 so about 411 For ecobuy = 1 the fraction correctly predicted is 340412 asymp 825 or 825 With the usual prediction rule the model does a much better job predicting the decision to buy eco-labeled apples (The overall percent correctly predicted is about 67)

20

Week 9 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is heteroskedasticity in a regression model

When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i) we say that heteroskedasticity is present

bull In the presence of heteroskedasticity are the t-stat and F-stat from the usual OLS still valid Why Are there any other problems with the OLS under heteroskedasticity The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators

bull What are the heteroskedasticity-robust standard errors How do you use them in STATA These are the corrected standard errors that take into account the possible presence of heteroskedasticity The t-stat and F-stat computed using the heteroskedasticity-robust standard errors are valid test statistics In STATA these are easily obtained by using the option ldquorobustrdquo with the ldquoregressrdquo command eg regress lwage educ robust

bull How do you detect if there is heteroskedasticity The Breusch-Pagan test or the White test can be used to detect the presence of heteroskedasticity See Section 83 for details

bull If heteroskedasticity is present in a known form how would you estimate the model In this case the WLS estimators should be used which is more efficient than the OLS estimators See Section 84 for details

bull If heteroskedasticity is present in an unknown form how would you estimate the model If there is strong evidence for heteroskedasticity the FGLS estimators should be used which is based on an exponential functional form See Section 84 for details

bull What are the steps in the FGLS estimation You should summarise from Section 84

bull How would you handle the heteroskedasticity of the LPM The heteroskedasticity functional form is known for the LPM Hence the WLS can be used in principle However because the LPM can produce predicted probabilities that are outside the interval (01) the known functional form p(x)[1-p(x)] may not be useful for the WLS In that case it may be necessary to use FGLS for the LPM

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 81 bull Parts (ii) and (iii) The homoskedasticity assumption played no role in Chapter 5 in showing

that OLS is consistent But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid even in large samples As heteroskedasticity is a violation of the Gauss-Markov assumptions OLS is no longer BLUE

Q2 Wooldridge 82 bull With Var(u|incpriceeducfemale) = σ2inc2 h(x) = inc2 where h(x) is the heteroskedasticity

function defined in equation (821) Therefore radich(x) = inc and so the transformed equation is obtained by dividing the original equation by inc beerinc = β0(1inc) + β1 + β2priceinc + β3educinc + β4femaleinc + uinc

21

Notice that β1 which is the slope on inc in the original model is now a constant in the transformed equation This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation

Q3 Wooldridge 83 bull False The unbiasedness of WLS and OLS hinges crucially on Assumption MLR4 and as we

know from Chapter 4 this assumption is often violated when an important variable is omitted When MLR4 does not hold both WLS and OLS are biased Without specific information on how the omitted variable is correlated with the included explanatory variables it is not possible to determine which estimator has a small bias It is possible that WLS would have more bias than OLS or less bias Because we cannot know we should not claim to use WLS in order to solve ldquobiasesrdquo associated with OLS

Q4 Wooldridge 85 bull (i) No For each coefficient the usual standard errors and the heteroskedasticity-robust

ones are practically very similar bull (ii) The effect is minus029(4) = minus116 so the probability of smoking falls by about 116 bull (iii) As usual we compute the turning point in the quadratic 020[2(00026)] asymp 3846 so

about 38 and one-half years bull (iv) Holding other factors in the equation fixed a person in a state with restaurant smoking

restrictions has a 101 lower chance of smoking This is similar to the effect of having four more years of education

bull (v) We just plug the values of the independent variables into the OLS regression line 119904119898119900119896119890119904 = 656 minus 069log(6744)+012log(6500) minus029(16)+020(77) minus00026(77) asymp 0052 Thus the estimated probability of smoking for this person is close to zero (In fact this person is not a smoker so the equation predicts well for this particular observation)

Q5 Wooldridge C810 (401ksubs_c8_10do) bull (i) In the following equation estimated by OLS the usual standard errors are in (sdot) and the

heteroskedasticity-robust standard errors are in [sdot] 119890401119896 = minus506 + 0124 inc minus 000062 inc2 + 0265 age minus 00031 age2 minus 0035 male (081) (0006) (000005) (0039) (00005) (0121) [079] [0006] [000005] [0038] [00004] [0121] n = 9275 R2 = 094 There are no important differences if anything the robust standard errors are smaller

bull (ii) This is a general claim Since Var(y|x) = p(x)[1-p(x)] we can write E(u2|x) = p(x)-p(x)2 Written in error form u2 = p(x) - p(x)2 + v In other words we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v with the restrictions δ0 = 0 δ1 = 1 and δ2 = -1 Remember that for the LPM the fitted values 119910 are estimates of p(x) So when we run the regression 1199062 on 119910 and 1199102 (including an intercept) the intercept estimates should be close to zero the coefficient on 119910 should be close to one and the coefficient on 1199102 should be close to ndash1

bull (iii) The White LM statistic and F statistic about 5819 and 31032 respectively both of which are very significant The coefficient on 119890401119896 is about 1010 the coefficient on 119890401119896 2 about minus970 and the intercept is about -009 These estimates are quite close to what we expect to find from the theory in part (ii)

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 15: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

15

Week 7 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the advantages of using the log of a variable in regression Find the ldquorules of

thumbrdquo for taking logs See page 191 of Wooldridge (p198-199 for the 3rd edition)

bull Be careful when you interpret the coefficients of explanatory variables in a model where some variables are in logarithm Do you remember Table 23 Consult Table 23

bull How do you compute the change in y caused by Δx when the model is built for log(y) A precise computation is given by Equation (68) of Wooldridge

bull Why do we need the ldquointeractionrdquo terms in regression models An interaction term is needed if the partial effect of an explanatory variable is linearly related to another explanatory variable See (617) for example

bull What is the adjusted R-squared What is the difference between it and the R-squared The primary attractiveness of the adjusted R-squared is that it imposes penalty for adding additional explanatory variables to a model The R-squared can never fall when a new explanatory variable is added However the adjusted R-squared will fall if the t-ratio on the new variable is less than one in absolute value

bull How do you construct interval prediction for given x-values This is nicely summarised in Equations (627)-(631) and the surrounding text

bull How do you predict y for given x-values when the model is built for log(y) Check the list on page 212 (p220 for the 3rd edition)

bull What is involved in ldquoresidual analysisrdquo See pages 209-210 (p217-218 for the 3rd edition)

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 64 bull (i) Holding all other factors fixed we have

Δlog(wage) = β1 Δeduc + β2 Δeducsdot pareduc = (β1 + β2 pareduc) Δeduc Dividing both sides by Δeduc gives the result The sign of β2 is not obvious although β2 gt 0 if we think a child gets more out of another year of education the more highly educated are the childrsquos parents

bull (ii) We use the values pareduc = 32 and pareduc = 24 to interpret the coefficient on educsdotpareduc The difference in the estimated return to education is 00078(32 ndash 24) = 0062 or about 62 percentage points (Percentage points are changes in percentages)

bull (iii) When we add pareduc by itself the coefficient on the interaction term is negative The t statistic on educsdotpareduc is about ndash133 which is not significant at the 10 level against a two-sided alternative Note that the coefficient on pareduc is significant at the 5 level against a two-sided alternative This provides a good example of how omitting a level effect (pareduc in this case) can lead to biased estimation of the interaction effect

Q2 Wooldridge 65 bull This would make little sense Performances on math and science exams are measures of

outputs of the educational process and we would like to know how various educational inputs and school characteristics affect math and science scores For example if the staff-to-pupil ratio has an effect on both exam scores why would we want to hold

16

performance on the science test fixed while studying the effects of staff on the math pass rate This would be an example of controlling for too many factors in a regression equation The variable scill could be a dependent variable in an identical regression equation

Q3 Wooldridge C63 (C62 for the 3rd Edition) (wage2_c6_3do) bull (i) Holding exper (and the elements in u) fixed we have

Δ log(wage) = β1Δeduc + β2Δeducsdotexper = (β1 + β2exper)Δeduc or Δ log(wage) Δeduc = (β1 + β2exper) This is the approximate proportionate change in wage given one more year of education

bull (ii) H0 β3 = 0 If we think that education and experience interact positively ndash so that people with more experience are more productive when given another year of education then β3 gt 0 is the appropriate alternative

bull (iii) The estimated equation is log (119908119886119892119890) = 595 + 0440 educ ndash 0215 exper + 00320 educsdotexper (024) (0174) (0200) (00153) n = 935 R2 = 135 adjusted-R2 = 132 The t statistic on the interaction term is about 213 which gives a p-value below 02 against H1 β3 gt 0 Therefore we reject H0 β3 = 0 in favor of H1 β3 gt 0 at the 2 level

bull (iv) We rewrite the equation as log(wage) = β0 + θ1educ + β2exper + β3 educ (exper minus 10) + u and run the regression log(wage) on educ exper and educ(exper ndash 10) We want the coefficient on educ We obtain θ1 asymp 0761 and se(θ1) asymp 0066 The 95 CI for θ1 is about 063 to 089

Q4 Wooldridge C68 (hprice1_c6_8do) bull (i) The estimated equation (where price is in dollars) is

119901119903120484119888119890 = minus217703 + 2068 lotsize + 12278 sqrft + 138525 bdrms (294750) (0642) (1324) (90101) n = 88 R2 = 672 adjusted-R2 = 661 120590 = 59833 The predicted price at lotsize = 10000 sqrft = 2300 and bdrms = 4 is about $336714

bull (ii) The regression is pricei on (lotsizei ndash 10000) (sqrfti ndash 2300) and (bdrmsi ndash 4) We want the intercept estimate and the associated 95 CI from this regression The CI is approximately 3367067 plusmn 14665 or about $322042 to $351372 when rounded to the nearest dollar

bull (iii) We must use equation (636) to obtain the standard error of and then use equation (637) (assuming that price is normally distributed) But from the regression in part (ii) se(1199100) asymp 73745 and 120590 asymp 59833 Therefore se(0) asymp [(73745)2 + (59833) 2]12 asymp 602858 Using 199 as the approximate 975th percentile in the t84 distribution gives the 95 CI for price0 at the given values of the explanatory variables as 3367067 plusmn 199(602858) or rounded to the nearest dollar $216738 to $456675 This is a fairly wide prediction interval But we have not used many factors to explain housing price If we had more we could presumably reduce the error standard deviation and therefore 120590 to obtain a tighter prediction interval

17

Week 8 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is a qualitative factor Give some examples

This should be easy bull How do you use dummy variables to represent qualitative factors

A factor with two levels (or values) is easily represented one dummy variable For a factor with more than two levels (or values) we need more than one dummies For example a factors with 5 values requires 4 dummy variables

bull How do you use dummy variables to represent an ordinal variable An ordinal variable may be represented by dummies in the same way as a multi-levelled factor

bull How do you test for differences in regression functions across different groups We generally use the F test on properly-defined restricted model and unrestricted model Chow test is a special case of the F test where the null hypothesis is no difference across groups

bull What can you achieve by interacting group dummy variables with other regressors The interaction allows the slope coefficients to be different across groups

bull What is ldquoprogram evaluationrdquo It assesses the effectiveness of a program by checking the differences in regression coefficients for the controlled (or base) group and the treatment group

bull What are the interpretations of the PRF and SRF when the dependent variable is binary The PRF is the conditional probability of ldquosuccessrdquo for given explanatory variables The SRF is the predicted conditional probability of ldquosuccessrdquo for given explanatory variables

bull What are the shortcomings of the LPM The major drawback of the LPM is that the predicted probability of success may be outside the interval [01]

bull Does MLR5 hold for the LPM No as indicated in the textbook and lecture the conditional variance of the disturbance is not a constant and depends on explanatory variables var(u|x) = E(y|x)[1minus E(y|x)]

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 74 bull (i) The approximate difference is just the coefficient on utility times 100 or ndash283 The t

statistic is minus283099 asymp minus286 which is statistically significant bull (ii) 100sdot[exp(minus283) ndash 1) asymp minus247 and so the estimate is somewhat smaller in magnitude bull (iii) The proportionate difference is 181 minus 158 = 023 or about 23 One equation that can

be estimated to obtain the standard error of this difference is log(salary) = β0 + β1log(sales) + β2roe + δ1consprod + δ2utility +δ3trans + u where trans is a dummy variable for the transportation industry Now the base group is finance and so the coefficient δ1 directly measures the difference between the consumer products and finance industries and we can use the t statistic on consprod

Q2 Wooldridge 76 bull In Section 33 ndash in particular in the discussion surrounding Table 32 ndash we discussed how to

determine the direction of bias in the OLS estimators when an important variable (ability in this case) has been omitted from the regression As we discussed there Table 32 only

18

strictly holds with a single explanatory variable included in the regression but we often ignore the presence of other independent variables and use this table as a rough guide (Or we can use the results of Problem 310 for a more precise analysis) If less able workers are more likely to receive training then train and u are negatively correlated If we ignore the presence of educ and exper or at least assume that train and u are negatively correlated after netting out educ and exper then we can use Table 32 the OLS estimator of β1 (with ability in the error term) has a downward bias Because we think β1 ge 0 we are less likely to conclude that the training program was effective Intuitively this makes sense if those chosen for training had not received training they would have lowers wages on average than the control group

Q3 Wooldridge 79 bull (i) Plugging in u = 0 and d = 1 gives f1(z) = (β0+δ0) + (β1+δ1)z bull (ii) Setting f0(z) = f1(z) gives β0 + β1z = (β0+δ0) + (β1+δ1)z Therefore provided δ1 ne 0

we have z = minusδ0δ1 Clearly z is positive if and only if δ0δ1 is negative which means δ0 and δ1 must have opposite signs

bull (iii) Using part (ii) we have totcoll = 357030 = 119 years (totcoll= total college years) bull (iv) The estimated years of college where women catch up to men is much too high to be

practically relevant While the estimated coefficient on femalesdottotcoll shows that the gap is reduced at higher levels of college it is never closed ndash not even close In fact at four years of college the difference in predicted log wage is still minus357+030(4) = 237 or about 211 less for women

Q4 Wooldridge C713 (apple_c7_13do) bull (i) 412660 asymp 624 bull (ii) The OLS estimates of the LPM are

119890119888119900119899119887119906119910 = 424 minus 803 ecoprc + 719 regprc + 00055 faminc + 024 hhsize (165) (109) (132) (00053) (013) + 025 educ minus 00050 age (008) (00125) n = 660 R2 = 110 If ecoprc increases by say 10 cents (10) then the probability of buying eco-labeled apples falls by about 080 If regprc increases by 10 cents the probability of buying eco-labeled apples increases by about 072 (Of course we are assuming that the probabilities are not close to the boundaries of zero and one respectively)

bull (iii) The F test with 4 and 653 df is 443 with p-value = 0015 Thus based on the usual F test the four non-price variables are jointly very significant Of the four variables educ appears to have the most important effect For example a difference of four years of education implies an increase of 025(4) = 10 in the estimated probability of buying eco-labeled apples This suggests that more highly educated people are more open to buying produce that is environmentally friendly which is perhaps expected Household size (hhsize) also has an effect Comparing a couple with two children to one that has no children ndash other factors equal ndash the couple with two children has a 048 higher probability of buying eco-labeled apples

bull (iv) The model with log(faminc) fits the data slightly better the R-squared increases to about 112 (We would not expect a large increase in R-squared from a simple change in the functional form) The coefficient on log(faminc) is about 045 (t = 155) If log(faminc) increases by 10 which means roughly a 10 increase in faminc then P(ecobuy = 1) is estimated to increase by about 0045 a pretty small effect

19

bull (v) The fitted probabilities range from about 185 to 1051 so none are negative There are two fitted probabilities above 1 which is not a source of concern with 660 observations

bull (vi) Using the standard prediction rule ndash predict one when 119890119888119900119899119887119906119910 ge 05 and zero otherwise ndash gives the fraction correctly predicted for ecobuy = 0 as 102248 asymp 411 so about 411 For ecobuy = 1 the fraction correctly predicted is 340412 asymp 825 or 825 With the usual prediction rule the model does a much better job predicting the decision to buy eco-labeled apples (The overall percent correctly predicted is about 67)

20

Week 9 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is heteroskedasticity in a regression model

When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i) we say that heteroskedasticity is present

bull In the presence of heteroskedasticity are the t-stat and F-stat from the usual OLS still valid Why Are there any other problems with the OLS under heteroskedasticity The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators

bull What are the heteroskedasticity-robust standard errors How do you use them in STATA These are the corrected standard errors that take into account the possible presence of heteroskedasticity The t-stat and F-stat computed using the heteroskedasticity-robust standard errors are valid test statistics In STATA these are easily obtained by using the option ldquorobustrdquo with the ldquoregressrdquo command eg regress lwage educ robust

bull How do you detect if there is heteroskedasticity The Breusch-Pagan test or the White test can be used to detect the presence of heteroskedasticity See Section 83 for details

bull If heteroskedasticity is present in a known form how would you estimate the model In this case the WLS estimators should be used which is more efficient than the OLS estimators See Section 84 for details

bull If heteroskedasticity is present in an unknown form how would you estimate the model If there is strong evidence for heteroskedasticity the FGLS estimators should be used which is based on an exponential functional form See Section 84 for details

bull What are the steps in the FGLS estimation You should summarise from Section 84

bull How would you handle the heteroskedasticity of the LPM The heteroskedasticity functional form is known for the LPM Hence the WLS can be used in principle However because the LPM can produce predicted probabilities that are outside the interval (01) the known functional form p(x)[1-p(x)] may not be useful for the WLS In that case it may be necessary to use FGLS for the LPM

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 81 bull Parts (ii) and (iii) The homoskedasticity assumption played no role in Chapter 5 in showing

that OLS is consistent But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid even in large samples As heteroskedasticity is a violation of the Gauss-Markov assumptions OLS is no longer BLUE

Q2 Wooldridge 82 bull With Var(u|incpriceeducfemale) = σ2inc2 h(x) = inc2 where h(x) is the heteroskedasticity

function defined in equation (821) Therefore radich(x) = inc and so the transformed equation is obtained by dividing the original equation by inc beerinc = β0(1inc) + β1 + β2priceinc + β3educinc + β4femaleinc + uinc

21

Notice that β1 which is the slope on inc in the original model is now a constant in the transformed equation This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation

Q3 Wooldridge 83 bull False The unbiasedness of WLS and OLS hinges crucially on Assumption MLR4 and as we

know from Chapter 4 this assumption is often violated when an important variable is omitted When MLR4 does not hold both WLS and OLS are biased Without specific information on how the omitted variable is correlated with the included explanatory variables it is not possible to determine which estimator has a small bias It is possible that WLS would have more bias than OLS or less bias Because we cannot know we should not claim to use WLS in order to solve ldquobiasesrdquo associated with OLS

Q4 Wooldridge 85 bull (i) No For each coefficient the usual standard errors and the heteroskedasticity-robust

ones are practically very similar bull (ii) The effect is minus029(4) = minus116 so the probability of smoking falls by about 116 bull (iii) As usual we compute the turning point in the quadratic 020[2(00026)] asymp 3846 so

about 38 and one-half years bull (iv) Holding other factors in the equation fixed a person in a state with restaurant smoking

restrictions has a 101 lower chance of smoking This is similar to the effect of having four more years of education

bull (v) We just plug the values of the independent variables into the OLS regression line 119904119898119900119896119890119904 = 656 minus 069log(6744)+012log(6500) minus029(16)+020(77) minus00026(77) asymp 0052 Thus the estimated probability of smoking for this person is close to zero (In fact this person is not a smoker so the equation predicts well for this particular observation)

Q5 Wooldridge C810 (401ksubs_c8_10do) bull (i) In the following equation estimated by OLS the usual standard errors are in (sdot) and the

heteroskedasticity-robust standard errors are in [sdot] 119890401119896 = minus506 + 0124 inc minus 000062 inc2 + 0265 age minus 00031 age2 minus 0035 male (081) (0006) (000005) (0039) (00005) (0121) [079] [0006] [000005] [0038] [00004] [0121] n = 9275 R2 = 094 There are no important differences if anything the robust standard errors are smaller

bull (ii) This is a general claim Since Var(y|x) = p(x)[1-p(x)] we can write E(u2|x) = p(x)-p(x)2 Written in error form u2 = p(x) - p(x)2 + v In other words we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v with the restrictions δ0 = 0 δ1 = 1 and δ2 = -1 Remember that for the LPM the fitted values 119910 are estimates of p(x) So when we run the regression 1199062 on 119910 and 1199102 (including an intercept) the intercept estimates should be close to zero the coefficient on 119910 should be close to one and the coefficient on 1199102 should be close to ndash1

bull (iii) The White LM statistic and F statistic about 5819 and 31032 respectively both of which are very significant The coefficient on 119890401119896 is about 1010 the coefficient on 119890401119896 2 about minus970 and the intercept is about -009 These estimates are quite close to what we expect to find from the theory in part (ii)

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 16: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

16

performance on the science test fixed while studying the effects of staff on the math pass rate This would be an example of controlling for too many factors in a regression equation The variable scill could be a dependent variable in an identical regression equation

Q3 Wooldridge C63 (C62 for the 3rd Edition) (wage2_c6_3do) bull (i) Holding exper (and the elements in u) fixed we have

Δ log(wage) = β1Δeduc + β2Δeducsdotexper = (β1 + β2exper)Δeduc or Δ log(wage) Δeduc = (β1 + β2exper) This is the approximate proportionate change in wage given one more year of education

bull (ii) H0 β3 = 0 If we think that education and experience interact positively ndash so that people with more experience are more productive when given another year of education then β3 gt 0 is the appropriate alternative

bull (iii) The estimated equation is log (119908119886119892119890) = 595 + 0440 educ ndash 0215 exper + 00320 educsdotexper (024) (0174) (0200) (00153) n = 935 R2 = 135 adjusted-R2 = 132 The t statistic on the interaction term is about 213 which gives a p-value below 02 against H1 β3 gt 0 Therefore we reject H0 β3 = 0 in favor of H1 β3 gt 0 at the 2 level

bull (iv) We rewrite the equation as log(wage) = β0 + θ1educ + β2exper + β3 educ (exper minus 10) + u and run the regression log(wage) on educ exper and educ(exper ndash 10) We want the coefficient on educ We obtain θ1 asymp 0761 and se(θ1) asymp 0066 The 95 CI for θ1 is about 063 to 089

Q4 Wooldridge C68 (hprice1_c6_8do) bull (i) The estimated equation (where price is in dollars) is

119901119903120484119888119890 = minus217703 + 2068 lotsize + 12278 sqrft + 138525 bdrms (294750) (0642) (1324) (90101) n = 88 R2 = 672 adjusted-R2 = 661 120590 = 59833 The predicted price at lotsize = 10000 sqrft = 2300 and bdrms = 4 is about $336714

bull (ii) The regression is pricei on (lotsizei ndash 10000) (sqrfti ndash 2300) and (bdrmsi ndash 4) We want the intercept estimate and the associated 95 CI from this regression The CI is approximately 3367067 plusmn 14665 or about $322042 to $351372 when rounded to the nearest dollar

bull (iii) We must use equation (636) to obtain the standard error of and then use equation (637) (assuming that price is normally distributed) But from the regression in part (ii) se(1199100) asymp 73745 and 120590 asymp 59833 Therefore se(0) asymp [(73745)2 + (59833) 2]12 asymp 602858 Using 199 as the approximate 975th percentile in the t84 distribution gives the 95 CI for price0 at the given values of the explanatory variables as 3367067 plusmn 199(602858) or rounded to the nearest dollar $216738 to $456675 This is a fairly wide prediction interval But we have not used many factors to explain housing price If we had more we could presumably reduce the error standard deviation and therefore 120590 to obtain a tighter prediction interval

17

Week 8 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is a qualitative factor Give some examples

This should be easy bull How do you use dummy variables to represent qualitative factors

A factor with two levels (or values) is easily represented one dummy variable For a factor with more than two levels (or values) we need more than one dummies For example a factors with 5 values requires 4 dummy variables

bull How do you use dummy variables to represent an ordinal variable An ordinal variable may be represented by dummies in the same way as a multi-levelled factor

bull How do you test for differences in regression functions across different groups We generally use the F test on properly-defined restricted model and unrestricted model Chow test is a special case of the F test where the null hypothesis is no difference across groups

bull What can you achieve by interacting group dummy variables with other regressors The interaction allows the slope coefficients to be different across groups

bull What is ldquoprogram evaluationrdquo It assesses the effectiveness of a program by checking the differences in regression coefficients for the controlled (or base) group and the treatment group

bull What are the interpretations of the PRF and SRF when the dependent variable is binary The PRF is the conditional probability of ldquosuccessrdquo for given explanatory variables The SRF is the predicted conditional probability of ldquosuccessrdquo for given explanatory variables

bull What are the shortcomings of the LPM The major drawback of the LPM is that the predicted probability of success may be outside the interval [01]

bull Does MLR5 hold for the LPM No as indicated in the textbook and lecture the conditional variance of the disturbance is not a constant and depends on explanatory variables var(u|x) = E(y|x)[1minus E(y|x)]

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 74 bull (i) The approximate difference is just the coefficient on utility times 100 or ndash283 The t

statistic is minus283099 asymp minus286 which is statistically significant bull (ii) 100sdot[exp(minus283) ndash 1) asymp minus247 and so the estimate is somewhat smaller in magnitude bull (iii) The proportionate difference is 181 minus 158 = 023 or about 23 One equation that can

be estimated to obtain the standard error of this difference is log(salary) = β0 + β1log(sales) + β2roe + δ1consprod + δ2utility +δ3trans + u where trans is a dummy variable for the transportation industry Now the base group is finance and so the coefficient δ1 directly measures the difference between the consumer products and finance industries and we can use the t statistic on consprod

Q2 Wooldridge 76 bull In Section 33 ndash in particular in the discussion surrounding Table 32 ndash we discussed how to

determine the direction of bias in the OLS estimators when an important variable (ability in this case) has been omitted from the regression As we discussed there Table 32 only

18

strictly holds with a single explanatory variable included in the regression but we often ignore the presence of other independent variables and use this table as a rough guide (Or we can use the results of Problem 310 for a more precise analysis) If less able workers are more likely to receive training then train and u are negatively correlated If we ignore the presence of educ and exper or at least assume that train and u are negatively correlated after netting out educ and exper then we can use Table 32 the OLS estimator of β1 (with ability in the error term) has a downward bias Because we think β1 ge 0 we are less likely to conclude that the training program was effective Intuitively this makes sense if those chosen for training had not received training they would have lowers wages on average than the control group

Q3 Wooldridge 79 bull (i) Plugging in u = 0 and d = 1 gives f1(z) = (β0+δ0) + (β1+δ1)z bull (ii) Setting f0(z) = f1(z) gives β0 + β1z = (β0+δ0) + (β1+δ1)z Therefore provided δ1 ne 0

we have z = minusδ0δ1 Clearly z is positive if and only if δ0δ1 is negative which means δ0 and δ1 must have opposite signs

bull (iii) Using part (ii) we have totcoll = 357030 = 119 years (totcoll= total college years) bull (iv) The estimated years of college where women catch up to men is much too high to be

practically relevant While the estimated coefficient on femalesdottotcoll shows that the gap is reduced at higher levels of college it is never closed ndash not even close In fact at four years of college the difference in predicted log wage is still minus357+030(4) = 237 or about 211 less for women

Q4 Wooldridge C713 (apple_c7_13do) bull (i) 412660 asymp 624 bull (ii) The OLS estimates of the LPM are

119890119888119900119899119887119906119910 = 424 minus 803 ecoprc + 719 regprc + 00055 faminc + 024 hhsize (165) (109) (132) (00053) (013) + 025 educ minus 00050 age (008) (00125) n = 660 R2 = 110 If ecoprc increases by say 10 cents (10) then the probability of buying eco-labeled apples falls by about 080 If regprc increases by 10 cents the probability of buying eco-labeled apples increases by about 072 (Of course we are assuming that the probabilities are not close to the boundaries of zero and one respectively)

bull (iii) The F test with 4 and 653 df is 443 with p-value = 0015 Thus based on the usual F test the four non-price variables are jointly very significant Of the four variables educ appears to have the most important effect For example a difference of four years of education implies an increase of 025(4) = 10 in the estimated probability of buying eco-labeled apples This suggests that more highly educated people are more open to buying produce that is environmentally friendly which is perhaps expected Household size (hhsize) also has an effect Comparing a couple with two children to one that has no children ndash other factors equal ndash the couple with two children has a 048 higher probability of buying eco-labeled apples

bull (iv) The model with log(faminc) fits the data slightly better the R-squared increases to about 112 (We would not expect a large increase in R-squared from a simple change in the functional form) The coefficient on log(faminc) is about 045 (t = 155) If log(faminc) increases by 10 which means roughly a 10 increase in faminc then P(ecobuy = 1) is estimated to increase by about 0045 a pretty small effect

19

bull (v) The fitted probabilities range from about 185 to 1051 so none are negative There are two fitted probabilities above 1 which is not a source of concern with 660 observations

bull (vi) Using the standard prediction rule ndash predict one when 119890119888119900119899119887119906119910 ge 05 and zero otherwise ndash gives the fraction correctly predicted for ecobuy = 0 as 102248 asymp 411 so about 411 For ecobuy = 1 the fraction correctly predicted is 340412 asymp 825 or 825 With the usual prediction rule the model does a much better job predicting the decision to buy eco-labeled apples (The overall percent correctly predicted is about 67)

20

Week 9 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is heteroskedasticity in a regression model

When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i) we say that heteroskedasticity is present

bull In the presence of heteroskedasticity are the t-stat and F-stat from the usual OLS still valid Why Are there any other problems with the OLS under heteroskedasticity The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators

bull What are the heteroskedasticity-robust standard errors How do you use them in STATA These are the corrected standard errors that take into account the possible presence of heteroskedasticity The t-stat and F-stat computed using the heteroskedasticity-robust standard errors are valid test statistics In STATA these are easily obtained by using the option ldquorobustrdquo with the ldquoregressrdquo command eg regress lwage educ robust

bull How do you detect if there is heteroskedasticity The Breusch-Pagan test or the White test can be used to detect the presence of heteroskedasticity See Section 83 for details

bull If heteroskedasticity is present in a known form how would you estimate the model In this case the WLS estimators should be used which is more efficient than the OLS estimators See Section 84 for details

bull If heteroskedasticity is present in an unknown form how would you estimate the model If there is strong evidence for heteroskedasticity the FGLS estimators should be used which is based on an exponential functional form See Section 84 for details

bull What are the steps in the FGLS estimation You should summarise from Section 84

bull How would you handle the heteroskedasticity of the LPM The heteroskedasticity functional form is known for the LPM Hence the WLS can be used in principle However because the LPM can produce predicted probabilities that are outside the interval (01) the known functional form p(x)[1-p(x)] may not be useful for the WLS In that case it may be necessary to use FGLS for the LPM

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 81 bull Parts (ii) and (iii) The homoskedasticity assumption played no role in Chapter 5 in showing

that OLS is consistent But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid even in large samples As heteroskedasticity is a violation of the Gauss-Markov assumptions OLS is no longer BLUE

Q2 Wooldridge 82 bull With Var(u|incpriceeducfemale) = σ2inc2 h(x) = inc2 where h(x) is the heteroskedasticity

function defined in equation (821) Therefore radich(x) = inc and so the transformed equation is obtained by dividing the original equation by inc beerinc = β0(1inc) + β1 + β2priceinc + β3educinc + β4femaleinc + uinc

21

Notice that β1 which is the slope on inc in the original model is now a constant in the transformed equation This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation

Q3 Wooldridge 83 bull False The unbiasedness of WLS and OLS hinges crucially on Assumption MLR4 and as we

know from Chapter 4 this assumption is often violated when an important variable is omitted When MLR4 does not hold both WLS and OLS are biased Without specific information on how the omitted variable is correlated with the included explanatory variables it is not possible to determine which estimator has a small bias It is possible that WLS would have more bias than OLS or less bias Because we cannot know we should not claim to use WLS in order to solve ldquobiasesrdquo associated with OLS

Q4 Wooldridge 85 bull (i) No For each coefficient the usual standard errors and the heteroskedasticity-robust

ones are practically very similar bull (ii) The effect is minus029(4) = minus116 so the probability of smoking falls by about 116 bull (iii) As usual we compute the turning point in the quadratic 020[2(00026)] asymp 3846 so

about 38 and one-half years bull (iv) Holding other factors in the equation fixed a person in a state with restaurant smoking

restrictions has a 101 lower chance of smoking This is similar to the effect of having four more years of education

bull (v) We just plug the values of the independent variables into the OLS regression line 119904119898119900119896119890119904 = 656 minus 069log(6744)+012log(6500) minus029(16)+020(77) minus00026(77) asymp 0052 Thus the estimated probability of smoking for this person is close to zero (In fact this person is not a smoker so the equation predicts well for this particular observation)

Q5 Wooldridge C810 (401ksubs_c8_10do) bull (i) In the following equation estimated by OLS the usual standard errors are in (sdot) and the

heteroskedasticity-robust standard errors are in [sdot] 119890401119896 = minus506 + 0124 inc minus 000062 inc2 + 0265 age minus 00031 age2 minus 0035 male (081) (0006) (000005) (0039) (00005) (0121) [079] [0006] [000005] [0038] [00004] [0121] n = 9275 R2 = 094 There are no important differences if anything the robust standard errors are smaller

bull (ii) This is a general claim Since Var(y|x) = p(x)[1-p(x)] we can write E(u2|x) = p(x)-p(x)2 Written in error form u2 = p(x) - p(x)2 + v In other words we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v with the restrictions δ0 = 0 δ1 = 1 and δ2 = -1 Remember that for the LPM the fitted values 119910 are estimates of p(x) So when we run the regression 1199062 on 119910 and 1199102 (including an intercept) the intercept estimates should be close to zero the coefficient on 119910 should be close to one and the coefficient on 1199102 should be close to ndash1

bull (iii) The White LM statistic and F statistic about 5819 and 31032 respectively both of which are very significant The coefficient on 119890401119896 is about 1010 the coefficient on 119890401119896 2 about minus970 and the intercept is about -009 These estimates are quite close to what we expect to find from the theory in part (ii)

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 17: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

17

Week 8 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is a qualitative factor Give some examples

This should be easy bull How do you use dummy variables to represent qualitative factors

A factor with two levels (or values) is easily represented one dummy variable For a factor with more than two levels (or values) we need more than one dummies For example a factors with 5 values requires 4 dummy variables

bull How do you use dummy variables to represent an ordinal variable An ordinal variable may be represented by dummies in the same way as a multi-levelled factor

bull How do you test for differences in regression functions across different groups We generally use the F test on properly-defined restricted model and unrestricted model Chow test is a special case of the F test where the null hypothesis is no difference across groups

bull What can you achieve by interacting group dummy variables with other regressors The interaction allows the slope coefficients to be different across groups

bull What is ldquoprogram evaluationrdquo It assesses the effectiveness of a program by checking the differences in regression coefficients for the controlled (or base) group and the treatment group

bull What are the interpretations of the PRF and SRF when the dependent variable is binary The PRF is the conditional probability of ldquosuccessrdquo for given explanatory variables The SRF is the predicted conditional probability of ldquosuccessrdquo for given explanatory variables

bull What are the shortcomings of the LPM The major drawback of the LPM is that the predicted probability of success may be outside the interval [01]

bull Does MLR5 hold for the LPM No as indicated in the textbook and lecture the conditional variance of the disturbance is not a constant and depends on explanatory variables var(u|x) = E(y|x)[1minus E(y|x)]

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 74 bull (i) The approximate difference is just the coefficient on utility times 100 or ndash283 The t

statistic is minus283099 asymp minus286 which is statistically significant bull (ii) 100sdot[exp(minus283) ndash 1) asymp minus247 and so the estimate is somewhat smaller in magnitude bull (iii) The proportionate difference is 181 minus 158 = 023 or about 23 One equation that can

be estimated to obtain the standard error of this difference is log(salary) = β0 + β1log(sales) + β2roe + δ1consprod + δ2utility +δ3trans + u where trans is a dummy variable for the transportation industry Now the base group is finance and so the coefficient δ1 directly measures the difference between the consumer products and finance industries and we can use the t statistic on consprod

Q2 Wooldridge 76 bull In Section 33 ndash in particular in the discussion surrounding Table 32 ndash we discussed how to

determine the direction of bias in the OLS estimators when an important variable (ability in this case) has been omitted from the regression As we discussed there Table 32 only

18

strictly holds with a single explanatory variable included in the regression but we often ignore the presence of other independent variables and use this table as a rough guide (Or we can use the results of Problem 310 for a more precise analysis) If less able workers are more likely to receive training then train and u are negatively correlated If we ignore the presence of educ and exper or at least assume that train and u are negatively correlated after netting out educ and exper then we can use Table 32 the OLS estimator of β1 (with ability in the error term) has a downward bias Because we think β1 ge 0 we are less likely to conclude that the training program was effective Intuitively this makes sense if those chosen for training had not received training they would have lowers wages on average than the control group

Q3 Wooldridge 79 bull (i) Plugging in u = 0 and d = 1 gives f1(z) = (β0+δ0) + (β1+δ1)z bull (ii) Setting f0(z) = f1(z) gives β0 + β1z = (β0+δ0) + (β1+δ1)z Therefore provided δ1 ne 0

we have z = minusδ0δ1 Clearly z is positive if and only if δ0δ1 is negative which means δ0 and δ1 must have opposite signs

bull (iii) Using part (ii) we have totcoll = 357030 = 119 years (totcoll= total college years) bull (iv) The estimated years of college where women catch up to men is much too high to be

practically relevant While the estimated coefficient on femalesdottotcoll shows that the gap is reduced at higher levels of college it is never closed ndash not even close In fact at four years of college the difference in predicted log wage is still minus357+030(4) = 237 or about 211 less for women

Q4 Wooldridge C713 (apple_c7_13do) bull (i) 412660 asymp 624 bull (ii) The OLS estimates of the LPM are

119890119888119900119899119887119906119910 = 424 minus 803 ecoprc + 719 regprc + 00055 faminc + 024 hhsize (165) (109) (132) (00053) (013) + 025 educ minus 00050 age (008) (00125) n = 660 R2 = 110 If ecoprc increases by say 10 cents (10) then the probability of buying eco-labeled apples falls by about 080 If regprc increases by 10 cents the probability of buying eco-labeled apples increases by about 072 (Of course we are assuming that the probabilities are not close to the boundaries of zero and one respectively)

bull (iii) The F test with 4 and 653 df is 443 with p-value = 0015 Thus based on the usual F test the four non-price variables are jointly very significant Of the four variables educ appears to have the most important effect For example a difference of four years of education implies an increase of 025(4) = 10 in the estimated probability of buying eco-labeled apples This suggests that more highly educated people are more open to buying produce that is environmentally friendly which is perhaps expected Household size (hhsize) also has an effect Comparing a couple with two children to one that has no children ndash other factors equal ndash the couple with two children has a 048 higher probability of buying eco-labeled apples

bull (iv) The model with log(faminc) fits the data slightly better the R-squared increases to about 112 (We would not expect a large increase in R-squared from a simple change in the functional form) The coefficient on log(faminc) is about 045 (t = 155) If log(faminc) increases by 10 which means roughly a 10 increase in faminc then P(ecobuy = 1) is estimated to increase by about 0045 a pretty small effect

19

bull (v) The fitted probabilities range from about 185 to 1051 so none are negative There are two fitted probabilities above 1 which is not a source of concern with 660 observations

bull (vi) Using the standard prediction rule ndash predict one when 119890119888119900119899119887119906119910 ge 05 and zero otherwise ndash gives the fraction correctly predicted for ecobuy = 0 as 102248 asymp 411 so about 411 For ecobuy = 1 the fraction correctly predicted is 340412 asymp 825 or 825 With the usual prediction rule the model does a much better job predicting the decision to buy eco-labeled apples (The overall percent correctly predicted is about 67)

20

Week 9 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is heteroskedasticity in a regression model

When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i) we say that heteroskedasticity is present

bull In the presence of heteroskedasticity are the t-stat and F-stat from the usual OLS still valid Why Are there any other problems with the OLS under heteroskedasticity The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators

bull What are the heteroskedasticity-robust standard errors How do you use them in STATA These are the corrected standard errors that take into account the possible presence of heteroskedasticity The t-stat and F-stat computed using the heteroskedasticity-robust standard errors are valid test statistics In STATA these are easily obtained by using the option ldquorobustrdquo with the ldquoregressrdquo command eg regress lwage educ robust

bull How do you detect if there is heteroskedasticity The Breusch-Pagan test or the White test can be used to detect the presence of heteroskedasticity See Section 83 for details

bull If heteroskedasticity is present in a known form how would you estimate the model In this case the WLS estimators should be used which is more efficient than the OLS estimators See Section 84 for details

bull If heteroskedasticity is present in an unknown form how would you estimate the model If there is strong evidence for heteroskedasticity the FGLS estimators should be used which is based on an exponential functional form See Section 84 for details

bull What are the steps in the FGLS estimation You should summarise from Section 84

bull How would you handle the heteroskedasticity of the LPM The heteroskedasticity functional form is known for the LPM Hence the WLS can be used in principle However because the LPM can produce predicted probabilities that are outside the interval (01) the known functional form p(x)[1-p(x)] may not be useful for the WLS In that case it may be necessary to use FGLS for the LPM

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 81 bull Parts (ii) and (iii) The homoskedasticity assumption played no role in Chapter 5 in showing

that OLS is consistent But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid even in large samples As heteroskedasticity is a violation of the Gauss-Markov assumptions OLS is no longer BLUE

Q2 Wooldridge 82 bull With Var(u|incpriceeducfemale) = σ2inc2 h(x) = inc2 where h(x) is the heteroskedasticity

function defined in equation (821) Therefore radich(x) = inc and so the transformed equation is obtained by dividing the original equation by inc beerinc = β0(1inc) + β1 + β2priceinc + β3educinc + β4femaleinc + uinc

21

Notice that β1 which is the slope on inc in the original model is now a constant in the transformed equation This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation

Q3 Wooldridge 83 bull False The unbiasedness of WLS and OLS hinges crucially on Assumption MLR4 and as we

know from Chapter 4 this assumption is often violated when an important variable is omitted When MLR4 does not hold both WLS and OLS are biased Without specific information on how the omitted variable is correlated with the included explanatory variables it is not possible to determine which estimator has a small bias It is possible that WLS would have more bias than OLS or less bias Because we cannot know we should not claim to use WLS in order to solve ldquobiasesrdquo associated with OLS

Q4 Wooldridge 85 bull (i) No For each coefficient the usual standard errors and the heteroskedasticity-robust

ones are practically very similar bull (ii) The effect is minus029(4) = minus116 so the probability of smoking falls by about 116 bull (iii) As usual we compute the turning point in the quadratic 020[2(00026)] asymp 3846 so

about 38 and one-half years bull (iv) Holding other factors in the equation fixed a person in a state with restaurant smoking

restrictions has a 101 lower chance of smoking This is similar to the effect of having four more years of education

bull (v) We just plug the values of the independent variables into the OLS regression line 119904119898119900119896119890119904 = 656 minus 069log(6744)+012log(6500) minus029(16)+020(77) minus00026(77) asymp 0052 Thus the estimated probability of smoking for this person is close to zero (In fact this person is not a smoker so the equation predicts well for this particular observation)

Q5 Wooldridge C810 (401ksubs_c8_10do) bull (i) In the following equation estimated by OLS the usual standard errors are in (sdot) and the

heteroskedasticity-robust standard errors are in [sdot] 119890401119896 = minus506 + 0124 inc minus 000062 inc2 + 0265 age minus 00031 age2 minus 0035 male (081) (0006) (000005) (0039) (00005) (0121) [079] [0006] [000005] [0038] [00004] [0121] n = 9275 R2 = 094 There are no important differences if anything the robust standard errors are smaller

bull (ii) This is a general claim Since Var(y|x) = p(x)[1-p(x)] we can write E(u2|x) = p(x)-p(x)2 Written in error form u2 = p(x) - p(x)2 + v In other words we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v with the restrictions δ0 = 0 δ1 = 1 and δ2 = -1 Remember that for the LPM the fitted values 119910 are estimates of p(x) So when we run the regression 1199062 on 119910 and 1199102 (including an intercept) the intercept estimates should be close to zero the coefficient on 119910 should be close to one and the coefficient on 1199102 should be close to ndash1

bull (iii) The White LM statistic and F statistic about 5819 and 31032 respectively both of which are very significant The coefficient on 119890401119896 is about 1010 the coefficient on 119890401119896 2 about minus970 and the intercept is about -009 These estimates are quite close to what we expect to find from the theory in part (ii)

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 18: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

18

strictly holds with a single explanatory variable included in the regression but we often ignore the presence of other independent variables and use this table as a rough guide (Or we can use the results of Problem 310 for a more precise analysis) If less able workers are more likely to receive training then train and u are negatively correlated If we ignore the presence of educ and exper or at least assume that train and u are negatively correlated after netting out educ and exper then we can use Table 32 the OLS estimator of β1 (with ability in the error term) has a downward bias Because we think β1 ge 0 we are less likely to conclude that the training program was effective Intuitively this makes sense if those chosen for training had not received training they would have lowers wages on average than the control group

Q3 Wooldridge 79 bull (i) Plugging in u = 0 and d = 1 gives f1(z) = (β0+δ0) + (β1+δ1)z bull (ii) Setting f0(z) = f1(z) gives β0 + β1z = (β0+δ0) + (β1+δ1)z Therefore provided δ1 ne 0

we have z = minusδ0δ1 Clearly z is positive if and only if δ0δ1 is negative which means δ0 and δ1 must have opposite signs

bull (iii) Using part (ii) we have totcoll = 357030 = 119 years (totcoll= total college years) bull (iv) The estimated years of college where women catch up to men is much too high to be

practically relevant While the estimated coefficient on femalesdottotcoll shows that the gap is reduced at higher levels of college it is never closed ndash not even close In fact at four years of college the difference in predicted log wage is still minus357+030(4) = 237 or about 211 less for women

Q4 Wooldridge C713 (apple_c7_13do) bull (i) 412660 asymp 624 bull (ii) The OLS estimates of the LPM are

119890119888119900119899119887119906119910 = 424 minus 803 ecoprc + 719 regprc + 00055 faminc + 024 hhsize (165) (109) (132) (00053) (013) + 025 educ minus 00050 age (008) (00125) n = 660 R2 = 110 If ecoprc increases by say 10 cents (10) then the probability of buying eco-labeled apples falls by about 080 If regprc increases by 10 cents the probability of buying eco-labeled apples increases by about 072 (Of course we are assuming that the probabilities are not close to the boundaries of zero and one respectively)

bull (iii) The F test with 4 and 653 df is 443 with p-value = 0015 Thus based on the usual F test the four non-price variables are jointly very significant Of the four variables educ appears to have the most important effect For example a difference of four years of education implies an increase of 025(4) = 10 in the estimated probability of buying eco-labeled apples This suggests that more highly educated people are more open to buying produce that is environmentally friendly which is perhaps expected Household size (hhsize) also has an effect Comparing a couple with two children to one that has no children ndash other factors equal ndash the couple with two children has a 048 higher probability of buying eco-labeled apples

bull (iv) The model with log(faminc) fits the data slightly better the R-squared increases to about 112 (We would not expect a large increase in R-squared from a simple change in the functional form) The coefficient on log(faminc) is about 045 (t = 155) If log(faminc) increases by 10 which means roughly a 10 increase in faminc then P(ecobuy = 1) is estimated to increase by about 0045 a pretty small effect

19

bull (v) The fitted probabilities range from about 185 to 1051 so none are negative There are two fitted probabilities above 1 which is not a source of concern with 660 observations

bull (vi) Using the standard prediction rule ndash predict one when 119890119888119900119899119887119906119910 ge 05 and zero otherwise ndash gives the fraction correctly predicted for ecobuy = 0 as 102248 asymp 411 so about 411 For ecobuy = 1 the fraction correctly predicted is 340412 asymp 825 or 825 With the usual prediction rule the model does a much better job predicting the decision to buy eco-labeled apples (The overall percent correctly predicted is about 67)

20

Week 9 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is heteroskedasticity in a regression model

When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i) we say that heteroskedasticity is present

bull In the presence of heteroskedasticity are the t-stat and F-stat from the usual OLS still valid Why Are there any other problems with the OLS under heteroskedasticity The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators

bull What are the heteroskedasticity-robust standard errors How do you use them in STATA These are the corrected standard errors that take into account the possible presence of heteroskedasticity The t-stat and F-stat computed using the heteroskedasticity-robust standard errors are valid test statistics In STATA these are easily obtained by using the option ldquorobustrdquo with the ldquoregressrdquo command eg regress lwage educ robust

bull How do you detect if there is heteroskedasticity The Breusch-Pagan test or the White test can be used to detect the presence of heteroskedasticity See Section 83 for details

bull If heteroskedasticity is present in a known form how would you estimate the model In this case the WLS estimators should be used which is more efficient than the OLS estimators See Section 84 for details

bull If heteroskedasticity is present in an unknown form how would you estimate the model If there is strong evidence for heteroskedasticity the FGLS estimators should be used which is based on an exponential functional form See Section 84 for details

bull What are the steps in the FGLS estimation You should summarise from Section 84

bull How would you handle the heteroskedasticity of the LPM The heteroskedasticity functional form is known for the LPM Hence the WLS can be used in principle However because the LPM can produce predicted probabilities that are outside the interval (01) the known functional form p(x)[1-p(x)] may not be useful for the WLS In that case it may be necessary to use FGLS for the LPM

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 81 bull Parts (ii) and (iii) The homoskedasticity assumption played no role in Chapter 5 in showing

that OLS is consistent But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid even in large samples As heteroskedasticity is a violation of the Gauss-Markov assumptions OLS is no longer BLUE

Q2 Wooldridge 82 bull With Var(u|incpriceeducfemale) = σ2inc2 h(x) = inc2 where h(x) is the heteroskedasticity

function defined in equation (821) Therefore radich(x) = inc and so the transformed equation is obtained by dividing the original equation by inc beerinc = β0(1inc) + β1 + β2priceinc + β3educinc + β4femaleinc + uinc

21

Notice that β1 which is the slope on inc in the original model is now a constant in the transformed equation This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation

Q3 Wooldridge 83 bull False The unbiasedness of WLS and OLS hinges crucially on Assumption MLR4 and as we

know from Chapter 4 this assumption is often violated when an important variable is omitted When MLR4 does not hold both WLS and OLS are biased Without specific information on how the omitted variable is correlated with the included explanatory variables it is not possible to determine which estimator has a small bias It is possible that WLS would have more bias than OLS or less bias Because we cannot know we should not claim to use WLS in order to solve ldquobiasesrdquo associated with OLS

Q4 Wooldridge 85 bull (i) No For each coefficient the usual standard errors and the heteroskedasticity-robust

ones are practically very similar bull (ii) The effect is minus029(4) = minus116 so the probability of smoking falls by about 116 bull (iii) As usual we compute the turning point in the quadratic 020[2(00026)] asymp 3846 so

about 38 and one-half years bull (iv) Holding other factors in the equation fixed a person in a state with restaurant smoking

restrictions has a 101 lower chance of smoking This is similar to the effect of having four more years of education

bull (v) We just plug the values of the independent variables into the OLS regression line 119904119898119900119896119890119904 = 656 minus 069log(6744)+012log(6500) minus029(16)+020(77) minus00026(77) asymp 0052 Thus the estimated probability of smoking for this person is close to zero (In fact this person is not a smoker so the equation predicts well for this particular observation)

Q5 Wooldridge C810 (401ksubs_c8_10do) bull (i) In the following equation estimated by OLS the usual standard errors are in (sdot) and the

heteroskedasticity-robust standard errors are in [sdot] 119890401119896 = minus506 + 0124 inc minus 000062 inc2 + 0265 age minus 00031 age2 minus 0035 male (081) (0006) (000005) (0039) (00005) (0121) [079] [0006] [000005] [0038] [00004] [0121] n = 9275 R2 = 094 There are no important differences if anything the robust standard errors are smaller

bull (ii) This is a general claim Since Var(y|x) = p(x)[1-p(x)] we can write E(u2|x) = p(x)-p(x)2 Written in error form u2 = p(x) - p(x)2 + v In other words we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v with the restrictions δ0 = 0 δ1 = 1 and δ2 = -1 Remember that for the LPM the fitted values 119910 are estimates of p(x) So when we run the regression 1199062 on 119910 and 1199102 (including an intercept) the intercept estimates should be close to zero the coefficient on 119910 should be close to one and the coefficient on 1199102 should be close to ndash1

bull (iii) The White LM statistic and F statistic about 5819 and 31032 respectively both of which are very significant The coefficient on 119890401119896 is about 1010 the coefficient on 119890401119896 2 about minus970 and the intercept is about -009 These estimates are quite close to what we expect to find from the theory in part (ii)

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 19: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

19

bull (v) The fitted probabilities range from about 185 to 1051 so none are negative There are two fitted probabilities above 1 which is not a source of concern with 660 observations

bull (vi) Using the standard prediction rule ndash predict one when 119890119888119900119899119887119906119910 ge 05 and zero otherwise ndash gives the fraction correctly predicted for ecobuy = 0 as 102248 asymp 411 so about 411 For ecobuy = 1 the fraction correctly predicted is 340412 asymp 825 or 825 With the usual prediction rule the model does a much better job predicting the decision to buy eco-labeled apples (The overall percent correctly predicted is about 67)

20

Week 9 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is heteroskedasticity in a regression model

When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i) we say that heteroskedasticity is present

bull In the presence of heteroskedasticity are the t-stat and F-stat from the usual OLS still valid Why Are there any other problems with the OLS under heteroskedasticity The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators

bull What are the heteroskedasticity-robust standard errors How do you use them in STATA These are the corrected standard errors that take into account the possible presence of heteroskedasticity The t-stat and F-stat computed using the heteroskedasticity-robust standard errors are valid test statistics In STATA these are easily obtained by using the option ldquorobustrdquo with the ldquoregressrdquo command eg regress lwage educ robust

bull How do you detect if there is heteroskedasticity The Breusch-Pagan test or the White test can be used to detect the presence of heteroskedasticity See Section 83 for details

bull If heteroskedasticity is present in a known form how would you estimate the model In this case the WLS estimators should be used which is more efficient than the OLS estimators See Section 84 for details

bull If heteroskedasticity is present in an unknown form how would you estimate the model If there is strong evidence for heteroskedasticity the FGLS estimators should be used which is based on an exponential functional form See Section 84 for details

bull What are the steps in the FGLS estimation You should summarise from Section 84

bull How would you handle the heteroskedasticity of the LPM The heteroskedasticity functional form is known for the LPM Hence the WLS can be used in principle However because the LPM can produce predicted probabilities that are outside the interval (01) the known functional form p(x)[1-p(x)] may not be useful for the WLS In that case it may be necessary to use FGLS for the LPM

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 81 bull Parts (ii) and (iii) The homoskedasticity assumption played no role in Chapter 5 in showing

that OLS is consistent But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid even in large samples As heteroskedasticity is a violation of the Gauss-Markov assumptions OLS is no longer BLUE

Q2 Wooldridge 82 bull With Var(u|incpriceeducfemale) = σ2inc2 h(x) = inc2 where h(x) is the heteroskedasticity

function defined in equation (821) Therefore radich(x) = inc and so the transformed equation is obtained by dividing the original equation by inc beerinc = β0(1inc) + β1 + β2priceinc + β3educinc + β4femaleinc + uinc

21

Notice that β1 which is the slope on inc in the original model is now a constant in the transformed equation This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation

Q3 Wooldridge 83 bull False The unbiasedness of WLS and OLS hinges crucially on Assumption MLR4 and as we

know from Chapter 4 this assumption is often violated when an important variable is omitted When MLR4 does not hold both WLS and OLS are biased Without specific information on how the omitted variable is correlated with the included explanatory variables it is not possible to determine which estimator has a small bias It is possible that WLS would have more bias than OLS or less bias Because we cannot know we should not claim to use WLS in order to solve ldquobiasesrdquo associated with OLS

Q4 Wooldridge 85 bull (i) No For each coefficient the usual standard errors and the heteroskedasticity-robust

ones are practically very similar bull (ii) The effect is minus029(4) = minus116 so the probability of smoking falls by about 116 bull (iii) As usual we compute the turning point in the quadratic 020[2(00026)] asymp 3846 so

about 38 and one-half years bull (iv) Holding other factors in the equation fixed a person in a state with restaurant smoking

restrictions has a 101 lower chance of smoking This is similar to the effect of having four more years of education

bull (v) We just plug the values of the independent variables into the OLS regression line 119904119898119900119896119890119904 = 656 minus 069log(6744)+012log(6500) minus029(16)+020(77) minus00026(77) asymp 0052 Thus the estimated probability of smoking for this person is close to zero (In fact this person is not a smoker so the equation predicts well for this particular observation)

Q5 Wooldridge C810 (401ksubs_c8_10do) bull (i) In the following equation estimated by OLS the usual standard errors are in (sdot) and the

heteroskedasticity-robust standard errors are in [sdot] 119890401119896 = minus506 + 0124 inc minus 000062 inc2 + 0265 age minus 00031 age2 minus 0035 male (081) (0006) (000005) (0039) (00005) (0121) [079] [0006] [000005] [0038] [00004] [0121] n = 9275 R2 = 094 There are no important differences if anything the robust standard errors are smaller

bull (ii) This is a general claim Since Var(y|x) = p(x)[1-p(x)] we can write E(u2|x) = p(x)-p(x)2 Written in error form u2 = p(x) - p(x)2 + v In other words we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v with the restrictions δ0 = 0 δ1 = 1 and δ2 = -1 Remember that for the LPM the fitted values 119910 are estimates of p(x) So when we run the regression 1199062 on 119910 and 1199102 (including an intercept) the intercept estimates should be close to zero the coefficient on 119910 should be close to one and the coefficient on 1199102 should be close to ndash1

bull (iii) The White LM statistic and F statistic about 5819 and 31032 respectively both of which are very significant The coefficient on 119890401119896 is about 1010 the coefficient on 119890401119896 2 about minus970 and the intercept is about -009 These estimates are quite close to what we expect to find from the theory in part (ii)

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 20: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

20

Week 9 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What is heteroskedasticity in a regression model

When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i) we say that heteroskedasticity is present

bull In the presence of heteroskedasticity are the t-stat and F-stat from the usual OLS still valid Why Are there any other problems with the OLS under heteroskedasticity The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators

bull What are the heteroskedasticity-robust standard errors How do you use them in STATA These are the corrected standard errors that take into account the possible presence of heteroskedasticity The t-stat and F-stat computed using the heteroskedasticity-robust standard errors are valid test statistics In STATA these are easily obtained by using the option ldquorobustrdquo with the ldquoregressrdquo command eg regress lwage educ robust

bull How do you detect if there is heteroskedasticity The Breusch-Pagan test or the White test can be used to detect the presence of heteroskedasticity See Section 83 for details

bull If heteroskedasticity is present in a known form how would you estimate the model In this case the WLS estimators should be used which is more efficient than the OLS estimators See Section 84 for details

bull If heteroskedasticity is present in an unknown form how would you estimate the model If there is strong evidence for heteroskedasticity the FGLS estimators should be used which is based on an exponential functional form See Section 84 for details

bull What are the steps in the FGLS estimation You should summarise from Section 84

bull How would you handle the heteroskedasticity of the LPM The heteroskedasticity functional form is known for the LPM Hence the WLS can be used in principle However because the LPM can produce predicted probabilities that are outside the interval (01) the known functional form p(x)[1-p(x)] may not be useful for the WLS In that case it may be necessary to use FGLS for the LPM

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 81 bull Parts (ii) and (iii) The homoskedasticity assumption played no role in Chapter 5 in showing

that OLS is consistent But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid even in large samples As heteroskedasticity is a violation of the Gauss-Markov assumptions OLS is no longer BLUE

Q2 Wooldridge 82 bull With Var(u|incpriceeducfemale) = σ2inc2 h(x) = inc2 where h(x) is the heteroskedasticity

function defined in equation (821) Therefore radich(x) = inc and so the transformed equation is obtained by dividing the original equation by inc beerinc = β0(1inc) + β1 + β2priceinc + β3educinc + β4femaleinc + uinc

21

Notice that β1 which is the slope on inc in the original model is now a constant in the transformed equation This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation

Q3 Wooldridge 83 bull False The unbiasedness of WLS and OLS hinges crucially on Assumption MLR4 and as we

know from Chapter 4 this assumption is often violated when an important variable is omitted When MLR4 does not hold both WLS and OLS are biased Without specific information on how the omitted variable is correlated with the included explanatory variables it is not possible to determine which estimator has a small bias It is possible that WLS would have more bias than OLS or less bias Because we cannot know we should not claim to use WLS in order to solve ldquobiasesrdquo associated with OLS

Q4 Wooldridge 85 bull (i) No For each coefficient the usual standard errors and the heteroskedasticity-robust

ones are practically very similar bull (ii) The effect is minus029(4) = minus116 so the probability of smoking falls by about 116 bull (iii) As usual we compute the turning point in the quadratic 020[2(00026)] asymp 3846 so

about 38 and one-half years bull (iv) Holding other factors in the equation fixed a person in a state with restaurant smoking

restrictions has a 101 lower chance of smoking This is similar to the effect of having four more years of education

bull (v) We just plug the values of the independent variables into the OLS regression line 119904119898119900119896119890119904 = 656 minus 069log(6744)+012log(6500) minus029(16)+020(77) minus00026(77) asymp 0052 Thus the estimated probability of smoking for this person is close to zero (In fact this person is not a smoker so the equation predicts well for this particular observation)

Q5 Wooldridge C810 (401ksubs_c8_10do) bull (i) In the following equation estimated by OLS the usual standard errors are in (sdot) and the

heteroskedasticity-robust standard errors are in [sdot] 119890401119896 = minus506 + 0124 inc minus 000062 inc2 + 0265 age minus 00031 age2 minus 0035 male (081) (0006) (000005) (0039) (00005) (0121) [079] [0006] [000005] [0038] [00004] [0121] n = 9275 R2 = 094 There are no important differences if anything the robust standard errors are smaller

bull (ii) This is a general claim Since Var(y|x) = p(x)[1-p(x)] we can write E(u2|x) = p(x)-p(x)2 Written in error form u2 = p(x) - p(x)2 + v In other words we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v with the restrictions δ0 = 0 δ1 = 1 and δ2 = -1 Remember that for the LPM the fitted values 119910 are estimates of p(x) So when we run the regression 1199062 on 119910 and 1199102 (including an intercept) the intercept estimates should be close to zero the coefficient on 119910 should be close to one and the coefficient on 1199102 should be close to ndash1

bull (iii) The White LM statistic and F statistic about 5819 and 31032 respectively both of which are very significant The coefficient on 119890401119896 is about 1010 the coefficient on 119890401119896 2 about minus970 and the intercept is about -009 These estimates are quite close to what we expect to find from the theory in part (ii)

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 21: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

21

Notice that β1 which is the slope on inc in the original model is now a constant in the transformed equation This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation

Q3 Wooldridge 83 bull False The unbiasedness of WLS and OLS hinges crucially on Assumption MLR4 and as we

know from Chapter 4 this assumption is often violated when an important variable is omitted When MLR4 does not hold both WLS and OLS are biased Without specific information on how the omitted variable is correlated with the included explanatory variables it is not possible to determine which estimator has a small bias It is possible that WLS would have more bias than OLS or less bias Because we cannot know we should not claim to use WLS in order to solve ldquobiasesrdquo associated with OLS

Q4 Wooldridge 85 bull (i) No For each coefficient the usual standard errors and the heteroskedasticity-robust

ones are practically very similar bull (ii) The effect is minus029(4) = minus116 so the probability of smoking falls by about 116 bull (iii) As usual we compute the turning point in the quadratic 020[2(00026)] asymp 3846 so

about 38 and one-half years bull (iv) Holding other factors in the equation fixed a person in a state with restaurant smoking

restrictions has a 101 lower chance of smoking This is similar to the effect of having four more years of education

bull (v) We just plug the values of the independent variables into the OLS regression line 119904119898119900119896119890119904 = 656 minus 069log(6744)+012log(6500) minus029(16)+020(77) minus00026(77) asymp 0052 Thus the estimated probability of smoking for this person is close to zero (In fact this person is not a smoker so the equation predicts well for this particular observation)

Q5 Wooldridge C810 (401ksubs_c8_10do) bull (i) In the following equation estimated by OLS the usual standard errors are in (sdot) and the

heteroskedasticity-robust standard errors are in [sdot] 119890401119896 = minus506 + 0124 inc minus 000062 inc2 + 0265 age minus 00031 age2 minus 0035 male (081) (0006) (000005) (0039) (00005) (0121) [079] [0006] [000005] [0038] [00004] [0121] n = 9275 R2 = 094 There are no important differences if anything the robust standard errors are smaller

bull (ii) This is a general claim Since Var(y|x) = p(x)[1-p(x)] we can write E(u2|x) = p(x)-p(x)2 Written in error form u2 = p(x) - p(x)2 + v In other words we can write this as a regression model u2 = δ0 + δ1p(x) + δ2p(x)2 + v with the restrictions δ0 = 0 δ1 = 1 and δ2 = -1 Remember that for the LPM the fitted values 119910 are estimates of p(x) So when we run the regression 1199062 on 119910 and 1199102 (including an intercept) the intercept estimates should be close to zero the coefficient on 119910 should be close to one and the coefficient on 1199102 should be close to ndash1

bull (iii) The White LM statistic and F statistic about 5819 and 31032 respectively both of which are very significant The coefficient on 119890401119896 is about 1010 the coefficient on 119890401119896 2 about minus970 and the intercept is about -009 These estimates are quite close to what we expect to find from the theory in part (ii)

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 22: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

22

bull (iv) The smallest fitted value is about 030 and the largest is about 697 The WLS estimates of the LPM are 119890401119896 = minus488 + 0126 inc minus 000062 inc2 + 0255 age minus 00030 age2 minus 0055 male (076) (0005) (000004) (0037) (00004) (0117) n = 9275 R2 = 108 There are no important differences with the OLS estimates The largest relative change is in the coefficient on male but this variable is very insignificant using either estimation method

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 23: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

23

Week 10 Tutorial Exercises

Readings bull Read Chapter 9 thoroughly bull Make sure that you know the meanings of the Key Terms at the chapter end

Review Questions (these may or may not be discussed in tutorial classes) bull What is functional form misspecification What is its major consequence in regression

analysis As the word ldquomisspecificationrdquo suggests it is a mis-specified model with a wrong functional form for explanatory variables For example a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped In linear regression models the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables Hence the consequences of omitting important variables apply and the major one is estimation bias

bull How would you test for functional form misspecification The RESET may be used to test for functional form misspecification more specifically for neglected nonlinearities (nonlinear function of regressors) The RESET is based on the F-test for the joint significance of the squared and cubed predicted value (y-hat) in a ldquoexpandedrdquo model that includes all the original regressors as well as the squared and cubed y-hat where the latter represent possible nonlinear functions of the regressors

bull What are nested models And nonnested models Two models are nested if one is a restricted version of the other (by restricting the values of parameters) Two models are nonnested if neither nests the other

bull What is the purpose of testing one model against another The purpose of testing one model against another is to select a ldquobestrdquo or ldquopreferredrdquo model for statistical inference and prediction

bull How would you test for two nonnested models There are two strategies The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be ldquoexcludedrdquo The second is to test the significance of the predicted value of one model in the other model (known as Davidson-MacKinnon test)

bull What is a proxy variable What are the conditions for a proxy variable to be valid in regression analysis A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable There are two conditions for the validity of a proxy The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy) The second is that the conditional mean of the unobserved given other explanatory variables and the proxy only depends on the proxy

bull Can you analyse the consequences of measurement errors With a careful reading of Section 94 you should be able to analyse

bull In what circumstances missing observations will cause major concerns in regression analysis Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample Nonrandom samples may cause bias in the OLS estimation

bull What is exogenous sample selection What is endogenous sample selection

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 24: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

24

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables It is easily seen that such sampling scheme does not violate MLR134 and hence does not introduce bias in the OLS estimation On the other hand the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation

bull What are outliers and influential observations Outliers are unusual observations that are far away from the centre of other observations A small number of outliers can have large impact on the OLS estimates of parameters These are called influential observations if the estimation results when including them are significantly different from the results when excluding them

bull Consider the simple regression model yi = α + (β + bi)xi + ui for i = 1 2hellip n where the ldquoslop parameterrdquo contains a random variable bi α and β are constant parameters Assume that the usual MLR1-5 hold for yi = α + βxi + ui E(bi|xi) = 0 and Var(bi|xi) = ω2 If we regress y on x with a intercept will the estimator of the ldquoslope parameterrdquo be biased from β Will the usual OLS standard errors be valid for statistical inference As the model can be writted as yi = α + β xi + (bixi + ui) where the overall disturbance is (bixi + ui) Given the stated assumptions MLR1-4 hold for this model and the OLS estimator of the ldquoslope parameterrdquo is unbiased (from β) However MLR5 does not hold since the conditional variance of the overall disturbance var(bixi + ui) will be dependent on xi and the usual OLS standard errors will not be valid

Problem Set (these will be discussed in tutorial classes)

Q1 Wooldridge 91 bull There is functional form misspecification if β6 ne 0 or β7 ne 0 where these are the population

parameters on ceoten2 and comten2 respectively Therefore we test the joint significance of these variables using the R-squared form of the F test F = [(375 minus 353)2][(1 minus 375)(177 ndash 8)] = 297 With 2 and infin df the 10 critical value is 230 while the 5 critical value is 300 Thus the p-value is slightly above 05 which is reasonable evidence of functional form misspecification (Of course whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter)

Q2 Wooldridge 93 bull (i) Eligibility for the federally funded school lunch program is very tightly linked to being

economically disadvantaged Therefore the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty

bull (ii) We can use our usual reasoning on omitting important variables from a regression equation The variables log(expend) and lnchprg are negatively correlated school districts with poorer children spend on average less on schools Further β3 lt 0 From Table 32 omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of β1 [ignoring the presence of log(enroll) in the model] So when we control for the poverty rate the effect of spending falls

bull (iii) Once we control for lnchprg the coefficient on log(enroll) becomes negative and has a t of about ndash217 which is significant at the 5 level against a two-sided alternative The coefficient implies that Δmath10 asymp minus(126100)(Δenroll) = minus0126(Δenroll) Therefore a 10 increase in enrollment leads to a drop in math10 of 126 percentage points

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 25: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

25

bull (iv) Both math10 and lnchprg are percentages Therefore a ten percentage point increase in lnchprg leads to about a 323 percentage point fall in math10 a sizeable effect

bull (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test less than 3 In column (2) we are explaining almost 19 (which still leaves much variation unexplained) Clearly most of the variation in math10 is explained by variation in lnchprg This is a common finding in studies of school performance family income (or related factors such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics

Q3 Wooldridge 95 bull The sample selection in this case is arguably endogenous Because prospective students may

look at campus crime as one factor in deciding where to attend college colleges with high crime rates have an incentive not to report crime statistics If this is the case then the chance of appearing in the sample is negatively related to u in the crime equation (For a given school size higher u means more crime and therefore a smaller probability that the school reports its crime figures)

Q4 Wooldridge C93 (jtrain_c9_3do) bull (i) If the grants were awarded to firms based on firm or worker characteristics grant could

easily be correlated with such factors that affect productivity In the simple regression model these are contained in u

bull (ii) The simple regression estimates using the 1988 data are log(scrap) = 409 + 057 grant (241) (406) n = 54 R2 = 0004 The coefficient on grant is actually positive but not statistically different from zero

bull (iii) When we add log(scrap87) to the equation we obtain log(scrap88) = 021 minus 254 grant88 + 831 log(scrap87) (089) (147) (044) n = 54 R2 = 873 where the years are indicated as suffixes to the variables for clarity The t statistic for H0 βgrant = 0 is minus254147 asymp minus173 We use the 5 critical value for 40 df in Table G2 -168 Because t = minus173 lt minus168 we reject H0 in favor of H1 βgrant lt 0 at the 5 level

bull (iv) The t statistic is (831 ndash 1)044 asymp minus384 which is a strong rejection of H0 bull (v) With the heteroskedasticity-robust standard error the t statistic for grant88 is

minus254142 asymp minus179 so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error The t statistic for H0 βlog(scrap87) = 1 is (831 ndash 1)071 asymp minus238 which is notably smaller than before but it is still pretty significant

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 26: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

26

Week 11 Tutorial Exercises

Review Questions (these may or may not be discussed in tutorial classes) bull What are the main features of time series data How do time series data differ from cross-

sectional data The main features include the following First time series data have a temporal ordering which matters in regression analysis because of the second feature Second many economic time series are serially correlated (or autocorrelated) meaning that future observation are dependent on present and past observations Third many economic time series contain trends and seasonality In comparison to cross-sectional data the major difference is that time series data are not ldquorandom samplerdquo Recall that for a random sample observations are required to be independent of one another

bull What is a stochastic process and its realisation A stochastic process (SP) is a random variable that depends on the time index At any fixed point in time the SP is a random variable (hence has a distribution) The SP can be viewed as a ldquorandom curverdquo with the horizontal axis being the time index where the outcome of the underlying experiment is a ldquocurverdquo (or a time series plot) The time series we observe is viewed as a realisation (or outcome) of the ldquorandom curverdquo or the SP

bull What is serial correlation or autocorrelation The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time The serial (or auto) correlation of yt is usually denoted as Corr(yt yt-h) = Cov(yt yt-h)[Var(yt)Var(yt-h)]12

bull What is a finite distributed lag model What is the long-run propensity (LRP) How would you estimate the LRP and the associated standard error (say in STATA) Read ie_Slides10 pages 5-7 Read Section 102 Try Wooldridge 103

bull What are TS1-6 (assumptions about time series regression) How do they differ from the assumptions in MLR1-6 TS1-6 include 1) linear in parameter 2) no perfect collinearity 3) strict zero conditional mean 4) homoskedasticity 5) no serial correlation 6) normality These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data To ensure the unbiasedness of the OLS estimators TS3 is needed To ensure the validity of OLS standard errors t-stat and F-stat for statistical inference TS3-TS6 are needed These are very strong assumptions and can be relaxed when sample sizes are large

bull What are ldquostrictly exogenousrdquo regressors and ldquocontemporaneously exogenousrdquo regressors Strictly exogenous regressors satisfy the assumption TS3 whereas the contemporaneously exogenous regressors only satisfy the condition (1010) (or (z10) of the Slides) that is weaker than TS3

bull What is a trending time series What is a time trend A time series is trending if it has the tendency of growing (or shrinking) over time A time trend is a function of time index (eg linear quadratic etc) A time trend can be used to mimic (or model) the trending component of a time series

bull Why may a regression with trending time series produce ldquospuriousrdquo results First two time trends are always ldquocorrelatedrdquo because they grow together in the same (or opposite) direction Now consider two unrelated time series each of which contains a time trend Since the time trends in the two series are ldquocorrelatedrdquo regressing one on the other will produce a significant slope coefficient Such ldquostatistical significancerdquo is spurious because

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 27: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

27

it is purely induced by the time trends and has little to say about the true relationship between the two time series

bull Why would you include a time trend in regressions with trending variables Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series

bull What is seasonality in a time series Give an example of time series variable with seasonality Seasonality in a time series is the fluctuation that repeats itself every year (or every week or every day) An example would be the daily revenue a pub which is likely to have week-day seasonality

bull For quarterly data how would you define seasonal dummy variables for a regression model We need to define three dummies For instance if we take the first quarter as the base we define dummy variables Q2 Q3 and Q4 for the second third and fourth quarters and include them in the regression model

Problem Set

Q1 Wooldridge 101 bull (i) Disagree Most time series processes are correlated over time and many of them strongly

correlated This means they cannot be independent across observations which simply represent different time periods Even series that do appear to be roughly uncorrelated ndash such as stock returns ndash do not appear to be independently distributed as they have dynamic forms of heteroskedasticity

bull (ii) Agree This follows immediately from Theorem 101 In particular we do not need the homoskedasticity and no serial correlation assumptions

bull (iii) Disagree Trending variables are used all the time as dependent variables in a regression model We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables Including a trend in the regression is a good idea with trending dependent or independent variables As discussed in Section 105 the usual R-squared can be misleading when the dependent variable is trending

bull (iv) Agree With annual data each time period represents a year and is not associated with any season

Q2 Wooldridge 102 bull We follow the hint and write

gGDPt-1 = α0 + δ0 intt-1 + δ1intt-2 + ut-1 and plug this into the right-hand-side of the intt equation int t = γ0 + γ1( α0 + δ0int t-1 + δ1int t-2 + ut-1 ndash 3) + vt = (γ0 + γ1 α0 ndash 3 γ1) + γ1 δ0int t-1 + γ1 δ1int t-2 + γ1ut-1 + vt Now by assumption u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation except itself of course So Cov(intt ut-1) = E(int t sdot ut-1) = γ1E(ut-12) gt 0 because γ1 gt 0 If σ2 = E(ut2 ) for all t then Cov(intt ut-1) = γ1 σ2 This violates the strict exogeneity assumption TS3 While u t is uncorrelated with intt intt-1 and so on ut is correlated with intt+1

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model

Page 28: UNSW -Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics

28

Q3 Wooldridge 107 bull (i) pet-1 and pet-2 must be increasing by the same amount as pet bull (ii) The long-run effect by definition should be the change in gfr when pe increases

permanently But a permanent increase means the level of pe increases and stays at the new level and this is achieved by increasing pet-1 pet-1 and pet by the same amount

Q4 Wooldridge C1010 (intdef_c10_10do) bull (i) The sample correlation between inf and def is only about 098 which is pretty small

Perhaps surprisingly inflation and the deficit rate are practically uncorrelated over this period Of course this is a good thing for estimating the effects of each variable on i3 as it implies almost no multicollinearity

bull (ii) The equation with the lags is i3t = 161 + 343 inf t + 382 inf t-1 minus 190 def t + 569 def t-1 (040) (125) (134) (221) (197) n = 55 R2 = 685 adjR2 = 660

bull (iii) The estimated LRP of i3 with respect to inf is 343 + 382 = 725 which is somewhat larger than 606 which we obtain from the static model in (1015) But the estimates are fairly close considering the size and significance of the coefficient on inft-1

bull (iv) The F statistic for significance of inf t-1 and def t-1 is about 522 with p-value asymp 009 So they are jointly significant at the 1 level It seems that both lags belong in the model