Regression with One Regressor: Hypothesis Tests and Confidence Intervals

53
Regressor: Hypothesis Tests and Confidence Intervals

description

Regression with One Regressor: Hypothesis Tests and Confidence Intervals. Hypothesis Testing Confidence intervals Regression when X is Binary Heteroskedasticity and Homoskedasticity Weighted Least Squares Summary and Assessment. Hypothesis Testing and the Standard Error of. - PowerPoint PPT Presentation

Transcript of Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Page 1: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Regression with One Regressor:

Hypothesis Tests and ConfidenceIntervals

Page 2: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Hypothesis Testing Confidence intervals Regression when X is Binary Heteroskedasticity and Homoskedasticity Weighted Least Squares Summary and Assessment

Page 3: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Hypothesis Testing and the Standard Error of

Suppose a skeptic suggests that reducing the number of students in a class has no effect on learning or, specifically, test scores. The skeptic thus asserts the hypothesis,

We wish to test this hypothesis using data— reach a tentative conclusion whether it is correct or not.

Page 4: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Null hypothesis and two-sided alternative:

or, more generally,

Null hypothesis and one-sided alternative:

In economics, it is almost always possible to come up with stories in which an effect could “go either way,” so it is standard to focus on two-sided alternatives.

Page 5: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

In general,

where the S.E. of the estimator is the square root of an estimator of the variance of the estimator.

Applied to a hypothesis about :

where is the hypothesized value of .

Page 6: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Formula for SE( )

Recall the expression for the variance of (large n):

where Estimator of the variance of is

Page 7: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

where is the residual. There is no reason to memorize this. It is computed automatically by regression

software. is reported by regression software. It is less complicated than it seems. The numerator

estimates Var(ν), the denominator estimates Var(X).

Page 8: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

The calculation of the t-statistic:

Reject at 5% significance level if |t| > 1.96. p-value is =probability in tails

outside Both the previous statements are based on large-

n approximation; typically n = 50 is large enough for the approximation to be excellent.

Page 9: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Example: Test Scores and ST R, California data

Estimated regression line:

Regression software reports the standard errors:

t-statistic testing The 1% two-sided significance level is 2.58, so

we reject the null at 1% significance level. Alternatively, we can compute the p-value.

Page 10: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

The p-value based on the large-n standard normal approximation to the t-statistic is 0.00001.

Page 11: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Confidence intervals for

In general, if the sample distribution of an estimator is nomal for large n, then a 95% confidence interval can be constructed as estimator ±1.96 standard error, that is

Page 12: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Example: Test Scores and STR, California data

Estimated regression line:

Regression software reports the standard errors.

95% confidence interval for :

Equivalent statements: The 95% confidence interval does not include zero. The hypothesis = 0 is rejected at the 5% level.

Page 13: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

A concise (and conventional) way to report regressions:

Put standard errors in parentheses below the estimated coefficients to which they apply.

This expression gives a lot of information. The estimated regression line is

The standard error of is 10.4. The standard error of is 0.52 The R2 is .05; the standard error of the regression

is 18.6.

Page 14: Regression with One Regressor: Hypothesis Tests and Confidence Intervals
Page 15: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Regression when X is Binary

Sometimes a regressor is binary: X = 1 if female, = 0 if male X = 1 if treated (experimental drug), = 0 if not X = 1 if small class size, = 0 if notBinary regressors are sometimes called “dummy”

variables.So far, has been called a “slope,” but that

doesn’t make much sense if X is binary.How do we interpret regression with a binary

regressor?

Page 16: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Interpreting regressions with a binary regressor

where X is binary (Xi = 0 or 1).

that is:

so:

which is the population difference in group means.

Page 17: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Example: TestScore and STR, California data

Let

The OLS estimate of the regression line relating Test Score to D (with standard errors in parentheses) is:

Difference in means between groups = 7.4;

Page 18: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Compare the regression results with the group means,

computed directly:

Estimation: Test:

This is the same as in the regression.

Page 19: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Other Regression Statistics

A natural question is how well the regression line “fits” or explains the data. There are two regression statistics that provide complementary measures of the quality of fit:

The regression R2 measures the fraction of the variance of Y that is explained by X; it is unitless and ranges between zero (no fit) and one (perfect fit)

The standard error of the regression measures the fit – the typical size of a regression residual – in the units of Y.

The R2

Page 20: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Write Yi as the sum of the OLS prediction + OLS residual:

 Yi = +

 The R2 is the fraction of the sample variance of Yi

“explained” by the regression, that is, by : 

R2 = ,

where ESS = and TSS = .

iY ˆiu

iY

ESS

TSS

2

1

( )n

ii

Y Y

2

1

( )n

ii

Y Y

iY

Page 21: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Heteroskedasticity and Homoskedasticity

Heteroskedasticity, Homoskedasticity, and the Formula for the Standard Errors of and

What do these two terms mean? Consequences of homoskedasticity. Implication for computing standard errors.

Page 22: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

What do these two terms mean?If Var(u|X = x) is constant— that is, the variance of

the conditional distribution of u given X does not depend on X, then u is said to be homoskedasticity (變異數齊一 ).

Otherwise, u is said to be heteroskedastic (變異數不齊一 ).

Page 23: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Homoskedasticity in a picture

E(u|X = x) = 0, u satisfies Least Squares Assumption #1.

The variance of u does not depend on x.

Page 24: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Heteroskedasticity in a picture

E(u|X = x) = 0, u satisfies Least Squares Assumption #1.

The variance of u depends on x.

Page 25: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Heteroskedastic or homoskedastic?

Page 26: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Is heteroskedasticity present in the class size data?

Page 27: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Recall the three least squares assumptions: The conditional distribution of u given X has

mean zero, that is, E(u|X = x) = 0. (Xi , Yi ), i = 1, … , n, are i.i.d. Large outliers are rare.Heteroskedasticity and homoskedasticity concern

Var(u|X = x). Because we have not explicitly assumed homoskedastic errors, we have implicitly allowed for heteroskedasticity.

So far we have assumed that u is heteroskedastic

Page 28: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

What if the errors are in fact homoskedastic? You can prove that OLS has the lowest variance

among estimators that are linear in Y . A result called the Gauss-Markov theorem.

The formula for the variance of and the OLS standard error simplifies. If , then

Note: Var( ) is inversely proportional to Var(X). More spread in X means more information about .

Page 29: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

The nominator of Var( ) is

Page 30: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Gauss-Markov conditions:

Gauss-Markov Theorem:Under the Gauss-Markov conditions, the OLS

estimator is BLUE (Best Linear Unbiased Estimator). That is,

for all linear conditionally unbiased estimators .

Page 31: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

where

is a linear unbiased estimator.

Page 32: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Proof of the Gauss-Markov Theorem

is a linear unbiased estimator of .For to be conditionally unbiased,

we need

Page 33: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

With the above two conditions for

Let , then and

Page 34: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Therefore,

has a greater conditional variance than if di ≠ 0 for any i=1,… ,n.

If di = 0 for all i, then = , which proves that OLS is BLUE.

Page 35: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

General formula for the standard error of is the square root of

Special case under homoskedasticity is

Page 36: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

The homoskedasticity-only formula for the standard error of and the “heteroskedasticity-robust” formula (the formula that is valid under heteroskedasticity) differ. In general, you get different standard errors using the different formulas.

Homoskedasticity-only standard errors are the default setting in regression software - sometimes the only setting (e.g. Excel). To get the general “heteroskedasticity-robust” standard errors you must override the default.

If you don’t override the default and there is in fact heteroskedasticity, will get the wrong standard errors (and wrong t-statistics and confidence intervals).

Page 37: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

If you use the “, robust” option, the STATA computes heteroskedasticity-robust standard errors.

Otherwise, STATA computes homoskedasticity-only standard errors.

Page 38: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Weighted Least Squares

Since OLS under homoskedasticity is efficient, traditional approach is trying to transform a heteroskedastic model into a homoskedastic model.

Suppose the conditional variance of ui is a function of Xi

Page 39: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Then we can divide both sides of the single-variable regression model by to obtain

where

The WLS estimator is the OLS estimator obtained by regressing

on and .

Page 40: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

However, h(Xi ) is usually unknown, then we have to estimate h(Xi ) first to obtain . Then replace h(Xi ) with . This is called feasible WLS.

More importantly, the function form of h( ) is ‧usually unknown, then there is no way to systematically estimate h(Xi ). This is why, in practice, we usually only run OLS with robust standard error.

Page 41: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

The critical points: If the errors are homoskedastic and you use the

heteroskedastic formula for standard errors (the one we derived), you are OK.

If the errors are heteroskedastic and you use the homoskedasticity-only formula for standard errors, the standard errors are wrong.

The two formulas coincide (when n is large) in the special case of homoskedasticity.

The bottom line: you should always use the heteroskedasticity-based formulas- these are conventionally called the heteroskedasticity-robust standard errors.

Page 42: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

42

The Extended OLS Assumptions (II)

These consist of the three LS assumptions, plus two more: 1. E(u| X = x) = 0. 2. (Xi,Yi), i =1,…,n, are i.i.d. 3. Large outliers are rare (E(Y4) < , E(X4) < ). 4. u is homoskedastic 5. u is distributed N(0,2)

Assumptions 4 and 5 are more restrictive – so they apply to

fewer cases in practice. However, if you make these assumptions, then certain mathematical calculations simplify and you can prove strong results – results that hold if these additional assumptions are true.

Page 43: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

43

Efficiency of OLS, part II:

Under all five extended LS assumptions – including normally distributed errors – 1 has the smallest variance of all consistent estimators (linear or nonlinear functions of Y1,…,Yn), as n .

This is a pretty amazing result – it says that, if (in addition to LSA 1-3) the errors are homoskedastic and normally distributed, then OLS is a better choice than any other consistent estimator. And because an estimator that isn’t consistent is a poor choice, this says that OLS really is the best you can do – if all five extended LS assumptions hold. (The proof of this result is beyond the scope of this course and isn’t in SW – it is typically done in graduate courses.)

Page 44: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

44

Some not-so-good thing about OLSThe foregoing results are impressive, but these results – and the OLS estimator – have important limitations.

1. The GM theorem really isn’t that compelling:

The condition of homoskedasticity often doesn’t hold (homoskedasticity is special)

The result is only for linear estimators – only a small subset of estimators (more on this in a moment)

2. The strongest optimality result (“part II” above) requires

homoskedastic normal errors – not plausible in applications (think about the hourly earnings data!)

Page 45: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

45

Limitations of OLS, ctd. 3. OLS is more sensitive to outliers than some other estimators. In the

case of estimating the population mean, if there are big outliers, then the median is preferred to the mean because the median is less sensitive to outliers – it has a smaller variance than OLS when there are outliers. Similarly, in regression, OLS can be sensitive to outliers, and if there are big outliers other estimators can be more efficient (have a smaller variance). One such estimator is the least absolute deviations (LAD) estimator:

0 1, 0 11

min ( )n

b b i ii

Y b b X

In virtually all applied regression analysis, OLS is used – and that is what we will do in this course too.

Page 46: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

46

Inference if u is Homoskedastic and Normal: the Student t Distribution Recall the five extended LS assumptions:

1. E(u| X = x) = 0. 2. (Xi,Yi), i =1,…,n, are i.i.d.

3. Large outliers are rare (E(Y4) < , E(X4) < ). 4. u is homoskedastic

5. u is distributed N(0,2) If all five assumptions hold, then:

0 and 1 are normally distributed for all n (!)

the t-statistic has a Student t distribution with n – 2 degrees of freedom – this holds exactly for all n (!)

Page 47: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

47

Normality of the sampling distribution of 1 under 1–5:

1 – 1 = 1

2

1

( )

( )

n

i iin

ii

X X u

X X

= 1

1 n

i ii

wun , where wi =

2

1

( )

1( )

in

ii

X X

X Xn

.

What is the distribution of a weighted average of normals? Under assumptions 1 – 5:

1 – 1 ~ 2 22

1

10,

n

i ui

N wn

(*)

Substituting wi into (*) yields the homoskedasticity-only variance formula.

Page 48: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

48

In addition, under assumptions 1 – 5, under the null hypothesis the t statistic has a Student t distribution with n – 2 degrees of freedom

Why n – 2? because we estimated 2 parameters, 0 and 1

For n < 30, the t critical values can be a fair bit larger than the N(0,1) critical values

For n > 50 or so, the difference in tn–2 and N(0,1) distributions is negligible. Recall the Student t table:

degrees of freedom 5% t-distribution critical value 10 2.23 20 2.09 30 2.04 60 2.00 1.96

Page 49: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

49

Practical implication: If n < 50 and you really believe that, for your application, u is

homoskedastic and normally distributed, then use the tn–2 instead of the N(0,1) critical values for hypothesis tests and confidence intervals.

In most econometric applications, there is no reason to believe that u is homoskedastic and normal – usually, there is good reason to believe that neither assumption holds.

Fortunately, in modern applications, n > 50, so we can rely on the large-n results presented earlier, based on the CLT, to perform hypothesis tests and construct confidence intervals using the large-n normal approximation.

Page 50: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Summary and Assessment

The initial policy question:Suppose new teachers are hired so the student-teacher ratio falls by one student per class. What is the effect of this policy intervention (this “treatment”) on test scores?

Does our regression analysis give a convincing answer? Not really - districts with low ST R tend to be ones with lots of other resources and higher income families, which provide kids with more learning opportunities outside school. This suggests that

so

Page 51: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Digression on Causality

The original question (what is the quantitative effect of an intervention that reduces class size?) is a question about a causal effect: the effect on Y of applying a unit of the treatment is 1.

  But what is, precisely, a causal effect? The common-sense definition of causality isn’t

precise enough for our purposes. In this course, we define a causal effect as the

effect that is measured in an ideal randomized controlled experiment.

Page 52: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Ideal Randomized Controlled Experiment Ideal: subjects all follow the treatment protocol –

perfect compliance, no errors in reporting, etc.! Randomized: subjects from the population of

interest are randomly assigned to a treatment or control group (so there are no confounding factors)

Controlled: having a control group permits measuring the differential effect of the treatment

Experiment: the treatment is assigned as part of the experiment: the subjects have no choice, which means that there is no “reverse causality” in which subjects choose the treatment they think will work best.

Page 53: Regression with One Regressor: Hypothesis Tests and Confidence Intervals

Back to class size: What is an ideal randomized controlled experiment for

measuring the effect on Test Score of reducing STR? How does our regression analysis of observational

data differ from this ideal? The treatment is not randomly assigned In the US – in our observational data – districts with

higher family incomes are likely to have both smaller classes and higher test scores.

As a result it is plausible that E(ui|Xi=x) = 0. If so, Least Squares Assumption #1 does not hold. If so, is biased: does an omitted factor make class

size seem more important than it really is?