Econometrics I, Testing - Stanford Universitydoubleh/eco270/testing.pdf · 2016. 12. 13. · Simple...

Econometrics I, Testing

Department of EconomicsStanford University

September, 2008

Part I

Hypothesis Testing

• Null hypothesis H0. Alternative hypothesis HA.

• Sample x = (x1, x2, . . . , xn) ∈ Rn.

• Rejection region: R ∈ Rn such that H0 is rejected if x ∈ R.

• Simple hypothesis; composite hypothesis.

• Simple H0 vs simple HA;

• Simple H0 vs Composite HA;

• Composite H0 vs Composite HA.

Principle of testing (of finding R):

• Find a scalar statistic t (x) such that under the null H0:

t (x)p−→ 0

while under the alternative HA:

t (x)p−→ c > 0.

• Derive (asymptotic or exact) distribution of t (x) under H0:

Under H0: P (ant (x) ≤ x)A∼ F (x)

where an →∞ when n→∞

• Reject H0 if ant (x) is larger than αth critical value of F (x).

• Under HA: ant (x)→∞.

Trinity of tests

• Likelihood ratio test.

• Need to estimate model under both H0 and HA.

• Wald test.

• No need to estimate under H0.

• Score function test.

• Only need to estimate under H0.

Properties of testings

• Type I error: error of rejecting H0 when it is true.

• Size: probability of type I error (under H0 by definition).

P(ant (x) ≥ F−1 (1− α)

)≈ α when H0 is true.

• Type II error: error of accepting H0 when HA is true.

• Power: probability of rejecting H0 when HA is true.

• Therefore, power = 1 - P(type II error) = 1 - β.

• Power of asymptotic test is usually 1:

P(ant (x) ≥ F−1 (1− α)

)→ 1 when HA is true.

• Consistent test. Local asymptotic power.

Simple null versus simple alternative

• Definition 9.2.2: Let (α1, β1) and (α2, β2) be thecharacteristics of two tests. The first test is better (morepowerful) than the second test if α1 ≤ α2 and β1 ≤ β2 with astrict inequality holding for at least one of the ≤.

• Definition 9.2.4: R is the most powerful test of size α ifα (R) = α and for any test R1 of size α, β (R) ≤ β (R1). (Itmay not be unique.)

• Definition 9.5.2: R is the most powerful test of level α ifα (R) ≤ α and for any test R1 of level α (that is, such thatα (R1) ≤ α, β (R) ≤ β (R1).

• “level α” is needed with discrete sample when α (R) 6= α forall R. Not needed with randomized test.

• Randomized test: toss a coin before deciding which of R1 andR2 to use. Can achieve any desired α.

Likelihood Ratio Test

• simple H0 versus simple H1: H0 : θ = θ0, H1 : θ = θ1, reject if

L (x |θ1)

L (x |θ0)> c or R = x :

L (x |H1)

L (x |H0)> c

• Simple H0 : θ = θ0, composite H1, for example, H1 : θ > θ0,

LR =supθ∈Θ0∪Θ1

L (x |θ)

L (x |θ0)=

L(X |θMLE

)L (X |θ0)

≥ 1.

H0 ∪ H1 = θ : θ ≥ θ0 parameter space, over which wecalculate MLE.

• Composite H0, composite H1. Reject if

LR =supθ∈Θ0∪Θ1

L (x |θ)

supθ∈Θ0L (x |θ) > c

use H0 ∪H1 for convenience because this is just unconstrainedMLE.

Neyman-pearson lemma

• In simple H0 versus simple H1 tests, the LR test is the mostpowerful test (of a given size),

• Bayes testing: loss matrix,

State of NatureDecision H0 H1

H0 0 γ2

H1 γ1 0

• Bayesian expected loss

φ (R) = γ1π (H0)︸︷︷︸η0

P (x ∈ R|H0) + γ2π (H1)︸︷︷︸η1

P (x ∈ Rc |H1)

≡η0α (R) + η1β (R)

=η0

∫x∈R

f (x|H0) dx + η1

∫x∈Rc

f (x|H1) dx.

• Optimal R that minimizes φ (R):

R0 =x : η0f (x|H0) ≤ η1f (x|H1) = x :L (x|H1)

L (x|H0)>η0

η1

=x : log L (x|H1)− log L (x|H0) > log

(η0

η1

).

• Likelihood ratio test minimizes Bayes risk.

• is also most powerful.

• φ (R) is a linear combination of size and type II error.

• No R1 s.t. α (R1) = α (R0) and β (R1) < β (R0).

• Otherwise φ (R1) < φ (R0).

• Frequentist: pick η0/η1 for the desired size.

• Example. H0 : µ = µ0, HA : µ = µ1. Assume µ1 > µ0.

• Xt , t = 1, . . . i.i.d. N(µ, σ2

). σ2 known.

• Log likelihood ratio:

− 1

2σ2

∑(xt − µ1)2 +

1

2σ2

∑(xt − µ0)2

=n (µ1 − µ0)

σ2x + c

• Best rejection region: x > d such that

P (x > d |H0) = α.

• Note that d does not depend on the value of µ1, as long as

µ1 > µ0

• Under the H0:

P (x > d |H0) =P

(√n (x − µ0)

σ>

√n (d − µ0)

σ

)=P

(Z >

√n (d − µ0)

σ

)= 1− Φ

(√n (d − µ0)

σ

)= α

Solve for d ,

d = Φ−1 (1− α)σ√n

+ µ0

• Simple Null vs Composite Alternative.

• H0 : θ = θ0 against H1 : θ ∈ Θ1.

• Power function for a test R

Q (θ) = P (x ∈ R|θ) .

• Q (θ0) = α. Q (θ1) = 1− β for θ1 ∈ Θ1.

• Definition 9.4.2. R1 is uniformly better than R2 ifQ1 (θ0) = Q2 (θ0), Q1 (θ) ≥ Q2 (θ) ∀θ ∈ Θ1, andQ1 (θ1) > Q2 (θ1) for at least one θ1 ∈ Θ1.

• Definition 9.4.3. A test R is uniformly most powerful (UMP)if it is uniformly weakly better than any other test with thesame size (the same Q (θ0)).

• UMP tests often do not exist.

• Likelihood ratio test usually is UMP if UMP exists.

• Likelihood ratio test often used even without UMP.

• LR test: reject if

log L (θ0)− supθ∈θ0∪Θ1

log L (θ) < c .

• The second part is just the maximum likelihood estimator:

log L(θ)

= supθ∈θ0∪Θ1

log L (θ) .

• An admissible test is a test R0 such that there is no other testR with the same size, where PR (R|θ) ≥ PR (R0|θ) for all θ,and PR (R|θ) > PR (R0|θ) for some θ.

• Example. H0 : µ = µ0, HA : µ > µ0.

• Xt ∼ N(µ, σ2

) with σ2 known.

• log likelihood ratio statistics:

LR =− 1

2σ2

∑(xt − µ0)2 − sup

µ≥µ0

− 1

2σ2

∑(xt − µ)2

=− 1

2σ2n (x − µ0)2 + inf

µ≥µ0

1

2σ2n (x − µ)2

.

• If x ≤ µ0, then infµ≥µ01

2σ2 n (x − µ)2 = 12σ2 n (x − µ0)2.

LR = 0.

• Do not reject.

• If x > µ0, inf = 0, LR = − 12σ2 n (x − µ0)2, reject if x > d .

• This test is UMP.

• Same test for simple H0 : µ = µ0 versus H1 : µ = µ1 for anyµ1 > µ0, same d as before. Reject if

x > d = Φ−1 (1− α)σ√n

+ µ0.

• Power function, for µ > µ0,

Power (µ) =P (x > d |µ) = P

(√n (x − µ0)

σ≥ Φ−1 (1− α) |µ

)=P

(√n (x − µ)

σ≥ Φ−1 (1− α) +

√n (µ0 − µ)

σ|µ)

=P

(Z ≥ Φ−1 (1− α) +

√n (µ0 − µ)

σ

)=1− Φ

(Φ−1 (1− α) +

√n (µ0 − µ)

σ

)Power (µ)→ 1 as n→∞, or µ→∞.

• Example: H0 : µ = µ0, HA : µ 6= µ0.

• Xt ∼ N(µ, σ2

) with σ2 known.

• log likelihood ratio statistics:

LR =− 1

2σ2

∑(xt − µ0)2 − sup

µ 6=µ0

− 1

2σ2

∑(xt − µ)2

=− 1

2σ2n (x − µ0)2 + inf

µ 6=µ0

1

2σ2n (x − µ)2

=− 1

2σ2n (x − µ0)2

.

• Reject if |x − µ0| > d where

P (|x − µ0| > d |H0) = α.

• Not a UMP test. Not Neyman-Pearson against, e.g. anyµ > µ0. But, UMP among tests that assign equal power to µequidistant from µ0.

• Composite H0 vs Composite H1: UMP test may exist.

• Example: X ∼ N (µ, 1), H0 : µ ≤ 0, H1 : µ > 0.

• Recall that α (R) = supµ∈H0P (X ∈ R|µ).

• The convention is not to use the ”size function”, but insteaduse the ”least favorable null distribution”.

• What if you reject if X > Φ−1 (1− α). Then

α (R) = supµ≤o

P (X > Z1−α|µ)

= supµ≤0

P (X − µ > Z1−α − µ)

= supµ≤0

(1− Φ (Z1−α − µ)) = 1− Φ (Z1−α) = α

• By definition, any test of size α for composite H0 againstcomposite H1 will also have size at most α at the leastfavorable null µ = 0, and therefore the above test is UMP.Same as the one sided test of H0 : µ = 0 against H1 : µ > 0.

• Wald tests: H0 : θ = θ0. HA : θ 6= θ0.

• Typically√n(θ − θ

)d−→ N (0,Σ).

• Intend to reject if θ − θ0 is sufficient large.

• What metric to use to measure distance |θ − θ0|?

• Use quadratic norm:

ant (x) = n(θ − θ0

)′Σ−1

(θ − θ0

)d−→ χ2

dim(θ).

• If θ is MLE: Σ = H−1SH−1, and if the model is also correctlyspecified, Σ = −H−1 = S−1.

• Why use this particular weighting matrix? So that limit hasno nuisance parameters.

Theorem: Suppose X is a J-vector distributed as N (µ,Σ), then(X − µ)′Σ−1 (X − µ) ∼ χ2

J .Proof: Σ is symmetric and positive definite. So use an eigenvalue-function decomposition:

Σ = HΛH ′

Λ is a diagonal matrix with positive characteris roots of Σ on themain diagonal. H is orthonormal: HH ′ = I ⇒ H ′ = H−1. LetΣ−1/2 = HΛ−1/2H ′. Then

Σ−1/2Σ−1/2 =HΛ−1/2H ′HΛ−1/2H ′ = HΛ−1/2Λ−1/2H ′ = HΛ−1H ′,

Σ−1 =(HΛH ′

)−1= H

′−1Λ−1H−1 = HΛ−1H ′ = Σ−1/2Σ−1/2.

Σ−1/2ΣΣ−1/2 =HΛ−1/2H ′HΛH ′HΛ−1/2H ′ = HH ′ = I .

Therefore

Σ−1/2 (X − µ) ∼ N(

0,Σ−1/2ΣΣ−1/2)

= N (0, I )(Σ−1/2 (X − µ)

)′ (Σ−1/2 (X − µ)

)∼ χ2

J .

• Wald test for linear combinations:

• H0 : Aθ = Aθ0 = b, HA : Aθ 6= Aθ0 = b.

• If√n(θ − θ

)d−→ N (0,Σ), then under the H0:

√n(Aθ − b

)=√nA(θ − θ

)d−→ N (0,AΣA′)

• Quadratic norm based test statistic

n(Aθ − b

)′ (AΣA′

)−1 (Aθ − b

)=n(θ − θ0

)′A′(AΣA′

)−1

A(θ − θ0

)d−→ χ2

rows of A

• Not UMP.

• Wald test for nonlinear functions

√n(θ − θ0

)d−→ N (0,Σ)

H0 : g (θ0) = 0 H1 : g (θ0) 6= 0.

where g (θ) = (g1 (θ) , . . . , gJ (θ))′ is J × 1

• By the Delta method,

√n(g(θ)− g (θ0)

)d−→ N

(0,R (θ0) ΣR (θ0)′

)where the J × dθ matrix,

R (θ0) =∂g (θ0)

∂θ′=

∂g1(θ0)∂θ1

· · · ∂g1(θ0)∂θdθ

· · · · · · · · ·∂gJ(θ0)∂θ1

· · · ∂gJ(θ0)∂θdθ

• Define A = R (θ0) ΣR (θ0)′. Under H0 : g (θ0) = 0,

√ng(θ)

d−→ N (0,A)

• Let A = HΩH ′. Define A−1/2HΛ−1/2H ′.

√nA−1/2g

(θ)

d−→ N(

0,A−1/2AA−1/2)

= N (0, I )

by the continuous mapping theorem(√nA−1/2g

(θ))′ (√

nA−1/2g(θ))

d−→ χ2J

Therefore,

W = ng(θ)′

A−1g(θ)

d−→ χ2J

Reject if W > χ2J,α.

• Also need Ap−→ A: A = R

(θ)

ΣR(θ)

. Define

W = ng(θ)′

A−1g(θ)

d−→ χ2J

• Use a combination of Slutsky and CMT to show

Wd−→ χ2

J

• Asymptotic LR test: H0 : θ = θ0, HA : θ 6= θ0.

LR = log L (θ0)− log L(θMLE

).

• One can prove that under H0,

2LRd−→ −χ2

dim(θ).

• Intuitively,

2LR ≈√n(θMLE − θ0

)′ 1

n

∂2 log L (θ)

∂θ∂θ′√n(θMLE − θ0

).

• Not UMP.

• Not longer χ2 limit for multivariate inequality test.

• Note that∂ log L(θMLE)

∂θ = 0, use a second order Taylorexpansion,

2LR =(θ − θ0

)′ ∂2 log L (θ∗)

∂θ∂θ′

(θ − θ0

)=√n(θ − θ0

)′ 1

n

∂2 log L (θ∗)

∂θ∂θ′√n(θ − θ0

)• Note that

√n(θ − θ0

)d−→ N

(0,H−1SH−1

)= N

(0,−H−1

)The 2nd equality holds if the model is correct.

1

n

∂2 log L (θ∗)

∂θ∂θ′=

1

n

n∑i=1

∂2 log f (Xi |θ∗)∂θ∂θ′

As θp−→ θ0, θ∗

p−→ θ0,

1

n

n∑i=1

∂2 log f (Xi |θ∗)∂θ∂θ′

p−→ E∂2 log f (Xi |θ0)

∂θ∂θ′≡ H.

• Therefore by Slutsky and CMT, when the model is correct, forZ ∼ N

(0,−H−1

):

2LRd−→ −Z ′HZ = Z ′Σ−1/2Σ−1/2Z ∼ χ2

dim(θ)

• If the model is misspecified (but the parameter is stillconsistent), then for Z ∼ N

(0,H−1SH−1

),

2LRd−→ −Z ′HZ

This is some kind of ”weighted” chi-square, but no longer χ2.It can be simulated using consistent estimates H and S .

• It can be shown however, that as n→∞, under the H0:

2LR = 2[log L

(Xn|θ

)− log L

(Xn|θ

)] d−→ χ2df =dR

The degree of freedom is the number of restrictions.

• Even more generally, H0 : g (θ) = 0, H1 : g (θ) 6= 0,

θ = arg maxθ∈Θ

L (X |θ)

θ = arg maxθ∈Θ

L (X |θ) subject to g (θ) = 0.

It is still true that

2LR = 2[log L

(Xn|θ

)− log L

(Xn|θ

)] d−→ χ2df =dg

where dg is the number of constraints in g (θ) as long as the∂g(θ0)dθ has full row rank.

• Asymptotic power.

• Under HA, typically ant (x)−→∞ w.p. → 1.

• Under HA, reject w.p. → 1.

• Asymptotic power is 1.

• These are called consistent tests.

• Local alternatives and local power.

• Local alternative, HA : θ = θ0 + c√n

.

• Example: H0 : µ = µ0, HA : µ > µ0. Xt is i.i.d such that

• Xt ∼ N(µ, σ2

) with σ2 known.

• It can be shown that X is a sufficient statistic that containsall the information about µ

• Under the H0: X ∼ N(

0, σ2

n

).

• Reject if x > d : d = σzα√n

+ µ0 for size α test, since

P

(X√1/n

>d√1/n|H0

)= α.

• Asymptotic power: for fixed µ1 > µ0,

P (x > d |µ1) =P

(√nx − µ1

σ≥ zα +

√nµ0 − µ1

σ

)=1− Φ

(zα +

µ0 − µ1

σ

√n

)→ 1

P (x > d |µ1)→

1 when µ1 →∞ holding n fixed.1 when n→∞ holding µ > 0 fixed.

• Draw the shape of the power function when n increases.

• If n =∞, there should be no Type II error. Implications:

• Why fix α? Since Power = 1− Φ(Z1−α −

√nµ), if you really

believe in n→∞, you can choose α→ 0, Z1−α →∞ in sucha way that Z1−α −

√nµ→ −∞. Then both size→ 0 and

Power (µ, n)→ 1 as n→∞.

• Local power, for µ1 = µ0 + c√n

,

P (x > d |µ1) =P

(√nx − µ1

σ≥ zα −

c

σ

)→1− Φ

(zα −

c

σ

)> α.

• P-value: probability of rejection if the observed test statistic isused as the critical value.

• Test-statistic: T = T (X1, . . . ,Xn). Reject if T > c .

• Let F (·) be the CDF of T under the null H0.

Pvalue = 1− F (T )

• Reject if T > c, or if Pvalue < α.

P (Pvalue < x) = P (1− F (T ) < x) = P (F (T ) > 1− x)

=P (T > F−1 (1− x)) = 1− F(F−1 (1− x)

)= 1− (1− x) = x .

• Relation between Testing and Confidence Set

• Confidence Set: a set S such that

P (θ0 ∈ S)

= 1− α→ 1− α

• Any test of H0 : θ = θ0 versus H1 : θ 6= θ0 can be convertedinto a confidence set.

• Consider a test statistic T (X , θ0). Reject ifT (X , θ0) > Cθ0,α. Let

S = set of θ0 that can not be rejected by the above test..

Then

Pθ0 (θ0 ∈ S) = Pθ0 (T (X , θ0) ≤ Cθ0,α) = 1− α.

For example, in the Binary case,√n (p − p)√p (1− p)

d−→ N (0, 1)

√n (p − p)√p (1− p)

d−→ N (0, 1) .

If we test: H0 : p = p H1 : p 6= p. Reject if

T1 (p, p) =

∣∣∣∣√n (p − p)√p (1− p)

∣∣∣∣ > Z1−α/2 or

T2 (p, p) =

∣∣∣∣√n (p − p)√p (1− p)

∣∣∣∣ > Z1−α/2

This implies two confidence sets:

S1 =

p such that

∣∣∣∣√n (p − p)√p (1− p)

∣∣∣∣ < Z1−α/2

S1 =

p such that

∣∣∣∣√n (p − p)√p (1− p)

∣∣∣∣ < Z1−α/2

• Exact finite sample pivotal test.

• The T-statistic is finite sample pivotal if Xi is a knownlocation scale family.

• Suppose Xi ∼ 1σ f(Xi−µσ

). So that Xi = µ+ σZi , where

Zi ∼ f (·) and f (·) is known. Then

T =

√n(X − µ

)√1

n−1

∑ni=1

(Xi − X

)2=

√n(Z)√

1n−1

∑ni=1

(Zi − Z

)2

• This distribution can be simulated as long as you know f (·).It is pivotal since it is free of nuisance parameters.

• In general the T-statistic is asymptotically pivotal.

Td−→ N (0, 1) .

Econometrics I, Testing - Stanford Universitydoubleh/eco270/testing.pdf · 2016. 12. 13. · Simple...

Documents

Transcript of Econometrics I, Testing - Stanford Universitydoubleh/eco270/testing.pdf · 2016. 12. 13. · Simple...