Econometrics I, Testing - Stanford Universitydoubleh/eco270/testing.pdf · 2016. 12. 13. · Simple...
Transcript of Econometrics I, Testing - Stanford Universitydoubleh/eco270/testing.pdf · 2016. 12. 13. · Simple...
Econometrics I, Testing
Department of EconomicsStanford University
September, 2008
Part I
Hypothesis Testing
• Null hypothesis H0. Alternative hypothesis HA.
• Sample x = (x1, x2, . . . , xn) ∈ Rn.
• Rejection region: R ∈ Rn such that H0 is rejected if x ∈ R.
• Simple hypothesis; composite hypothesis.
• Simple H0 vs simple HA;
• Simple H0 vs Composite HA;
• Composite H0 vs Composite HA.
Principle of testing (of finding R):
• Find a scalar statistic t (x) such that under the null H0:
t (x)p−→ 0
while under the alternative HA:
t (x)p−→ c > 0.
• Derive (asymptotic or exact) distribution of t (x) under H0:
Under H0: P (ant (x) ≤ x)A∼ F (x)
where an →∞ when n→∞
• Reject H0 if ant (x) is larger than αth critical value of F (x).
• Under HA: ant (x)→∞.
Trinity of tests
• Likelihood ratio test.
• Need to estimate model under both H0 and HA.
• Wald test.
• No need to estimate under H0.
• Score function test.
• Only need to estimate under H0.
Properties of testings
• Type I error: error of rejecting H0 when it is true.
• Size: probability of type I error (under H0 by definition).
P(ant (x) ≥ F−1 (1− α)
)≈ α when H0 is true.
• Type II error: error of accepting H0 when HA is true.
• Power: probability of rejecting H0 when HA is true.
• Therefore, power = 1 - P(type II error) = 1 - β.
• Power of asymptotic test is usually 1:
P(ant (x) ≥ F−1 (1− α)
)→ 1 when HA is true.
• Consistent test. Local asymptotic power.
Simple null versus simple alternative
• Definition 9.2.2: Let (α1, β1) and (α2, β2) be thecharacteristics of two tests. The first test is better (morepowerful) than the second test if α1 ≤ α2 and β1 ≤ β2 with astrict inequality holding for at least one of the ≤.
• Definition 9.2.4: R is the most powerful test of size α ifα (R) = α and for any test R1 of size α, β (R) ≤ β (R1). (Itmay not be unique.)
• Definition 9.5.2: R is the most powerful test of level α ifα (R) ≤ α and for any test R1 of level α (that is, such thatα (R1) ≤ α, β (R) ≤ β (R1).
• “level α” is needed with discrete sample when α (R) 6= α forall R. Not needed with randomized test.
• Randomized test: toss a coin before deciding which of R1 andR2 to use. Can achieve any desired α.
Likelihood Ratio Test
• simple H0 versus simple H1: H0 : θ = θ0, H1 : θ = θ1, reject if
L (x |θ1)
L (x |θ0)> c or R = x :
L (x |H1)
L (x |H0)> c
• Simple H0 : θ = θ0, composite H1, for example, H1 : θ > θ0,
LR =supθ∈Θ0∪Θ1
L (x |θ)
L (x |θ0)=
L(X |θMLE
)L (X |θ0)
≥ 1.
H0 ∪ H1 = θ : θ ≥ θ0 parameter space, over which wecalculate MLE.
• Composite H0, composite H1. Reject if
LR =supθ∈Θ0∪Θ1
L (x |θ)
supθ∈Θ0L (x |θ) > c
use H0 ∪H1 for convenience because this is just unconstrainedMLE.
Neyman-pearson lemma
• In simple H0 versus simple H1 tests, the LR test is the mostpowerful test (of a given size),
• Bayes testing: loss matrix,
State of NatureDecision H0 H1
H0 0 γ2
H1 γ1 0
• Bayesian expected loss
φ (R) = γ1π (H0)︸ ︷︷ ︸η0
P (x ∈ R|H0) + γ2π (H1)︸ ︷︷ ︸η1
P (x ∈ Rc |H1)
≡η0α (R) + η1β (R)
=η0
∫x∈R
f (x|H0) dx + η1
∫x∈Rc
f (x|H1) dx.
• Optimal R that minimizes φ (R):
R0 =x : η0f (x|H0) ≤ η1f (x|H1) = x :L (x|H1)
L (x|H0)>η0
η1
=x : log L (x|H1)− log L (x|H0) > log
(η0
η1
).
• Likelihood ratio test minimizes Bayes risk.
• is also most powerful.
• φ (R) is a linear combination of size and type II error.
• No R1 s.t. α (R1) = α (R0) and β (R1) < β (R0).
• Otherwise φ (R1) < φ (R0).
• Frequentist: pick η0/η1 for the desired size.
• Example. H0 : µ = µ0, HA : µ = µ1. Assume µ1 > µ0.
• Xt , t = 1, . . . i.i.d. N(µ, σ2
). σ2 known.
• Log likelihood ratio:
− 1
2σ2
∑(xt − µ1)2 +
1
2σ2
∑(xt − µ0)2
=n (µ1 − µ0)
σ2x + c
• Best rejection region: x > d such that
P (x > d |H0) = α.
• Note that d does not depend on the value of µ1, as long as
µ1 > µ0
• Under the H0:
P (x > d |H0) =P
(√n (x − µ0)
σ>
√n (d − µ0)
σ
)=P
(Z >
√n (d − µ0)
σ
)= 1− Φ
(√n (d − µ0)
σ
)= α
Solve for d ,
d = Φ−1 (1− α)σ√n
+ µ0
• Simple Null vs Composite Alternative.
• H0 : θ = θ0 against H1 : θ ∈ Θ1.
• Power function for a test R
Q (θ) = P (x ∈ R|θ) .
• Q (θ0) = α. Q (θ1) = 1− β for θ1 ∈ Θ1.
• Definition 9.4.2. R1 is uniformly better than R2 ifQ1 (θ0) = Q2 (θ0), Q1 (θ) ≥ Q2 (θ) ∀θ ∈ Θ1, andQ1 (θ1) > Q2 (θ1) for at least one θ1 ∈ Θ1.
• Definition 9.4.3. A test R is uniformly most powerful (UMP)if it is uniformly weakly better than any other test with thesame size (the same Q (θ0)).
• UMP tests often do not exist.
• Likelihood ratio test usually is UMP if UMP exists.
• Likelihood ratio test often used even without UMP.
• LR test: reject if
log L (θ0)− supθ∈θ0∪Θ1
log L (θ) < c .
• The second part is just the maximum likelihood estimator:
log L(θ)
= supθ∈θ0∪Θ1
log L (θ) .
• An admissible test is a test R0 such that there is no other testR with the same size, where PR (R|θ) ≥ PR (R0|θ) for all θ,and PR (R|θ) > PR (R0|θ) for some θ.
• Example. H0 : µ = µ0, HA : µ > µ0.
• Xt ∼ N(µ, σ2
) with σ2 known.
• log likelihood ratio statistics:
LR =− 1
2σ2
∑(xt − µ0)2 − sup
µ≥µ0
− 1
2σ2
∑(xt − µ)2
=− 1
2σ2n (x − µ0)2 + inf
µ≥µ0
1
2σ2n (x − µ)2
.
• If x ≤ µ0, then infµ≥µ01
2σ2 n (x − µ)2 = 12σ2 n (x − µ0)2.
LR = 0.
• Do not reject.
• If x > µ0, inf = 0, LR = − 12σ2 n (x − µ0)2, reject if x > d .
• This test is UMP.
• Same test for simple H0 : µ = µ0 versus H1 : µ = µ1 for anyµ1 > µ0, same d as before. Reject if
x > d = Φ−1 (1− α)σ√n
+ µ0.
• Power function, for µ > µ0,
Power (µ) =P (x > d |µ) = P
(√n (x − µ0)
σ≥ Φ−1 (1− α) |µ
)=P
(√n (x − µ)
σ≥ Φ−1 (1− α) +
√n (µ0 − µ)
σ|µ)
=P
(Z ≥ Φ−1 (1− α) +
√n (µ0 − µ)
σ
)=1− Φ
(Φ−1 (1− α) +
√n (µ0 − µ)
σ
)Power (µ)→ 1 as n→∞, or µ→∞.
• Example: H0 : µ = µ0, HA : µ 6= µ0.
• Xt ∼ N(µ, σ2
) with σ2 known.
• log likelihood ratio statistics:
LR =− 1
2σ2
∑(xt − µ0)2 − sup
µ 6=µ0
− 1
2σ2
∑(xt − µ)2
=− 1
2σ2n (x − µ0)2 + inf
µ 6=µ0
1
2σ2n (x − µ)2
=− 1
2σ2n (x − µ0)2
.
• Reject if |x − µ0| > d where
P (|x − µ0| > d |H0) = α.
• Not a UMP test. Not Neyman-Pearson against, e.g. anyµ > µ0. But, UMP among tests that assign equal power to µequidistant from µ0.
• To determine d ,
P
(|√n (x − µ0)
σ| ≥√nd
σ
)= α
d = Φ−1 (1− α/2)σ√n.
• Power function, as n→∞ or |µ| → ∞.
P (|x − µ0| > d |µ) = P (x > µ0 + d |µ) + P (x < µ0 − d |µ)
= P (x − µ > µ0 − µ+ d |µ) + P (x − µ < µ0 − µ+ d |µ)
= P
(√n (x − µ)
σ>
√n (µ0 − µ+ d)
σ|µ)
+ P
(√n (x − µ)
σ<
√n (µ0 − µ− d)
σ|µ)
=1− Φ
(√n (µ0 − µ)
σ+ Φ−1 (1− α/2)
)+ Φ
(√n (µ0 − µ)
σ− Φ−1 (1− α/2)
)→ 1
• Composite H0 vs Composite H1: UMP test may exist.
• Example: X ∼ N (µ, 1), H0 : µ ≤ 0, H1 : µ > 0.
• Recall that α (R) = supµ∈H0P (X ∈ R|µ).
• The convention is not to use the ”size function”, but insteaduse the ”least favorable null distribution”.
• What if you reject if X > Φ−1 (1− α). Then
α (R) = supµ≤o
P (X > Z1−α|µ)
= supµ≤0
P (X − µ > Z1−α − µ)
= supµ≤0
(1− Φ (Z1−α − µ)) = 1− Φ (Z1−α) = α
• By definition, any test of size α for composite H0 againstcomposite H1 will also have size at most α at the leastfavorable null µ = 0, and therefore the above test is UMP.Same as the one sided test of H0 : µ = 0 against H1 : µ > 0.
• Wald tests: H0 : θ = θ0. HA : θ 6= θ0.
• Typically√n(θ − θ
)d−→ N (0,Σ).
• Intend to reject if θ − θ0 is sufficient large.
• What metric to use to measure distance |θ − θ0|?
• Use quadratic norm:
ant (x) = n(θ − θ0
)′Σ−1
(θ − θ0
)d−→ χ2
dim(θ).
• If θ is MLE: Σ = H−1SH−1, and if the model is also correctlyspecified, Σ = −H−1 = S−1.
• Why use this particular weighting matrix? So that limit hasno nuisance parameters.
Theorem: Suppose X is a J-vector distributed as N (µ,Σ), then(X − µ)′Σ−1 (X − µ) ∼ χ2
J .Proof: Σ is symmetric and positive definite. So use an eigenvalue-function decomposition:
Σ = HΛH ′
Λ is a diagonal matrix with positive characteris roots of Σ on themain diagonal. H is orthonormal: HH ′ = I ⇒ H ′ = H−1. LetΣ−1/2 = HΛ−1/2H ′. Then
Σ−1/2Σ−1/2 =HΛ−1/2H ′HΛ−1/2H ′ = HΛ−1/2Λ−1/2H ′ = HΛ−1H ′,
Σ−1 =(HΛH ′
)−1= H
′−1Λ−1H−1 = HΛ−1H ′ = Σ−1/2Σ−1/2.
Σ−1/2ΣΣ−1/2 =HΛ−1/2H ′HΛH ′HΛ−1/2H ′ = HH ′ = I .
Therefore
Σ−1/2 (X − µ) ∼ N(
0,Σ−1/2ΣΣ−1/2)
= N (0, I )(Σ−1/2 (X − µ)
)′ (Σ−1/2 (X − µ)
)∼ χ2
J .
• Wald test for linear combinations:
• H0 : Aθ = Aθ0 = b, HA : Aθ 6= Aθ0 = b.
• If√n(θ − θ
)d−→ N (0,Σ), then under the H0:
√n(Aθ − b
)=√nA(θ − θ
)d−→ N (0,AΣA′)
• Quadratic norm based test statistic
n(Aθ − b
)′ (AΣA′
)−1 (Aθ − b
)=n(θ − θ0
)′A′(AΣA′
)−1
A(θ − θ0
)d−→ χ2
rows of A
• Not UMP.
• Wald test for nonlinear functions
√n(θ − θ0
)d−→ N (0,Σ)
H0 : g (θ0) = 0 H1 : g (θ0) 6= 0.
where g (θ) = (g1 (θ) , . . . , gJ (θ))′ is J × 1
• By the Delta method,
√n(g(θ)− g (θ0)
)d−→ N
(0,R (θ0) ΣR (θ0)′
)where the J × dθ matrix,
R (θ0) =∂g (θ0)
∂θ′=
∂g1(θ0)∂θ1
· · · ∂g1(θ0)∂θdθ
· · · · · · · · ·∂gJ(θ0)∂θ1
· · · ∂gJ(θ0)∂θdθ
• Define A = R (θ0) ΣR (θ0)′. Under H0 : g (θ0) = 0,
√ng(θ)
d−→ N (0,A)
• Let A = HΩH ′. Define A−1/2HΛ−1/2H ′.
√nA−1/2g
(θ)
d−→ N(
0,A−1/2AA−1/2)
= N (0, I )
by the continuous mapping theorem(√nA−1/2g
(θ))′ (√
nA−1/2g(θ))
d−→ χ2J
Therefore,
W = ng(θ)′
A−1g(θ)
d−→ χ2J
Reject if W > χ2J,α.
• Also need Ap−→ A: A = R
(θ)
ΣR(θ)
. Define
W = ng(θ)′
A−1g(θ)
d−→ χ2J
• Use a combination of Slutsky and CMT to show
Wd−→ χ2
J
• Asymptotic LR test: H0 : θ = θ0, HA : θ 6= θ0.
LR = log L (θ0)− log L(θMLE
).
• One can prove that under H0,
2LRd−→ −χ2
dim(θ).
• Intuitively,
2LR ≈√n(θMLE − θ0
)′ 1
n
∂2 log L (θ)
∂θ∂θ′√n(θMLE − θ0
).
• Not UMP.
• Not longer χ2 limit for multivariate inequality test.
• Note that∂ log L(θMLE)
∂θ = 0, use a second order Taylorexpansion,
2LR =(θ − θ0
)′ ∂2 log L (θ∗)
∂θ∂θ′
(θ − θ0
)=√n(θ − θ0
)′ 1
n
∂2 log L (θ∗)
∂θ∂θ′√n(θ − θ0
)• Note that
√n(θ − θ0
)d−→ N
(0,H−1SH−1
)= N
(0,−H−1
)The 2nd equality holds if the model is correct.
1
n
∂2 log L (θ∗)
∂θ∂θ′=
1
n
n∑i=1
∂2 log f (Xi |θ∗)∂θ∂θ′
As θp−→ θ0, θ∗
p−→ θ0,
1
n
n∑i=1
∂2 log f (Xi |θ∗)∂θ∂θ′
p−→ E∂2 log f (Xi |θ0)
∂θ∂θ′≡ H.
• Therefore by Slutsky and CMT, when the model is correct, forZ ∼ N
(0,−H−1
):
2LRd−→ −Z ′HZ = Z ′Σ−1/2Σ−1/2Z ∼ χ2
dim(θ)
• If the model is misspecified (but the parameter is stillconsistent), then for Z ∼ N
(0,H−1SH−1
),
2LRd−→ −Z ′HZ
This is some kind of ”weighted” chi-square, but no longer χ2.It can be simulated using consistent estimates H and S .
• More generally, for θ the unconstrained MLE and θ theconstrained MLE:
LR =supθ∈H0∪H1
L (X |θ)
supθ∈H0L (X |θ)
=L(X |θ)
L(X |θ) .
• H0 : Rθ = γ, versus, H0 : Rθ 6= γ, then
θ = arg maxθ∈Θ
L (X |θ)
θ = arg maxθ∈Θ
L (X |θ) subject to Rθ = γ.
Find cα so that
P
R =
X :L(X |θ)
L(X |θ) > cα
|H0
= α.
Difficult to derive the exact finite sample distribution.
• It can be shown however, that as n→∞, under the H0:
2LR = 2[log L
(Xn|θ
)− log L
(Xn|θ
)] d−→ χ2df =dR
The degree of freedom is the number of restrictions.
• Even more generally, H0 : g (θ) = 0, H1 : g (θ) 6= 0,
θ = arg maxθ∈Θ
L (X |θ)
θ = arg maxθ∈Θ
L (X |θ) subject to g (θ) = 0.
It is still true that
2LR = 2[log L
(Xn|θ
)− log L
(Xn|θ
)] d−→ χ2df =dg
where dg is the number of constraints in g (θ) as long as the∂g(θ0)dθ has full row rank.
• Asymptotic power.
• Under HA, typically ant (x)−→∞ w.p. → 1.
• Under HA, reject w.p. → 1.
• Asymptotic power is 1.
• These are called consistent tests.
• Local alternatives and local power.
• Local alternative, HA : θ = θ0 + c√n
.
• Example: H0 : µ = µ0, HA : µ > µ0. Xt is i.i.d such that
• Xt ∼ N(µ, σ2
) with σ2 known.
• It can be shown that X is a sufficient statistic that containsall the information about µ
• Under the H0: X ∼ N(
0, σ2
n
).
• Reject if x > d : d = σzα√n
+ µ0 for size α test, since
P
(X√1/n
>d√1/n|H0
)= α.
• Asymptotic power: for fixed µ1 > µ0,
P (x > d |µ1) =P
(√nx − µ1
σ≥ zα +
√nµ0 − µ1
σ
)=1− Φ
(zα +
µ0 − µ1
σ
√n
)→ 1
P (x > d |µ1)→
1 when µ1 →∞ holding n fixed.1 when n→∞ holding µ > 0 fixed.
• Draw the shape of the power function when n increases.
• If n =∞, there should be no Type II error. Implications:
• Why fix α? Since Power = 1− Φ(Z1−α −
√nµ), if you really
believe in n→∞, you can choose α→ 0, Z1−α →∞ in sucha way that Z1−α −
√nµ→ −∞. Then both size→ 0 and
Power (µ, n)→ 1 as n→∞.
• Local power, for µ1 = µ0 + c√n
,
P (x > d |µ1) =P
(√nx − µ1
σ≥ zα −
c
σ
)→1− Φ
(zα −
c
σ
)> α.
• P-value: probability of rejection if the observed test statistic isused as the critical value.
• Test-statistic: T = T (X1, . . . ,Xn). Reject if T > c .
• Let F (·) be the CDF of T under the null H0.
Pvalue = 1− F (T )
• Reject if T > c, or if Pvalue < α.
P (Pvalue < x) = P (1− F (T ) < x) = P (F (T ) > 1− x)
=P (T > F−1 (1− x)) = 1− F(F−1 (1− x)
)= 1− (1− x) = x .
• Relation between Testing and Confidence Set
• Confidence Set: a set S such that
P (θ0 ∈ S)
= 1− α→ 1− α
• Any test of H0 : θ = θ0 versus H1 : θ 6= θ0 can be convertedinto a confidence set.
• Consider a test statistic T (X , θ0). Reject ifT (X , θ0) > Cθ0,α. Let
S = set of θ0 that can not be rejected by the above test..
Then
Pθ0 (θ0 ∈ S) = Pθ0 (T (X , θ0) ≤ Cθ0,α) = 1− α.
For example, in the Binary case,√n (p − p)√p (1− p)
d−→ N (0, 1)
√n (p − p)√p (1− p)
d−→ N (0, 1) .
If we test: H0 : p = p H1 : p 6= p. Reject if
T1 (p, p) =
∣∣∣∣√n (p − p)√p (1− p)
∣∣∣∣ > Z1−α/2 or
T2 (p, p) =
∣∣∣∣√n (p − p)√p (1− p)
∣∣∣∣ > Z1−α/2
This implies two confidence sets:
S1 =
p such that
∣∣∣∣√n (p − p)√p (1− p)
∣∣∣∣ < Z1−α/2
S1 =
p such that
∣∣∣∣√n (p − p)√p (1− p)
∣∣∣∣ < Z1−α/2
• Exact finite sample pivotal test.
• The T-statistic is finite sample pivotal if Xi is a knownlocation scale family.
• Suppose Xi ∼ 1σ f(Xi−µσ
). So that Xi = µ+ σZi , where
Zi ∼ f (·) and f (·) is known. Then
T =
√n(X − µ
)√1
n−1
∑ni=1
(Xi − X
)2=
√n(Z)√
1n−1
∑ni=1
(Zi − Z
)2
• This distribution can be simulated as long as you know f (·).It is pivotal since it is free of nuisance parameters.
• In general the T-statistic is asymptotically pivotal.
Td−→ N (0, 1) .