ECE 645 { Hypothesis Testing (cont’d.)jvk/645/645... · This reduces the problem back to that of...

19
ECE 645 – Hypothesis Testing (cont’d.) J. V. Krogmeier February 21, 2014 Contents 1 Composite Hypothesis Testing 3 2 Bayesian Formulation 4 2.1 Specialization of Bayesian for uniform costs ................. 5 2.2 Example: Testing on the radius of a point in the plane ........... 6 3 Uniformly Most Powerful Tests 10 3.1 Example: UMP Tests Don’t Always Exist ................. 11 1

Transcript of ECE 645 { Hypothesis Testing (cont’d.)jvk/645/645... · This reduces the problem back to that of...

Page 1: ECE 645 { Hypothesis Testing (cont’d.)jvk/645/645... · This reduces the problem back to that of simple Bayesian hypothesis testing. 2.2 Example: Testing on the radius of a point

ECE 645 – Hypothesis Testing (cont’d.)

J. V. Krogmeier

February 21, 2014

Contents

1 Composite Hypothesis Testing 3

2 Bayesian Formulation 4

2.1 Specialization of Bayesian for uniform costs . . . . . . . . . . . . . . . . . 5

2.2 Example: Testing on the radius of a point in the plane . . . . . . . . . . . 6

3 Uniformly Most Powerful Tests 10

3.1 Example: UMP Tests Don’t Always Exist . . . . . . . . . . . . . . . . . 11

1

Page 2: ECE 645 { Hypothesis Testing (cont’d.)jvk/645/645... · This reduces the problem back to that of simple Bayesian hypothesis testing. 2.2 Example: Testing on the radius of a point

3.2 Example: UMP Testing of Location . . . . . . . . . . . . . . . . . . . . . 12

3.3 Unbiasedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Locally Most Powerful Tests 16

4.1 The general structure of LMP . . . . . . . . . . . . . . . . . . . . . . . . 17

5 Generalized Likelihood Ratio Tests 19

Hypothesis Testing 2

Page 3: ECE 645 { Hypothesis Testing (cont’d.)jvk/645/645... · This reduces the problem back to that of simple Bayesian hypothesis testing. 2.2 Example: Testing on the radius of a point

1 Composite Hypothesis Testing

• Previously had the case where under each hypothesis there was only one possible

distribution for the observation:

H0 : Y ∼ P0

vs.

H1 : Y ∼ P1

.

This is said to be a simple hypothesis test.

• Now consider the case where under each hypothesis there are many possible distribu-

tions for the observation Y . Hypotheses of this type are known as composite hypotheses.

• A family {Pθ : θ ∈ Λ} of probability distributions on the observation space Γ. Here

parameter set is a disjoint union Λ = Λ0 ∪ Λ1 and the two hypotheses are

H0 : Y ∼ Pθ, θ ∈ Λ0

vs.

H1 : Y ∼ Pθ, θ ∈ Λ1

.

• Consider two approaches: Bayesian and non-Bayesian.

Hypothesis Testing 3

Page 4: ECE 645 { Hypothesis Testing (cont’d.)jvk/645/645... · This reduces the problem back to that of simple Bayesian hypothesis testing. 2.2 Example: Testing on the radius of a point

2 Bayesian Formulation

• Assume the parameter is a rv Θ taking values in Λ and that Pθ is the conditional

distribution of Y given Θ = θ.

• Wish to make a binary decision based on the observation Y = y about which of the

two sets Λ0 and Λ1 contains Θ = θ. Consider only non-randomized decision rules.

• Cost function. C(i, θ), for i = 0, 1 and θ ∈ Λ, is the cost of choosing decision i when

Y ∼ Pθ. For simplicity, assume that C is non-negative and bounded.

• Conditional risks (for a decision rule δ)1

Rθ(δ) = Eθ{C(δ(Y ), θ)} for θ ∈ Λ.

• Average or Bayes risk

r(δ) = E{RΘ(δ)}and a Bayes rule is one that minimizes the Bayes risk.

• Using iterated expectations2 and the definition Eθ{C(δ(Y ), θ)} = E{C(δ(Y ),Θ)|Θ =

θ}:r(δ) = E{E{C(δ(Y ),Θ)|Θ}}

1Eθ{·} denotes expectation assuming Y ∼ Pθ.2EX = E{E{X|Y }}.

Hypothesis Testing 4

Page 5: ECE 645 { Hypothesis Testing (cont’d.)jvk/645/645... · This reduces the problem back to that of simple Bayesian hypothesis testing. 2.2 Example: Testing on the radius of a point

= E{C(δ(Y ),Θ)}= E{E{C(δ(Y ),Θ)|Y }}.

• The last relation implies that r(δ) is minimized over δ if for each y ∈ Γ, δ(y) is chosen

to be the decision that minimizes the posterior cost

E{C(δ(Y ),Θ)|Y = y} = E{C(δ(y),Θ)|Y = y}.

• Since δ(y) can be only 0 or 1, a Bayes rule for this problem is given by

δB(y) =

1 <

0 or 1 if E{C(1,Θ)|Y = y} = E{C(0,Θ)|Y = y}0 >

,

i.e., δB chooses the hypothesis that is least costly, on the average, given the obser-

vation. For Λ = {0, 1} this reduces to the Bayes rule for simple hypothesis testing,

which also had the interpretation of minimizing the posterior cost.

2.1 Specialization of Bayesian for uniform costs

• Suppose that the cost function is uniform over the two sets Λ0 and Λ1, i.e., say

C(i, θ) = ci,j, θ ∈ Λj.

Hypothesis Testing 5

Page 6: ECE 645 { Hypothesis Testing (cont’d.)jvk/645/645... · This reduces the problem back to that of simple Bayesian hypothesis testing. 2.2 Example: Testing on the radius of a point

• Then, under the resonable assumption c11 < c01

δB(y) =

1 >

0 or 1 if P (Θ∈Λ1|Y=y)P (Θ∈Λ0|Y=y) = c10−c00

c01−c11

0 <

.

• In addition, if we assume that Y has conditional densities fY (y|Θ ∈ Λ0), fY (y|Θ ∈Λ1), then the test can be rewritten (using Bayes’ formula3)

δB(y) =

1 >

0 or 1 if L(y) = π0(c10−c00)π1(c01−c11)

0 <

where L(y) = fY (y|Θ ∈ Λ1)/fY (y|Θ ∈ Λ0) and π0 = P (Θ ∈ Λ0), π1 = P (Θ ∈ Λ1).

• This reduces the problem back to that of simple Bayesian hypothesis testing.

2.2 Example: Testing on the radius of a point in the plane

3Bayes formula says:

P (Θ ∈ Λj |Y = y) =fY (y|Θ ∈ Λj)P (Θ ∈ Λj)

fY (y|Θ ∈ Λ0)P (Θ ∈ Λ0) + fY (y|Θ ∈ Λ1)P (Θ ∈ Λ1)

Hypothesis Testing 6

Page 7: ECE 645 { Hypothesis Testing (cont’d.)jvk/645/645... · This reduces the problem back to that of simple Bayesian hypothesis testing. 2.2 Example: Testing on the radius of a point
Page 8: ECE 645 { Hypothesis Testing (cont’d.)jvk/645/645... · This reduces the problem back to that of simple Bayesian hypothesis testing. 2.2 Example: Testing on the radius of a point
Page 9: ECE 645 { Hypothesis Testing (cont’d.)jvk/645/645... · This reduces the problem back to that of simple Bayesian hypothesis testing. 2.2 Example: Testing on the radius of a point
Page 10: ECE 645 { Hypothesis Testing (cont’d.)jvk/645/645... · This reduces the problem back to that of simple Bayesian hypothesis testing. 2.2 Example: Testing on the radius of a point

3 Uniformly Most Powerful Tests

The case considered here is that θ is modeled as a deterministic but unknown constant

(or as random, but with unknown statistics). Then a Bayes test in not meaningful so we

look at Neyman-Pearson methods.

• Say parameter set is given as a disjoint union: Λ = Λ0 ∪ Λ1. Hypothesis H0 corre-

sponds to a state of nature Pθ where θ ∈ Λ0. Similarly, for H1.

• Let δ(y) be a randomized decision rule for H0 vs. H1. Define

– False-alarm probabilities.

PF (δ; θ) = Eθ{δ(Y )}, for θ ∈ Λ0.

– Detection probabilities.

PD(δ; θ) = Eθ{δ(Y )}, for θ ∈ Λ1.

• A Uniformly Most Powerful (UMP) test of level α is one that maximizes

PD(δ; θ)

for every θ ∈ Λ1 subject to

PF (δ; θ) ≤ α

for all θ ∈ Λ0.

Hypothesis Testing 10

Page 11: ECE 645 { Hypothesis Testing (cont’d.)jvk/645/645... · This reduces the problem back to that of simple Bayesian hypothesis testing. 2.2 Example: Testing on the radius of a point

• UMP tests do not always exist.

3.1 Example: UMP Tests Don’t Always Exist

• Λ = Λ0 ∪ Λ1

• Suppose H0 is simple, i.e., Λ0 = {θ0}

• Suppose that Pθ has a density fθ(·) for each θ ∈ Λ and consider the Neyman-Pearson

problem for testingH0 : Y ∼ Pθ0

vs.

Hθ : Y ∼ Pθfor some fixed θ ∈ Λ1. To be clear, at the moment we are considering a simple

hypothesis test.

• We know from the NPL that there exists a most powerful α-level test for this problem

with a critical region of the form

Γθ = {y ∈ Γ : fθ(y) > τfθ0(y)}where τ and a possible randomization are chosen to give a size α test. Also from the

NPL we know that the test is essentially unique and that any other α-level test will

have smaller power.

Hypothesis Testing 11

Page 12: ECE 645 { Hypothesis Testing (cont’d.)jvk/645/645... · This reduces the problem back to that of simple Bayesian hypothesis testing. 2.2 Example: Testing on the radius of a point

• So for two distinct parameter values θ′, θ′′ ∈ Λ1 the test with critical region Γθ′ will

have a smaller power for testing

H0 vs. Hθ′′

than the test with Γθ′′ (and vice versa) unless Γθ′ and Γθ′′ are essentially identical.

A UMP test for

H0 vs. Hθ : Y ∼ Pθ, θ ∈ Λ1

exists if and only if the critical region Γθ is (essentially) the same for all

θ ∈ Λ1.

Synonymously, we can say a UMP test exists if and only if the LRT for every θ ∈ Λ1

can be completely defined (including threshold) without knowledge of θ.

3.2 Example: UMP Testing of Location

• Consider the family of distributions {Pθ : θ ∈ Λ} where Λ is a subset of R and Pθ is

N (θ, σ2).

• Consider the hypothesis pairH0 : θ = µ0

vs.

H1 : θ > µ0

Hypothesis Testing 12

Page 13: ECE 645 { Hypothesis Testing (cont’d.)jvk/645/645... · This reduces the problem back to that of simple Bayesian hypothesis testing. 2.2 Example: Testing on the radius of a point

where µ0 is a fixed real number. Therefore, we have a simple null hypothesis Λ0 =

{µ0} and a composite alternative Λ1 = (µ0, ∞).

• From the previous example (i.e., location testing with Gaussian error) we know that

for each fixed θ ∈ Λ1 the most powerful α-level test for H0 versus Y ∼ N (θ, σ2) has

a critical region

Γθ = {y ∈ Γ : y > σΦ−1(1− α) + µ0}.This region does not depend upon θ (note that θ is restricted to be > µ0 ) and thus

it gives a UMP test for H0 : θ = µ0 vs. H1 : θ > µ0.

• Let δ1 denote the decision rule for the critical region Γθ (as seen previously, random-

ization is not required). The detection probabilities are:

PD(δ1; θ) = 1− Φ

Φ−1(1− α)− θ − µ0

σ

for θ > µ0.

• Now for the same family of distributions consider the hypothesis testing problem

H0 : θ = µ0

vs.

H1 : θ 6= µ0

Hypothesis Testing 13

Page 14: ECE 645 { Hypothesis Testing (cont’d.)jvk/645/645... · This reduces the problem back to that of simple Bayesian hypothesis testing. 2.2 Example: Testing on the radius of a point

where µ0 is a fixed real number. Therefore, we have the same simple null hypothesis

Λ0 = {µ0} and a new composite alternative Λ1 = (−∞, µ0) ∪ (µ0, ∞).

• For θ > µ0 the critical region of the most powerful α-level test is as before, but for

θ < µ0 it is different. With the two cases included in the same formula:

Γθ =

{y ∈ Γ : y > σΦ−1(1− α) + µ0} for θ > µ0

{y ∈ Γ : y < σΦ−1(α) + µ0} for θ < µ0.

In the sense that the critical region “switches” as a function of θ between these two

cases, it depends upon θ and

No UMP test exists for this two-sided hypothesis testing problem.

• Let δ2 be the test corresponding to critical region Γθ with θ < µ0. Then we can show

PD(δ2; θ) = Φ

Φ−1(α)− θ − µ0

σ

for θ < µ0.

• We can certainly extend the definitions of PD(δ1; θ) and PD(δ2; θ) to θ ∈ R and then

plot the two power functions together on the same axis. Note that PD(δ1; θ) increases

as θ increases while PD(δ2; θ) decreases as θ increases. The curves cross when θ is

Hypothesis Testing 14

Page 15: ECE 645 { Hypothesis Testing (cont’d.)jvk/645/645... · This reduces the problem back to that of simple Bayesian hypothesis testing. 2.2 Example: Testing on the radius of a point

such that

Φ

Φ−1(α)− θ − µ0

σ

= 1− Φ

Φ−1(1− α)− θ − µ0

σ

which happens for θ = µ0 where both sides equal α.

• This shows that neither test performs will outside of its region of optimality. A more

reasonable test than either δ1 or δ2 would compare |y − µ0| to a threshold, but this

cannot be UMP for H0 : θ = µ0 vs. H1 : θ 6= µ0 because such does not exist.

3.3 Unbiasedness

• Previous example illustrates how the UMP criterion is too strong for some problems

since it is not useful to aim for a criterion for which a test does not exist.

Hypothesis Testing 15

Page 16: ECE 645 { Hypothesis Testing (cont’d.)jvk/645/645... · This reduces the problem back to that of simple Bayesian hypothesis testing. 2.2 Example: Testing on the radius of a point

• Sometimes can overcome the difficulty by applying more constraints to eliminate

unreasonable tests.

• One such condition is to require unbiasedness meaning we require4

PD(δ; θ) ≥ α

for all θ ∈ Λ1 in addition to the constraint PF (δ; θ) ≤ α for all θ ∈ Λ0. This would

have eliminated both δ1 and δ2 from consideration in the previous example.

4 Locally Most Powerful Tests

• Consider the case where Λ is of the form [θ0,∞) with Λ0 = {θ0} and Λ1 = (θ0,∞).

• Such comes up in many signal detection problems in which θ0 = 0 and θ is a signal

amplitude parameter.

• Often we are primarily interested in the case where, under H1, θ is close to θ0. When

θ is a signal amplitude parameter this would correspond to small signal strength (i.e.,4Recalling the actual definition of detection and false alarm probabilties we can give a more symmetric definition for unbiasedness. Namely

δ is unbiased in this composite binary hypothesis testing problem if

Eθ{δ(Y )} is

{≥ α for all θ ∈ Λ1

≤ α for all θ ∈ Λ0.

Hypothesis Testing 16

Page 17: ECE 645 { Hypothesis Testing (cont’d.)jvk/645/645... · This reduces the problem back to that of simple Bayesian hypothesis testing. 2.2 Example: Testing on the radius of a point

the low signal-to-noise ratio regime).

• Consider a decision rule δ. Then subject to regularity conditions we may expand

PD(δ; θ) in a Taylor series about θ0:

PD(δ; θ) = PD(δ; θ0) + (θ − θ0)P ′D(δ; θ0) + O((θ − θ0)2)

where P ′D(δ; θ0) = ∂∂θPD(δ; θ).

• Note that PD(δ; θ0) = PF (δ) so for all α sized tests we see that for θ near θ0

PD(δ; θ) ≈ α + (θ − θ0)P ′D(δ; θ0).

Conclude that for θ near θ0 we can achieve an approximate maximum power with size

α by choosing δ to maximize P ′D(δ; θ0).

• A test which maximizes P ′D(δ; θ0) subject to a false alarm constraint PF (δ) ≤ α is

called an α-level locally most powerful (LMP) test or simply a locally optimum test.

4.1 The general structure of LMP

• Assume that Pθ has density fθ for each θ ∈ Λ1. Then we can write

PD(δ; θ) = Eθ{δ(Y )} =∫Γδ(y)fθ(y)dy.

Hypothesis Testing 17

Page 18: ECE 645 { Hypothesis Testing (cont’d.)jvk/645/645... · This reduces the problem back to that of simple Bayesian hypothesis testing. 2.2 Example: Testing on the radius of a point

• If the family {fθ(y) : θ ∈ Λ1} is sufficiently regular that differentiation wrt θ and

integration wrt y may be interchanged then

P ′D(δ; θ) =∫Γδ(y)

∂θfθ(y)

∣∣∣∣∣∣θ=θ0

dy.

• Comparison of this expression with our previous work on NP testing for simple hy-

potheses shows that the α-level LMP problem is the same as the α-level NP design

problem where we replace f1(y) with

∂θfθ(y)

∣∣∣∣∣∣θ=θ0

.

• From this analogy (within regularity) an α-level LMP test for H0 : θ = θ0 vs.

H1 : θ > θ0 is given by

δlo(y) =

1 >

γ if ∂∂θfθ(y)

∣∣∣∣θ=θ0

= ηfθ0(y)

0 >

where η and γ are chosen st PF (δlo) = α.

Hypothesis Testing 18

Page 19: ECE 645 { Hypothesis Testing (cont’d.)jvk/645/645... · This reduces the problem back to that of simple Bayesian hypothesis testing. 2.2 Example: Testing on the radius of a point

5 Generalized Likelihood Ratio Tests

• In the absense of the applicability of any of the above a test for composite hypothesis

which is often used is to compare

maxθ∈Λ1 fθ(y)

maxθ∈Λ0 fθ(y)

to a threshold.

• Called a generalized likelihood ratio test (GLRT) or maximum likelihood test.

Hypothesis Testing 19