Detection and Estimation Chapter 1. Hypothesis...

Detection and EstimationChapter 1. Hypothesis Testing

Husheng Li

Min Kao Department of Electrical Engineering and Computer ScienceUniversity of Tennessee, Knoxville

Spring, 2015

1/20

Syllabus

Homework: 4 problems each week; hand in your homework after oneweek. 20%

Midterm and Final Exames: 50%

Project: 30%, choose one topic such as large scale hypothesis testing,quickest detection, robust detection, distributed detection, distributedestimation, large covariance matrix estimation, et al; the topic should bebeyond your own research; send me your topic right after the midterm;read the most important books and papers in that topic; apply it in anapplication you choose by your own; provide a presentation of thesurvey and your own results before the final.

2/20

Textbook

The book by H. V. Poor will be the textbook. Other books are forreferences.

3/20

The First Paper onHypothesis Testing

In 1900, K. Pearson published the first paper on hypothesis testing withthe long title “On the criterion that a given system of deviation from theprobable in the case of a correlated system of variables is such that itcan be reasonable supposed to have arisen from random sampling”.

In the paper, he discussed the criterion to judge whether the deviationsof a set of samples can be explained by the randomness of sampling.

4/20

The First Paper onHypothesis Testing:Continued

Pearson made a mistake in this paper, which was pointed out by anagriculture statistician R. Fisher (Pearson was 65 and Fisher was 32).

People in early 1900s considered Fisher as an infant claiming to betaller than his father.

5/20

Bayesian Hypothesis Testing

We assume that two hypotheses H0 and H1 corresponds to twodistributions. The set of observations is denoted by Γ. The goal is todetermine whether a set of observations are generated by H0 or H1.

A decision rule δ is to divide Γ into Γ0 and Γ1 such that

δ(y) =

{1, if y ∈ Γ1

0, if y ∈ Γc1.

Cij means the cost incurred by choosing Hi while Hj is true. Theconditional risk is defined as

Rj (δ) = C1jPj (Γ1) + C0jPj (Γ0), j = 0, 1.

6/20

Bayesian Risk and DecisionRule

We assume that H0 and H1 has prior probabilities π0 and π1. Then, theBayes risk is defined as

r(δ) = π0R0(δ) + π1R1(δ).

Thus the optimal decision rule is to minimize the Bayes risk.

Theorem: The decision rule minimizing the Bayes risk is to calculate thelikelihood ratio L(y) = p1(y)

p0(y)and make the decision:

δB(y) =

{1, if L(y) ≥ r0, if L(y) < r ,

where r is a threshold.

7/20

Bayesian and FrequentistSchools

The difference between the Bayesian and Frequentist schools iswhether the unknown is fixed or random.

In early 20th century, Bayesian statistics were disliked by famousstatisticians Neyman and Fisher. But it flourished in late 20th century.

The main challenge of the Bayesian school is how to determine the priordistribution.

Empirical Bayesian method is kind of a bridge between the two schools.

8/20

MiniMax Hypothesis Testing

The Bayesian risk based testing requires the prior distributions of thehypotheses. What if we do not know them?

If the prior distributions are not known, then we should minimize theworst case, i.e., the maximum risk:

δ∗ = arg minδ

maxπ0

π0R0(δ) + (1− π0)R1(δ).

Again, we can prove that the likelihood ratio based testing is optimal forthe MiniMax hypotheses testing.

9/20

Illustration of the MiniMaxRule

10/20

Randomized Decision Rule

When if V is not differentiable at the least favorable prior probabilityπL.

11/20

Neyman-Pearson HypothesisTesting

Sometimes, the costs are defined directly on the errorprobabilities.Type I error (false alarm): H0 is falsely rejected, probability PF .Type II error (missed detection): H1 is falsely rejected, probabilityPM .Detection probability PD = 1− PM .Neyman-Pearson design criterion:

maxδ

PD(δ), s.t . PF (δ) ≤ α.

12/20

Neyman-Pearson LemmaTheorem: Consider the hypothesis with densities p1 and p0. Assume α > 0.Then, the following statements are true:

Let δ̃ be any decision rule satisfying PF (δ̃) ≤ α and δ̃′ be any decisionrule of the form

δ̃′(y) =

1, p1(y) > ηp0(y)γ(y), p1(y) = ηp0(y)0, p1(y) < ηp0(y)

, (1)

where η and γ(y) are such that PF (δ̃′) = α. Then, PD(δ̃′) ≥ PD(δ̃).

For every α ∈ (0, 1), there exists a decision rule δ̃NP satisfying (1) withγ(y) = γ0 and PF (δ̃NP) = α.

Suppose that δ̃′′ is any α-level Neyman-Pearson decision rule. Then, itmust be of the form (1) except possibly on a subset of Γ of zeroprobability measure.

13/20

Some Gossips

Neyman (in Poland) and Pearson (in UK) collaborated between1926 and 1936. Their paper (98 pages) on the likelihood ratiotest was published in 1928.But their collaboration did not survive...

14/20

Composite HypothesisTesting

In simple hypothesis testing, each hypothesis corresponds toonly one distribution.In composite hypothesis testing, each hypothesis maycorrespond to multiple distributions. Usually we consider a familyof distributions indexed by a parameter θ.In the Bayesian setup, we assume θ is random and thecorresponding prior probability is known.

15/20

Composite HypothesisTesting: Decomposition

In many situations, the parameter space can be decomposed totwo disjoint sets Λ0 and Λ1, representing H0 and H1, respectively.The decision rule is still based on likelihood ratio test, where thelikelihood is calculated by

p(y |Θ ∈ Λj ) =

∫Λj

pθ(y)w(θ)dθ.

16/20

Example 1

17/20

UMP Test

For composite hypothesis testing problems in which we do nothave a prior distribution for the parameter, the development ofhypothesis testing that satisfy precise analytical definitions ofoptimality is very often an illusive task. We can generalize theNeyman-Pearson test by defining the false alarm and detectionprobabilities to

PF (δ̃, θ) = Eθ

[δ̃(Y )

], θ ∈ Λ0 PF (δ̃, θ) = Eθ

[δ̃(Y )

], θ ∈ Λ1.

The ideal case is that the test maximizes PD for every θ whilesatisfying PF ≤ α for every θ. This is called the uniformly mostpowerful (UMP) test.

18/20

Example 2

19/20

What If UMP Test Does NotExist?

In many situations, UMP test does not exist. This can beovercome by applying other constraints to eliminateunreasonable tests from consideration.One such condition is unbiasedness.We can try to optimize the derivative of PD(θ0) for the locallyoptimal test. Or we can try generalized likelihood ratio test:

maxθ∈Λ1 pθ(y)

maxθ∈Λ0 pθ(y).

20/20

Detection and Estimation Chapter 1. Hypothesis...

Documents

Transcript of Detection and Estimation Chapter 1. Hypothesis...