l5

Introduction to Detection Theory

Reading:

Ch. 3 in Kay-II.

Notes by Prof. Don Johnson on detection theory,see http://www.ece.rice.edu/~dhj/courses/elec531/notes5.pdf.

Ch. 10 in Wasserman.

EE 527, Detection and Estimation Theory, # 5 1

Introduction to Detection Theory (cont.)

We wish to make a decision on a signal of interest using noisymeasurements. Statistical tools enable systematic solutionsand optimal design.

Application areas include:

Communications,

Radar and sonar,

Nondestructive evaluation (NDE) of materials,

Biomedicine, etc.


Example: Radar Detection. We wish to decide on thepresence or absence of a target.


Introduction to Detection Theory

We assume a parametric measurement model p(x | ) [orp(x ; ), which is the notation that we sometimes use in theclassical setting].

In point estimation theory, we estimated the parameter given the data x.

Suppose now that we choose 0 and 1 that form a partitionof the parameter space :

0 1 = , 0 1 = .

In detection theory, we wish to identify which hypothesis is true(i.e. make the appropriate decision):

H0 : 0, null hypothesisH1 : 1, alternative hypothesis.

Terminology: If can only take two values,

= {0, 1}, 0 = {0}, 1 = {1}

we say that the hypotheses are simple. Otherwise, we say thatthey are composite.

Composite Hypothesis Example: H0 : = 0 versus H1 : (0,).


The Decision Rule

We wish to design a decision rule (function) (x) : X (0, 1):

(x) ={

1, decide H1,0, decide H0.

which partitions the data space X [i.e. the support of p(x | )]into two regions:

Rule (x): X0 = {x : (x) = 0}, X1 = {x : (x) = 1}.

Let us define probabilities of false alarm and miss:

PFA = E x | [(X) | ] =X1p(x | ) dx for in 0

PM = E x | [1 (X) | ] = 1X1p(x | ) dx

=X0p(x | ) dx for in 1.

Then, the probability of detection (correctly deciding H1) is

PD = 1PM = E x | [(X) | ] =X1p(x | ) dx for in 1.


Note: PFA and PD/PM are generally functions of the parameter (where 0 when computing PFA and 1 whencomputing PD/PM).

More Terminology. Statisticians use the followingterminology:

False alarm Type I error

Miss Type II error

Probability of detection Power

Probability of false alarm Significance level.


Bayesian Decision-theoretic Detection Theory

Recall (a slightly generalized version of) the posterior expectedloss:

(action |x) =

L(, action) p( |x) dthat we introduced in handout # 4 when we discussed Bayesiandecision theory. Let us now apply this theory to our easyexample discussed here: hypothesis testing, where our actionspace consists of only two choices. We first assign a loss table:

decision rule true state 1 0x X1 L(1|1) = 0 L(1|0)x X0 L(0|1) L(0|0) = 0

with the loss function described by the quantitiesL(declared | true):

L(1 | 0) quantifies loss due to a false alarm,

L(0 | 1) quantifies loss due to a miss,

L(1 | 1) and L(0 | 0) (losses due to correct decisions) typically set to zero in real life. Here, we adopt zero lossesfor correct decisions.


Now, our posterior expected loss takes two values:

0(x) =1

L(0 | 1) p( |x) d

+0

L(0 | 0) 0

p( |x) d

=1

L(0 | 1) p( |x) d

L(0 | 1) is constant= L(0 | 1)

1

p( |x) d P [1 |x]

and, similarly,

1(x) =0

L(1 | 0) p( |x) d

L(1 | 0) is constant= L(1 | 0)

0

p( |x) d P [0 |x]

.

We define the Bayes decision rule as the rule that minimizesthe posterior expected loss; this rule corresponds to choosingthe data-space partitioning as follows:

X1 = {x : 1(x) 0(x)}EE 527, Detection and Estimation Theory, # 5 8

or

X1 ={x :

P [1 |x] 1

p( |x) d0

p( |x) d P [0 |x]

L(1 | 0)L(0 | 1)

}(1)

or, equivalently, upon applying the Bayes rule:

X1 ={x :

1p(x | )pi() d

0p(x | )pi() d

L(1 | 0)L(0 | 1)

}. (2)

0-1 loss: For L(1|0) = L(0|1) = 1, we havedecision rule true state 1 0

x X1 L(1|1) = 0 L(1|0) = 1x X0 L(0|1) = 1 L(0|0) = 0

yielding the following Bayes decision rule, called the maximuma posteriori (MAP) rule:

X1 ={x :

P [ 1 |x]P [ 0 |x] 1

}(3)


X1 ={x :

1p(x | )pi() d

0p(x | )pi() d 1

}. (4)


Simple hypotheses. Let us specialize (1) to the case of simplehypotheses (0 = {0},1 = {1}):

X1 ={x :

p(1 |x)p(0 |x)

posterior-odds ratio

L(1 | 0)L(0 | 1)

}. (5)

We can rewrite (5) using the Bayes rule:

X1 ={x :

p(x | 1)p(x | 0)

likelihood ratio

pi0 L(1 | 0)pi1 L(0 | 1)

}(6)

wherepi0 = pi(0), pi1 = pi(1) = 1 pi0

describe the prior probability mass function (pmf) of the binaryrandom variable (recall that {0, 1}). Hence, for binarysimple hypotheses, the prior pmf of is the Bernoulli pmf.


Preposterior (Bayes) Risk

The preposterior (Bayes) risk for rule (x) is

E x,[loss] =X1

0

L(1 | 0) p(x | )pi() d dx

+X0

1

L(0 | 1) p(x | )pi() d dx.

How do we choose the rule (x) that minimizes the preposterior


risk? X1

0

L(1 | 0) p(x | )pi() d dx

+X0

1

L(0 | 1) p(x | )pi() d dx

=X1

0

L(1 | 0) p(x | )pi() d dx

X1

1

L(0 | 1) p(x | )pi() d dx

+X0

1

L(0 | 1) p(x | )pi() d dx

+X1

1

L(0 | 1) p(x | )pi() d dx

= const not dependent on (x)

+X1

{L(1 | 0)

0

p(x | )pi() d

L(0 | 1) 1

p(x | )pi() d}dx

implying that X1 should be chosen as{X1 : L(1 | 0)

0

p(x | )pi() dL(0 | 1)1

p(x | )pi() < 0}


which, as expected, is the same as (2), since

minimizing the posterior expected loss

minimizing the preposterior risk for every x

as showed earlier in handout # 4.

0-1 loss: For the 0-1 loss, i.e. L(1|0) = L(0|1) = 1, thepreposterior (Bayes) risk for rule (x) is

E x,[loss] =X1

0

p(x | )pi() d dx

+X0

1

p(x | )pi() d dx (7)

which is simply the average error probability, with averagingperformed over the joint probability density or mass function(pdf/pmf) or the data x and parameters .


Bayesian Decision-theoretic Detection forSimple Hypotheses

The Bayes decision rule for simple hypotheses is (6):

(x) likelihood ratio

=p(x | 1)p(x | 0)

H1 pi0 L(1|0)

pi1 L(0|1) (8)

see also Ch. 3.7 in Kay-II. (Recall that (x) is the sufficientstatistic for the detection problem, see p. 37 in handout # 1.)Equivalently,

log (x) = log[p(x | 1)] log[p(x | 0)]H1 log .

Minimum Average Error Probability Detection: In thefamiliar 0-1 loss case where L(1|0) = L(0|1) = 1, we knowthat the preposterior (Bayes) risk is equal to the average errorprobability, see (7). This average error probability greatly


simplifies in the simple hypothesis testing case:

av. error probability =X1

L(1 | 0) 1

p(x | 0)pi0 dx

+X0

L(0 | 1) 1

p(x | 1)pi1 dx

= pi0 X1p(x | 0) dx PFA

+pi1 X0p(x | 1) dx

PM

where, as before, the averaging is performed over the jointpdf/pmf of the data x and parameters , and

pi0 = pi(0), pi1 = pi(1) = 1 pi0 (the Bernoulli pmf).

In this case, our Bayes decision rule simplifies to the MAP rule(as expected, see (5) and Ch. 3.6 in Kay-II):

p(1 |x)p(0 |x)

posterior-odds ratio

H1 1 (9)


p(x | 1)p(x | 0)

likelihood ratio

H1 pi0

pi1. (10)


which is the same as

(4), upon substituting the Bernoulli pmf as the prior pmf for and

(8), upon substituting L(1|0) = L(0|1) = 1.


Bayesian Decision-theoretic Detection Theory:Handling Nuisance Parameters

We apply the same approach as before integrate the nuisanceparameters (, say) out!

Therefore, (1) still holds for testing

H0 : 0 versusH1 : 1

but p |x( |x) is computed as follows:

p |x( |x) =p, |x(, |x) d

and, therefore,

1

p( |x) p, |x(, |x) d d

0

p, |x(, |x) d

p( |x)

d

H1 L(1 | 0)

L(0 | 1) (11)


or, equivalently, upon applying the Bayes rule:1

px | ,(x | , )pi,(, ) d d

0

px | ,(x | , )pi,(, ) d d

H1 L(1 | 0)

L(0 | 1). (12)

Now, if and are independent a priori, i.e.

pi,(, ) = pi() pi() (13)

then (12) can be rewritten as

1pi()

p(x |) px | ,(x | , )pi() d d

0pi()

px | ,(x | , )pi()

p(x |)

d d

H1 L(1 | 0)

L(0 | 1). (14)

Simple hypotheses and independent priors for and : Letus specialize (11) to the simple hypotheses (0 = {0},1 ={1}):

p, |x(1, |x) dp, |x(0, |x) d

H1 L(1 | 0)

L(0 | 1). (15)

Now, if and are independent a priori, i.e. (13) holds, then


we can rewrite (14) [or (15) using the Bayes rule]:

px | ,(x | 1, )pi() dpx | ,(x | 0, )pi() d integrated likelihood ratio

=

same as (6) p(x | 1)p(x | 0)

H1 pi0 L(1 | 0)

pi1 L(0 | 1) (16)

wherepi0 = pi(0), pi1 = pi(1) = 1 pi0.


Chernoff Bound on Average Error Probabilityfor Simple Hypotheses

Recall that minimizing the average error probability

av. error probability =X1

0

p(x | )pi() d dx

+X0

1

p(x | )pi() d dx

leads to the MAP decision rule:

X ?1 ={x :0

p(x | )pi() d 1

p(x | )pi() d < 0}.

In many applications, we may not be able to obtain asimple closed-form expression for the minimum average error


probability, but we can bound it as follows:

min av. error probability =X?1

0

p(x | )pi() d dx

+X?0

1

p(x | )pi() d dx

usingthe def.of X?1=

X

data space

min{

0

p(x | )pi() d,1

p(x | )pi() d)}dx

X

[ 4= q0(x) 0

p(x | )pi() d] [ 4= q1(x)

1

p(x | )pi() d]1

dx

=X[q0(x)] [q1(x)]1 dx

which is the Chernoff bound on the minimum average errorprobability. Here, we have used the fact that

min{a, b} a b1, for 0 1, a, b 0.

When

x =

x1x2...xN


with

x1, x2, . . . xN conditionally independent, identicallydistributed (i.i.d.) given and

simple hypotheses (i.e. 0 = {0},1 = {1})

then

q0(x) = p(x | 0) pi0 = pi0Nn=1

p(xn | 0)

q1(x) = p(x | 1) pi1 = pi1Nn=1

p(xn | 1)

yielding

Chernoff bound for N conditionally i.i.d. measurements (given ) and simple hyp.

= [

pi0

Nn=1

p(xn | 0)] [

pi1

Nn=1

p(xn | 1)]1

dx

= pi0 pi11

Nn=1

{[p(xn | 0)] [p(xn | 1)]1 dxn

}= pi0 pi

11

{[p(x1 | 0)] [p(x1 | 1)]1 dx1

}NEE 527, Detection and Estimation Theory, # 5 22

or, in other words,

1N log(min av. error probability) log(pi0 pi11 )

+ log[p(x1 | 0)] [p(x1 | 1)]1 dx1, [0, 1].

If pi0 = pi1 = 1/2 (which is almost always the case of interestwhen evaluating average error probabilities), we can say that,as N ,

min av. error probabilityfor N cond. i.i.d. measurements (given ) and simple hypotheses

f(N) exp(N

{ min

[0,1]log[p(x1 | 0)] [p(x1 | 1)]1

} Chernoff information for a single observation

)

where f(N) is a slowly-varying function compared with theexponential term:

limN

log f(N)N

= 0.

Note that the Chernoff information in the exponent term ofthe above expression quantifies the asymptotic behavior of theminimum average error probability.

We now give a useful result, taken from


K. Fukunaga, Introduction to Statistical Pattern Recognition,2nd ed., San Diego, CA: Academic Press, 1990

for evaluating a class of Chernoff bounds.

Lemma 1. Consider p1(x) = N (1,1) and p2(x) =N (2,2). Then

[p1(x)] [p2(x)]1 dx = exp[g()]

where

g() = (1 )

2 (2 1)T [1 + (1 )2]1(2 1)

+12 log[|1 + (1 )2|

|1| |2|1].


Probabilities of False Alarm (P FA) andDetection (P D) for Simple Hypotheses

0 5 10 15 20 250

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Test Statistic

Prob

abilit

y De

nsity

Fun

ctio

nPDF under H0PDF under H1

PFA

PD

PFA = P [(x)

test statistic > XX1

| = 0]

PD = P [test statistic > XX0

| = 1].

Comments:

(i) As the region X1 shrinks (i.e. ), both of the aboveprobabilities shrink towards zero.


(ii) As the region X1 grows (i.e. 0), both of theseprobabilities grow towards unity.

(iii) Observations (i) and (ii) do not imply equality betweenPFA and PD; in most cases, as R1 grows, PD grows morerapidly than PFA (i.e. we better be right more often than weare wrong).

(iv) However, the perfect case where our rule is always rightand never wrong (PD = 1 and PFA = 0) cannot occur whenthe conditional pdfs/pmfs p(x | 0) and p(x | 1) overlap.

(v) Thus, to increase the detection probability PD, we mustalso allow for the false-alarm probability PFA to increase.This behavior

represents the fundamental tradeoff in hypothesis testingand detection theory and

motivates us to introduce a (classical) approach to testingsimple hypotheses, pioneered by Neyman and Pearson (tobe discussed next).


Neyman-Pearson Test for Simple Hypotheses

Setup:

Parametric data models p(x ; 0), p(x ; 1),

Simple hypothesis testing:

H0 : = 0 versusH1 : = 1.

No prior pdf/pmf on is available.

Goal: Design a test that maximizes the probability of detection

PD = P [X X1 ; = 0]

(equivalently, minimizes the miss probability PM) under theconstraint

PFA = P [X X1 ; = 0] = .

Here, we consider simple hypotheses; classical version oftesting composite hypotheses is much more complicated. TheBayesian version of testing composite hypotheses is trivial (as


is everything else Bayesian, at least conceptually) and we havealready seen it.

Solution. We apply the Lagrange-multiplier approach:maximize

L = PD + (PFA )=X1p(x ; 1) dx+

[ X1p(x; 0) dx

]=X1[p(x ; 1) p(x ; 0)] dx .

To maximize L, set

X1 = {x : p(x ; 1)p(x ; 0) > 0} ={x :

p(x ; 1)p(x ; 0)

> }.

Again, we find the likelihood ratio:

(x) =p(x ; 1)p(x ; 0)

.

Recall our constraint:X1p(x ; 0) dx = PFA = .


If we increase , PFA and PD go down. Similarly, if we decrease, PFA and PD go up. Hence, to maximize PD, choose sothat PFA is as big as possible under the constraint.

Two useful ways for determining the threshold thatachieves a specified false-alarm rate:

Find that satisfiesx : (x)>

p(x ; 0) dx = PFA =

or,

expressing in terms of the pdf/pmf of (x) under H0:

p;0(l ; 0) dl = .

or, perhaps, in terms of a monotonic function of (x), sayT (x) = monotonic function((x)).

Warning: We have been implicitly assuming that PFA is acontinuous function of . Some (not insightful) technicaladjustments are needed if this is not the case.

A way of handling nuisance parameters: We can utilize theintegrated (marginal) likelihood ratio (16) under the Neyman-Pearson setup as well.


Chernoff-Stein Lemma for Bounding the MissProbability in Neyman-Pearson Tests of Simple

Hypotheses

Recall the definition of the Kullback-Leibler (K-L) distanceD(p q) from one pmf (p) to another (p):

D(p q) =k

pk logpkqk.

The complete proof of this lemma for the discrete (pmf) caseis given in

Additional Reading: T.M. Cover and J.A. Thomas, Elementsof Information Theory. Second ed., New York: Wiley, 2006.


Setup for the Chernoff-Stein Lemma

Assume that x1, x2, . . . , xN are conditionally i.i.d. given .

We adopt the Neyman-Pearson framework, i.e. obtain adecision threshold to achieve a fixed PFA. Let us study theasymptotic PM = 1 PD as the number of observations Ngets large.

To keep PFA constant as N increases, we need to make ourdecision threshold (, say) vary with N , i.e.

= N(PFA)

Now, the miss probability is

PM = PM() = PM(N(PFA)

).


Detection for Simple Hypotheses: Example

Known positive DC level in additive white Gaussian noise(AWGN), Example 3.2 in Kay-II.

Consider

H0 : x[n] = w[n], n = 1, 2, . . . , N versusH1 : x[n] = A+ w[n], n = 1, 2, . . . , N

where

A > 0 is a known constant,

w[n] is zero-mean white Gaussian noise with known variance2, i.e.

w[n] N (0, 2).

The above hypothesis-testing formulation is EE-like: noiseversus signal plus noise. It is similar to the on-off keyingscheme in communications, which gives us an idea torephrase it so that it fits our formulation on p. 4 (for whichwe have developed all the theory so far). Here is such an


alternative formulation: consider a family of pdfs

p(x ; a) =1

(2pi2)N exp

[ 1

22

Nn=1

(x[n] a)2]

(17)

and the following (equivalent) hypotheses:

H0 : a = 0 (off) versusH1 : a = A (on).

Then, the likelihood ratio is

(x) =p(x ; a = A)p(x ; a = 0)

=1/(2pi2)N/2 exp[ 1

22

Nn=1(x[n]A)2]

1/(2pi2)N/2 exp( 122

Nn=1 x[n]2)

.

Now, take the logarithm and, after simple manipulations, reduceour likelihood-ratio test to comparing

T (x) = x4=

1N

Nn=1

x[n]

with a threshold . [Here, T (x) is a monotonic function of(x).] If T (x) > acceptH1 (i.e. rejectH0), otherwise acceptEE 527, Detection and Estimation Theory, # 5 34

H0 (well, not exactly, we will talk more about this decision onp. 59).

The choice of depends on the approach that we take. Forthe Bayes decision rule, is a function of pi0 and pi1. Forthe Neyman-Pearson test, is chosen to achieve (control) adesired PFA.

Bayesian decision-theoretic detection for 0-1 loss(corresponding to minimizing the average error


probability):

log (x) = 122

Nn=1

(x[n]A)2 + 122

Nn=1

(x[n])2H1 log

(pi0pi1

)

122

Nn=1

(x[n] x[n] 0

+A) (x[n] + x[n]A)H1 log

(pi0pi1

)

2A ( Nn=1

x[n])A2N

H1 22 log

(pi0pi1

)

( Nn=1

x[n]) AN

2

H1

2

Alog(pi0pi1

)since A > 0

and, finally, xH1

2

N A log(pi0/pi1) + A2

which, for the practically most interesting case of equiprobablehypotheses

pi0 = pi1 = 12 (18)

simplifies toxH1 A

2

known as the maximum-likelihood test (i.e. the Bayes decisionrule for 0-1 loss and a priori equiprobable hypotheses is definedas the maximum-likelihood test). This maximum-likelihooddetector does not require the knowledge of the noise variance


2 to declare its decision. However, the knowledge of 2 iskey to assessing the detection performance. Interestingly, theseobservations will carry over to a few maximum-likelihood teststhat we will derive in the future.

Assuming (18), we now derive the minimum average errorprobability. First, note that X | a = 0 N (0, 2/N) andX | a = A N (A, 2/N). Then

min av. error prob. = 12 P [X >A

2| a = 0]

PFA

+12 P [X A/22/N

; a = 0]

+12 P[ X A

2/N standardnormal

random var.

, decide H1 (i.e. reject H0),

otherwise decide H0 (see also the discussion on p. 59).

Performance evaluation: Assuming (17), we have

T (x) | a N (a, 2/N).

Therefore, T (X) | a = 0 N (0, 2/N), implying

PFA = P [T (X) > ; a = 0] = Q(

2/N

)EE 527, Detection and Estimation Theory, # 5 40

and we obtain the decision threshold as follows:

=

2

NQ1(PFA).

Now, T (X) | a = A N (A, 2/N), implying

PD = P (T (X) > | a = A) = Q( A

2/N

)= Q

(Q1(PFA)

A2

2/N

)

= Q

(Q1(PFA)

NA2/2 4= SNR = d2

).

Given the false-alarm probability PFA, the detection probabilityPD depends only on the deflection coefficient:

d2 =NA2

2={E [T (X) | a = A] E [T (X) | a = 0]}2

var[T (X | a = 0)]

which is also (a reasonable definition for) the signal-to-noiseratio (SNR).


Receiver Operating Characteristics (ROC)

PD = Q(Q1(PFA) d

).

Comments:

As we raise the threshold , PFA goes down but so does PD.

ROC should be above the 45 line otherwise we can dobetter by flipping a coin.

Performance improves with d2.


Typical Ways of Depicting the DetectionPerformance Under the Neyman-Pearson Setup

To analyze the performance of a Neyman-Pearson detector, weexamine two relationships:

Between PD and PFA, for a given SNR, called the receiveroperating characteristics (ROC).

Between PD and SNR, for a given PFA.

Here are examples of the two:

see Figs. 3.8 and 3.5 in Kay-II, respectively.


Asymptotic (as N and P FA 0)P D and PM for a Known DC Level in AWGN

We apply the Chernoff-Stein lemma, for which we need tocompute the following K-L distance:

D(p(Xn | a = 0) p(Xn | a = A)

)= E p(xn | a=0)

{log[p(Xn | a = A)p(Xn | a = 0)

]}where

p(xn | a = 0) = N (0, 2)

log[p(xn | a = A)p(xn | a = 0)

]= (xn A)

2

22+

x2n22

=122

(2Axn A2).

Therefore,

D(p(Xn | 0) p(Xn | 1)

)=

A2

22

and the Chernoff-Stein lemma predicts the following behaviorof the detection probability as N and PFA 0:

PD 1 f(N) exp(N A

2

22)

PM


where f(N) is a slowly-varying function of N compared withthe exponential term. In this case, the exact expression for PM(PD) is available and consistent with the Chernoff-Stein lemma:

PM = 1Q(Q1(PFA)

NA2/2

)= Q

(NA2/2 Q1(PFA)

)N+ 1

[NA2/2 Q1(PFA)]

2pi

exp{ [NA2/2 Q1(PFA)]2/2

}=

1[NA2/2 Q1(PFA)]

2pi

exp{NA2/(22) [Q1(PFA)]2/2

+Q1(PFA)NA2/2

}=

1[NA2/2 Q1(PFA)]

2pi

exp{ [Q1(PFA)]2/2 +Q1(PFA)

NA2/2

} exp[NA2/(22)] as predicted by the Chernoff-Stein lemma

(21)

where we have used the asymptotic formula (20). When PFA


is small and N is large, the first two (green) terms in theabove expression make a slowly-varying function f(N) of N .Note that the exponential term in (21) does not depend onthe false-alarm probability PFA (or, equivalently, on the choiceof the decision threshold). The Chernoff-Stein lemma assertsthat this is not a coincidence.

Comment. For detecting a known DC level in AWGN:

The slope of the exponential-decay term of PM is

A2/(22)

which is different from (larger than, in this case) the slopeof the exponential decay of the minimum average errorprobability:

A2/(82).


Decentralized Neyman-Pearson Detection forSimple Hypotheses

Consider a sensor-network scenario depicted by

Assumption: Observations made at N spatially distributedsensors (nodes) follow the same marginal probabilistic model:

Hi : xn p(xn | i)

where n = 1, 2, . . . , N and i {0, 1} for binary hypotheses.Each node n makes a hard local decision dn based on its localobservation xn and sends it to the headquarters (fusion center),which collects all the local decisions and makes the final global


decision about H0 or H1. This structure is clearly suboptimal it is easy to construct a better decision strategy in whicheach node sends its (quantized, in practice) likelihood ratio tothe fusion center, rather than the decision only. However, sucha strategy would have a higher communication (energy) cost.

We now go back to the decentralized detection problem.Suppose that each node n makes a local decision dn {0, 1}, n = 1, 2, . . . , N and transmits it to the fusion center.Then, the fusion center makes the global decision based onthe likelihood ratio formed from the dns. The simplestfusion scheme is based on the assumption that the dns areconditionally independent given (which may not always bereasonable, but leads to an easy solution). We can now write

p(dn | 1) = P dnD,n (1 PD,n)1dn Bernoulli pmf

where PD,n is the nth sensors local detection probability.Similarly,

p(dn | 0) = P dnFA,n (1 PFA,n)1dn Bernoulli pmf

where PFA,n is the nth sensors local detection false-alarm


probability. Now,

log (d) =Nn=1

log[p(dn | 1)p(dn | 0

]

=Nn=1

log[ P dnD,n (1 PD,n)1dnP dnFA,n (1 PFA,n)1dn

] H1 log .

To further simplify the exposition, we assume that all sensorshave identical performance:

PD,n = PD, PFA,n = PFA.

Define the number of sensors having dn = 1:

u1 =Nn=1

dn

Then, the log-likelihood ratio becomes

log (d) = u1 log( PDPFA

)+ (N u1) log

( 1 PD1 PFA

) H1 log

or

u1 log[PD (1 PFA)PFA (1 PD)

] H1 log +N log

(1 PFA1 PD

). (22)


Clearly, each nodes local decision dn is meaningful only ifPD > PFA, which implies

PD (1 PFA)PFA (1 PD) > 1

the logarithm of which is therefore positive, and the decisionrule (22) further simplifies to

u1H1 .

The Neyman-Person performance analysis of this detectoris easy: the random variable U1 is binomial given (i.e.conditional on the hypothesis) and, therefore,

P [U1 = u1] =(N

u1

)pu1 (1 p)Nu1

where p = PFA under H0 and p = PD under H1. Hence, theglobal false-alarm probability is

PFA,global = P [U1 > | 0] =N

u1=d e

(N

u1

)Pu1FA (1PFA)Nu1.

Clearly, PFA,global will be a discontinuous function of and

therefore, we should choose our PFA,global specification from


the available discrete choices. But, if none of the candidatechoices is acceptable, this means that our current system doesnot satisfy the requirements and a remedial action is needed,e.g. increasing the quantity (N) or improving the quality ofsensors (through changing PD and PFA), or both.


Testing Multiple Hypotheses

Suppose now that we choose 0,1, . . . ,M1 that form apartition of the parameter space :

0 1 M1 = , i j = i 6= j.

We wish to distinguish among M > 2 hypotheses, i.e. identifywhich hypothesis is true:

H0 : 0 versusH1 : 1 versus

... versus

HM1 : M1

and, consequently, our action space consists of M choices. Wedesign a decision rule : X (0, 1, . . . ,M 1):

(x) =

0, decide H0,1, decide H1,...M 1, decide HM1

where partitions the data space X [i.e. the support of p(x | )]into M regions:

Rule : X0 = {x : (x) = 0}, . . . ,XM1 = {x : (x) =M1}.EE 527, Detection and Estimation Theory, # 5 52

We specify the loss function using L(i |m), where, typically,the losses due to correct decisions are set to zero:

L(i | i) = 0, i = 0, 1, . . . ,M 1.

Here, we adopt zero losses for correct decisions. Now, ourposterior expected loss takes M values:

m(x) =M1i=0

i

L(m | i) p( |x) d

=M1i=0

L(m | i)i

p( |x) d, m = 0, 1, . . . ,M 1.

Then, the Bayes decision rule ? is defined via the followingdata-space partitioning:

X ?m = {x : m(x) = min0lM1

l(x)}, m = 0, 1, . . . ,M 1

or, equivalently, upon applying the Bayes rule

X ?m ={x : m = arg min

0lM1

M1i=0

i

L(l | i) p(x | )pi() d}

={x : m = arg min

0lM1

M1i=0

L(l | i)i

p(x | )pi() d}.


The preposterior (Bayes) risk for rule (x) is

E x,[loss] =M1m=0

M1i=0

Xm

i

L(m | i) p(x | )pi() d dx

=M1m=0

Xm

M1i=0

L(m | i)i

p(x | )pi() d hm()

dx.

Then, for an arbitrary hm(x),

[M1m=0

Xm

hm(x) dx][M1m=0

Xm?

hm(x) dx] 0

which verifies that the Bayes decision rule ? minimizes thepreposterior (Bayes) risk.


Special Case: L(i | i) = 0 and L(m | i) = 1 for i 6= m (0-1loss), implying that m(x) can be written as

m(x) =M1

i=0, i 6=m

i

p( |x) d

= const not a function of m

m

p( |x) d

and

X ?m ={x : m = arg max

0lM1

l

p( |x) d}

={x : m = arg max

0lM1P [ l |x]

}(23)

which is the MAP rule, as expected.

Simple hypotheses: Let us specialize (23) to simplehypotheses (m = {m}, m = 0, 1, . . . ,M 1):


0lM1p(l |x)

}or, equivalently,


0lM1[pil p(x | l)]

}, m = 0, 1, . . . ,M1


wherepi0 = pi(0), . . . , piM1 = pi(M1)

define the prior pmf of the M -ary discrete random variable (recall that {0, 1, . . . , M1}). If pii, i = 0, 1, . . . ,M 1are all equal:

pi0 = pi1 = = piM1 = 1M

the resulting test


0lM1

[ 1M

p(x | l)] }

={x : m = arg max

0lM1p(x | l) likelihood

}(24)

is the maximum-likelihood test; this name is easy to justify afterinspecting (24) and noting that the computation of the optimaldecision region X ?m requires the maximization of the likelihoodp(x | ) with respect to the parameter {0, 1, . . . , M1}.


Summary: Bayesian Decision Approach versusNeyman-Pearson Approach

The Neyman-Pearson approach appears particularly suitablefor applications where the null hypothesis can be formulatedas absence of signal or perhaps, absence of statisticaldifference between two data sets (treatment versus placebo,say).

In the Neyman-Pearson approach, the null hypothesis istreated very differently from the alternative. (If the nullhypothesis is true, we wish to control the false-alarm rate,which is different from our desire to maximize the probabilityof detection when the alternative is true). Consequently, ourdecisions should also be treated differently. If the likelihoodratio is large enough, we decide to accept H1 (or rejectH0). However, if the likelihood ratio is not large enough,we decide not to reject H0 because, in this case, it may bethat either

(i) H0 is true or(ii) H0 is false but the test has low detection probability

(power) (e.g. because the signal level is small comparedwith noise or we collected too small number ofobservations).


The Bayesian decision framework is suitable forcommunications applications as it can easily handle multiplehypotheses (unlike the Neyman-Pearson framework).

0-1 loss: In communications applications, we typicallyselect a 0-1 loss, implying that all hypotheses are treatedequally (i.e. we could change the roles of null andalternative hypotheses without any problems). Therefore,interpretations of our decisions are also straightforward.Furthermore, in this case, the Bayes decision rule isalso optimal in terms of minimizing the average errorprobability, which is one of the most popular performancecriteria in communications.


P Values

Reporting accept H0 or accept H1 is not very informative.Instead, we could vary PFA and examine how our report wouldchange.

Generally, if H1 is accepted for a certain specified PFA, it willbe accepted for P FA > PFA. Therefore, there exists a smallestPFA at which H1 is accepted. This motivates the introductionof the p value.

To be more precise (and be able to handle compositehypotheses), here is a definition of a size of a hypothesistest.

Definition 1. The size of a hypothesis test described by

Rule : X0 = {x : (x) = 0}, X1 = {x : (x) = 1}.

is defined as follows:

= max0

P [x X1 | ] = max possible PFA.

A hypothesis test is said to have level if its size is less thanor equal to . Therefore, a level- test is guaranteed to havea false-alarm probability less than or equal to .


Definition 2. Consider a Neyman-Pearson-type setup whereour test

Rule : X0, = {x : (x) = 0}, X1, = {x : (x) = 1}.(25)

achieves a specified size , meaning that,

= max possible PFA

= max0

P [x X1, | ] (composite hypotheses)

or, in the simple-hypothesis case (0 = {0},1 = {1}):

=PFA

P [x X1, | = 0] (simple hypotheses).

We suppose that, for every (0, 1), we have a size- testwith decision regions (25). Then, the p value for this test isthe smallest level at which we can declare H1:

p value = inf{ : x X1,}.

Informally, the p value is a measure of evidence for supportingH1. For example, p values less than 0.01 are considered verystrong evidence supporting H1.There are a lot of warnings (and misconceptions) regarding pvalues. Here are the most important ones.


Warning: A large p value is not strong evidence in favor ofH0; a large p value can occur for two reasons:

(i) H0 is true or

(ii) H0 is false but the test has low detection probability(power).

Warning: Do not confuse the p value with

P [H0 | data] = P [ 0 |x]

which is used in Bayesian inference. The p value is not theprobability that H0 is true.Theorem 1. Suppose that we have a size- test of the form

declare H1 if and only if T (x) c.

Then, the p value for this test is

p value = max0

P [T (X) T (x) | ]

where x is the observed value of X. For 0 = {0}:

p value = P [T (X) T (x) | = 0].


In words, Theorem 1 states that

The p value is the probability that, under H0, a randomdata realization X is observed yielding a value of the teststatistic T (X) that is greater than or equal to what hasactually been observed (i.e. T (x)).

Note: This interpretation requires that we allow the experimentto be repeated many times. This is what Bayesians criticize bysaying that data that have never been observed are used forinference.

Theorem 2. If the test statistics has a continuousdistribution, then, under H0 : = 0, the p value has auniform(0, 1) distribution. Therefore, if we declare H1 (rejectH0) when the p value is less than or equal to , the probabilityof false alarm is .

In other words, if H0 is true, the p value is like a randomdraw from an uniform(0, 1) distribution. If H1 is true and if werepeat the experiment many times, the random p values willconcentrate closer to zero.


Multiple Testing

We may conduct many hypothesis tests in some applications,e.g.

bioinformatics and

sensor networks.

Here, we perform many (typically binary) tests (one test pernode in a sensor network, say). This is different from testingmultiple hypotheses that we considered on pp. 5458, wherewe performed a single test of multiple hypotheses. For asensor-network related discussion on multiple testing, see

E.B. Ermis, M. Alanyali, and V. Saligrama, Search anddiscovery in an uncertain networked world, IEEE SignalProcessing Magazine, vol. 23, pp. 107118, Jul. 2006.

Suppose that each test is conducted with false-alarm probabilityPFA = . For example, in a sensor-network setup, each nodeconducts a test based on its local data.

Although the chance of false alarm at each node is only , thechance of at least one falsely alarmed node is much higher,since there are many nodes. Here, we discuss two ways to dealwith this problem.


Consider M hypothesis tests:

H0i versus H1i, i = 1, 2, . . . ,M

and denote by p1, p2, . . . , pM the p values for these tests. Then,the Bonferroni method does the following:

Given the p values p1, p2, . . . , pm, accept H1i if

pi x} assuming H0ii = 1, 2, . . . ,M

= (1 x)M

yielding the proper p value to be attached tomin{p1, p2, . . . , pm} as

1 (1min{p1, p2, . . . , pM})M (26)

and if min{p1, p2, . . . , pM} is small and M not too large, (26)will be close to mmin{p1, p2, . . . , pM}.False Discovery Rate: Sometimes it is reasonable to controlfalse discovery rate (FDR), which we introduce below.

Suppose that we accept H1i for all i for which

pi < threshold


and let m0 +m1 = m benumber of true H0i hypotheses + number of true H1ihypotheses = total number of hypotheses (nodes, say).

# of different outcomes H0 not rejected H1 declared totalH0 true U V m0H1 true T S m1total mR R m

Define the false discovery proportion (FDP) as

FDP ={V/R, R > 0,0, R = 0

which is simply the proportion of incorrect H1 decisions. Now,define

FDR = E [FDP].


The Benjamini-Hochberg (BH) Method

(i) Denote the ordered p values by p(1) < p(2) < p(m).

(ii) Define

li =i

Cmmand R = max{i : p(i) < li}

where Cm is defined to be 1 if the p values areindependent and Cm =

mi=1(1/i) otherwise.

(iii) Define the BH rejection threshold = p(R).

(iv) Accept all H1i for which p(i) .

Theorem 4. (formulated and proved by Benjamini andHochberg) If the above BH method is applied, then, regardlessof how many null hypotheses are true and regardless of thedistribution of the p values when the null hypothesis is false,

FDR = E [FDP] m0m

.


l5

Documents

Transcript of l5

マイクロコンポーネントシステム 型UX-L5-B/UX-L5-R/UX-L5-Wマイクロコンポーネントシステム 型 名UX-L5-B/UX-L5-R/UX-L5-W お買い上げありがとうございます

L5 coaching induction

Spelling words l5

Evaluation of the New WAAS L5 Signal - Gaussgauss.gge.unb.ca/papers.pdf/iongnss2008.rho.pdf · WAAS L5 signal quality and the identified WAAS GEO ... GPS L5 WAAS L5 GPS/WAAS L1 ...

Vitamins L5

Lumbar primary dorsal branch L5 - wipbenelux.org lataster deel 17.pdf · Lumbar primary dorsal branch L5 Bogduk&Long’1979 MB L5 L5 Posterolateral Lumbar primary dorsal ramus L5

L5 6 manlutrition

L5 Revision Lists for Summer Examinations 2017fluencycontent-schoolwebsite.netdna-ssl.com/FileCluster/StCatherines/... · L5 Revision Lists for Summer Examinations 2017 Biology L5

L5 Slides v01

ABE L5 IBC

L5 headlands

l5 Blankinggg

l5 tic Surgery

L5 Creating Interfaces

L5 product bho1171

L5 aggregate functions

L5 Genocide

l5 Etching

Module P7 L5

L5 Biodiversity

マイクロコンポーネントシステム型UX-L5-B/UX-L5-R/UX-L5-Wマイクロコンポーネントシステム型名UX-L5-B/UX-L5-R/UX-L5-W お買い上げありがとうございます