Likelihood, Bayesian and Decision Theory

86
L i k e l o o h i d 1

description

e. L. k. i. o. h. o. i. d. l. Likelihood, Bayesian and Decision Theory. Kenneth Yu. History. The likelihood principle was first introduced by R.A. Fisher in 1922. The law of likelihood was identified by Ian Hacking. - PowerPoint PPT Presentation

Transcript of Likelihood, Bayesian and Decision Theory

Page 1: Likelihood, Bayesian and Decision Theory

L i ke

l

o

ohi d

1

Page 2: Likelihood, Bayesian and Decision Theory

Likelihood, Bayesian and Decision Theory

2

Kenneth Yu

Page 3: Likelihood, Bayesian and Decision Theory

History • The likelihood principle was first

introduced by R.A. Fisher in 1922. The law of likelihood was identified by Ian Hacking.

• "Modern statisticians are familiar with the notion that any finite body of data contains only a limited amount of information on any point under examination; that this limit is set by the nature of the data themselves…the statistician's task, in fact, is limited to the extraction of the whole of the available information on any particular issue." R. A. Fisher

3

Page 4: Likelihood, Bayesian and Decision Theory

Likelihood Principle• All relevant data in is contained in the

likelihood function L(θ | x) = P(X=x | θ)

Law of Likelihood• The extent to which the evidence supports

one parameter over another can be measured by taking their ratio

• These two concepts allow us to utilize likelihood for inferences on θ.

4

Page 5: Likelihood, Bayesian and Decision Theory

Motivation and Applications• Likelihood (Especially MLE) is used in a range of

statistical models such as structural equation modeling, confirmatory factor analysis, linear models, etc. to make inferences on the parameter in a function. Its importance came from a need to find the “best” parameter value subject to error.

• This makes use of only the evidence and disregards the prior probability of the hypothesis. By making inferences on unknown parameters from our past observations, we are able to estimate the true Θ value for the population.

5

Page 6: Likelihood, Bayesian and Decision Theory

• The Likelihood is a function of the form:L(Θ|X)Є{α P(X|Θ) : α > 0 }

• This represents how “likely” Θ is if we have prior outcomes X. It is the same as the probability of X happening given parameter Θ

• Likelihood functions are equivalent if they differ by constant α (They are proportional). The inferences on parameter Θ would be the same if based on equivalent functions.

6

Page 7: Likelihood, Bayesian and Decision Theory

Maximum Likelihood Method

By Hanchao

7

Page 8: Likelihood, Bayesian and Decision Theory

Main topic include:• 1. Why use Maximum Likelihood

Method?• 2. Likelihood Function • 3. Maximum Likelihood Estimators• 4. How to calculate MLE?

8

Page 9: Likelihood, Bayesian and Decision Theory

1. Why use Maximum Likelihood Method?

Difference between:

Method of Moments

& Method of Maximum likelihood

9

Page 10: Likelihood, Bayesian and Decision Theory

• Mostly, same!

• However, Method of Maximum likelihood does yield “good” estimators:

1. an after-the-fact calculation 2. More versatile methods for fitting parametric

statistical models to data 3. Suit for large data samples

10

Page 11: Likelihood, Bayesian and Decision Theory

2. Likelihood Function • Definition: Let , be the joint probability

(or density) function of n random variables : with sample values

The likelihood function of the sample is given by:

kn Rxxf ),;,...,( 1

nXX ,...,1 nxx ,...,1

);,...,(),...,,( 11 nn xxfxxL

11

Page 12: Likelihood, Bayesian and Decision Theory

• If are discrete iid random variable with probability function ,

then, the likelihood function is given by

nXX ,...,1

),( xp

n

ii

n

iii

nn

xP

xXP

xXxXPL

1

1

11

),(

)(

),...,()(

12

Page 13: Likelihood, Bayesian and Decision Theory

• In the continuous case, if the density is then, the likelihood function is given by

i.e. Let be iid random variables. Find the Likelihood function?

),( xf

n

iixfL

1

),()(

),( 2NnXX ,...,1

)2

)(exp(

)2(

1)

2

)(exp(

2

1),(

21

2

2/1

2

22

n

ii

nn

n

i

i

xx

L

13

Page 14: Likelihood, Bayesian and Decision Theory

4. Procedure of one approach to find MLE

• 1). Define the likelihood function, L(θ)• 2). Take the natural logarithm (ln) of L(θ)• 3). Differentiate ln L(θ) with respect to θ, and

then equate the derivative to 0.• 4). Solve the parameter θ, and we will obtain• 5). Check whether it is a max or global max

• Still confuse?

^

14

Page 15: Likelihood, Bayesian and Decision Theory

Ex1: Suppose are random samples from a Poisson distribution with parameter λ. Find MLE ?

^

nXX ,...,1

We have pmf:

0,...;2,1,0;!

)(

xx

exp

x

Hence, the likelihood function is:

n

ii

nx

n

i i

x

x

e

x

eL

n

ii

i

1

1 !!

)(1

15

Page 16: Likelihood, Bayesian and Decision Theory

Differentiating with respect to λ, results in:

And let the result equals to zero:

That is,

Hence, the MLE of λ is:

nx

d

Ld

n

ii

1)(ln

0)(ln 1 n

x

d

Ld

n

ii

_1 xn

xn

ii

_^

X16

Page 17: Likelihood, Bayesian and Decision Theory

Ex2: Let be .a) if μ is unknown and is known, find the

MLE for μ.b) if is known and unknown, find the

MLE for .c) if μ and are both unknown, find the MLE for

.

• Ans:Let , so the likelihood function is:

nXX ,...,1 ),( 2N20

2

0 22

2),( 2

2

)2

)(exp()2(),( 1

2

2/

n

ii

n

xL

17

Page 18: Likelihood, Bayesian and Decision Theory

So after take the natural log we have:

a). When is known, we only need to solve the unknown parameter μ:

2

)()ln(

2)2ln(

2),(ln 1

2

n

iixnn

L

02

0

02

)(2)),((ln

0

10

n

iix

L

0)(1

n

iix

nxn

ii

1

x

18

Page 19: Likelihood, Bayesian and Decision Theory

• b) When is known, so we only need to solve one parameter :

• c) When both μ and θ unknown, we need to differentiate both parameters, and mostly follow the same steps by part a). and b).

0

02

)(

2)),((ln

21

2

n

iixn

L

2

n

Xn

ii

1

20^

2^

)(

19

Page 20: Likelihood, Bayesian and Decision Theory

Reality example: Sound localization

Mic1Mic2

MCU

20

Page 21: Likelihood, Bayesian and Decision Theory

Robust Sound LocalizationIEEE Transactions on Signal Processing, Vol. 53, No. 6, June 2005

Noise reverberations

21

Sound Source

Page 22: Likelihood, Bayesian and Decision Theory

The ideality and reality

22

Mic1Mic2

The received signal in 1meter and angle 60 frequency 1kHz

Page 23: Likelihood, Bayesian and Decision Theory

Fourier Transform shows noise

23

Frequency (100Hz)

Am

plitu

de

Page 24: Likelihood, Bayesian and Decision Theory

Algorithm:

)()()(

)()()(

22

11

tntstm

tntstm

1. Signal collection (Original signal samples in time domain)

2. Cross Correlation (received signals after DFT, in freq domain)

deMM j

)()(maxarg 21

~

24

Page 25: Likelihood, Bayesian and Decision Theory

• However, we have noise mixing within the signal, so the Weighting Cross Correlation algorithm become:

• Where by Using ML method as “Weighting function ” to reduce the sensitive from noise & reverberations

deMMW j

)()()(maxarg 21

~

21

22

22

21

21

|)(||)(||)(||)(|

|)(||)(|)(

MNMN

MMW

25

Page 26: Likelihood, Bayesian and Decision Theory

Reference:

• Complicated calculation (slow) -> it is almost the last approach to solve the problem

• Approximated results (not exact)

The disadvantage of MLE

[1] Halupka, 2005,Robust sound localization in 0.18 um CMOS[2] S.Zucker, 2003, Cross-correlation and maximum-likelihood analysis: a new approach to combining cross-correlation fuctions[3]Tamhane Dunlop, “Statistics and Data Analysis: from Elementary to intermediate”, Chap 15.[4]Kandethody M. Ramachandran, Chris P. Tsokos, “Mathematical Statistics with Applications”, page 235-252.

26

Page 27: Likelihood, Bayesian and Decision Theory

Likelihood ratio test

Ji Wang

27

Page 28: Likelihood, Bayesian and Decision Theory

Brief Introduction

• The likelihood ratio test was firstly claimed by Neyman and E.pearson in 1928. This test method is widely used and always has some kind of optimality.

• In statistics, a likelihood ratio test is used to compare the fit of two models, one of which is nested within the other. This often occurs when testing whether a simplifying assumption for a model is valid, as when two or more model parameters are assumed to be related.

28

Page 29: Likelihood, Bayesian and Decision Theory

Introduction about most powerful test

To the hypothesis , we have two test functions

and , If *, ,then we called is more

powerful than .

If there is a test function Y satisfying the * inequality to the every test

function , then we called Y the uniformly most powerful test.

0 0 1 1: :H H 1Y

2Y 1 2E Y E Y 1 1Y

2Y

2Y

Page 30: Likelihood, Bayesian and Decision Theory

The advantage of likelihood ratio test comparing to the significance test

• The significance test can only deal with the hypothesis in specific interval just like:

but can not handle the very

commonly hypothesis :

because we can not use the method of significance test to find the reject region.

0 0 0 1 1 0: :H H

0 0 0 1 1: :H H

30

Page 31: Likelihood, Bayesian and Decision Theory

Definition of likelihood ratio test statistic

• are the random identical sampling from the family distribution of F={ }. For the test

let

We call is the likelihood ratio of the above mentioned hypothesis. Sometimes we also called it general likelihood ratio.

# From the definition of the likelihood ratio test statistics, we can find if the

value of is small, the null hypothesis is more probably to occur than the alternative hypothesis , so it is reasonable for us to reject null hypothesis.

Thus, this test reject if

)X

( , ) :f x 0 0 0 1 1 0: :H H

)X 0 0 0:H 1 1:H

1,....., nX X

0 1

1

MAX ( ,..., ))

MAX ( ,...,n

n

l x xx

l x x

0H

)X C

31

Page 32: Likelihood, Bayesian and Decision Theory

The definition of likelihood ratio test

• We use as the test statistic of the test :

and the rejection region is , the C satisfy the inequality

Then this test is the likelihood ratio test of level.

# If we do not know the distribution of under null hypothesis, it is very difficult for us to find the marginal value of LRT. However, if there is a statistic( )which is monotonous to the ,and we know its distribution under null hypothesis. Thus, we can make a significance test based on the .

)X

0 0 0 1 1 0: :H H

{ ) }X C

)X

( )T X

0 { ) }P X C

( )T X

32

Page 33: Likelihood, Bayesian and Decision Theory

The steps to make a likelihood ratio test

• Step1 Find the likelihood ration function of the sample .

• Step2 Find the , the test statistic or some other statistics which is monotonous to the .

• Step3 Construct the reject region by using the type 1 error at the significance level of .

1,....., nX X

)X

)X

33

Page 34: Likelihood, Bayesian and Decision Theory

• Example are the random samples having the pdf:

Please derive the rejection region of the hypothesis in the level

(( , ) ,xf x e x R

0 1: 0 : 0H H

1,....., nX X

34

Page 35: Likelihood, Bayesian and Decision Theory

Solution: ● Step1: the sample distribution is :

and it is also the likelihood function, the parameter space is

then we derived

1

(1)

(

( )( , )

n

i

x

xf X e I

0, {0}R

(1)1 1

0 1 1MAX ( ,..., ) ,MAX ( ,..., )

n n

i ii i

x x nx

n nl x x e l x x e

35

Page 36: Likelihood, Bayesian and Decision Theory

● Step2 the likelihood ratio test statistics

We can just used ,because it is monotonous to the ● Step3 Under the null hypothesis, , so the marginal

value by calculating the That is to say is the likelihood ratio test statistics and the reject region is

(1)(1)

1(2 )

2)nXnXX e e

(1)2nX

2(1)2 ~ (2)nX

2 (2)c

(1)2nX

2(1){2 (2)}nX

)x

0 1{2 }P nX C ( )

36

Page 37: Likelihood, Bayesian and Decision Theory

Wald Sequential Probability Ratio Test

So far we assumed that the sample size is fixed in advance. What if it is not fixed?

Abraham Wald(1902-1950) developed the sequential probability ratio test(SPRT) by applying the idea of likelihood ratio testing, which sample sequentially by taking observations one at a time.

Xiao Yu

Page 38: Likelihood, Bayesian and Decision Theory

Hypothesis:

• If stop sampling and decide to not reject

• If continue sampling• If stop sampling and decide to

reject

0 0 1 1: ; :H H

11 1 2 11 2

0 1 2 01

( | )( | , ,..., )( , ,..., )

( | , ,..., ) ( | )

n

in n in n n n

n n ii

f xL x x xx x x

L x x x f x

1 2( , ,..., )n nx x x A

0H

1 2( , ,..., )n nA x x x B

1 2( , ,..., )n nx x x B

0H

1A

1B

Page 39: Likelihood, Bayesian and Decision Theory

SPRT for Bernoulli Parameter

• A electrical parts manufacturer receives a large lot of fuses from a vendor. The lot is regarded as “satisfactory” if the fraction defective p is no more than 0.1, otherwise it is regarded as “unsatisfactory”.

0 0 1 1: 0.1; : 0.3H p p H p p

1 1

0 0

1

1

n ns n s

n

p p

p p

Page 40: Likelihood, Bayesian and Decision Theory

1.504 0.186 1.540 0.186nn s n 0.10, 0.20

Page 41: Likelihood, Bayesian and Decision Theory

Fisher Information

2 2

2

ln ( | ) ln ( | )( )

d f X d f XI E E

d d

ln ( | )d f X

d

ln ( | )0

d f XE

d

score

1ˆ( )( )

VarnI

Cramer-Rao Lower Bound

Page 42: Likelihood, Bayesian and Decision Theory

Single-Parameter Bernoulli experiment

• The Fisher information contained n independent Bernoulli trials may be calculated as follows. In the following, A represents the number of successes, B the number of failure.

42

2 2

2 2

2

2

2 2 2 2

( )!( ) ln( ( ; )) | ln( (1 ) ) |

! !

( ln( ) ln(1 )) | |1

(1 )|

(1 ) (1 ) (1 )

A B A BI E f A E

A B

A BE A B E

A B n n nE

We can see it’s reciprocal of the variance of the number of successes in NBernoulli trials. The more the variance, the less the Fisher information.

Page 43: Likelihood, Bayesian and Decision Theory

Large Sample Inferences Based on the MLE’s

• An approximate large sample (1-alpha)-level confidence interval(CI) is given by

43

2

2

ln ( )1ˆ 0,( )ln ( )

d Ld

NnId L

d

Plug in the Fisher information of Bernoulli trials, we can see it’s consistent as we have learned.

/2 /2

1 1ˆ ˆˆ ˆ( ) ( )

z znI nI

Page 44: Likelihood, Bayesian and Decision Theory

Bayes' theorem

Thomas Bayes (1702 –1761) -English mathematician and a Presbyterian minister born in London -a specific case of the theorem (Bayes'theorem), which was published after his death (Richard price)

Jaeheun kim

Page 45: Likelihood, Bayesian and Decision Theory

Bayesian inference

• Bayesian inference is a method of statistical inference in which some kind of evidence or observations are used to calculate the probability that a hypothesis may be true, or else to update its previously-calculated probability.

• "Bayesian" comes from its use of the Bayes' theorem in the calculation process.

Page 46: Likelihood, Bayesian and Decision Theory

BAYES’ THEOREM

Bayes' theorem shows the relation between two conditional probabilities

)(

)()|()|(

)()|()()|()(

AP

BPBAPABP

APABPBPBAPBAP

Page 47: Likelihood, Bayesian and Decision Theory

• we can make updated probability(posterior probability) from the initial probability(prior probability) using new information.

• we call this updating process Bayes' Theorem

Prior prob.

New info.

Posterior prob.

Using Bayes thm

Page 48: Likelihood, Bayesian and Decision Theory

MONTE HALL

Should we switch the door or stay?????

Page 49: Likelihood, Bayesian and Decision Theory

A contestant chose door 1 and then the host opened one of the other doors(door 3).

Would switching from door 1 to door 2 increase chances of winning the car?

http://en.wikipedia.org/wiki/Monty_Hall_problem

Page 50: Likelihood, Bayesian and Decision Theory

3

2

)31

0()31

1()31

21

(

3

11

)()|()()|()()|(

)()|()|(

31

)3

10()

3

11()

3

1

2

1(

3

1

2

1

)()|()()|()()|(

)()|()|(

333223113

22332

333223113

11331

DpDOpDpDOpDpDOp

DpDOpODp

DpDOpDpDOpDpDOp

DpDOpODp

j

i

O

D ={Door i conceals a car}

={Host opens Door j after a contestant choose Door1}

0)|(

1)|(2

1)|(

3

1)()()(

33

23

13

321

DOp

DOp

DOp

DpDpDp

(when you stay)

(when you switch)

Page 51: Likelihood, Bayesian and Decision Theory

15.3.1 Bayesian Estimation

Zhenrui & friends

Premises of doing a bayesian estimation:

1. Prior knowledge about the unknown parameter θ

2. The possibility distribution of θ : π (θ) (prior distribution)

General equation:

),,,(

)()|,,,(

)()|,,,(

)()|,,,()(

21

21

21

21

n

n

n

n

xxxf

xxxf

dxxxf

xxxf

Marginal p.d.f. of X1,X2,…Xn, Just a normalizing constant to make

1)( d

π*(θ):posterior distribution

π (θ):prior distribution

θ: unknown parameter from a distribution with pdf/pmf f (x | θ). Considered as r.v. in Bayesian estimation

f(x1,x2,…xn| θ) likelihood function of θ based on observed values x1,x2,

…,xn .

The µ* and σ*2 of π*(θ) are called posterior mean and variance, repectively. µ* can be used as a point estimate of θ (Bayes estimate)

)()|()( Xf

This

must be

a

mistake!

Trust me. I know this θ!

51

Page 52: Likelihood, Bayesian and Decision Theory

Bayesian Estimation continued

Conjugate Priors:

a family of prior distributions that the posterior distribution is of the same form of the prior distribution

Examples of Conjugate Priors( from text book): Example 15.25,15.26

• Normal distribution is a conjugate prior on µ of N(µ, σ2 ) )(if σ2 is already known)

• Beta distribution is a conjugate prior on p of a Binominal distribution Bin(n,p)

A question: If I only know the possible value range of θ, but can’t summarized it in the form of a possibility distribution. Can I still do the Bayesian estimation?

No! To apply the Bayes’ theorem, every term in the equation has to be a probability

term. π (θ) : √ vs θ : x

)()|()( Xf

52

Criticisms of Bayesian approach: 1. Perceptions of prior knowledge differ from person to person. ‘subjective’. 2. Too fuzzy to quantify the prior knowledge of θ in the form of a distribution

Page 53: Likelihood, Bayesian and Decision Theory

15.3.2 Bayesian Testing

11

00

:

:

H

Hsimple vs simple hypothesis test :

)|,,,(

)|,,,(

0210

1211*0

*1

n

n

xxxf

xxxf

a

b

ba

a

)( 0

**0

*01

**1 1)(

ba

b )|,,,( 1211 nxxxfb

)|,,,( 0210 nxxxfa

A Bayesian test rejects if k*0

*1

0H

k >0 is a suitably chosen critical constant. A large value of k corresponds toa small value of α

53

),( 00

011 1)(

Prior probability of H0 and H1

Page 54: Likelihood, Bayesian and Decision Theory

Bayesian Testing continuedBayesian test vs Neyman-Pearson likelihood ratio test (15.18)

Neyman-Pearson Lemma: kxxxf

xxxf

xxxL

xxxL

n

n

n

n

)|,,,(

)|,,,(

),,,|(

),,,|(

021

121

210

211

kkxxxf

xxxf

a

b

n

n )(*)|,,,(

)|,,,(

1

0

0210

1211*0

*1

Bayesian test:

Bayesian test can be considered as a specialized Neyman-Peanson likelihood testwhere the probabilities of each hypothesis (H0 & H1 )being true is known:π0 & π1

If , 2/110 kxxxf

xxxf

n

n

)|,,,(

)|,,,(

021

121*0

*1

The Bayesian test becomes the Neyman-Pearson likelihood ratio test

54

Page 55: Likelihood, Bayesian and Decision Theory

Bayesian Inference for one parameter

A biased coin • Bernoulli random variable

• Prob(Head)= ϴ

• ϴ is unknown

Bingqi Cheng

Page 56: Likelihood, Bayesian and Decision Theory

Bayesian StatisticsThree Ingredients:

• Prior distribution

• Likelihood function

• Bayes Theorem

Initial guess or prior knowledge on parameter ϴ, highly subjective

Fits or describes the distribution of real data ( e.g. a sequence of heads or tails when toss the coin)

Update the prior distribution with real data

Prob(ϴ | data) ϴ

Posterior distribution

Page 57: Likelihood, Bayesian and Decision Theory

Prior DistributionBeta distributions are to Bernoullidistributions

conjugate prior

If prior is Beta and likelihood function is Bernoulli, then posterior is Beta

Page 58: Likelihood, Bayesian and Decision Theory

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

x

Den

sity

Prior Distribution

Page 59: Likelihood, Bayesian and Decision Theory

Likelihood function

Page 60: Likelihood, Bayesian and Decision Theory

Posterior Distribution

Prior distributionLikelihood function

For this biased coin:

Page 61: Likelihood, Bayesian and Decision Theory

Calculation Steps

Page 62: Likelihood, Bayesian and Decision Theory

Posterior Distribution

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

1

2

3

4

5

6

7

x

Den

sity

Posterior Distribution

Page 63: Likelihood, Bayesian and Decision Theory

Predictive Probability

Page 64: Likelihood, Bayesian and Decision Theory

Bayesian VS M.L.E with calculus method

Back to the example of biased coin, still we have 20 trials and get 13 heads.

13 720( ) (1 )

13f p p p

12 620'( ) (1 ) (13 20 ) 0

13

0.65

f p p p p

p

Xiao Yu

64

Page 65: Likelihood, Bayesian and Decision Theory

65

Jeffreys Prior ˆ ˆ( ) det ( )p I

Page 66: Likelihood, Bayesian and Decision Theory

Why bother to use Bayesian? With large amount of data, the computation of Bayesian is more easy to handle.

• M.L.E with calculus method: Find the parameter quick and directly, if

possible. -> A huge step• Bayesian Initial guess + approximation + converge -> another startline + small step + maybe not

best value

66

Page 67: Likelihood, Bayesian and Decision Theory

• This is a Gaussian Mixture, observations are vectors, C is the covariance matrix. Find the maximum likelihood estimate for a mixture by direct application of Calculus is tough.

2 2 2 21 1 2 2

1

( ) /2 ( ) /21 2

1 1 2

log ( , | ,..., )

1 1log( )

2 2

n

nx C x C

j

L p C x x

p e p eC C

67

Page 68: Likelihood, Bayesian and Decision Theory

Bayesian Learning

• The more evidence we have, the more we learn.

The more flips we do, the more we know about the probability to get a head, which is the parameter of binomial distribution.

An application: EM(Expectation Maximization) algorithm which can beautifully handle with some regression

problems.

68

Page 69: Likelihood, Bayesian and Decision Theory

Two coins Game: Suppose now that there are two

coins which can be flipped. The probability of heads is p1 for the first coin, and p2 for the second coin. We decide on each flip which of the two coins will be flipped, and our objective is to maximize the number of heads that occur.(p1 and p2 are unknown)

69

Page 70: Likelihood, Bayesian and Decision Theory

Matlab code for the strategy• function [] = twocoin2(p1,p2,n) • H1=0;T1=0;• H2=0;T2=0;• syms ps1;• syms ps2;• for k=1:n,• temp = int(ps2^H2*(1-ps2)^T2,0,ps1);• p(k) =

double(int(ps1^H1*(1-ps1)^T1*temp,0,1)/(beta(H1+1,T1+1)*betH2+1,T2+1))); • if rand < p(k),• guess(k) = 1;• y(k) = rand < p1;• H1 = H1 + y(k);• T1 = T1 + (1 - y(k));• else• guess(k) = 2;• y(k) = rand < p2;• H2 = H2 + y(k);• T2 = T2 + (1 - y(k));• end• end• disp('Guesses: ')• tabulate(guess)• disp('Outcomes: ')• tabulate(y)• figure(2)• plot(p)• end

P1=0.4, p2=0.6Value of L(p1>p2|H1,T1,H2,T2)

70

Page 71: Likelihood, Bayesian and Decision Theory

Statistical Decision Theory

ABRAHAM WALD(1902-1950)• Hungarian mathematician • Major contributions - geometry, econometrics,

statistical sequential analysis, and decision theory

• Died in an airplane accident in 1950

Hans Schneeweiss “Abraham Wald” Department of Statistics, University of Munich Akademiestr. 1, 80799 MÄunchen, Germany

71

Kicheon Park

Page 72: Likelihood, Bayesian and Decision Theory

Why decision theory is needed?

Limits of classical statisticsI. Prior information

and LossII. Initial and final

PrecisionIII. Formulational

Inadequacy

72

Page 73: Likelihood, Bayesian and Decision Theory

Limit of Classical Statistics

• Prior information and Loss- relevant effects from past experience & losses from each possible decision

• Initial and final Precision- Before and After observation of sample information which is result of long series of identical experiments

• Formulational Inadequacy- Limit to make meaningful decision to be reached in the majority problem

73

Page 74: Likelihood, Bayesian and Decision Theory

Classical statistics vs. Decision Theory

• Classical statistics- Direct use of sample information

• Decision theory- combine the sample information with other relevant aspects of the problem for the best decision

→ The goal of decision theory is to make decision based on not only the presence of statistical knowledge but also the uncertainties (θ) that are involved in the decision problem

74

Page 75: Likelihood, Bayesian and Decision Theory

Two types of relevant information

I. Knowledge of the possible consequences of the decisions → loss of result by each possible decisions

II. Prior information →effects from past experience about similar situation

75

Page 76: Likelihood, Bayesian and Decision Theory

Statistical Decision Theory - Elements

Sample Space ΧUnknown parameter θ

χ

Decision Space

,

“Abraham Wald”, Wolfowitz, Annals of Mathematical Statistics“Statistics & Data Analysis”, Tamhane & Dunlop, Prentice Hall“Statistical Decision Theory”, Berger, Springer-Verlag

76Mun Sang Yue

Page 77: Likelihood, Bayesian and Decision Theory

Statistical Decision Theory - Eqns

77

Page 78: Likelihood, Bayesian and Decision Theory

Statistical Decision Theory – Decision Rules

min {max }

78

Page 79: Likelihood, Bayesian and Decision Theory

Statistical Decision Theory - Example

• A retailer must decide whether to purchase a large lot of items containing an unknown fraction p of defectives. Before making the decision of whether to purchase the lot (decision d1) or not to purchase the lot (decision d2), 2 items are randomly selected from the lot for inspection. The retailer wants to evaluate two decisions rules formulated. Prior π(p) = 2(1-p)

79

Page 80: Likelihood, Bayesian and Decision Theory

Example - ContinueNo. of Defectives

xDecision Rule δ1

Decision

Decision Rule δ2

Decision0 d1 d1

1 d2 d1

2 d2 d2

• Loss FunctionsL(d1,p) = 8p-1, and L(d2,p)=2

• Risk Functions– R(δ1,p) = L(d1,p) P(δ1 chooses d1 | p) + L(d2,p) P(δ1 chooses d2 | p)

= (8p-1) P(X=0 | p) + 2 P(X=1 or 2 | p)

– R(δ2,p) = L(d1,p) P(δ2 chooses d1 | p) + L(d2,p) P(δ2 chooses d2 | p)

= (8p-1) P(X=0 or 1 | p) + 2P(X=2 | p) 80

Page 81: Likelihood, Bayesian and Decision Theory

Example - ContinueR(δ2,p)

R(δ1,p)

p

R

max R(δ1,p) = 2.289 max R(δ2,p) = 3.329

81

Page 82: Likelihood, Bayesian and Decision Theory

Statistical Decision Theory - Example

• A shipment of transistors was received by a radio company. A sampling plan was used to check the shipment as a whole to ensure contractual requirement of 0.05 defect rate was not exceeded. A random sample of n transistors was chosen from the shipment and tested. Based upon X, the number of defective transistors in the sample, the shipment will be accepted or rejected.

82

Page 83: Likelihood, Bayesian and Decision Theory

Example (continue)

• Proportion of defective transistors in the shipment is θ.• Decision Rule:

a1 accept lot if X/n ≤ 0.05

a2 reject lot if X/n ≥ 0.05

• Loss Function: L(a1,θ) = 10*θ ; L(a2,θ) = 1

• π(θ) can be estimated based on prior experience• R(δ,θ) can then be calculated

83

Page 84: Likelihood, Bayesian and Decision Theory

Summary

• Maximum Likelihood Estimation selects an estimate of the unknown parameter that maximizes the likelihood function.

• The Likelihood Ratio Test compares the likelihood of the observed outcomes under the null hypothesis to the likelihood under the alternate hypothesis.

• Bayesian methods treat unknown models or variables as random variables with known distributions instead of deterministic quantities that happened to be unknown

84

Page 85: Likelihood, Bayesian and Decision Theory

Summary(Continue)

• Statistical Decision Theory moves statistics from its traditional role of just drawing inferences from incomplete information. The theory focuses on the problem of statistical actions rather than inference.

“Here in the 21st Century … a combination of Bayesian and frequentist ideas will be needed to deal with our increasingly intense scientific environment.”

Bradley Efron, 164th ASA Presidential Address

85

Page 86: Likelihood, Bayesian and Decision Theory

Questions?

THANK YOU!

86