Likelihood, Bayesian and Decision Theory

L i ke

l

o

ohi d

1

Likelihood, Bayesian and Decision Theory

2

Kenneth Yu

History • The likelihood principle was first

introduced by R.A. Fisher in 1922. The law of likelihood was identified by Ian Hacking.

• "Modern statisticians are familiar with the notion that any finite body of data contains only a limited amount of information on any point under examination; that this limit is set by the nature of the data themselves…the statistician's task, in fact, is limited to the extraction of the whole of the available information on any particular issue." R. A. Fisher

3

Likelihood Principle• All relevant data in is contained in the

likelihood function L(θ | x) = P(X=x | θ)

Law of Likelihood• The extent to which the evidence supports

one parameter over another can be measured by taking their ratio

• These two concepts allow us to utilize likelihood for inferences on θ.

4

Motivation and Applications• Likelihood (Especially MLE) is used in a range of

statistical models such as structural equation modeling, confirmatory factor analysis, linear models, etc. to make inferences on the parameter in a function. Its importance came from a need to find the “best” parameter value subject to error.

• This makes use of only the evidence and disregards the prior probability of the hypothesis. By making inferences on unknown parameters from our past observations, we are able to estimate the true Θ value for the population.

5

• The Likelihood is a function of the form:L(Θ|X)Є{α P(X|Θ) : α > 0 }

• This represents how “likely” Θ is if we have prior outcomes X. It is the same as the probability of X happening given parameter Θ

• Likelihood functions are equivalent if they differ by constant α (They are proportional). The inferences on parameter Θ would be the same if based on equivalent functions.

6

Maximum Likelihood Method

By Hanchao

7

Main topic include:• 1. Why use Maximum Likelihood

Method?• 2. Likelihood Function • 3. Maximum Likelihood Estimators• 4. How to calculate MLE?

8

1. Why use Maximum Likelihood Method?

Difference between:

Method of Moments

& Method of Maximum likelihood

9

• Mostly, same!

• However, Method of Maximum likelihood does yield “good” estimators:

1. an after-the-fact calculation 2. More versatile methods for fitting parametric

statistical models to data 3. Suit for large data samples

10

2. Likelihood Function • Definition: Let , be the joint probability

(or density) function of n random variables : with sample values

The likelihood function of the sample is given by:

kn Rxxf ),;,...,( 1

nXX ,...,1 nxx ,...,1

);,...,(),...,,( 11 nn xxfxxL

11

• If are discrete iid random variable with probability function ,

then, the likelihood function is given by

nXX ,...,1

),( xp

n

ii

n

iii

nn

xP

xXP

xXxXPL

1

1

11

),(

)(

),...,()(

12

• In the continuous case, if the density is then, the likelihood function is given by

i.e. Let be iid random variables. Find the Likelihood function?

),( xf

n

iixfL

1

),()(

),( 2NnXX ,...,1

)2

)(exp(

)2(

1)

2

)(exp(

2

1),(

21

2

2/1

2

22

n

ii

nn

n

i

i

xx

L

13

4. Procedure of one approach to find MLE

• 1). Define the likelihood function, L(θ)• 2). Take the natural logarithm (ln) of L(θ)• 3). Differentiate ln L(θ) with respect to θ, and

then equate the derivative to 0.• 4). Solve the parameter θ, and we will obtain• 5). Check whether it is a max or global max

• Still confuse?

^

14

Ex1: Suppose are random samples from a Poisson distribution with parameter λ. Find MLE ?

^

nXX ,...,1

We have pmf:

0,...;2,1,0;!

)(

xx

exp

x

Hence, the likelihood function is:

n

ii

nx

n

i i

x

x

e

x

eL

n

ii

i

1

1 !!

)(1

15

Differentiating with respect to λ, results in:

And let the result equals to zero:

That is,

Hence, the MLE of λ is:

nx

d

Ld

n

ii

1)(ln

0)(ln 1 n

x

d

Ld

n

ii

_1 xn

xn

ii

_^

X16

Ex2: Let be .a) if μ is unknown and is known, find the

MLE for μ.b) if is known and unknown, find the

MLE for .c) if μ and are both unknown, find the MLE for

.

• Ans:Let , so the likelihood function is:

nXX ,...,1 ),( 2N20

2

0 22

2),( 2

2

)2

)(exp()2(),( 1

2

2/

n

ii

n

xL

17

So after take the natural log we have:

a). When is known, we only need to solve the unknown parameter μ:

2

)()ln(

2)2ln(

2),(ln 1

2

n

iixnn

L

02

0

02

)(2)),((ln

0

10

n

iix

L

0)(1

n

iix

nxn

ii

1

x

18

• b) When is known, so we only need to solve one parameter :

• c) When both μ and θ unknown, we need to differentiate both parameters, and mostly follow the same steps by part a). and b).

0

02

)(

2)),((ln

21

2

n

iixn

L

2

n

Xn

ii

1

20^

2^

)(

19

Reality example: Sound localization

Mic1Mic2

MCU

20

Robust Sound LocalizationIEEE Transactions on Signal Processing, Vol. 53, No. 6, June 2005

Noise reverberations

21

Sound Source

The ideality and reality

22

Mic1Mic2

The received signal in 1meter and angle 60 frequency 1kHz

Fourier Transform shows noise

23

Frequency (100Hz)

Am

plitu

de

Algorithm:

)()()(

)()()(

22

11

tntstm

tntstm

1. Signal collection (Original signal samples in time domain)

2. Cross Correlation (received signals after DFT, in freq domain)

deMM j

)()(maxarg 21

~

24

• However, we have noise mixing within the signal, so the Weighting Cross Correlation algorithm become:

• Where by Using ML method as “Weighting function ” to reduce the sensitive from noise & reverberations

deMMW j

)()()(maxarg 21

~

21

22

22

21

21

|)(||)(||)(||)(|

|)(||)(|)(

MNMN

MMW

25

Reference:

• Complicated calculation (slow) -> it is almost the last approach to solve the problem

• Approximated results (not exact)

The disadvantage of MLE

[1] Halupka, 2005,Robust sound localization in 0.18 um CMOS[2] S.Zucker, 2003, Cross-correlation and maximum-likelihood analysis: a new approach to combining cross-correlation fuctions[3]Tamhane Dunlop, “Statistics and Data Analysis: from Elementary to intermediate”, Chap 15.[4]Kandethody M. Ramachandran, Chris P. Tsokos, “Mathematical Statistics with Applications”, page 235-252.

26

Likelihood ratio test

Ji Wang

27

Brief Introduction

• The likelihood ratio test was firstly claimed by Neyman and E.pearson in 1928. This test method is widely used and always has some kind of optimality.

• In statistics, a likelihood ratio test is used to compare the fit of two models, one of which is nested within the other. This often occurs when testing whether a simplifying assumption for a model is valid, as when two or more model parameters are assumed to be related.

28

http://en.wikipedia.org/wiki/Statistics

Introduction about most powerful test

To the hypothesis , we have two test functions

and , If *, ,then we called is more

powerful than .

If there is a test function Y satisfying the * inequality to the every test

function , then we called Y the uniformly most powerful test.

0 0 1 1: :H H 1Y

2Y 1 2E Y E Y 1 1Y

2Y

2Y

The advantage of likelihood ratio test comparing to the significance test

• The significance test can only deal with the hypothesis in specific interval just like:

but can not handle the very

commonly hypothesis :

because we can not use the method of significance test to find the reject region.

0 0 0 1 1 0: :H H

0 0 0 1 1: :H H

30

Definition of likelihood ratio test statistic

• are the random identical sampling from the family distribution of F={ }. For the test

let

We call is the likelihood ratio of the above mentioned hypothesis. Sometimes we also called it general likelihood ratio.

＃ From the definition of the likelihood ratio test statistics, we can find if the

value of is small, the null hypothesis is more probably to occur than the alternative hypothesis , so it is reasonable for us to reject null hypothesis.

Thus, this test reject if

)X

( , ) :f x 0 0 0 1 1 0: :H H

)X 0 0 0:H 1 1:H

1,....., nX X

0 1

1

MAX ( ,..., ))

MAX ( ,...,n

n

l x xx

l x x

0H

)X C

31

The definition of likelihood ratio test

• We use as the test statistic of the test :

and the rejection region is , the C satisfy the inequality

Then this test is the likelihood ratio test of level.

＃ If we do not know the distribution of under null hypothesis, it is very difficult for us to find the marginal value of LRT. However, if there is a statistic( )which is monotonous to the ,and we know its distribution under null hypothesis. Thus, we can make a significance test based on the .

)X

0 0 0 1 1 0: :H H

{ ) }X C

)X

( )T X

0 { ) }P X C

( )T X

32

The steps to make a likelihood ratio test

• Step1 Find the likelihood ration function of the sample .

• Step2 Find the , the test statistic or some other statistics which is monotonous to the .

• Step3 Construct the reject region by using the type 1 error at the significance level of .

1,....., nX X

)X

)X

33

• Example are the random samples having the pdf:

Please derive the rejection region of the hypothesis in the level

(( , ) ,xf x e x R

0 1: 0 : 0H H

1,....., nX X

34

Solution: ● Step1: the sample distribution is :

and it is also the likelihood function, the parameter space is

then we derived

1

(1)

(

( )( , )

n

i

x

xf X e I

0, {0}R

(1)1 1

0 1 1MAX ( ,..., ) ,MAX ( ,..., )

n n

i ii i

x x nx

n nl x x e l x x e

35

● Step2 the likelihood ratio test statistics

We can just used ,because it is monotonous to the ● Step3 Under the null hypothesis, , so the marginal

value by calculating the That is to say is the likelihood ratio test statistics and the reject region is

(1)(1)

1(2 )

2)nXnXX e e

(1)2nX

2(1)2 ~ (2)nX

2 (2)c

(1)2nX

2(1){2 (2)}nX

)x

0 1{2 }P nX C （）

36

Wald Sequential Probability Ratio Test

So far we assumed that the sample size is fixed in advance. What if it is not fixed?

Abraham Wald(1902-1950) developed the sequential probability ratio test(SPRT) by applying the idea of likelihood ratio testing, which sample sequentially by taking observations one at a time.

Xiao Yu

Hypothesis:

• If stop sampling and decide to not reject

• If continue sampling• If stop sampling and decide to

reject

0 0 1 1: ; :H H

11 1 2 11 2

0 1 2 01

( | )( | , ,..., )( , ,..., )

( | , ,..., ) ( | )

n

in n in n n n

n n ii

f xL x x xx x x

L x x x f x

1 2( , ,..., )n nx x x A

0H

1 2( , ,..., )n nA x x x B

1 2( , ,..., )n nx x x B

0H

1A

1B

SPRT for Bernoulli Parameter

• A electrical parts manufacturer receives a large lot of fuses from a vendor. The lot is regarded as “satisfactory” if the fraction defective p is no more than 0.1, otherwise it is regarded as “unsatisfactory”.

0 0 1 1: 0.1; : 0.3H p p H p p

1 1

0 0

1

1

n ns n s

n

p p

p p

1.504 0.186 1.540 0.186nn s n 0.10, 0.20

Fisher Information

2 2

2

ln ( | ) ln ( | )( )

d f X d f XI E E

d d

ln ( | )d f X

d

ln ( | )0

d f XE

d

score

1ˆ( )( )

VarnI

Cramer-Rao Lower Bound

Single-Parameter Bernoulli experiment

• The Fisher information contained n independent Bernoulli trials may be calculated as follows. In the following, A represents the number of successes, B the number of failure.

42

2 2

2 2

2

2

2 2 2 2

( )!( ) ln( ( ; )) | ln( (1 ) ) |

! !

( ln( ) ln(1 )) | |1

(1 )|

(1 ) (1 ) (1 )

A B A BI E f A E

A B

A BE A B E

A B n n nE

We can see it’s reciprocal of the variance of the number of successes in NBernoulli trials. The more the variance, the less the Fisher information.

Large Sample Inferences Based on the MLE’s

• An approximate large sample (1-alpha)-level confidence interval(CI) is given by

43

2

2

ln ( )1ˆ 0,( )ln ( )

d Ld

NnId L

d

Plug in the Fisher information of Bernoulli trials, we can see it’s consistent as we have learned.

/2 /2

1 1ˆ ˆˆ ˆ( ) ( )

z znI nI

Bayes' theorem

Thomas Bayes (1702 –1761) -English mathematician and a Presbyterian minister born in London -a specific case of the theorem (Bayes'theorem), which was published after his death (Richard price)

Jaeheun kim

http://en.wikipedia.org/wiki/Bayes'_theorem

Bayesian inference

• Bayesian inference is a method of statistical inference in which some kind of evidence or observations are used to calculate the probability that a hypothesis may be true, or else to update its previously-calculated probability.

• "Bayesian" comes from its use of the Bayes' theorem in the calculation process.

BAYES’ THEOREM

Bayes' theorem shows the relation between two conditional probabilities

)(

)()|()|(

)()|()()|()(

AP

BPBAPABP

APABPBPBAPBAP

• we can make updated probability(posterior probability) from the initial probability(prior probability) using new information.

• we call this updating process Bayes' Theorem

Prior prob.

New info.

Posterior prob.

Using Bayes thm

MONTE HALL

Should we switch the door or stay?????

A contestant chose door 1 and then the host opened one of the other doors(door 3).

Would switching from door 1 to door 2 increase chances of winning the car?

http://en.wikipedia.org/wiki/Monty_Hall_problem

3

2

)31

0()31

1()31

21

(

3

11

)()|()()|()()|(

)()|()|(

31

)3

10()

3

11()

3

1

2

1(

3

1

2

1

)()|()()|()()|(

)()|()|(

333223113

22332

333223113

11331

DpDOpDpDOpDpDOp

DpDOpODp

DpDOpDpDOpDpDOp

DpDOpODp

j

i

O

D ={Door i conceals a car}

={Host opens Door j after a contestant choose Door1}

0)|(

1)|(2

1)|(

3

1)()()(

33

23

13

321

DOp

DOp

DOp

DpDpDp

(when you stay)

(when you switch)

15.3.1 Bayesian Estimation

Zhenrui & friends

Premises of doing a bayesian estimation:

1. Prior knowledge about the unknown parameter θ

2. The possibility distribution of θ : π (θ) (prior distribution)

General equation:

),,,(

)()|,,,(

)()|,,,(

)()|,,,()(

21

21

21

21

n

n

n

n

xxxf

xxxf

dxxxf

xxxf

Marginal p.d.f. of X1,X2,…Xn, Just a normalizing constant to make

1)( d

π*(θ):posterior distribution

π (θ):prior distribution

θ: unknown parameter from a distribution with pdf/pmf f (x | θ). Considered as r.v. in Bayesian estimation

f(x1,x2,…xn| θ) likelihood function of θ based on observed values x1,x2,

…,xn .

The µ* and σ*2 of π*(θ) are called posterior mean and variance, repectively. µ* can be used as a point estimate of θ (Bayes estimate)

)()|()( Xf

This

must be

a

mistake!

Trust me. I know this θ!

51

Bayesian Estimation continued

Conjugate Priors:

a family of prior distributions that the posterior distribution is of the same form of the prior distribution

Examples of Conjugate Priors( from text book): Example 15.25,15.26

• Normal distribution is a conjugate prior on µ of N(µ, σ2 ) )(if σ2 is already known)

• Beta distribution is a conjugate prior on p of a Binominal distribution Bin(n,p)

A question: If I only know the possible value range of θ, but can’t summarized it in the form of a possibility distribution. Can I still do the Bayesian estimation?

No! To apply the Bayes’ theorem, every term in the equation has to be a probability

term. π (θ) : √ vs θ : x

)()|()( Xf

52

Criticisms of Bayesian approach: 1. Perceptions of prior knowledge differ from person to person. ‘subjective’. 2. Too fuzzy to quantify the prior knowledge of θ in the form of a distribution

15.3.2 Bayesian Testing

11

00

:

:

H

Hsimple vs simple hypothesis test :

)|,,,(

)|,,,(

0210

1211*0

*1

n

n

xxxf

xxxf

a

b

ba

a

)( 0

**0

*01

**1 1)(

ba

b )|,,,( 1211 nxxxfb

)|,,,( 0210 nxxxfa

A Bayesian test rejects if k*0

*1

0H

k >0 is a suitably chosen critical constant. A large value of k corresponds toa small value of α

53

),( 00

011 1)(

Prior probability of H0 and H1

Bayesian Testing continuedBayesian test vs Neyman-Pearson likelihood ratio test (15.18)

Neyman-Pearson Lemma: kxxxf

xxxf

xxxL

xxxL

n

n

n

n

)|,,,(

)|,,,(

),,,|(

),,,|(

021

121

210

211

kkxxxf

xxxf

a

b

n

n )(*)|,,,(

)|,,,(

1

0

0210

1211*0

*1

Bayesian test:

Bayesian test can be considered as a specialized Neyman-Peanson likelihood testwhere the probabilities of each hypothesis (H0 & H1 )being true is known:π0 & π1

If , 2/110 kxxxf

xxxf

n

n

)|,,,(

)|,,,(

021

121*0

*1

The Bayesian test becomes the Neyman-Pearson likelihood ratio test

54

Bayesian Inference for one parameter

A biased coin • Bernoulli random variable

• Prob(Head)= ϴ

• ϴ is unknown

Bingqi Cheng

Bayesian StatisticsThree Ingredients:

• Prior distribution

• Likelihood function

• Bayes Theorem

Initial guess or prior knowledge on parameter ϴ, highly subjective

Fits or describes the distribution of real data ( e.g. a sequence of heads or tails when toss the coin)

Update the prior distribution with real data

Prob(ϴ | data) ϴ

Posterior distribution

Prior DistributionBeta distributions are to Bernoullidistributions

conjugate prior

If prior is Beta and likelihood function is Bernoulli, then posterior is Beta

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

x

Den

sity

Prior Distribution

Likelihood function

Posterior Distribution

Prior distributionLikelihood function

For this biased coin:

Calculation Steps


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

1

2

3

4

5

6

7

x

Den

sity


Predictive Probability

Bayesian VS M.L.E with calculus method

Back to the example of biased coin, still we have 20 trials and get 13 heads.

13 720( ) (1 )

13f p p p

12 620'( ) (1 ) (13 20 ) 0

13

0.65

f p p p p

p

Xiao Yu

64

65

Jeffreys Prior ˆ ˆ( ) det ( )p I

Why bother to use Bayesian? With large amount of data, the computation of Bayesian is more easy to handle.

• M.L.E with calculus method: Find the parameter quick and directly, if

possible. -> A huge step• Bayesian Initial guess + approximation + converge -> another startline + small step + maybe not

best value

66

• This is a Gaussian Mixture, observations are vectors, C is the covariance matrix. Find the maximum likelihood estimate for a mixture by direct application of Calculus is tough.

2 2 2 21 1 2 2

1

( ) /2 ( ) /21 2

1 1 2

log ( , | ,..., )

1 1log( )

2 2

n

nx C x C

j

L p C x x

p e p eC C

67

Bayesian Learning

• The more evidence we have, the more we learn.

The more flips we do, the more we know about the probability to get a head, which is the parameter of binomial distribution.

An application: EM(Expectation Maximization) algorithm which can beautifully handle with some regression

problems.

68

Two coins Game: Suppose now that there are two

coins which can be flipped. The probability of heads is p1 for the first coin, and p2 for the second coin. We decide on each flip which of the two coins will be flipped, and our objective is to maximize the number of heads that occur.(p1 and p2 are unknown)

69

Matlab code for the strategy• function [] = twocoin2(p1,p2,n) • H1=0;T1=0;• H2=0;T2=0;• syms ps1;• syms ps2;• for k=1:n,• temp = int(ps2^H2*(1-ps2)^T2,0,ps1);• p(k) =

double(int(ps1^H1*(1-ps1)^T1*temp,0,1)/(beta(H1+1,T1+1)*betH2+1,T2+1))); • if rand < p(k),• guess(k) = 1;• y(k) = rand < p1;• H1 = H1 + y(k);• T1 = T1 + (1 - y(k));• else• guess(k) = 2;• y(k) = rand < p2;• H2 = H2 + y(k);• T2 = T2 + (1 - y(k));• end• end• disp('Guesses: ')• tabulate(guess)• disp('Outcomes: ')• tabulate(y)• figure(2)• plot(p)• end

P1=0.4, p2=0.6Value of L(p1>p2|H1,T1,H2,T2)

70

Statistical Decision Theory

ABRAHAM WALD(1902-1950)• Hungarian mathematician • Major contributions - geometry, econometrics,

statistical sequential analysis, and decision theory

• Died in an airplane accident in 1950

Hans Schneeweiss “Abraham Wald” Department of Statistics, University of Munich Akademiestr. 1, 80799 MÄunchen, Germany

71

Kicheon Park

Why decision theory is needed?

Limits of classical statisticsI. Prior information

and LossII. Initial and final

PrecisionIII. Formulational

Inadequacy

72

Limit of Classical Statistics

• Prior information and Loss- relevant effects from past experience & losses from each possible decision

• Initial and final Precision- Before and After observation of sample information which is result of long series of identical experiments

• Formulational Inadequacy- Limit to make meaningful decision to be reached in the majority problem

73

Classical statistics vs. Decision Theory

• Classical statistics- Direct use of sample information

• Decision theory- combine the sample information with other relevant aspects of the problem for the best decision

→ The goal of decision theory is to make decision based on not only the presence of statistical knowledge but also the uncertainties (θ) that are involved in the decision problem

74

Two types of relevant information

I. Knowledge of the possible consequences of the decisions → loss of result by each possible decisions

II. Prior information →effects from past experience about similar situation

75

Statistical Decision Theory - Elements

Sample Space ΧUnknown parameter θ

χ

Decision Space

,

“Abraham Wald”, Wolfowitz, Annals of Mathematical Statistics“Statistics & Data Analysis”, Tamhane & Dunlop, Prentice Hall“Statistical Decision Theory”, Berger, Springer-Verlag

76Mun Sang Yue

Statistical Decision Theory - Eqns

77

Statistical Decision Theory – Decision Rules

min {max }

78

Statistical Decision Theory - Example

• A retailer must decide whether to purchase a large lot of items containing an unknown fraction p of defectives. Before making the decision of whether to purchase the lot (decision d1) or not to purchase the lot (decision d2), 2 items are randomly selected from the lot for inspection. The retailer wants to evaluate two decisions rules formulated. Prior π(p) = 2(1-p)

79

Example - ContinueNo. of Defectives

xDecision Rule δ1

Decision

Decision Rule δ2

Decision0 d1 d1

1 d2 d1

2 d2 d2

• Loss FunctionsL(d1,p) = 8p-1, and L(d2,p)=2

• Risk Functions– R(δ1,p) = L(d1,p) P(δ1 chooses d1 | p) + L(d2,p) P(δ1 chooses d2 | p)

= (8p-1) P(X=0 | p) + 2 P(X=1 or 2 | p)

– R(δ2,p) = L(d1,p) P(δ2 chooses d1 | p) + L(d2,p) P(δ2 chooses d2 | p)

= (8p-1) P(X=0 or 1 | p) + 2P(X=2 | p) 80

Example - ContinueR(δ2,p)

R(δ1,p)

p

R

max R(δ1,p) = 2.289 max R(δ2,p) = 3.329

81

Statistical Decision Theory - Example

• A shipment of transistors was received by a radio company. A sampling plan was used to check the shipment as a whole to ensure contractual requirement of 0.05 defect rate was not exceeded. A random sample of n transistors was chosen from the shipment and tested. Based upon X, the number of defective transistors in the sample, the shipment will be accepted or rejected.

82

Example (continue)

• Proportion of defective transistors in the shipment is θ.• Decision Rule:

a1 accept lot if X/n ≤ 0.05

a2 reject lot if X/n ≥ 0.05

• Loss Function: L(a1,θ) = 10*θ ; L(a2,θ) = 1

• π(θ) can be estimated based on prior experience• R(δ,θ) can then be calculated

83

Summary

• Maximum Likelihood Estimation selects an estimate of the unknown parameter that maximizes the likelihood function.

• The Likelihood Ratio Test compares the likelihood of the observed outcomes under the null hypothesis to the likelihood under the alternate hypothesis.

• Bayesian methods treat unknown models or variables as random variables with known distributions instead of deterministic quantities that happened to be unknown

84

Summary(Continue)

• Statistical Decision Theory moves statistics from its traditional role of just drawing inferences from incomplete information. The theory focuses on the problem of statistical actions rather than inference.

“Here in the 21st Century … a combination of Bayesian and frequentist ideas will be needed to deal with our increasingly intense scientific environment.”

Bradley Efron, 164th ASA Presidential Address

85

Questions?

THANK YOU!

86

Likelihood, Bayesian and Decision Theory

Documents

Transcript of Likelihood, Bayesian and Decision Theory