Maximum Likelihood Estimation

12
Daphne Koller Parameter Estimation Maximum Likelihood Estimation Probabilistic Graphical Models Learning

description

Learning. Probabilistic Graphical Models. Parameter Estimation. Maximum Likelihood Estimation. Biased Coin Example. P is a Bernoulli distribution: P(X=1) = , P(X=0) = 1- . Tosses are independent of each other Tosses are sampled from the same distribution (identically distributed). - PowerPoint PPT Presentation

Transcript of Maximum Likelihood Estimation

Page 1: Maximum Likelihood Estimation

Daphne Koller

Parameter Estimation

MaximumLikelihoodEstimation

ProbabilisticGraphicalModels

Learning

Page 2: Maximum Likelihood Estimation

Daphne Koller

Biased Coin Example

• Tosses are independent of each other• Tosses are sampled from the same

distribution (identically distributed)

P is a Bernoulli distribution: P(X=1) = , P(X=0) = 1-

sampled IID from P

Page 3: Maximum Likelihood Estimation

Daphne Koller

IID as a PGM

XData m X[1] X[M]

. . .

0

1

][1

][)|][(

xmx

xmxmxP

Page 4: Maximum Likelihood Estimation

Daphne Koller

Maximum Likelihood Estimation

• Goal: find [0,1] that predicts D well• Prediction quality = likelihood of D given

M

mmxPDPDL

1)|][()|():(

HHTTHL ,,,,:

0 0.2 0.4 0.6 0.8 1

L(D:

)

Page 5: Maximum Likelihood Estimation

Daphne Koller

Maximum Likelihood Estimator

• Observations: MH heads and MT tails

• Find maximizing likelihood

• Equivalent to maximizing log-likelihood

• Differentiating the log-likelihood and solving for :

TH MMTH MML )1(),:(

)1log(log),:( THTH MMMMl

TH

H

MM

M

Page 6: Maximum Likelihood Estimation

Daphne Koller

Sufficient Statistics

• For computing in the coin toss example, we only needed MH and MT since

• MH and MT are sufficient statistics

TH MMDL )1():(

Page 7: Maximum Likelihood Estimation

Daphne Koller

Sufficient Statistics• A function s(D) is a sufficient statistic from

instances to a vector in k if for any two datasets D and D’ and any we have

)':():(])[(])[('][][

DLDLixsixsDixDix

Datasets

Statistics

Page 8: Maximum Likelihood Estimation

Daphne Koller

Sufficient Statistic for Multinomial

k

i

Mi

iDL1

):(

• For a dataset D over variable X with k values, the sufficient statistics are counts <M1,...,Mk> where Mi is the # of times that X[m]=xi in D

• Sufficient statistic s(x) is a tuple of dimension k– s(xi)=(0,...0,1,0,...,0)

i

Page 9: Maximum Likelihood Estimation

Daphne Koller

Sufficient Statistic for Gaussian

• Gaussian distribution:

• Rewrite as

• Sufficient statistics for Gaussian: s(x)=<1,x,x2>

2

2

12

2

1)(),(~)(

x

eXpNXP if

2

2

222

2

1exp

2

1)(

xxXp

Page 10: Maximum Likelihood Estimation

Daphne Koller

Maximum Likelihood Estimation

• MLE Principle: Choose to maximize L(D:)

• Multinomial MLE:

• Gaussian MLE: m

mxM

][1

m

i i

ii

M

M

1

m

mxM

2)ˆ][(1

ˆ

Page 11: Maximum Likelihood Estimation

Daphne Koller

Summary

• Maximum likelihood estimation is a simple principle for parameter selection given D

• Likelihood function uniquely determined by sufficient statistics that summarize D

• MLE has closed form solution for many parametric distributions

Page 12: Maximum Likelihood Estimation

Daphne Koller

END END END