Maximum Likelihood & Method of Moments...

Maximum Likelihood & Method of Moments Estimation

Patrick Zheng 01/30/14

Introduction Goal: Find a good POINT estimation of population parameter

Data: We begin with a random sample of size n taken from the totality of a population.

We shall estimate the parameter based on the sample

Distribution: Initial step is to identify the probability distribution of the sample, which is characterized by the parameter.

The distribution is always easy to identify The parameter is unknown.

Notations Sample: 𝑋 , 𝑋 ,…, 𝑋 Distribution: 𝑋 iid f(x, 𝜃) Parameter: 𝜃 Example

e.g., the distribution is normal (f=Normal) with unknown parameter 𝜇 and 𝜎 (𝜃=(𝜇, 𝜎 )). e.g., the distribution is binomial (f=binomial) with unknown parameter p (𝜃= p).

It’s important to have a good estimate!

The importance of point estimates lies in the fact that many statistical formulas are based on them, such as confidence interval and formulas for hypothesis testing, etc.. A good estimate should 1. Be unbiased 2. Have small variance 3. Be efficient 4. Be consistent 4

Unbiasedness An estimator is unbiased if its mean equals the parameter. It does not systematically overestimate or underestimate the target parameter. Sample mean( )/proportion( ) is an unbiased estimator of population mean/proportion.

Small variance We also prefer the sampling distribution of the estimator has a small spread or variability, i.e. small standard deviation.

Efficiency An estimator 𝜃 is said to be efficient if its Mean Square Error (MSE) is minimum among all competitors.

Relative Efficiency(𝜃 , 𝜃 ) = ( )( )

If >1, 𝜃 is more efficient than 𝜃 . If <1, 𝜃 is more efficient than 𝜃 .

2 2ˆ ˆ ˆ ˆMSE( ) E( ) Bias ( ) var( ),ˆ ˆwhere Bias( ) E( ) .

T T T T T

� �

Example: efficiency Suppose If then

If then Since ,

𝜇 is more efficient than 𝜇 .

21 2 nX ,X ,...X iid~ N( , ).P V

2 21 1 1ˆ ˆ ˆMSE( ) Bias ( ) var( ) 0 .P P P V � �

2 22 2 2ˆ ˆ ˆMSE( ) Bias ( ) var( ) 0 / n.P P P V � �

1 1ˆ X ,P

1 2 n2

X X ... Xˆ X ,n

P� � �

1 2 21

ˆMSE( ) / n 1ˆ ˆR.E.( , ) = = 1ˆMSE( ) nP V

P PP V

Consistency An estimator 𝜃 is said to be consistent if sample size 𝑛 goes to +∞, 𝜃 will converge in probability to 𝜃.

Chebychev’s rule

If one can prove MSE of 𝜃 tends to 0 when 𝑛 goes to +∞, then 𝜃 is consistent.

ˆ0, Pr(| | ) 0 as nH T T H� ! � ! o o�f

ˆ ˆE( ) MSE( )ˆ0, Pr(| | ) T T TH T T H

H H�

� ! � t d

Example: Consistency Suppose Estimator is consistent, since

21 2 nX ,X ,...X iid~ N( , ).P V

1 2 nX X ... Xˆ Xn

P� � �

ˆ ˆE( ) MSE( )ˆ0, Pr(| | )

/ n 0 as n

P P PH P P H

�� ! � t d

o o�f

Point Estimation Methods There are many methods available for estimating the parameter(s) of interest. Three of the most popular methods of estimation are:

The method of moments (MM) The method of maximum likelihood (ML) Bayesian method

1, The Method of Moments

The Method of Moments One of the oldest methods; very simple procedure What is Moment? Based on the assumption that sample moments should provide GOOD ESTIMATES of the corresponding population moments.

How it works?

n' ' 21 2 ii 1

m X; m (1/ n) X

Example: normal distribution

' ' 2 2 21 2

n' ' 21 2 ii 1

' ' ' '1 1 2 2

n2 2 2ii 1

step1, E(X) ; E(X ) .

step 2, m X; m (1/ n) X .

step 3, Set m , m , therefore,

(1/ n) X

ˆ ˆSolving the two equations, we get X, (1/ n) X X

P W P W V

21 2 nX ,X ,...X iid~ N( , ).W V

Example: Bernoulli Distribution

X follows a Bernoulli distribution, if p if x 1

P(X x)1 p if x 0

® � ¯

Example: Poisson distribution

Note MME may not be unique. In general, minimum number of moment conditions we need equals the number of parameters.

Question: Can these two estimators be combined in some optimal way?

Answer: Generalized method of moments.

Pros of Method of Moments Easy to compute and always work:

The method often provides estimators when other methods fail to do so or when estimators are hard to obtain (as in the case of gamma distribution).

MME is consistent.

Cons of Method of Moments They are usually not the “best estimators” available. By best, we mean most efficient, i.e., achieving minimum MSE. Sometimes it may be meaningless.

(see next page for example)

Sometimes, MME is meaningless Suppose we observe 3,5,6,18 from a U(0,𝜃) Since E(X)= 𝜃 /2,

MME of 𝜃 is 2 =2*3+5+6+184 =16, which is

not acceptable, because we have already observed a value of 18.

2, The Method of Maximum Likelihood

The Method of Maximum Likelihood

Proposed by geneticist/statistician: Sir Ronald A. Fisher in 1922

Idea: We attempt to find the values of the parameters which would have most likely produced the data that we in fact observed.

What is likelihood?

¾ E.g., Likelihood of 𝜃=1 is the chance of observing when 𝜃=1.

1 2 nX ,X ,...,X

How to compute Likelihood?

Example of computing likelihood (discrete case)

Example of computing likelihood (continuous case)

Definition of MLE

¾ In general, the method of ML results in the problem of maximizing a function of single or several parameters. One way to do the maximization is to take derivative.

Procedure to find MLE

Example: Poisson Distribution

Example cont’d

Example: Uniform Distribution

Example cont’d

More than one parameter

Pros of Method of ML When sample size n is large (n>30), MLE is unbiased, consistent, normally distributed, and efficient (“regularity conditions”) “Efficient” means it produces the minimum MSE than other methods including Method of Moments

More useful in statistical inference.

Cons of Method of ML MLE can be highly biased for small samples. Sometimes, MLE has no closed-form solution. MLE can be sensitive to starting values, which might not give a global optimum.

Common when 𝜃 is of high dimension

How to maximize Likelihood

1. Take derivative and solve analytically (as aforementioned)

2. Apply maximization techniques including Newton’s method, quasi-Newton method (Broyden 1970), direct search method (Nelder and Mead 1965), etc. These methods can be implemented by R function optimize(), optim()

Newton’s Method a method for finding successively better approximations to the roots (or zeroes) of a real-valued function.

Pick an 𝑥 close to the root of a continuous function 𝑓(𝑥) Take the derivative of 𝑓(𝑥) to get 𝑓′(𝑥) Plug into 𝑥 = 𝑥 − ( )

( ), 𝑓′(𝑥 )≠ 0

Repeat until converges where 𝑥 ≈ 𝑥

Example Solve 𝑒 − 1 = 0

Denote 𝑓(𝑥)= 𝑒 − 1; let starting point 𝑥 = 0.1 𝑓′(𝑥)=𝑒

𝑥 = 𝑥 − ( )( ) :

𝑥 = 𝑥 − = 0.1 −.. = 0.0048374

𝑥 = 𝑥 − =…

Repeat until |𝑥 − 𝑥 | < 0.00001, 𝑥 = 7.106 ∗ 10

Example: find MLE by Newton’s Method In Poisson Distribution, find 𝜆 is equivalent to

maximizing ln 𝐿(𝜆) finding the root of ( )

= ∑ − 𝑛

Implement Newton’s method here,

define 𝑓(𝜆) = ( ) = ∑ − 𝑛

𝑓′(𝜆) = ∑

𝜆 = 𝜆 − ( )( )

Given 𝑥 , 𝑥 , … , 𝑥 and 𝜆 , we can find 𝜆.

Example cont’d Suppose we collected a sample from Poi(𝜆):

18,10,8,13,7,17,11,6,7,7,10,10,12,4,12,4,12,10,7,14,13,7

Implement Newton’s method in R:

𝜆 = 𝜆 − 𝑓(𝜆 )𝑓′(𝜆 )

Use R function optim()

𝑓(𝜆) = ∑𝑥𝜆 − 𝑛

The End! Thank you!

Maximum Likelihood & Method of Moments...

Documents

Transcript of Maximum Likelihood & Method of Moments...

Maximum Likelihood and Method of Moments Estimation

Generalized Method of Moments - UW Faculty Web …faculty.washington.edu/ezivot/econ583/gmm.pdf · 1.1 Introduction This chapter describes generalized method of moments ... ing Euler

Simulated Method of Moments Estimation for Copula-Based ...public.econ.duke.edu/~ap172/Oh_Patton_SMM_copulas_may12.pdf · Simulated Method of Moments Estimation for Copula-Based Multivariate

The Maximum Entropy Method Of Moments And Bayesian ...bayes.wustl.edu/glb/BretthorstHistograms.pdf · The Maximum Entropy Method Of Moments And Bayesian Probability Theory G. Larry

GENERALIZED METHOD OF MOMENTS ESTIMATION

1 GPU-Accelerated Method of Moments by Example: Monostatic …research.ee.sun.ac.za/cem/files/publications/lezar_davidson_APM_2… · GPU-Accelerated Method of Moments by Example:

Method of Moments Simulation of Modulated Metasurface ...

Efficient volumetric method of moments for modeling ... Volumetri… · Efficient volumetric method of moments for modeling plasmonic thin-film solar cells with periodic structures

A Survey on the method of moments - fernuni-hagen.de · 2019. 7. 17. · The method of moments is a way to prove convergence in distribution by show-ing that the corresponding moments

Generalized Method of Moments - UC3Mricmora/mei/materials/Session_12_GMM.pdfMotivation Method of Moments Generalized Method of Moments estingT Overidentifying Restrictions Summary

The Generalised Method of Moments

Method of Moments (MoM): Application for Solving Augmented ...

Ranjith Nair, Xiao-Ming Lu, Shan Zheng Ang, Mankei Tsang · 1/31 Seize the Moments for Subdiﬀraction Incoherent Imaging Ranjith Nair, Xiao-Ming Lu, Shan Zheng Ang, Mankei Tsang

Extracting angular observables with Method of Moments · Extracting angular observables with Method of Moments MarcinChrząszcz mchrzasz@cern.ch IncollaborationwithF.Beaujean,D.vanDyk,N.Serra

A quadrature based method of moments for nonlinear … · A quadrature based method of moments for nonlinear ... A quadrature based method of moments for nonlinear Fokker ... Linear

A Method of Moments for Mixture Models and Hidden …djhsu/papers/moments-slides.pdfA Method of Moments for Mixture Models and Hidden Markov Models ... mixing weights and conditional

Generalised method of moments estimation of structural - Stata

Method of moments analysis of displaced-axis dual ...

Generalized Method of Moments with R

Generalized Method Of Moments (GMM)