Mathematical Statistics - Nanyang Technological University

78
Mathematical Statistics MAS 713 Chapter 6

Transcript of Mathematical Statistics - Nanyang Technological University

Page 1: Mathematical Statistics - Nanyang Technological University

Mathematical Statistics

MAS 713

Chapter 6

Page 2: Mathematical Statistics - Nanyang Technological University

Previous lecture

Point estimatorsI Estimation and sampling distributionI Point estimationI Properties of estimators

Sufficient StatisticsI Factorization Theorem

Any questions?

Mathematical Statistics (MAS713) Ariel Neufeld 2 / 53

Page 3: Mathematical Statistics - Nanyang Technological University

This lecture

1 6.1 Maximum Likelihood Estimation6.1.1 Introduction6.1.3 Maximum Likelihood Principle

2 6.2 Cramér-Rao Lower Bound6.2.1 Introduction6.2.2 Examples

3 6.3 Method of Moments

4 6.4 Examples: MLE and Methods of Moments

Additional reading : Chapter 7

Mathematical Statistics (MAS713) Ariel Neufeld 3 / 53

Page 4: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.1 Introduction

Intuition of MLE

A patient visits a physican and complains about the followingsympthoms:“I have a headache, I’m feeling weak and have no appetite."The Doctor’s diagnostics options:

1 You have a brain tumor.2 You broke your foot.3 You have a cold.

The Doctor’s job is to determine the most likely illness.We’ll revisit this example later.

Mathematical Statistics (MAS713) Ariel Neufeld 4 / 53

Page 5: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.1 Introduction

Intuition of MLE

A patient visits a physican and complains about the followingsympthoms:“I have a headache, I’m feeling weak and have no appetite."The Doctor’s diagnostics options:

1 You have a brain tumor.2 You broke your foot.3 You have a cold.

The Doctor’s job is to determine the most likely illness.We’ll revisit this example later.

Mathematical Statistics (MAS713) Ariel Neufeld 4 / 53

Page 6: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.1 Introduction

Intuition of MLE

A patient visits a physican and complains about the followingsympthoms:“I have a headache, I’m feeling weak and have no appetite."The Doctor’s diagnostics options:

1 You have a brain tumor.2 You broke your foot.3 You have a cold.

The Doctor’s job is to determine the most likely illness.We’ll revisit this example later.

Mathematical Statistics (MAS713) Ariel Neufeld 4 / 53

Page 7: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.1 Introduction

Intuition of MLE

A patient visits a physican and complains about the followingsympthoms:“I have a headache, I’m feeling weak and have no appetite."The Doctor’s diagnostics options:

1 You have a brain tumor.2 You broke your foot.3 You have a cold.

The Doctor’s job is to determine the most likely illness.We’ll revisit this example later.

Mathematical Statistics (MAS713) Ariel Neufeld 4 / 53

Page 8: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.1 Introduction

Intuition of MLE

A patient visits a physican and complains about the followingsympthoms:“I have a headache, I’m feeling weak and have no appetite."The Doctor’s diagnostics options:

1 You have a brain tumor.2 You broke your foot.3 You have a cold.

The Doctor’s job is to determine the most likely illness.We’ll revisit this example later.

Mathematical Statistics (MAS713) Ariel Neufeld 4 / 53

Page 9: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.1 Introduction

Intuition of MLE

A patient visits a physican and complains about the followingsympthoms:“I have a headache, I’m feeling weak and have no appetite."The Doctor’s diagnostics options:

1 You have a brain tumor.2 You broke your foot.3 You have a cold.

The Doctor’s job is to determine the most likely illness.We’ll revisit this example later.

Mathematical Statistics (MAS713) Ariel Neufeld 4 / 53

Page 10: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.1 Introduction

Intuition of MLE

A patient visits a physican and complains about the followingsympthoms:“I have a headache, I’m feeling weak and have no appetite."The Doctor’s diagnostics options:

1 You have a brain tumor.2 You broke your foot.3 You have a cold.

The Doctor’s job is to determine the most likely illness.We’ll revisit this example later.

Mathematical Statistics (MAS713) Ariel Neufeld 4 / 53

Page 11: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.1 Introduction

Intuition of MLE

A patient visits a physican and complains about the followingsympthoms:“I have a headache, I’m feeling weak and have no appetite."The Doctor’s diagnostics options:

1 You have a brain tumor.2 You broke your foot.3 You have a cold.

The Doctor’s job is to determine the most likely illness.We’ll revisit this example later.

Mathematical Statistics (MAS713) Ariel Neufeld 4 / 53

Page 12: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.1 Introduction

We have seen that there are plenty of choices for an estimator θ ofan unknown parameter θ

=⇒ How to choose θ?

One possible approach:

Given observations x1, x2, . . . , xn, choose unknown parameterθ = θ(x1, . . . , xn) in such a way that it maximizes the probability of theoccurrence of our observed values x1, x2, . . . , xn.

=⇒ choose θ such that

P(X1 = x1, . . . ,Xn = xn | θ) = maxθ

P(X1 = x1, . . . ,Xn = xn | θ)

It is the intuition behind the Maximum Likelihood estimator (MLE).

Mathematical Statistics (MAS713) Ariel Neufeld 5 / 53

Page 13: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

The Maximum Likelihood PrincipleThe main ingredients:

1 X : a random variable.2 θ: parameter to estimate (restricted to a parameter space Sθ).3 p (X ; θ) (or p (X |θ)): a statistical model (pmf or pdf)4 X1, . . . ,Xn: a random sample from X .

We want to construct good estimators for θ

Notation: Given observation x1, . . . , xn, we write

p(x|θ) =

{joint probability mass function if X is discretejoint probability density function if X is continuous

Mathematical Statistics (MAS713) Ariel Neufeld 6 / 53

Page 14: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

The Maximum Likelihood PrincipleDefinitionLet X = (X1, . . . ,Xn) have joint pdf/pmf p (x; θ) where θ ∈ Sθ. Thelikelihood function (or simply likelihood) is defined by

Sθ 3 θ 7→ L (θ) := L (θ; x) = p (x; θ)

Note: x is fixed and θ varies in Sθ.

The likelihood is a function of θ.The likelihood is not a pdf/pmf (as function of θ, for fixed x).If the data is i.i.d then

L (θ; x) =n∏

i=1

p (xi ; θ)

Mathematical Statistics (MAS713) Ariel Neufeld 7 / 53

Page 15: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

The Maximum Likelihood PrincipleChoose θ = θ (x) which maximizes the likelihood function, i.e.

L(θ; x

)= max

θ∈Sθ

L(θ; x

)

by definition of the arg max, this means

θ (x) ∈ arg maxθ∈Sθ

L(θ; x

)

Definition of Maximum Likelihood Estimator (MLE)Let X = (X1, . . . ,Xn) be a random sample. If

θ (X) ∈ arg maxθ∈Sθ

L(θ; X

)

Then we call θ (X) a Maximum Likelihood Estimator (MLE) for θ.Note: MLE may not be unique or may not exist.

Remark: arg maxθ f (θ) is the set of points, θ, for which f (θ) attains thefunction’s largest value.

Mathematical Statistics (MAS713) Ariel Neufeld 8 / 53

Page 16: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

Intuition of MLEThe data:x =“I have a headache, I’m feeling weak and have no appetite."The (discrete) parameter space θ:

1 You have a brain tumor.2 You broke your foot.3 You have a cold.

The likelihood under each parameter:

P (“headache, weakness,no appetite"|θ = brain tumor) = 0.2P (“headache, weakness,no appetite"|θ = broken foot) = 0.05P (“headache, weakness,no appetite"|θ = cold) = 0.4

.The ML estimateThe likelihood of having a cold is the highest.θ = cold

Mathematical Statistics (MAS713) Ariel Neufeld 9 / 53

Page 17: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

Intuition of MLEThe data:x =“I have a headache, I’m feeling weak and have no appetite."The (discrete) parameter space θ:

1 You have a brain tumor.2 You broke your foot.3 You have a cold.

The likelihood under each parameter:

P (“headache, weakness,no appetite"|θ = brain tumor) = 0.2P (“headache, weakness,no appetite"|θ = broken foot) = 0.05P (“headache, weakness,no appetite"|θ = cold) = 0.4

.The ML estimateThe likelihood of having a cold is the highest.θ = cold

Mathematical Statistics (MAS713) Ariel Neufeld 9 / 53

Page 18: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

Intuition of MLEThe data:x =“I have a headache, I’m feeling weak and have no appetite."The (discrete) parameter space θ:

1 You have a brain tumor.2 You broke your foot.3 You have a cold.

The likelihood under each parameter:

P (“headache, weakness,no appetite"|θ = brain tumor) = 0.2P (“headache, weakness,no appetite"|θ = broken foot) = 0.05P (“headache, weakness,no appetite"|θ = cold) = 0.4

.The ML estimateThe likelihood of having a cold is the highest.θ = cold

Mathematical Statistics (MAS713) Ariel Neufeld 9 / 53

Page 19: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

Intuition of MLEThe data:x =“I have a headache, I’m feeling weak and have no appetite."The (discrete) parameter space θ:

1 You have a brain tumor.2 You broke your foot.3 You have a cold.

The likelihood under each parameter:

P (“headache, weakness,no appetite"|θ = brain tumor) = 0.2P (“headache, weakness,no appetite"|θ = broken foot) = 0.05P (“headache, weakness,no appetite"|θ = cold) = 0.4

.The ML estimateThe likelihood of having a cold is the highest.θ = cold

Mathematical Statistics (MAS713) Ariel Neufeld 9 / 53

Page 20: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

The Maximum Likelihood PrincipleWe may apply any monotone increasing function, and still achievemaximization. Very often it is more convenient to consider thelogarithm of the likelihood function (log-likelihood function)

log L (θ,x) = log p (x|θ)

Since the logarithm is a monotonic function, the maximization of thelikelihood and log-likelihood functions is equivalent, that is, θmaximizes the likelihood function if and only if it also maximizes thelog-likelihood function.

arg maxθ∈Sθ

L(θ; X

)= arg max

θ∈Sθ

log L(θ; X

)

or in other words

θ ∈ arg maxθ∈Sθ

L(θ; X

)⇐⇒ θ ∈ arg max

θ∈Sθ

log L(θ; X

)

Mathematical Statistics (MAS713) Ariel Neufeld 10 / 53

Page 21: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

Maximum Likelihood Estimation

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−3

−2.5

−2

−1.5

−1

−0.5

0

0.5

1

log p(x; θ)

p(x; θ)

θθ

Mathematical Statistics (MAS713) Ariel Neufeld 11 / 53

Page 22: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

ExampleSuppose that X is a discrete random variable with the followingprobability mass function:

X 0 1 2 3p(X) 2θ/3 θ/3 2 (1− θ) /3 (1− θ) /3

where 0 < θ < 1 is a parameter. The following 10 independentobservations were taken from such a distribution:x = (x1, . . . , x10) = (3,0,2,1,3,2,1,0,2,1).Find a point estimate of θ using the MLE.

Mathematical Statistics (MAS713) Ariel Neufeld 12 / 53

Page 23: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

Solution:

The likelihood function given the observationsx = (x1, . . . , x10) = (3,0,2,1,3,2,1,0,2,1) is given by

L(θ; x) =n∏

i=1

p(xi |θ)

= p(X = 3|θ)p(X = 0|θ)p(X = 2|θ)p(X = 1|θ)p(X = 3|θ)

× p(X = 2|θ)p(X = 1|θ)p(X = 0|θ)p(X = 2|θ)p(X = 1|θ)

=

(2θ3

)2(θ3

)3(2(1− θ)

3

)3(1− θ3

)2

.

=⇒ θ ∈ arg maxθ∈(0,1)

(2θ3

)2(θ3

)3(2(1− θ)

3

)3(1− θ3

)2

Clearly, the likelihood function is not easy to maximize.Let’s look at the log-likelihood

Mathematical Statistics (MAS713) Ariel Neufeld 13 / 53

Page 24: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

The log-likelihood function given the observationsx = (x1, . . . , x10) = (3,0,2,1,3,2,1,0,2,1) is

log L(θ; x) = logn∏

i=1

p(xi |θ)

= 2(

log23

+ log θ

)+ 3

(log

13

+ log θ

)+ 3

(log

23

+ log(1− θ)

)

+ 2(

log(13− log(1− θ)

)

= Constant + 5 log θ + 5 log(1− θ)

Setting the derivative to 0 and solving

d log L(θ)

dθ= 5

(1θ− 1

1− θ

)= 0

θ = θ(x) = 0.5

Mathematical Statistics (MAS713) Ariel Neufeld 14 / 53

Page 25: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

Example: Estimating mean and variance in a normal populationGiven a random sample X = (X1, . . . ,Xn) of size n where

Xii.i.d∼ N (µ, σ)

Derive the Maximum Likelihood estimator for the mean and variance ofa Normal random variable

Solution:

θ =(µ, σ2), Sθ = R× (0,∞)

We need to find (µ, σ2

)∈ arg max

(µ,σ2)

p(

x|µ, σ2)

Notation: We write φ(x |µ, σ) for the pdf of a N(µ, σ)-distributedrandom variable, i.e.

φ(x |µ, σ) := 1√2πσ2 e−

(x−µ)2

2σ2

Mathematical Statistics (MAS713) Ariel Neufeld 15 / 53

Page 26: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

Example: Estimating mean and variance in a normal populationGiven a random sample X = (X1, . . . ,Xn) of size n where

Xii.i.d∼ N (µ, σ)

Derive the Maximum Likelihood estimator for the mean and variance ofa Normal random variable

Solution:

θ =(µ, σ2), Sθ = R× (0,∞)

We need to find (µ, σ2

)∈ arg max

(µ,σ2)

p(

x|µ, σ2)

Notation: We write φ(x |µ, σ) for the pdf of a N(µ, σ)-distributedrandom variable, i.e.

φ(x |µ, σ) := 1√2πσ2 e−

(x−µ)2

2σ2

Mathematical Statistics (MAS713) Ariel Neufeld 15 / 53

Page 27: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

θ :=(µ, σ2

)∈ argmax

µ,σ2p(

x|µ, σ2)

i.i.d= argmax

µ,σ2

n∏i=1

p(

xi |µ, σ2)

= argmaxµ,σ2

n∏i=1

φ (xi |µ, σ)

= argmaxµ,σ2

n∑i=1

log φ (xi |µ, σ)

= argmaxµ,σ2

n∑i=1

log

(1√

2πσ2exp

(− 1

2σ2 (xi − µ)2))

= argmaxµ,σ2

−n2

(log(2π) + log(σ2)

)−

n∑i=1

(xi−µ)2

2σ2

Mathematical Statistics (MAS713) Ariel Neufeld 16 / 53

Page 28: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

θ :=(µ, σ2

)∈ argmax

µ,σ2p(

x|µ, σ2)

i.i.d= argmax

µ,σ2

n∏i=1

p(

xi |µ, σ2)

= argmaxµ,σ2

n∏i=1

φ (xi |µ, σ)

= argmaxµ,σ2

n∑i=1

log φ (xi |µ, σ)

= argmaxµ,σ2

n∑i=1

log

(1√

2πσ2exp

(− 1

2σ2 (xi − µ)2))

= argmaxµ,σ2

−n2

(log(2π) + log(σ2)

)−

n∑i=1

(xi−µ)2

2σ2

Mathematical Statistics (MAS713) Ariel Neufeld 16 / 53

Page 29: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

θ :=(µ, σ2

)∈ argmax

µ,σ2p(

x|µ, σ2)

i.i.d= argmax

µ,σ2

n∏i=1

p(

xi |µ, σ2)

= argmaxµ,σ2

n∏i=1

φ (xi |µ, σ)

= argmaxµ,σ2

n∑i=1

log φ (xi |µ, σ)

= argmaxµ,σ2

n∑i=1

log

(1√

2πσ2exp

(− 1

2σ2 (xi − µ)2))

= argmaxµ,σ2

−n2

(log(2π) + log(σ2)

)−

n∑i=1

(xi−µ)2

2σ2

Mathematical Statistics (MAS713) Ariel Neufeld 16 / 53

Page 30: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

θ :=(µ, σ2

)∈ argmax

µ,σ2p(

x|µ, σ2)

i.i.d= argmax

µ,σ2

n∏i=1

p(

xi |µ, σ2)

= argmaxµ,σ2

n∏i=1

φ (xi |µ, σ)

= argmaxµ,σ2

n∑i=1

log φ (xi |µ, σ)

= argmaxµ,σ2

n∑i=1

log

(1√

2πσ2exp

(− 1

2σ2 (xi − µ)2))

= argmaxµ,σ2

−n2

(log(2π) + log(σ2)

)−

n∑i=1

(xi−µ)2

2σ2

Mathematical Statistics (MAS713) Ariel Neufeld 16 / 53

Page 31: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

θ :=(µ, σ2

)∈ argmax

µ,σ2p(

x|µ, σ2)

i.i.d= argmax

µ,σ2

n∏i=1

p(

xi |µ, σ2)

= argmaxµ,σ2

n∏i=1

φ (xi |µ, σ)

= argmaxµ,σ2

n∑i=1

log φ (xi |µ, σ)

= argmaxµ,σ2

n∑i=1

log

(1√

2πσ2exp

(− 1

2σ2 (xi − µ)2))

= argmaxµ,σ2

−n2

(log(2π) + log(σ2)

)−

n∑i=1

(xi−µ)2

2σ2

Mathematical Statistics (MAS713) Ariel Neufeld 16 / 53

Page 32: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

θ :=(µ, σ2

)∈ argmax

µ,σ2p(

x|µ, σ2)

i.i.d= argmax

µ,σ2

n∏i=1

p(

xi |µ, σ2)

= argmaxµ,σ2

n∏i=1

φ (xi |µ, σ)

= argmaxµ,σ2

n∑i=1

log φ (xi |µ, σ)

= argmaxµ,σ2

n∑i=1

log

(1√

2πσ2exp

(− 1

2σ2 (xi − µ)2))

= argmaxµ,σ2

−n2

(log(2π) + log(σ2)

)−

n∑i=1

(xi−µ)2

2σ2

Mathematical Statistics (MAS713) Ariel Neufeld 16 / 53

Page 33: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

To find the maximizer, we calculate

∂µ

(− n 1

2

(log(2π) + log(σ2)

)−

n∑

i=1

(xi−µ)2

2σ2

)=

n∑

i=1

xi−µσ2 .

Similarly, setting v := σ2 and taking the derivatives yields

∂σ2

(−n2

(log(2π) + log(σ2)

)−

n∑

i=1

(xi−µ)2

2σ2

)

=∂

∂v

(−n2

(log(2π) + log(v)

)−

n∑

i=1

(xi−µ)2

2v

)

= −n2

1v + 1

2v2

n∑

i=1

(xi − µ)2

= −n2

1σ2 + 1

2σ4

n∑

i=1

(xi − µ)2

Mathematical Statistics (MAS713) Ariel Neufeld 17 / 53

Page 34: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

Setting both derivatives equal to 0 implies

n∑

i=1

xi−µσ2 = 0 =⇒ µ = 1

n

n∑

i=1

xi = xn

−n2

1v + 1

2v2

n∑i=1

(xi − µ)2 = 0 =⇒ v = σ2 = 1n

n∑i=1

(xi−µ)2 = 1n

n∑i=1

(xi−xn)2

Therefore, we obtained the estimators

µ = 1n

n∑

i=1

Xi = Xn

σ2 = 1n

n∑i=1

(Xi − Xn)2

Note: Don’t forget, estimators are random variables!

Mathematical Statistics (MAS713) Ariel Neufeld 18 / 53

Page 35: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

Note:

E[µ] = E[

1n

n∑i=1

Xi

]= µ =⇒ µ unbiased

But, one can show that

E[σ2] = E[

1n

n∑i=1

(Xi − Xn)2]

= n−1n σ2 =⇒ σ2 biased

Observe:In this setting S2 := 1

n−1

n∑i=1

(Xi − Xn)2 is unbiased estimator for σ2.

Mathematical Statistics (MAS713) Ariel Neufeld 19 / 53

Page 36: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

Some issues to consider:

1 How do we guarantee that MLE exists?2 How do we guarantee that MLE is unique?3 How do we guarantee that calculation of MLE is tractable?4 Is the Likelihood function convex (related to uniqueness)?5 Boundary conditions?6 Numerical sensitivity: in many cases the likelihood function is flat...

These are not statistical questions, but mathematical ones, namelyfunctional analysis, convex analysis....

Mathematical Statistics (MAS713) Ariel Neufeld 20 / 53

Page 37: Mathematical Statistics - Nanyang Technological University

6.1 Maximum Likelihood Estimation 6.1.3 Maximum Likelihood Principle

Cramér-Rao Bound (CRLB)

Mathematical Statistics (MAS713) Ariel Neufeld 21 / 53

Page 38: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.1 Introduction

Cramér-Rao Lower Bound (CRLB)

The Cramér-Rao Lower Bound (CRLB) sets a lower bound on thevariance of any unbiased estimator. This can be extremely useful inseveral ways:

1 If we find an estimator that achieves the CRLB, then we know thatwe have found an Minimum Variance Unbiased estimator (MVUE)!

2 The CRLB can provide a benchmark against which we cancompare the performance of any unbiased estimator (We knowwe’re doing very well if our estimator is "close" to the CRLB)

3 The CRLB enables us to rule-out impossible estimators. That is,we know that it is physically impossible to find an unbiasedestimator that beats the CRLB. This is useful in feasibility studies.

4 The theory behind the CRLB can tell us if an estimator existswhich achieves the bound.

Mathematical Statistics (MAS713) Ariel Neufeld 22 / 53

Page 39: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.1 Introduction

Cramér-Rao Lower Bound (CRLB)

The Cramér-Rao Lower Bound (CRLB) sets a lower bound on thevariance of any unbiased estimator. This can be extremely useful inseveral ways:

1 If we find an estimator that achieves the CRLB, then we know thatwe have found an Minimum Variance Unbiased estimator (MVUE)!

2 The CRLB can provide a benchmark against which we cancompare the performance of any unbiased estimator (We knowwe’re doing very well if our estimator is "close" to the CRLB)

3 The CRLB enables us to rule-out impossible estimators. That is,we know that it is physically impossible to find an unbiasedestimator that beats the CRLB. This is useful in feasibility studies.

4 The theory behind the CRLB can tell us if an estimator existswhich achieves the bound.

Mathematical Statistics (MAS713) Ariel Neufeld 22 / 53

Page 40: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.1 Introduction

Cramér-Rao Lower Bound (CRLB)

The Cramér-Rao Lower Bound (CRLB) sets a lower bound on thevariance of any unbiased estimator. This can be extremely useful inseveral ways:

1 If we find an estimator that achieves the CRLB, then we know thatwe have found an Minimum Variance Unbiased estimator (MVUE)!

2 The CRLB can provide a benchmark against which we cancompare the performance of any unbiased estimator (We knowwe’re doing very well if our estimator is "close" to the CRLB)

3 The CRLB enables us to rule-out impossible estimators. That is,we know that it is physically impossible to find an unbiasedestimator that beats the CRLB. This is useful in feasibility studies.

4 The theory behind the CRLB can tell us if an estimator existswhich achieves the bound.

Mathematical Statistics (MAS713) Ariel Neufeld 22 / 53

Page 41: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.1 Introduction

Cramér-Rao Lower Bound (CRLB)

The Cramér-Rao Lower Bound (CRLB) sets a lower bound on thevariance of any unbiased estimator. This can be extremely useful inseveral ways:

1 If we find an estimator that achieves the CRLB, then we know thatwe have found an Minimum Variance Unbiased estimator (MVUE)!

2 The CRLB can provide a benchmark against which we cancompare the performance of any unbiased estimator (We knowwe’re doing very well if our estimator is "close" to the CRLB)

3 The CRLB enables us to rule-out impossible estimators. That is,we know that it is physically impossible to find an unbiasedestimator that beats the CRLB. This is useful in feasibility studies.

4 The theory behind the CRLB can tell us if an estimator existswhich achieves the bound.

Mathematical Statistics (MAS713) Ariel Neufeld 22 / 53

Page 42: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.1 Introduction

Cramér-Rao Lower Bound (CRLB)

Theorem: Cramér-Rao Lower Bound

If θ is any unbiased estimator of θ based on the random sample X,then the variance of the error in the estimator is bounded by theinverse of the Fisher Information I:

E[∥∥∥θ − θ

∥∥∥2]

= Var(θ) ≥ I−1,

where I is given by:

I = −E

[d2 log p (X|θ)

dθ2

].

Mathematical Statistics (MAS713) Ariel Neufeld 23 / 53

Page 43: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.1 Introduction

Cramér-Rao Bound (CRLB)

Definition: Efficient Estimator

An unbiased estimator θ is called efficient if

Var(θ) = I−1

Efficient estimator is an unbiased estimator withminimal possible variance.

Theorem: Sufficient condition for efficiency

If θ is an unbiased estimator of θ and

∂ log p (Y|θ)

∂θ= c (θ)

(θ − θ

)

then θ is an efficient estimator.

Mathematical Statistics (MAS713) Ariel Neufeld 24 / 53

Page 44: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.2 Examples

ExampleSuppose that X ∼ Bin(m, p), where m is known. The pmf is given by

p (x ; p) =

(mx

)px (1− p)m−x , x = 0,1, . . . ,m.

Find the CRLB.

Note: The range of X depends on m, but not on the unknownparameter p. Also, the sample size equals n = 1.

Mathematical Statistics (MAS713) Ariel Neufeld 25 / 53

Page 45: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.2 Examples

Solution:

The log-likelihood is given by

log p (x ; p) = log

(mx

)+ x log p + (m − x) log (1− p)

The first derivative is given by:

∂ log p (x ; p)

∂p=

xp− (m − x)

11− p

The second derivative is given by:

∂2 log p (x ; p)

∂p2 =−xp2 − (m − x)

1

(1− p)2

Mathematical Statistics (MAS713) Ariel Neufeld 26 / 53

Page 46: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.2 Examples

Therefore the Fisher Information I satisfies

I := −E

[−Xp2 − (m − X )

1

(1− p)2

]=

E [X ]

p2 + (m − E [X ])1

(1− p)2

=mpp2 + (m −mp)

1

(1− p)2

=m

p (1− p)

It follows that the CRLB is given by

Var(p)≥ I−1 =

p (1− p)

m

Mathematical Statistics (MAS713) Ariel Neufeld 27 / 53

Page 47: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.2 Examples

Cramér-Rao Bound (CRLB)

ExampleConsider n observations, such that

Yk = m + Wk , k = {1, . . . ,n}

where Wki.i.d∼ N

(0, σ2)

1 Find the MLE for m.2 Is m and efficient estimator?

Mathematical Statistics (MAS713) Ariel Neufeld 28 / 53

Page 48: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.2 Examples

Cramér-Rao Bound (CRLB)

ExampleConsider n observations, such that

Yk = m + Wk , k = {1, . . . ,n}

where Wki.i.d∼ N

(0, σ2)

1 Find the MLE for m.2 Is m and efficient estimator?

Mathematical Statistics (MAS713) Ariel Neufeld 28 / 53

Page 49: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.2 Examples

Cramér-Rao Bound (CRLB)Solution:

1) As Yki.i.d∼ N

(m, σ2), we know from Slide 18 that

m =

∑ni=1 Yi

n= Yn

2)m is unbiased, as E

[m]

= 1n∑n

i=1 E [Yi ] = mMoreover, from calculation on Slide 16–17

∂ log p(Y|m, σ2)

∂m=

n∑

i=1

(Yi −m)

σ2

=nσ2

(1n

n∑

i=1

Yi −m

)

= c(m −m

)

; efficient estimatorMathematical Statistics (MAS713) Ariel Neufeld 29 / 53

Page 50: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.2 Examples

Properties of MLE

Mathematical Statistics (MAS713) Ariel Neufeld 30 / 53

Page 51: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.2 Examples

The concept of MLE makes sense, but can we scientifically justify it?

Bad news: no optimum properties for finite samples.Good news: has a few attractive limiting properties.

Mathematical Statistics (MAS713) Ariel Neufeld 31 / 53

Page 52: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.2 Examples

The concept of MLE makes sense, but can we scientifically justify it?

Bad news: no optimum properties for finite samples.Good news: has a few attractive limiting properties.

Mathematical Statistics (MAS713) Ariel Neufeld 31 / 53

Page 53: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.2 Examples

The concept of MLE makes sense, but can we scientifically justify it?

Bad news: no optimum properties for finite samples.Good news: has a few attractive limiting properties.

Mathematical Statistics (MAS713) Ariel Neufeld 31 / 53

Page 54: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.2 Examples

Properties of MLE

What are the criteria for a “good" estimator?Unbiased.Consistency.normality.efficiency.

Mathematical Statistics (MAS713) Ariel Neufeld 32 / 53

Page 55: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.2 Examples

Properties of MLE

What are the criteria for a “good" estimator?Unbiased.Consistency.normality.efficiency.

Mathematical Statistics (MAS713) Ariel Neufeld 32 / 53

Page 56: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.2 Examples

Properties of MLE

What are the criteria for a “good" estimator?Unbiased.Consistency.normality.efficiency.

Mathematical Statistics (MAS713) Ariel Neufeld 32 / 53

Page 57: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.2 Examples

Properties of MLE

What are the criteria for a “good" estimator?Unbiased.Consistency.normality.efficiency.

Mathematical Statistics (MAS713) Ariel Neufeld 32 / 53

Page 58: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.2 Examples

The MLE satisfies the following 4 asymptotic properties:(under some additional regularity and integrability conditions)

Consistency: the sequence of MLEs converges in probability to thevalue being estimated.

limn→∞

P(∣∣∣θ(n) − θ

∣∣∣ > ε)

= 0 ∀ε > 0.

Asymptotically unbiased: The MLE satisfies

limn→∞

E(θ(n) − θ

)= 0

Mathematical Statistics (MAS713) Ariel Neufeld 33 / 53

Page 59: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.2 Examples

Asymptotic normality: A consistent estimator is called asymptoticallynormal if for some σ2

∞ > 0 we have that the limiting distribution of√

n(θ(n) − θ

)is equal N

(0, σ2

∞), i.e.

limn→∞

√n(θ(n) − θ

)d→ N (0, σ∞)

Asymptotic efficiency: Moreover, we call a consistent estimatorasymptotically efficient if σ2

∞ = I−1, meaning that

limn→∞

√n(θ(n) − θ

)d→ N

(0,√I−1

)

Mathematical Statistics (MAS713) Ariel Neufeld 34 / 53

Page 60: Mathematical Statistics - Nanyang Technological University

6.2 Cramér-Rao Lower Bound 6.2.2 Examples

Method of Moments Estimator

Mathematical Statistics (MAS713) Ariel Neufeld 35 / 53

Page 61: Mathematical Statistics - Nanyang Technological University

6.3 Method of Moments

Method of Moments EstimatorFacts :

Moments give good (but not always full!) information aboutdistribution.

If the distribution has bounded support thenmoments uniquely determine the law.

Idea:

=⇒ match sample moments with population moments

Theorem: Law of Large NumbersLet X1, . . . , Xn be i.i.d random variable with E[|X1|] <∞ and denotethe mean µ = E[X1]. Then

1n

n∑

i=1

Xi → µ

Mathematical Statistics (MAS713) Ariel Neufeld 36 / 53

Page 62: Mathematical Statistics - Nanyang Technological University

6.3 Method of Moments

Method of MomentsLet X1,X2, . . . ,Xn be a sample from a population with pdf or pmf p (x |θ1, θ2, . . . , θk ).Let the unknown parameter θ = (θ1, θ2, . . . , θk) be k -dimensional.

The method of moments estimation is found by:1) equating the first k sample moments to the corresponding k population moments,2) solving the resulting system of simultaneous equations.

The k -th theoretical/population moment of this random variable is defined as

µk = E[X k]=

∫xk p (x |θ1, θ2, . . . , θk ) dx if X continuous

µk = E[X k]=∑

x

xk p (x |θ1, θ2, . . . , θk ) if X discrete.

If X1,X2, . . . ,Xn are i.i.d. random variables from that distribution, the k -th samplemoment is defined as

mk =1n

n∑i=1

X ki ,

thus mk can be viewed as an estimator for µk . From the law of large number, we havemk → µk in probability as n→∞.

Mathematical Statistics (MAS713) Ariel Neufeld 37 / 53

Page 63: Mathematical Statistics - Nanyang Technological University

6.3 Method of Moments

Method of Moments:

E [X ] = 1n

n∑i=1

Xi

E[X 2] = 1

n

n∑i=1

X 2i

......

E[X k] = 1

n

n∑i=1

X ki

=⇒ Solve the set of k equations and find θ1, . . . , θk .

Mathematical Statistics (MAS713) Ariel Neufeld 38 / 53

Page 64: Mathematical Statistics - Nanyang Technological University

6.3 Method of Moments

ExampleSuppose that X is a discrete random variable with the followingprobability mass function:

X 0 1 2 3p(X) 2θ/3 θ/3 2 (1− θ) /3 (1− θ) /3

where θ is a parameter in (0,1). The following 10 independentobservations were taken from such a distribution:x = (x1, . . . , x10) = (3,0,2,1,3,2,1,0,2,1).

Find a point estimate of θ using the method of moments and MLE.

Mathematical Statistics (MAS713) Ariel Neufeld 39 / 53

Page 65: Mathematical Statistics - Nanyang Technological University

6.3 Method of Moments

Solution:

We have only a single parameter to estimate=⇒ we need to calculate only the first moment.

The theoretical mean value is

E [X ] =3∑

x=0

xp (x ; θ) = 02θ3

+ 1θ

3+ 2

2 (1− θ)

3+ 3

(1− θ)

3=

73− 2θ

The sample mean is

x =1n

n∑

i=0

xi =3 + 0 + 2 + 1 + 3 + 2 + 1 + 0 + 2 + 1

10= 1.5

We solve the single equation

73− 2θ = 1.5

and find that θ = 512 .

Mathematical Statistics (MAS713) Ariel Neufeld 40 / 53

Page 66: Mathematical Statistics - Nanyang Technological University

6.3 Method of Moments

The likelihood function of X given the observationsx = (x1, . . . , x10) = (3,0,2,1,3,2,1,0,2,1) is

L(θ; x) =n∏

i=1

p(xi |θ)

= p(X = 3|θ)p(X = 0|θ)p(X = 2|θ)p(X = 1θ)p(X = 3θ)

× p(X = 2θ)p(X = 1θ)p(X = 0θ)p(X = 2θ)p(X = 1θ)

=

(2θ3

)2(θ3

)3(2(1− θ)

3

)3(1− θ3

)2

.

θ = arg maxθ∈(0,1)

(2θ3

)2(θ3

)3(2(1− θ)

3

)3(1− θ3

)2

Clearly, the likelihood function is not easy to maximize.Let’s look at the log-likelihood

Mathematical Statistics (MAS713) Ariel Neufeld 41 / 53

Page 67: Mathematical Statistics - Nanyang Technological University

6.3 Method of Moments

The log-likelihood function of X given the observationsx = (x1, . . . , x10) = (3,0,2,1,3,2,1,0,2,1) is

log L(θ) = logn∏

i=1

p(xi |θ)

= 2(

log23

+ log θ

)+ 3

(log

13

+ log θ

)+ 3

(log

23

+ log(1− θ)

)

+ 2(

log(13− log(1− θ)

)

= Constant + 5 log θ + 5 log(1− θ)

Setting the derivative to 0 and solving

d log L(θ)

dθ= 5

(1θ− 1

1− θ

)= 0

θ = 0.5(the Method of Moments yields θ = 5/12, which is different from MLE.)

Mathematical Statistics (MAS713) Ariel Neufeld 42 / 53

Page 68: Mathematical Statistics - Nanyang Technological University

6.3 Method of Moments

Example

Use the Method of Moments to estimate the parameters µ and σ2 forthe normal density

p(

x |µ, σ2)

=1√2πσ

exp

(−(x − µ)2

2σ2

)

based on i.i.d. random sample X1, . . . ,Xn.

Mathematical Statistics (MAS713) Ariel Neufeld 43 / 53

Page 69: Mathematical Statistics - Nanyang Technological University

6.3 Method of Moments

Solution:

First and second theoretical moments for the normal distribution are

µ1 = E [X ] = µ

µ2 = E[X 2]

= σ2 + µ2.

The first and second sample moments are

m1 =1n

n∑

i=1

Xi

m2 =1n

n∑

i=1

X 2i .

Mathematical Statistics (MAS713) Ariel Neufeld 44 / 53

Page 70: Mathematical Statistics - Nanyang Technological University

6.3 Method of Moments

Solving the equations

µ =1n

n∑

i=1

Xi

σ2 + µ2 =1n

n∑

i=1

X 2i .

We have the Method of Moments estimator

µ =1n

n∑

i=1

Xi

σ2 =1n

n∑

i=1

X 2i −

(1n

n∑

i=1

Xi

)2

=1n

n∑

i=1

X 2i −

(Xn)2

=1n

n∑

i=1

(Xi − Xn

)2

In this case the MLE and MME yield the same estimators.

Mathematical Statistics (MAS713) Ariel Neufeld 45 / 53

Page 71: Mathematical Statistics - Nanyang Technological University

6.3 Method of Moments

ExampleLet X1, . . . ,Xn be i.i.d samples with from a uniform distribution on theinterval [a,b], that is

p (x |a,b) =

{1

b−a , a ≤ x ≤ b0 ,otherwise

Find the Method of Moments estimator for a,b.

Mathematical Statistics (MAS713) Ariel Neufeld 46 / 53

Page 72: Mathematical Statistics - Nanyang Technological University

6.3 Method of Moments

Solution:

The first two moments are:

µ1 = E [X ] =

b∫

a

x1

b − ad x =

a + b2

µ2 = E[X 2]

=

b∫

a

x2 1b − a

d x =a2 + ab + b2

3.

The corresponding sample moments are:

m1 =1n

n∑

i=1

Xi

m2 =1n

n∑

i=1

X 2i

Mathematical Statistics (MAS713) Ariel Neufeld 47 / 53

Page 73: Mathematical Statistics - Nanyang Technological University

6.3 Method of Moments

We solve the equations:

µ1 = m1

µ2 = m2.

and obtain:

a = m1 −√

3(m2 −m2

1

)

b = m1 +√

3(m2 −m2

1

)

Mathematical Statistics (MAS713) Ariel Neufeld 48 / 53

Page 74: Mathematical Statistics - Nanyang Technological University

6.4 Examples: MLE and Methods of Moments

ExampleLet X1, . . . ,Xn be i.i.d samples with from a beta distribution(X ∼ β (θ,1)) with pdf

p (x |θ) = θxθ−1, 0 ≤ x ≤ 1, 0 ≤ θ ≤ ∞

1 Find the MLE for θ.2 Find the Method of Moments estimator for θ.

Mathematical Statistics (MAS713) Ariel Neufeld 49 / 53

Page 75: Mathematical Statistics - Nanyang Technological University

6.4 Examples: MLE and Methods of Moments

The likelihood function is given by

p (x |θ) =n∏

i=1

θxθ−1i = θn

n∏

i=1

(xi)θ−1 = θn

(n∏

i=1

xi

)θ−1

Its derivative is given by

ddθ

log p (x |θ) =ddθ

log

θn

(n∏

i=1

xi

)θ−1

=ddθ

(n log θ + (θ − 1)

n∑

i=1

log (xi)

)

=nθ

+n∑

i=1

log(xi)

Mathematical Statistics (MAS713) Ariel Neufeld 50 / 53

Page 76: Mathematical Statistics - Nanyang Technological University

6.4 Examples: MLE and Methods of Moments

Set the derivative equal to zero, solve for θ , and replace xi by Xito obtain

θ = − nn∑

i=1log(Xi)

Is this the maximum?Let’s calculate the second derivative

ddθ2 log p (x |θ) =

ddθ

(nθ

+n∑

i=1

log(xi)

)

= − nθ2 ≤ 0,

so this is the MLE.

Mathematical Statistics (MAS713) Ariel Neufeld 51 / 53

Page 77: Mathematical Statistics - Nanyang Technological University

6.4 Examples: MLE and Methods of Moments

The Method of Moments for θ:

T The first moment of X ∼ β (θ,1)

E [X ] =θ

θ + 1

The first sample moment is

m1 =1n

n∑

i=1

Xi

We solve the equation

θ

θ + 1=

1n

n∑

i=1

Xi

which yields θ =

n∑i=1

Xi

n−n∑

i=1Xi

Mathematical Statistics (MAS713) Ariel Neufeld 52 / 53

Page 78: Mathematical Statistics - Nanyang Technological University

6.4 Examples: MLE and Methods of Moments

Objectives

Now you should be able to :Understand the likelihood principleUnderstand how to formulate the MLE procedureApply the CRLBUnderstand how to formulate the Method of Moments estimationprocedure

Put yourself to the test ! ; Q7.1 p.355, Q7.2 p.355, Q7.6 p.355, Q7.8p.355, Q7.10 p.355, Q7.15 p.355

Mathematical Statistics (MAS713) Ariel Neufeld 53 / 53