July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and...

April 19, 2023 1

732A36 Theory of Statistics

Course within the Master’s program in Statistics and Data mining

Fall semester 2011

April 19, 2023 2

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Course details

Course web: www.ida.liu.se/~732A36 Course responsible, tutor and examiner: Anders

Nordgaard Course period: Nov 2011-Jan 2012 Examination: Written exam in January 2012, Compulsory

assignments Course literature: “Garthwaite PH, Jolliffe IT and Jones B

(2002). Statistical Inference. 2nd ed. Oxford University Press, Oxford. ISBN 0-19-857226-3”

http://www.ida.liu.se/~732A36

April 19, 2023 3


Course contents

Statistical inference in general Point estimation (unbiasedness, consistency, efficiency,

sufficiency, completeness) Information and likelihood concepts Maximum-likelihood and Method-of-moment estimation Classical hypothesis testing (Power functions, the

Neyman-Pearson lemma , Maximum Likelihood Ratio Tests, Wald’s test)

Confidence intervals …

April 19, 2023 4


Course contents, cont.

Statistical decision theory (Loss functions, Risk concepts, Prior distributions, Sequential tests)

Bayesian inference (Estimation, Hypothesis testing, Credible intervals, Predictive distributions)

Non-parametric inference Computer intensive methods for estimation

April 19, 2023 5


Details about teaching and examination

Teaching is (as usual) sparse: A mixture between lectures and problem seminars

Lectures: Overview and some details of each chapter covered. No full-cover of the contents!

Problem seminars: Discussions about solutions to recommended exercises. Students should be prepared to provide solutions on the board!

Towards the end of the course a couple of larger compulsory assignments (that need solutions to be worked out with the help of a computer) will be distributed.

The course is finished by a written exam

April 19, 2023 6


Prerequisities

Good understanding of calculus an algebra Good understanding of the concepts of expectations

(including variance calculations) Familiarity with families of probability distributions

(Normal, Exponential, Binomial, Poisson, Gamma (Chi-square), Beta, …)

Skills in computer programming (e.g. with R , SAS, Matlab,)

April 19, 2023 7


Statistical inference in general

Population

Sample

Model

Conclusions about the population is drawn from the sample with assistance from a specified model

April 19, 2023 8


The two paradigms: Neyman-Pearson (frequentistic) and Bayesian

Population

Sample

Model

• Neyman-Pearson:• Model specifies the probability distribution for data obtained in a sample including a number of unknown population parameters

• Bayesian:•Model specifies the probability distribution for data obtained in a sample and a probability distribution (prior) for each of the unknown population parameters of that distribution

April 19, 2023 9


How is inference made?

Point estimation: Find the “best” approximations of an unknown population parameter

Interval estimation: Find a range of values that with high certainty covers the unknown population parameter Can be extended to regions if the parameter is multidimensional

Hypothesis testing: Give statements about the population (values of parameters, probability distributions, issues of independence,…) along with a quantitative measure of “certainty”

April 19, 2023 10


Tools for making inference

Criteria for a point estimate to be “good” “Algorithmic” methods to find point estimates (Maximum

Likelihood, Least Squares, Method-of-Moments) Classical methods of constructing hypothesis test (Neyman-

Pearson lemma, Maximum Likelihood Ratio Test,…) Classical methods to construct confidence intervals (regions) Decision theory (make use of loss and risk functions, utility

and cost) to find point estimates and hypothesis tests Using prior distributions to construct tests , credible intervals

and predictive distributions (Bayesian inference)

April 19, 2023 11


Tools for making inference…

Using theory of randomization to form non-parametric tests (tests not depending on any probability distribution behind data)

Computer intensive methods (bootstrap and cross-validation techniques)

Advanced models from data that make use of auxiliary information (explanatory variables): Generalized linear models, Generalized additive models, Spatio-temporal models, …

April 19, 2023 12


The univariate population-sample model

The population to be investigated is such that the values that comes out in a sample x1, x2 , …are governed by a probability distribution

The probability distribution is represented by a probability density (or mass) function f(x )

Alternatively, the sample values can be seen as the outcomes of independent random variables X1, X2, … all with probability density (or mass) function f(x )

April 19, 2023 13


Point estimation (frequentistic paradigm)

We have a sample x = (x1 , … , xn ) from a population The population contains an unknown parameter The functional forms of the distributional functions may be

known or unknown, but they depend on the unknown . Denote generally by f(x ; ) the probability density or mass

function of the distribution A point estimate of is a function of the sample values

such that its values should be close to the unknown .

x ˆ,,ˆˆ1 nxx

The sample mean is a point estimate of the population mean

The sample variance s2 is a point estimate of the population variance 2

The sample proportion p of a specific event (a specific value or range of values) is a point estimate of the corresponding population proportion

April 19, 2023 14


“Standard” point estimatesx

n

n

ii xxx

nx ,,ˆ

11

1

n

n

ii xxxx

ns ,,ˆ

1

11

2

1

22

ni xx

n

xp ,,ˆ

satisfied isevent :1

#

April 19, 2023 15


Assessing a point estimate

A point estimate has a sampling distribution Replace the sample observations x1 , … , xn with their

corresponding random variables X1 , … , Xn in the functional expression:

The point estimate is a random variable that is observed in the sample (point estimator)

As a random variable the point estimator must have a probability distribution than can be deduced from f (x ; )

The point estimator /estimate is assessed by investigating the its sampling distribution, in particular the mean and the variance.

nXX ,,ˆˆ1

April 19, 2023 16


Unbiasedness

A point estimator is unbiased for if the mean of its sampling distribution is equal to

The bias of a point estimate for is

Thus, a point estimate with bias = 0 is unbiased, otherwise it is biased

nXXEE ,,ˆˆ1

ˆˆ Ebias

April 19, 2023 17


Examples (within the univariate population-sample model) The sample mean is always unbiased for estimating the

population mean

Is the sample mean an unbiased estimate of the population median?

Why do we divide by n–1 in the sample variance (and not by n )?

n

i

n

i XEn

Xn

EXE11

11

2

1

2

1

2XnEXEXXE

n

i

n

i

April 19, 2023 18


Consistency

A point estimator is (weakly) consistent if

Thus, the point estimator should converge in probability to Theorem: A point estimator is consistent if

Proof: Use Chebyshev’s inequality in terms of

0any for as 0ˆPr n

nVarbias as 0ˆ and 0ˆ

22 ˆˆˆˆ biasVarEMSE

April 19, 2023 19


Examples

The sample mean is a consistent estimator of the population mean. What probability law can be applied?

What do we require for the sample variance to be a consistent estimator of the population variance?

...,21

1

1

1

2

1

222

1

22

2

1

22

2

XXnCovXVarnXVarn

XnXVarn

sVar

n

i

n

i

n

i

April 19, 2023 20


Efficiency

Assume we have two unbiased estimators of , i.e.

The efficiency of an unbiased estimator is defined as

2121 ˆˆ:ˆ,ˆ EE

21

21

ˆ than more be tosaid is ˆthen

of valueoneleast at for

inequalitystrict with ˆˆ If

efficient

VarVar

1

ˆ

ˆminˆ j

iij

Var

Vareff

April 19, 2023 21


Example

Let

21

222

212

22

21

21

2121

12

1

1

ˆ than efficient more is ˆ

2 since24

2

4

1ˆ

11ˆ

unbiased are estimatorsBoth 22

ˆ;ˆ

2ˆ and 2;

1ˆ

nn

XVarXVarVar

nn

nXVar

nVar

XEXEEE

XXnX

nX

n

i

nn

i

April 19, 2023 22


Likelihood function

For a sample x the likelihood function for is defined as

the log-likelihood function is

n

iixfL

1

;; x

n

iixfLl

1

;ln;ln; xx

measure how likely (or expected) the sample is

April 19, 2023 23


Fisher information

The (Fisher) Information about contained in a sample x is defined as

Theorem: Under some regularity conditions (interchangeability of integration and differentiation)

2

1

2

,,;; nXXlElEI

X

X;2

lEI

In particular the range of X cannot depend on (such as in

a population where X U(0, ) )

April 19, 2023 24


Why is it measure of information for

on.distributicurrent thefrom sample ain

about n informatio ofamount the measures

sample particular in the about n informatio ofamount themeasures

changed.slightly is iflot a changes

y probabilit The negativelargely or positivelargely is If

about n informatiomuch contain not do sample The

of changes

slightlyby affected sonot isy probabilit The 0 toclose is If

by measured is This

? with changey probabilit thisdo How

sample. obtained theofy probabilit the torelated is ; and ;

2

2

generallyl

E

l

l

l

l

lL

xx

April 19, 2023 25


Example

X Exp( )

232

132

1322

2

12

1

1

11

2

22

conditions regularity thefulfills ; 1

1ln;ln;

11;; 1

nn

n

XEn

Ixnl

Xxnl

xnLl

eexfL

n

i

n

i

n

i

n

i

x

n

nx

n

i

n

ii

xx

x

April 19, 2023 26


Cramér-Rao inequality

Under the same regularity conditions as for the previous theorem the following holds for any unbiased estimator

The lower bound is attained if and only if

I

Var1ˆ

ˆIl

April 19, 2023 27


XXyy

yy

yy

yy

yy

yyyyyy

yyy

y

yy

y

;ˆ;;ˆ

ˆ

;;

;;

;;

'ln as Now

;ˆconditions

Regularity;ˆ

ˆ

ˆ unbiased is ˆ asBut

;ˆ,,;,,ˆ

;;,,ˆ,,ˆˆ

1

1 1

111

1

;,,

111

lEdL

lE

Ll

LL

Ll

xg

xg

dx

xgd

dLdLE

E

dLdydyyyLyy

dydyyfyfyyXXEE

y y

nnn

y y

n

yyf

nnn

n

n n

Proof:

April 19, 2023 28


01;

conditions

Regularity;;

;

;;;;

;ˆ

; and ˆLet

,

,,

1, satisfies ,

,

and s variablewobetween tn correlatio cal theoretiThe

1;ˆ

1ˆ

:handother On the

11

2

yy

yyyyy

yX

X

XX

y

yy

y

dL

dLdLl

dydyyfyfll

EE

lVU

VEUEVUEVUCov

VVarUVarVUCovVVarUVarVUCov

VUVVarUVar

VUCovVU

VU

lE

E

nn

April 19, 2023 29


1

12

22

2

222

;ˆ

;ˆ0;ˆ

;;ˆ;ˆ11

101

;ˆ;ˆ;,ˆ

Il

EVar

lEVar

lEVar

lE

lEVar

lVarVar

lEE

lE

lCov

X

XX

XXX

XXX

April 19, 2023 30


Example

X Exp( )

boundlower Rao-Cramér theattains of estimator an as

1

2

1202

1

2

22

2

0

2

0

02

2

0

222

X

InXVar

dxexdxexex

dxexXEXEXVar

xxx

x

April 19, 2023 31


Sufficiency

A function T of the sample values of a sample x, i.e.

T = T(x)=T(x1 , … , xn ) is a statistic that is sufficient for the parameter if the conditional distribution of the sample random variables does not depend on , i.e.

What does it mean in practice?

If T is sufficient for then no more information about than what is contained in T can be obtained from the sample.

It is enough to work with T when deriving point estimates of

offunction a as written becannot ,,1),,(,, 11tyyf ntXXTXX nn

April 19, 2023 32


Example

12

21

11

11

11

1

2121

2111

1221

211

11,21,,

12

2121

21

PrPrPr

Pr atingdifferentiby Derive

?

,,,

.

observed is assume and ,Let

from sample a is , Assume

yty

yy

T

T

tyty

XXTXX

dydyee

XtXtXXtT

tTtF

tf

eee

ytyftyyf

xtx

tTxxxxT

Expxx

x

April 19, 2023 33


on dependingnot 1

,

101

1

1

1

21

1111

1111

1

11

1

1

11

1

1

11

11

1

1

2

21

11

1

1

2

21

11

1

1

2

21

11

2

2

21,

2211

11

01

1

0

111

0

11

0

101

0

1

0

211

0 0

1211

tet

etTyyf

eteteetf

eteete

eyedyee

dyeedyee

dydyeedydyee

t

t

TXX

ttttT

tttt

t

yty

t

y

ty

t

y

ytyt

y

yt

yyy

t

y

yt

y

yyt

y

yt

y

yy

April 19, 2023 34


The factorization theorem:

T is sufficient for if and only if the likelihood function can be written

i.e. can be factorized using two non-negative functions such that the first depends on x only through the statistics T and also on and the second does not depend on

xxx 21 ;; KTKL

April 19, 2023 35


Example, cont

X Exp( )

for sufficient is 11

11;;

Let

1

;

1

1

11

1

2

11

1

1

n

iK

xK

x

n

x

n

nx

n

i

n

i

xe

eexfL

xT

n

i

n

i

n

ii

x

x

x

April 19, 2023 36


July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and...

Documents

Transcript of July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and...