July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and...

36
March 21, 202 2 1 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011

Transcript of July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and...

Page 1: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 1

732A36 Theory of Statistics

Course within the Master’s program in Statistics and Data mining

Fall semester 2011

Page 2: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 2

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Course details

Course web: www.ida.liu.se/~732A36 Course responsible, tutor and examiner: Anders

Nordgaard Course period: Nov 2011-Jan 2012 Examination: Written exam in January 2012, Compulsory

assignments Course literature: “Garthwaite PH, Jolliffe IT and Jones B

(2002). Statistical Inference. 2nd ed. Oxford University Press, Oxford. ISBN 0-19-857226-3”

Page 3: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 3

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Course contents

Statistical inference in general Point estimation (unbiasedness, consistency, efficiency,

sufficiency, completeness) Information and likelihood concepts Maximum-likelihood and Method-of-moment estimation Classical hypothesis testing (Power functions, the

Neyman-Pearson lemma , Maximum Likelihood Ratio Tests, Wald’s test)

Confidence intervals …

Page 4: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 4

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Course contents, cont.

Statistical decision theory (Loss functions, Risk concepts, Prior distributions, Sequential tests)

Bayesian inference (Estimation, Hypothesis testing, Credible intervals, Predictive distributions)

Non-parametric inference Computer intensive methods for estimation

Page 5: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 5

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Details about teaching and examination

Teaching is (as usual) sparse: A mixture between lectures and problem seminars

Lectures: Overview and some details of each chapter covered. No full-cover of the contents!

Problem seminars: Discussions about solutions to recommended exercises. Students should be prepared to provide solutions on the board!

Towards the end of the course a couple of larger compulsory assignments (that need solutions to be worked out with the help of a computer) will be distributed.

The course is finished by a written exam

Page 6: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 6

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Prerequisities

Good understanding of calculus an algebra Good understanding of the concepts of expectations

(including variance calculations) Familiarity with families of probability distributions

(Normal, Exponential, Binomial, Poisson, Gamma (Chi-square), Beta, …)

Skills in computer programming (e.g. with R , SAS, Matlab,)

Page 7: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 7

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Statistical inference in general

Population

Sample

Model

Conclusions about the population is drawn from the sample with assistance from a specified model

Page 8: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 8

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

The two paradigms: Neyman-Pearson (frequentistic) and Bayesian

Population

Sample

Model

• Neyman-Pearson:• Model specifies the probability distribution for data obtained in a sample including a number of unknown population parameters

• Bayesian:•Model specifies the probability distribution for data obtained in a sample and a probability distribution (prior) for each of the unknown population parameters of that distribution

Page 9: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 9

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

How is inference made?

Point estimation: Find the “best” approximations of an unknown population parameter

Interval estimation: Find a range of values that with high certainty covers the unknown population parameter Can be extended to regions if the parameter is multidimensional

Hypothesis testing: Give statements about the population (values of parameters, probability distributions, issues of independence,…) along with a quantitative measure of “certainty”

Page 10: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 10

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Tools for making inference

Criteria for a point estimate to be “good” “Algorithmic” methods to find point estimates (Maximum

Likelihood, Least Squares, Method-of-Moments) Classical methods of constructing hypothesis test (Neyman-

Pearson lemma, Maximum Likelihood Ratio Test,…) Classical methods to construct confidence intervals (regions) Decision theory (make use of loss and risk functions, utility

and cost) to find point estimates and hypothesis tests Using prior distributions to construct tests , credible intervals

and predictive distributions (Bayesian inference)

Page 11: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 11

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Tools for making inference…

Using theory of randomization to form non-parametric tests (tests not depending on any probability distribution behind data)

Computer intensive methods (bootstrap and cross-validation techniques)

Advanced models from data that make use of auxiliary information (explanatory variables): Generalized linear models, Generalized additive models, Spatio-temporal models, …

Page 12: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 12

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

The univariate population-sample model

The population to be investigated is such that the values that comes out in a sample x1, x2 , …are governed by a probability distribution

The probability distribution is represented by a probability density (or mass) function f(x )

Alternatively, the sample values can be seen as the outcomes of independent random variables X1, X2, … all with probability density (or mass) function f(x )

Page 13: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 13

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Point estimation (frequentistic paradigm)

We have a sample x = (x1 , … , xn ) from a population The population contains an unknown parameter The functional forms of the distributional functions may be

known or unknown, but they depend on the unknown . Denote generally by f(x ; ) the probability density or mass

function of the distribution A point estimate of is a function of the sample values

such that its values should be close to the unknown .

x ˆ,,ˆˆ1 nxx

Page 14: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

The sample mean is a point estimate of the population mean

The sample variance s2 is a point estimate of the population variance 2

The sample proportion p of a specific event (a specific value or range of values) is a point estimate of the corresponding population proportion

April 19, 2023 14

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

“Standard” point estimatesx

n

n

ii xxx

nx ,,ˆ

11

1

n

n

ii xxxx

ns ,,ˆ

1

11

2

1

22

ni xx

n

xp ,,ˆ

satisfied isevent :1

#

Page 15: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 15

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Assessing a point estimate

A point estimate has a sampling distribution Replace the sample observations x1 , … , xn with their

corresponding random variables X1 , … , Xn in the functional expression:

The point estimate is a random variable that is observed in the sample (point estimator)

As a random variable the point estimator must have a probability distribution than can be deduced from f (x ; )

The point estimator /estimate is assessed by investigating the its sampling distribution, in particular the mean and the variance.

nXX ,,ˆˆ1

Page 16: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 16

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Unbiasedness

A point estimator is unbiased for if the mean of its sampling distribution is equal to

The bias of a point estimate for is

Thus, a point estimate with bias = 0 is unbiased, otherwise it is biased

nXXEE ,,ˆˆ1

ˆˆ Ebias

Page 17: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 17

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Examples (within the univariate population-sample model) The sample mean is always unbiased for estimating the

population mean

Is the sample mean an unbiased estimate of the population median?

Why do we divide by n–1 in the sample variance (and not by n )?

n

i

n

i XEn

Xn

EXE11

11

2

1

2

1

2XnEXEXXE

n

i

n

i

Page 18: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 18

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Consistency

A point estimator is (weakly) consistent if

Thus, the point estimator should converge in probability to Theorem: A point estimator is consistent if

Proof: Use Chebyshev’s inequality in terms of

0any for as 0ˆPr n

nVarbias as 0ˆ and 0ˆ

22 ˆˆˆˆ biasVarEMSE

Page 19: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 19

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Examples

The sample mean is a consistent estimator of the population mean. What probability law can be applied?

What do we require for the sample variance to be a consistent estimator of the population variance?

...,21

1

1

1

2

1

222

1

22

2

1

22

2

XXnCovXVarnXVarn

XnXVarn

sVar

n

i

n

i

n

i

Page 20: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 20

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Efficiency

Assume we have two unbiased estimators of , i.e.

The efficiency of an unbiased estimator is defined as

2121 ˆˆ:ˆ,ˆ EE

21

21

ˆ than more be tosaid is ˆthen

of valueoneleast at for

inequalitystrict with ˆˆ If

efficient

VarVar

1

ˆ

ˆminˆ j

iij

Var

Vareff

Page 21: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 21

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Example

Let

21

222

212

22

21

21

2121

12

1

1

ˆ than efficient more is ˆ

2 since24

2

4

11ˆ

unbiased are estimatorsBoth 22

ˆ;ˆ

2ˆ and 2;

nn

XVarXVarVar

nn

nXVar

nVar

XEXEEE

XXnX

nX

n

i

nn

i

Page 22: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 22

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Likelihood function

For a sample x the likelihood function for is defined as

the log-likelihood function is

n

iixfL

1

;; x

n

iixfLl

1

;ln;ln; xx

measure how likely (or expected) the sample is

Page 23: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 23

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Fisher information

The (Fisher) Information about contained in a sample x is defined as

Theorem: Under some regularity conditions (interchangeability of integration and differentiation)

2

1

2

,,;; nXXlElEI

X

X;2

lEI

In particular the range of X cannot depend on (such as in

a population where X U(0, ) )

Page 24: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 24

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Why is it measure of information for

on.distributicurrent thefrom sample ain

about n informatio ofamount the measures

sample particular in the about n informatio ofamount themeasures

changed.slightly is iflot a changes

y probabilit The negativelargely or positivelargely is If

about n informatiomuch contain not do sample The

of changes

slightlyby affected sonot isy probabilit The 0 toclose is If

by measured is This

? with changey probabilit thisdo How

sample. obtained theofy probabilit the torelated is ; and ;

2

2

generallyl

E

l

l

l

l

lL

xx

Page 25: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 25

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Example

X Exp( )

232

132

1322

2

12

1

1

11

2

22

conditions regularity thefulfills ; 1

1ln;ln;

11;; 1

nn

n

XEn

Ixnl

Xxnl

xnLl

eexfL

n

i

n

i

n

i

n

i

x

n

nx

n

i

n

ii

xx

x

Page 26: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 26

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Cramér-Rao inequality

Under the same regularity conditions as for the previous theorem the following holds for any unbiased estimator

The lower bound is attained if and only if

I

Var1ˆ

ˆIl

Page 27: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 27

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

XXyy

yy

yy

yy

yy

yyyyyy

yyy

y

yy

y

;ˆ;;ˆ

ˆ

;;

;;

;;

'ln as Now

;ˆconditions

Regularity;ˆ

ˆ

ˆ unbiased is ˆ asBut

;ˆ,,;,,ˆ

;;,,ˆ,,ˆˆ

1

1 1

111

1

;,,

111

lEdL

lE

Ll

LL

Ll

xg

xg

dx

xgd

dLdLE

E

dLdydyyyLyy

dydyyfyfyyXXEE

y y

nnn

y y

n

yyf

nnn

n

n n

Proof:

Page 28: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 28

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

01;

conditions

Regularity;;

;

;;;;

; and ˆLet

,

,,

1, satisfies ,

,

and s variablewobetween tn correlatio cal theoretiThe

1;ˆ

:handother On the

11

2

yy

yyyyy

yX

X

XX

y

yy

y

dL

dLdLl

dydyyfyfll

EE

lVU

VEUEVUEVUCov

VVarUVarVUCovVVarUVarVUCov

VUVVarUVar

VUCovVU

VU

lE

E

nn

Page 29: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 29

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

1

12

22

2

222

;ˆ0;ˆ

;;ˆ;ˆ11

101

;ˆ;ˆ;,ˆ

Il

EVar

lEVar

lEVar

lE

lEVar

lVarVar

lEE

lE

lCov

X

XX

XXX

XXX

Page 30: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 30

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Example

X Exp( )

boundlower Rao-Cramér theattains of estimator an as

1

2

1202

1

2

22

2

0

2

0

02

2

0

222

X

InXVar

dxexdxexex

dxexXEXEXVar

xxx

x

Page 31: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 31

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Sufficiency

A function T of the sample values of a sample x, i.e.

T = T(x)=T(x1 , … , xn ) is a statistic that is sufficient for the parameter if the conditional distribution of the sample random variables does not depend on , i.e.

What does it mean in practice?

If T is sufficient for then no more information about than what is contained in T can be obtained from the sample.

It is enough to work with T when deriving point estimates of

offunction a as written becannot ,,1),,(,, 11tyyf ntXXTXX nn

Page 32: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 32

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Example

12

21

11

11

11

1

2121

2111

1221

211

11,21,,

12

2121

21

PrPrPr

Pr atingdifferentiby Derive

?

,,,

.

observed is assume and ,Let

from sample a is , Assume

yty

yy

T

T

tyty

XXTXX

dydyee

XtXtXXtT

tTtF

tf

eee

ytyftyyf

xtx

tTxxxxT

Expxx

x

Page 33: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 33

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

on dependingnot 1

,

101

1

1

1

21

1111

1111

1

11

1

1

11

1

1

11

11

1

1

2

21

11

1

1

2

21

11

1

1

2

21

11

2

2

21,

2211

11

01

1

0

111

0

11

0

101

0

1

0

211

0 0

1211

tet

etTyyf

eteteetf

eteete

eyedyee

dyeedyee

dydyeedydyee

t

t

TXX

ttttT

tttt

t

yty

t

y

ty

t

y

ytyt

y

yt

yyy

t

y

yt

y

yyt

y

yt

y

yy

Page 34: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 34

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

The factorization theorem:

T is sufficient for if and only if the likelihood function can be written

i.e. can be factorized using two non-negative functions such that the first depends on x only through the statistics T and also on and the second does not depend on

xxx 21 ;; KTKL

Page 35: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 35

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden

Example, cont

X Exp( )

for sufficient is 11

11;;

Let

1

;

1

1

11

1

2

11

1

1

n

iK

xK

x

n

x

n

nx

n

i

n

i

xe

eexfL

xT

n

i

n

i

n

ii

x

x

x

Page 36: July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

April 19, 2023 36

Department of Computer and Information Science (IDA) Linköpings universitet, Sweden