July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and...
-
Upload
nelson-simon -
Category
Documents
-
view
215 -
download
2
Transcript of July 3, 20151 732A36 Theory of Statistics Course within the Master’s program in Statistics and...
April 19, 2023 1
732A36 Theory of Statistics
Course within the Master’s program in Statistics and Data mining
Fall semester 2011
April 19, 2023 2
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Course details
Course web: www.ida.liu.se/~732A36 Course responsible, tutor and examiner: Anders
Nordgaard Course period: Nov 2011-Jan 2012 Examination: Written exam in January 2012, Compulsory
assignments Course literature: “Garthwaite PH, Jolliffe IT and Jones B
(2002). Statistical Inference. 2nd ed. Oxford University Press, Oxford. ISBN 0-19-857226-3”
April 19, 2023 3
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Course contents
Statistical inference in general Point estimation (unbiasedness, consistency, efficiency,
sufficiency, completeness) Information and likelihood concepts Maximum-likelihood and Method-of-moment estimation Classical hypothesis testing (Power functions, the
Neyman-Pearson lemma , Maximum Likelihood Ratio Tests, Wald’s test)
Confidence intervals …
April 19, 2023 4
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Course contents, cont.
Statistical decision theory (Loss functions, Risk concepts, Prior distributions, Sequential tests)
Bayesian inference (Estimation, Hypothesis testing, Credible intervals, Predictive distributions)
Non-parametric inference Computer intensive methods for estimation
April 19, 2023 5
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Details about teaching and examination
Teaching is (as usual) sparse: A mixture between lectures and problem seminars
Lectures: Overview and some details of each chapter covered. No full-cover of the contents!
Problem seminars: Discussions about solutions to recommended exercises. Students should be prepared to provide solutions on the board!
Towards the end of the course a couple of larger compulsory assignments (that need solutions to be worked out with the help of a computer) will be distributed.
The course is finished by a written exam
April 19, 2023 6
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Prerequisities
Good understanding of calculus an algebra Good understanding of the concepts of expectations
(including variance calculations) Familiarity with families of probability distributions
(Normal, Exponential, Binomial, Poisson, Gamma (Chi-square), Beta, …)
Skills in computer programming (e.g. with R , SAS, Matlab,)
April 19, 2023 7
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Statistical inference in general
Population
Sample
Model
Conclusions about the population is drawn from the sample with assistance from a specified model
April 19, 2023 8
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
The two paradigms: Neyman-Pearson (frequentistic) and Bayesian
Population
Sample
Model
• Neyman-Pearson:• Model specifies the probability distribution for data obtained in a sample including a number of unknown population parameters
• Bayesian:•Model specifies the probability distribution for data obtained in a sample and a probability distribution (prior) for each of the unknown population parameters of that distribution
April 19, 2023 9
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
How is inference made?
Point estimation: Find the “best” approximations of an unknown population parameter
Interval estimation: Find a range of values that with high certainty covers the unknown population parameter Can be extended to regions if the parameter is multidimensional
Hypothesis testing: Give statements about the population (values of parameters, probability distributions, issues of independence,…) along with a quantitative measure of “certainty”
April 19, 2023 10
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Tools for making inference
Criteria for a point estimate to be “good” “Algorithmic” methods to find point estimates (Maximum
Likelihood, Least Squares, Method-of-Moments) Classical methods of constructing hypothesis test (Neyman-
Pearson lemma, Maximum Likelihood Ratio Test,…) Classical methods to construct confidence intervals (regions) Decision theory (make use of loss and risk functions, utility
and cost) to find point estimates and hypothesis tests Using prior distributions to construct tests , credible intervals
and predictive distributions (Bayesian inference)
April 19, 2023 11
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Tools for making inference…
Using theory of randomization to form non-parametric tests (tests not depending on any probability distribution behind data)
Computer intensive methods (bootstrap and cross-validation techniques)
Advanced models from data that make use of auxiliary information (explanatory variables): Generalized linear models, Generalized additive models, Spatio-temporal models, …
April 19, 2023 12
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
The univariate population-sample model
The population to be investigated is such that the values that comes out in a sample x1, x2 , …are governed by a probability distribution
The probability distribution is represented by a probability density (or mass) function f(x )
Alternatively, the sample values can be seen as the outcomes of independent random variables X1, X2, … all with probability density (or mass) function f(x )
April 19, 2023 13
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Point estimation (frequentistic paradigm)
We have a sample x = (x1 , … , xn ) from a population The population contains an unknown parameter The functional forms of the distributional functions may be
known or unknown, but they depend on the unknown . Denote generally by f(x ; ) the probability density or mass
function of the distribution A point estimate of is a function of the sample values
such that its values should be close to the unknown .
x ˆ,,ˆˆ1 nxx
The sample mean is a point estimate of the population mean
The sample variance s2 is a point estimate of the population variance 2
The sample proportion p of a specific event (a specific value or range of values) is a point estimate of the corresponding population proportion
April 19, 2023 14
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
“Standard” point estimatesx
n
n
ii xxx
nx ,,ˆ
11
1
n
n
ii xxxx
ns ,,ˆ
1
11
2
1
22
ni xx
n
xp ,,ˆ
satisfied isevent :1
#
April 19, 2023 15
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Assessing a point estimate
A point estimate has a sampling distribution Replace the sample observations x1 , … , xn with their
corresponding random variables X1 , … , Xn in the functional expression:
The point estimate is a random variable that is observed in the sample (point estimator)
As a random variable the point estimator must have a probability distribution than can be deduced from f (x ; )
The point estimator /estimate is assessed by investigating the its sampling distribution, in particular the mean and the variance.
nXX ,,ˆˆ1
April 19, 2023 16
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Unbiasedness
A point estimator is unbiased for if the mean of its sampling distribution is equal to
The bias of a point estimate for is
Thus, a point estimate with bias = 0 is unbiased, otherwise it is biased
nXXEE ,,ˆˆ1
ˆˆ Ebias
April 19, 2023 17
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Examples (within the univariate population-sample model) The sample mean is always unbiased for estimating the
population mean
Is the sample mean an unbiased estimate of the population median?
Why do we divide by n–1 in the sample variance (and not by n )?
n
i
n
i XEn
Xn
EXE11
11
2
1
2
1
2XnEXEXXE
n
i
n
i
April 19, 2023 18
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Consistency
A point estimator is (weakly) consistent if
Thus, the point estimator should converge in probability to Theorem: A point estimator is consistent if
Proof: Use Chebyshev’s inequality in terms of
0any for as 0ˆPr n
nVarbias as 0ˆ and 0ˆ
22 ˆˆˆˆ biasVarEMSE
April 19, 2023 19
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Examples
The sample mean is a consistent estimator of the population mean. What probability law can be applied?
What do we require for the sample variance to be a consistent estimator of the population variance?
...,21
1
1
1
2
1
222
1
22
2
1
22
2
XXnCovXVarnXVarn
XnXVarn
sVar
n
i
n
i
n
i
April 19, 2023 20
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Efficiency
Assume we have two unbiased estimators of , i.e.
The efficiency of an unbiased estimator is defined as
2121 ˆˆ:ˆ,ˆ EE
21
21
ˆ than more be tosaid is ˆthen
of valueoneleast at for
inequalitystrict with ˆˆ If
efficient
VarVar
1
ˆ
ˆminˆ j
iij
Var
Vareff
April 19, 2023 21
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Example
Let
21
222
212
22
21
21
2121
12
1
1
ˆ than efficient more is ˆ
2 since24
2
4
1ˆ
11ˆ
unbiased are estimatorsBoth 22
ˆ;ˆ
2ˆ and 2;
1ˆ
nn
XVarXVarVar
nn
nXVar
nVar
XEXEEE
XXnX
nX
n
i
nn
i
April 19, 2023 22
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Likelihood function
For a sample x the likelihood function for is defined as
the log-likelihood function is
n
iixfL
1
;; x
n
iixfLl
1
;ln;ln; xx
measure how likely (or expected) the sample is
April 19, 2023 23
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Fisher information
The (Fisher) Information about contained in a sample x is defined as
Theorem: Under some regularity conditions (interchangeability of integration and differentiation)
2
1
2
,,;; nXXlElEI
X
X;2
lEI
In particular the range of X cannot depend on (such as in
a population where X U(0, ) )
April 19, 2023 24
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Why is it measure of information for
on.distributicurrent thefrom sample ain
about n informatio ofamount the measures
sample particular in the about n informatio ofamount themeasures
changed.slightly is iflot a changes
y probabilit The negativelargely or positivelargely is If
about n informatiomuch contain not do sample The
of changes
slightlyby affected sonot isy probabilit The 0 toclose is If
by measured is This
? with changey probabilit thisdo How
sample. obtained theofy probabilit the torelated is ; and ;
2
2
generallyl
E
l
l
l
l
lL
xx
April 19, 2023 25
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Example
X Exp( )
232
132
1322
2
12
1
1
11
2
22
conditions regularity thefulfills ; 1
1ln;ln;
11;; 1
nn
n
XEn
Ixnl
Xxnl
xnLl
eexfL
n
i
n
i
n
i
n
i
x
n
nx
n
i
n
ii
xx
x
April 19, 2023 26
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Cramér-Rao inequality
Under the same regularity conditions as for the previous theorem the following holds for any unbiased estimator
The lower bound is attained if and only if
I
Var1ˆ
ˆIl
April 19, 2023 27
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
XXyy
yy
yy
yy
yy
yyyyyy
yyy
y
yy
y
;ˆ;;ˆ
ˆ
;;
;;
;;
'ln as Now
;ˆconditions
Regularity;ˆ
ˆ
ˆ unbiased is ˆ asBut
;ˆ,,;,,ˆ
;;,,ˆ,,ˆˆ
1
1 1
111
1
;,,
111
lEdL
lE
Ll
LL
Ll
xg
xg
dx
xgd
dLdLE
E
dLdydyyyLyy
dydyyfyfyyXXEE
y y
nnn
y y
n
yyf
nnn
n
n n
Proof:
April 19, 2023 28
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
01;
conditions
Regularity;;
;
;;;;
;ˆ
; and ˆLet
,
,,
1, satisfies ,
,
and s variablewobetween tn correlatio cal theoretiThe
1;ˆ
1ˆ
:handother On the
11
2
yy
yyyyy
yX
X
XX
y
yy
y
dL
dLdLl
dydyyfyfll
EE
lVU
VEUEVUEVUCov
VVarUVarVUCovVVarUVarVUCov
VUVVarUVar
VUCovVU
VU
lE
E
nn
April 19, 2023 29
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
1
12
22
2
222
;ˆ
;ˆ0;ˆ
;;ˆ;ˆ11
101
;ˆ;ˆ;,ˆ
Il
EVar
lEVar
lEVar
lE
lEVar
lVarVar
lEE
lE
lCov
X
XX
XXX
XXX
April 19, 2023 30
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Example
X Exp( )
boundlower Rao-Cramér theattains of estimator an as
1
2
1202
1
2
22
2
0
2
0
02
2
0
222
X
InXVar
dxexdxexex
dxexXEXEXVar
xxx
x
April 19, 2023 31
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Sufficiency
A function T of the sample values of a sample x, i.e.
T = T(x)=T(x1 , … , xn ) is a statistic that is sufficient for the parameter if the conditional distribution of the sample random variables does not depend on , i.e.
What does it mean in practice?
If T is sufficient for then no more information about than what is contained in T can be obtained from the sample.
It is enough to work with T when deriving point estimates of
offunction a as written becannot ,,1),,(,, 11tyyf ntXXTXX nn
April 19, 2023 32
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Example
12
21
11
11
11
1
2121
2111
1221
211
11,21,,
12
2121
21
PrPrPr
Pr atingdifferentiby Derive
?
,,,
.
observed is assume and ,Let
from sample a is , Assume
yty
yy
T
T
tyty
XXTXX
dydyee
XtXtXXtT
tTtF
tf
eee
ytyftyyf
xtx
tTxxxxT
Expxx
x
April 19, 2023 33
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
on dependingnot 1
,
101
1
1
1
21
1111
1111
1
11
1
1
11
1
1
11
11
1
1
2
21
11
1
1
2
21
11
1
1
2
21
11
2
2
21,
2211
11
01
1
0
111
0
11
0
101
0
1
0
211
0 0
1211
tet
etTyyf
eteteetf
eteete
eyedyee
dyeedyee
dydyeedydyee
t
t
TXX
ttttT
tttt
t
yty
t
y
ty
t
y
ytyt
y
yt
yyy
t
y
yt
y
yyt
y
yt
y
yy
April 19, 2023 34
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
The factorization theorem:
T is sufficient for if and only if the likelihood function can be written
i.e. can be factorized using two non-negative functions such that the first depends on x only through the statistics T and also on and the second does not depend on
xxx 21 ;; KTKL
April 19, 2023 35
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden
Example, cont
X Exp( )
for sufficient is 11
11;;
Let
1
;
1
1
11
1
2
11
1
1
n
iK
xK
x
n
x
n
nx
n
i
n
i
xe
eexfL
xT
n
i
n
i
n
ii
x
x
x
April 19, 2023 36
Department of Computer and Information Science (IDA) Linköpings universitet, Sweden