ST 522 Slides
-
Upload
riemannadi -
Category
Documents
-
view
218 -
download
0
Transcript of ST 522 Slides
-
7/29/2019 ST 522 Slides
1/177
ST 522: Statistical Theory II
Subhashis Ghoshal,North Carolina State University
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
http://find/ -
7/29/2019 ST 522 Slides
2/177
Useful Results from Calculus
We recapitulate some facts from calculus we need throughout.
Theorem (Binomial theorem)
(a+b)n =
n
0
anb0+
n
1
an1b1+ +
n
n 1
a1bn1+
n
n
a0bn.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
http://find/ -
7/29/2019 ST 522 Slides
3/177
Common infinite series
Geometric series
a + ar +
+ arn1 = a
rn 1
r 1= a
1 rn
1 r, r
= 1.
Infinite Geometric series
a + ar + ar2 + = a 11r, |r| < 1.(1 x)1 = 1 + x + x2 + , |x| < 1(1 + x)1 = 1 x + x2 x3 + , |x| < 1.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
http://find/ -
7/29/2019 ST 522 Slides
4/177
Common infinite series (contd.)
Infinite binomial series
(1 x)2
= 1 + 2x + 3x2
+ 4x3
+ , |x| < 1,(1 x)r = 1 +n=1 r+n1n xn, |x| < 1, where for any realnumber ,
n
= ( 1) ( n + 1)/n!, the generalized
binomial coefficient. In particular,
r+n1
n = r(r+ 1) (r+ n 1)/n!. Also note that for > 0,
r
= (1)r(+1)(+r1)
r! .Exponential series
ex = 1 +x
1!+
x2
2!+
Logarithmic series
log(1 + x) = x x2
2+
x3
3 , |x| < 1
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
http://find/ -
7/29/2019 ST 522 Slides
5/177
Useful limits
limn(1 + 1/n)n = e.
limn(1 + n/n)n = e for any n .
limx0(1 + ax)1/x = ea.limx0
log(1+x)x
= 1.
limx0sin xx
= 1.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
http://find/ -
7/29/2019 ST 522 Slides
6/177
Derivatives
ddx
xn = nxn1.ddx
eax = aeax.ddx
ax = ax log a.ddx
log x = 1/x.ddx
sin x = cos x.d
dx cos x = sin x.ddx
tan x = 1 + tan2 x.ddx
sin1 x = 1/
1 x2.ddx
tan1 x = 11+x2
.
ddx(af(x) + bg(x)) = af(x) + bg(x).ddx
f(x)g(x) = f(x)g(x) + f(x)g(x).
ddx
(f(x)/g(x)) = f(x)g(x)f(x)g(x)
g2(x).
ddx
f(g(x)) = f(g(x))g(x).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
I i
http://find/ -
7/29/2019 ST 522 Slides
7/177
Integration
xndx = x
n+1
n+1 , n = 1.x
1
dx = log x.eaxdx = eax/a, a = 0.f(x)f(x) dx = log f(x).
Integration by substitutiong(f(x))f(x)dx =
g(y)dy, y = f(x).
Integration by parts
u(x)v(x)dx = u(x)V(x) V(x)u(x)dx,
where V(x) =
v(x)dx, u(x) is called the first function andv(x) the second.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
I i ( d )
http://find/http://goback/ -
7/29/2019 ST 522 Slides
8/177
Integration (contd.)
Integration by partial fractionsApplies while integrating the ratio of two polynomials P(x)and Q(x), where the degree of P is less than the degree of Qwithout loss of generality. Factorize Q(x) in linear andquadratic factors. The ratio can be written as uniquely alinear combination of the reciprocals of the linear factors and
linear over quadratic factors. The resulting expression can beintegrated term by term. Consult any standard Calculus textsuch as Apostol.
Definite Integral
ba
f(x)dx = F(x))ba
= F(b) F(a),
where F(x) = f(x)dx.Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
O d St ti ti
http://find/http://goback/ -
7/29/2019 ST 522 Slides
9/177
Order Statistics
Given a random sample, we are interested in the smallest,largest, or middle observations.
the highest flood watersthe lowest winter temperature recorded in the last 50 yearsthe median price of houses sold in last monththe median salary of NBA players
Definition: Given a random sample, X1, , Xn, the sampleorder statistics are the sample values placed in ascendingorder,
X(1) = min1in Xi,X(2) = second smallest Xi,... = ...
X(n) = max1in Xi.Example: Suppose four numbers are observed as a sample ofsize 4. The sample values arex1 = 6, x2 = 9, x3 = 3, x4 = 8.. What are the order
statistics?Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
O d St ti ti ( td )
http://find/ -
7/29/2019 ST 522 Slides
10/177
Order Statistics (contd.)
Order statistics are random variables themselves (as functionsof a random sample).
Order statistics satisfy
X(1) X(n).Though the samples X1, , Xn are independently andidentically distributed, the order statistics X(1), , X(n) arenever independent because of the order restriction.
We will study their marginal distributions and jointdistributions
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
O d St tisti s M i l dist ib ti s
http://find/ -
7/29/2019 ST 522 Slides
11/177
Order Statistics - Marginal distributions
Assume X1, , Xn are from a continuous population with cdfF(x) and pdf f(x).
The nth order statistic, or the sample maximum, X(n) had thepdf
fX(n)(x) = n[F(x)]n1f(x)
The first order statistic, or the sample minimum, X(1) had thepdf
fX(1)(x) = n[1 F(x)]n1f(x)More generally, the jth order statistic X(j) has the pdf
fX(j)(x) =n!
(j 1)!(n j)! f(x)[F(x)]j1[1 F(x)]nj.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Order Statistics Joint distributions
http://find/http://goback/ -
7/29/2019 ST 522 Slides
12/177
Order Statistics -Joint distributions
For 1
i < j
n, the joint pdf of X(i)
and X(j)
is
fX(i),X(j)(u, v) =n!
(i 1)!(j i 1)!(n j)! f(u)f(v)[F(u)]i1
[F(v) F(u)]ji1[1 F(v)]nj
if < u < v < ; = 0 otherwise.Special case: Joint pdf of X(1) and X(n)
The joint pdf X(1), , X(n) is
fX(1), ,X(n)(u1, , un)= n!f(u1) f(un)1l{ < u1 < < un < }.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Illustration
http://goforward/http://find/http://goback/ -
7/29/2019 ST 522 Slides
13/177
Illustration
Example: X1, , Xn are iid from unif [0, 1].Show that X(j) Beta(j, n + 1 j).Compute E[X(j)] and Var[X(j)]
The joint pdf of X(1) and X(n).
Let n = 5. Derive the joint pdf of X(2) and X(4).
X(1)|X(n) X(n)Beta(1, n 1)For any i < j, X(i)|X(j) X(j)Beta(i,j i)
Let n = 5. Derive the joint pdf of X(1), , X(5).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Example
http://find/ -
7/29/2019 ST 522 Slides
14/177
Example
Compute P(X(1) > 1, X(n) 2).
P(X(1) > x, X(n) y) =n
i=1
P(x < Xi y) = [F(y) F(x)]n.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Common statistics based on order statistics
http://find/http://goback/ -
7/29/2019 ST 522 Slides
15/177
Common statistics based on order statistics
sample range: R = X(n) X(1)sample midrange: V = X(n) + X(1) /2sample median:
M =
X((n+1)/2) if n is odd
X(n/2) + X(n/2+1) /2 if n is even.sample percentile: For any 0 < p < 1, the (100p)th samplepercentile is the observation such that about np of theobservations are less than this observation and n(1 p)th ofthe observations are larger.
sample median M is 50th sample quantile (the second samplequartile)denote Q1 as 25th sample quantile (the first sample quartile)denote Q3 as 75th sample quantile (the third sample quartile)interquartile range IQR=Q3 Q1 (describing the spread aboutthe median)
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Remarks
http://find/http://goback/ -
7/29/2019 ST 522 Slides
16/177
Remarks
Sample Mean vs Sample Median
Sample Median vs Population Median
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Principles of data reduction
http://find/http://goback/ -
7/29/2019 ST 522 Slides
17/177
Principles of data reduction
Data X, (X1, . . . , Xn): Probability distribution P completely or
partially unknown.Distribution often modeled by standard ones such as Poisson,normal.A few parameters control the distribution of the data. P = P
Parameter : unknown, object of interest.Inference: Any conclusion about parameter values based on data.Three main inference problems point estimation, hypothesistesting, interval estimation.Statistic T = T(X): Any function of data. A summary measure of
the data.Statistics may be used as point estimators, test statistics, upperand lower confidence limit.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Inductive reasoning
http://find/ -
7/29/2019 ST 522 Slides
18/177
Inductive reasoning
Role of probability theory: Extent of randomness of T controlledby . Probabilistic characteristics such as expectation, variance,moments, distribution involve .Conversely, value of T reflects knowledge about . For instance, ifT has expectation and is unknown, then can be estimated by
T. Intuitively, if we observe a large value of T, we tend toconclude that must be large.Need to assess the extent of the error.Frequentist approach: Randomness of error means need to judge
based on average error over repeated sampling. Thus need tostudy the sampling distribution of T.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Sufficiency
http://find/ -
7/29/2019 ST 522 Slides
19/177
Sufficiency
As T summarizes the data X, the first natural question is that
whether there is any loss of information due to summarization.Data contains many information, some are relevant for and someare not.Dropping an irrelevant information is desirable, but dropping arelevant information is undesirable.
How to compare the amount of information about in data and inT? Is it sufficient to consider only the reduced data T?
Definition (Sufficient statistic)
A statistic T is called sufficient if the conditional distribution of Xgiven T is free of (that is, the conditional is a completely knowndistribution).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
http://find/ -
7/29/2019 ST 522 Slides
20/177
Example
Toss a coin 100 times. The probability of head p is unknown.T=number of heads obtained.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Sufficiency principle
http://find/ -
7/29/2019 ST 522 Slides
21/177
Sufficiency principle
If T is sufficient, the extra information carried by X is worthlessas long as is concerned. It is then only natural to considerinference procedures which do not use this extra irrelevantinformation. This leads to the principle of sufficiency.
Definition (Sufficiency principle)
Any inference procedure should depend on the data only through asufficient statistic.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
How to check sufficiency?
http://find/ -
7/29/2019 ST 522 Slides
22/177
How to check sufficiency?
Theorem (Neyman-Fisher Factorization theorem)
T is sufficient iff f(x; ) can be written as the product
g(T(x); )h(x), where the first factor depends on x only thoughT(x) and the second factor is free of .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
http://find/ -
7/29/2019 ST 522 Slides
23/177
Example
X1, . . . , Xn iid
N(, 1).
Bin(1, )
Poi().
N(, 2). = (, ).
Ga(, ). = (, ). (Includes exponential)
U(0, ), range of X depends on .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Exponential family
http://find/ -
7/29/2019 ST 522 Slides
24/177
p y
f(x; ) = c()h(x) exp[k
j=1 wj()tj(x)], = (1, . . . , d), d k.Theorem
Let X1, . . . , Xn be iid observations from the above exponentialfamily. Then T(X) = (
ni=1 t1(Xi), . . . ,
ni=1 tk(Xi)) is sufficient
for = (1, . . . , d).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Applications
http://find/ -
7/29/2019 ST 522 Slides
25/177
pp
beta(, ).
Curved exponential family: N(, 2).
Old examples revisited: binomial, Poisson, normal,exponential, gamma (except uniform). Exercise
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
More applications
http://find/ -
7/29/2019 ST 522 Slides
26/177
pp
Discrete uniform. P(X = x) = 1/, x = 1, . . . , , a positiveinteger.
f(x, ) = e(x)
, x > .A universal example. iid f density. Order statisticsT = (X(1), . . . , X(n)) is sufficient.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Remarks
http://find/ -
7/29/2019 ST 522 Slides
27/177
In the order statistics example, the dimension of T the same
as the dimension of the data. Still this is a nontrivial reductionas n! different values of data corresponds to one value of T.
Often one finds better reductions for specific parametricfamilies, as seen in the many examples before.
Trivially X is always sufficient for itself, has no gain.When one statistic is a mathematical function of the otherand vice versa (i.e., there is a one to one correspondence),then they carry exactly the same amount of information, soare equivalent.
More generally, if T is sufficient for and T = c(U), amathematical function of some other statistic U, then U isalso sufficient.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples of in-sufficiency
http://find/ -
7/29/2019 ST 522 Slides
28/177
X1, X2 iid Poi(). T = X1 X2 is not sufficient.
X1, . . . , Xn iid pmf f(x; ). T = (X1, . . . , Xn1) is notsufficient.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Minimal sufficiency
http://goforward/http://find/http://goback/ -
7/29/2019 ST 522 Slides
29/177
Maximum possible reduction.
Definition (Minimal sufficient statistic)
T is a minimal sufficient statistic if, given any other sufficient
statistic T, there is a function c() such that T = c(T).Equivalently, T is minimal sufficient if, given any other sufficientstatistic T, whenever x and y are two data values such thatT(x) = T(y), then T(x) = T(y).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Checking minimal sufficiency
http://goforward/http://find/http://goback/ -
7/29/2019 ST 522 Slides
30/177
Theorem (Lehmann-Scheffe Theorem)
A statistic T is minimal sufficient if the following property holds:For any two sample points x and y, f(x; )/f(y; ) does notdepend on if and only if T(x) = T(y).
Corollary
Minimal sufficient statistic is not unique. But any two are inone-to-one correspondence, so are equivalent.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
http://goforward/http://find/http://goback/ -
7/29/2019 ST 522 Slides
31/177
iid N(, 2).
iid U(, + 1).
iid Cauchy().
iid U(, ).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Minimal sufficiency in exponential family
http://goforward/http://find/http://goback/ -
7/29/2019 ST 522 Slides
32/177
Theorem
For iid observations from an exponential family
f(x; ) = c()h(x)exp[wj()tj(x)],so that, no affine (linear plus constant) relationship exists betweenw1(), . . . , wk(), the statisticT(X) = (
ni=1 t1(Xi), . . . ,
ni=1 tk(Xi)) is minimal sufficient for
= (1, . . . , d).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
http://goforward/http://find/http://goback/ -
7/29/2019 ST 522 Slides
33/177
N(, 2).
Ga(, ).
Be(, ).N(, 2).
Be(, 1 ), 0 < < 1.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Ancillary statistic
http://find/ -
7/29/2019 ST 522 Slides
34/177
Definition
A statistic T is called ancillary if its distribution does not dependon the parameter.
Induced family is singleton, completely known, contains noinformation about . Opposite of sufficiency.Function of ancillary is ancillary.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
http://find/ -
7/29/2019 ST 522 Slides
35/177
iid U(, + 1).
Location family, iid f(x ).Scale family, iid 1f(x/).
iid N(, 1).X1, X2 iid N(0,
2).
X1, . . . , Xn iid N(, 2).
T = ((X1 X)/S, . . . , (Xn X)/S), where S is samplestandard deviation is ancillary.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Results
http://find/ -
7/29/2019 ST 522 Slides
36/177
Location family. f(x ).T is a location invariant statistic, i.e.,T(x1 + b, . . . , xn + b) = T(x1, . . . , xn). Then T is ancillary.In particular, sample sd S is ancillary (and so are otherestimates of scale).
Location scale family. 1
f((x )/).T is a location-scale invariant statistic, i.e.,T(ax1 + b, . . . , axn + b) = T(x1, . . . , xn). Then T is ancillary.If T1 and T2 are such thatT1(ax1 + b, . . . , axn + b) = aT1(x1, . . . , xn) and
T2(ax1 + b, . . . , axn + b) = aT2(x1, . . . , xn), then T1/T2 isancillary.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
http://find/ -
7/29/2019 ST 522 Slides
37/177
Question. An ancillary statistic does not contain any informationabout . Then why do we study it?It indicates how good the given sample is.
ExampleX1, . . . , Xn iid U( 1, + 1). is estimated by the midrange(X(1) + X(n))/2. The range R = X(n) X(1) is ancillary.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
http://find/ -
7/29/2019 ST 522 Slides
38/177
Question. Can addition or removal of ancillary informationchange the information content about ?Intuitively, one may think that ancillary contains no informationabout , so it should not change the information content. But this
interpretation is false.
U(, + 1).
A more dramatic example: (X, Y) BVN(0, 0, 1, 1, ).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Completeness
http://goforward/http://find/http://goback/ -
7/29/2019 ST 522 Slides
39/177
Let a parametric family {f(x, ), } be given. Let T be astatistic. Induced family of distributions fT(t, ), .Definition
A statistic T is called complete (for the family {f(x, ), }), orequivalently the induced family fT(t, ), is called complete if
E(g(T)) = 0 for all implies g(T) = 0 a.s. P for all .
In other words, no non-constant function of T can have constantexpectation (in ).Completeness not only depends on the statistic, but also on thefamily. For instance, no nontrivial statistic is complete if the familyis singleton.In order to find optimal estimators and tests, one sometimes needsto find complete sufficient statistics.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
http://find/ -
7/29/2019 ST 522 Slides
40/177
X bin(n, ), 0 < < 1.X
Poi(), 0 < .
X N(, 1), < < .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
http://find/ -
7/29/2019 ST 522 Slides
41/177
Theorem
Let X1, . . . , Xn be iid observations from the above exponential
family. Then T(X) = (n
i=1 t1(Xi), . . . ,n
i=1 tk(Xi)) is completeif the parameter space contains an open set in Rk (i.e., d = k).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
http://find/ -
7/29/2019 ST 522 Slides
42/177
A non-exponential example: iid U(0, ), T = X(n).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Useful facts
http://find/ -
7/29/2019 ST 522 Slides
43/177
If T is complete and S = (T) is a function of T, then S isalso complete.
The constant statistic is complete for any family.
A non-constant ancillary statistic cannot be complete.A statistic is called first order ancillary if its expectation is freeof . If a non-constant function of statistic T is first orderancillary, then T cannot be complete.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Connection with minimal sufficiency
http://find/ -
7/29/2019 ST 522 Slides
44/177
Theorem
If T is complete and sufficient, and a minimal sufficient statisticexists, then T is also minimal sufficient.
As a consequence, in the search for complete sufficient statistics, itis enough check completeness of a minimal sufficient statistic (ifexists and easily found).This implies no complete sufficient statistic exists for theU(, + 1) family, or the Cauchy() family.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Basus theorem
http://goforward/http://find/http://goback/ -
7/29/2019 ST 522 Slides
45/177
T complete sufficient carries all relevant information about . Sancillary carries no information about . The followingremarkable result shows that they are statistically independent.
Theorem (Basus theorem)
A complete sufficient statistic is independent of all ancillarystatistics.
Completeness cannot be dropped, even if T is minimal sufficient iid U(, + 1).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Applications
http://goforward/http://find/http://goback/ -
7/29/2019 ST 522 Slides
46/177
iid exponential. Then T =
ni=1 Xi and (W1, . . . , Wn) are
independent, where Wj = Xj/T. Also calculate E(Wj).
iid normal. T = X and sample standard deviation S areindependent.
iid U(0, ). Then X(n) and X(1)/X(n) are independent. Alsocalculate E(X(1)/X(n)).
iid Ga(, ), > 0 known. Let U = (n
i=1 Xi)1/n. Then
U/X is ancillary, independent of X. AlsoE
[(U/X)
k
] =E
(U
k
)/E
(X
k
).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Likelihood
http://find/ -
7/29/2019 ST 522 Slides
47/177
X f(, ) pmf or pdf. X = x is observed.Definition
The likelihood function is a function of the parameter with anobserved sample, and is given by L(|x) = f(x, ).Same expression, but now x is fixed and is variable.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
http://find/ -
7/29/2019 ST 522 Slides
48/177
binomial experiment. Decide to stop after 10 trials. 3successes obtained.
negative binomial experiment. Decide to stop after 3successes. 10 trials were needed.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
http://find/ -
7/29/2019 ST 522 Slides
49/177
Likelihood can be viewed as the degree of plausibility. An estimateof may be obtained by choosing the most plausible value, i.e.,where the likelihood function is maximized. This leads to one ofthe most important methods of estimation the maximumlikelihood estimator (more details in Chapter 7).For instance, in either example above, the likelihood function ismaximized at 0.3.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
More examples
http://find/ -
7/29/2019 ST 522 Slides
50/177
iid Poisson()
iid N(, 2)
iid U(0, )Exponential family
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Bayesian approach
http://find/ -
7/29/2019 ST 522 Slides
51/177
Suppose that can be considered as a random quantity with somemarginal distribution (), a pre-experiment assessment called theprior distribution. Then we can legitimately calculate the posteriordistribution of given the data by the Bayes theorem. Thisposterior distribution will be the source of any inference about .
Theorem (Bayes theorem)
(|X) = ()f(X, )
(t)f(X, t)dt.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
http://find/ -
7/29/2019 ST 522 Slides
52/177
iid Bin(1, ), prior U(0, 1).
iid Poi(), prior standard exponential.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
http://find/ -
7/29/2019 ST 522 Slides
53/177
Difficulty: is fixed, nonrandom.
How to specify a prior?
Bayesians response:
Probability is a quantification of uncertainty of any type.
The arbitrariness of prior choice can be rectified to someextent by the use of automatic priors which arenon-informative. (More later)
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Point Estimation
http://find/ -
7/29/2019 ST 522 Slides
54/177
Find estimators for the unknown parameter or its function().
Evaluate your estimators (are they good?)
Definition
A point estimator of , is a function = W(X1, . . . , Xn).Given a sample of realized observations, the number W(x1, . . . , xn)is called a point estimate of .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Methods of point estimation
http://find/ -
7/29/2019 ST 522 Slides
55/177
method of moments
maximum likelihood estimator (MLE)
Bayes estimators
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Method of Moments
http://find/ -
7/29/2019 ST 522 Slides
56/177
Let X1, . . . , Xn be a sample from a population with pdf or pmff(x|1, . . . , k). Estimate = (1, . . . , k) by solving k equationsformed by matching first k sample and population raw moments:
m1
= 1nn
i=1Xi
, 1
= E
(X)m2 =
1n
ni=1 X
2i ,
2 = E(X
2). . . , . . .mk =
1n
ni=1 X
ki ,
k = E(X
k)
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
http://find/ -
7/29/2019 ST 522 Slides
57/177
X1, . . . , Xn iid N(, 2), both and 2 unknown.X1, . . . , Xn iid Bin(1, ).X1, . . . , Xn iid Ga(, ), with (, ) unknown.X1, . . . , Xn iid Unif(1, 2), where 1 < 2, both unknown.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Features
http://find/ -
7/29/2019 ST 522 Slides
58/177
Easy to implement
Computationally cheap
Converges to the parameter with increasing probability (called
consistency)Not necessarily give asymptotically most efficient estimator
Often used as an initial estimator in iterative methods
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Maximum Likelihood Estimator
http://find/ -
7/29/2019 ST 522 Slides
59/177
Recall that the likelihood function is
L(|X) = L(|X1, . . . , Xn) =n
i=1
f(Xi|)
Definition
The maximum likelihood estimator (MLE) of is the location atwhich L(|X) attains its maximum as a function of . Its numericalvalue is often called the maximum likelihood estimate.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
How to find the MLE?
We want to find the global maximum of L(|X)
http://find/ -
7/29/2019 ST 522 Slides
60/177
We want to find the global maximum of L(|X).If L(|X) is differentiable in (1, . . . , k), we solve
j
L(|X) = 0, j = 1, . . . , k.
The solutions to these likelihood equations locate onlyextreme points in the interior of , and provide possible
candidates for the MLE. They can be local or global minima,local or global maxima, or inflection points. Our job is to finda global maximum.
((d2/d2)L())= < 0 is sufficient for local maxima. We alsoneed to check the boundary points separately.
If there is only one local maxima, then that must be theunique global maxima.
Many examples falls in this category, so no further work willbe needed then.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
How to find the MLE? (contd.)
http://find/ -
7/29/2019 ST 522 Slides
61/177
In practice, we often work with log L(
|X), i.e. solve
jlog L(|X) = 0, j = 1, . . . , k.
We consider several different situations:
one parameter casenon-differentiable L(|X)restricted range MLE (e.g. is not the whole real line)
discrete parameter space
two-parameter case
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples: One-parameter case
http://find/ -
7/29/2019 ST 522 Slides
62/177
X1, . . . , Xn iid N(, 1), with unknown.X1, . . . , Xn iid Poi().X1, . . . , Xn
iid Exp().
(numerical/iterative method): X1, . . . , Xn iid Weibull().(numerical/iterative method): X1, . . . , Xn iid gamma(, 1).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Restricted MLE
http://find/ -
7/29/2019 ST 522 Slides
63/177
Parameter space is a proper subset of the set of all possiblevalues of the parameter. Special attention is needed to make sure
X1, . . . , Xn iid N(, 1), 0.But what if > 0?
X1, . . . , Xn iid N(, 2), a b.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Non-differentiable likelihood
http://find/ -
7/29/2019 ST 522 Slides
64/177
X1, . . . , Xn iid Unif(0, ], > 0.X1, . . . , Xn iid exponential location family with pdf
f(x) = e(x)
, if x .X1, . . . , Xn iid Unif( 12 , + 12).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Discrete parameter space
http://find/ -
7/29/2019 ST 522 Slides
65/177
Example
Let X by a single observation taking values from {0, 1, 2} accordingto P, where = 0 or 1. The probability of X is summarized
x = 0 x = 1 x = 2 = 0 0.8 0.1 0.1 = 1 0.2 0.3 0.5
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples: Two-parameter case
http://find/ -
7/29/2019 ST 522 Slides
66/177
For differentiable likelihood, needs calculus of several variables ingeneral, but often simple tricks help reduce to one-dimension.
X1, . . . , Xn iid N(, 2).
X1, . . . , Xn iid location-scale exponential family, with pdff(x; , ) =
1
e(x)/ if x .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Remarks about the MLE
http://find/ -
7/29/2019 ST 522 Slides
67/177
The MLE is the value for which the observed sample
xismost likely; possess some optimal properties (discussed later)
In exponential families, coincides with the method of momentestimator.
The MLE can be numerically sensitive to the variation in the
data, if the likelihood function is discontinuous.
If T is sufficient for , then the MLE must be a function ofT.
MLE is the value of that maximizes g(T(X), ), where
g(t, )) is the pdf or pmf of T = T(X) at t.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Induced likelihood
http://find/ -
7/29/2019 ST 522 Slides
68/177
If = () is a parametric function, then the likelihood for isdefined by
L(|X) = sup:()=
L(|X).
Theorem (Invariance Principle)
If is the MLE of , then for any function (), the MLE of ()is ().
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
http://find/ -
7/29/2019 ST 522 Slides
69/177
X1, . . . , Xn iid Bin(1, ). Find the MLE of
(1 ).X1, . . . , Xn iid Poi(). Find the MLE ofP(X 1).X1, . . . , Xn iid N(, 2).
Find the MLE of /.Find the MLE of the population median.Find the MLE for c = c(, ) such that P,(X > c) = 0.025.(the 97.5% percentile of the distribution of X).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
EM-algorithm
Useful numerical algorithm to compute the MLE with
http://find/ -
7/29/2019 ST 522 Slides
70/177
missing data
Iterative method repeating E-step (Expectation) and M-step(Maximization).
Given data Y, missing vital X. Augmented data (X, Y).
Actual likelihood L(|Y) = E[L(|X, Y)|Y].
Start with an initial estimator.
Calculate E=0(log L(|X, Y)|Y).Maximize with respect to to get update 1.
Repeat the procedure by replacing the old estimate by the
new until convergence.
Example
Multinomial (( + 1)/2, /4, /4, 1/2 ).Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Bayes Estimators
http://find/ -
7/29/2019 ST 522 Slides
71/177
Recall, in the Bayesian approach is considered as a quantity
whose variation can be described by a probability distribution(called the prior distribution). A sample is then taken from apopulation indexed by and the prior distribution is updated withthis sample information. The updated prior is called the posteriordistribution.
Prior distribution of : ()Posterior distribution of : (|X) = f(X|)()/m(X)Marginal distribution of X: m(X) =
f(X|)()d
The mean of the posterior distribution, E(|X), can be used
as the Bayes estimator of .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
http://find/ -
7/29/2019 ST 522 Slides
72/177
X1, . . . , Xn iid Bin(1, ). Assume the prior distribution on isBeta(, ). Find the posterior distribution of and the Bayesestimator of .
Special case: () Unif(0,1).X1, . . . , Xn iid N(0, ), [0, 1], prior U[0, 1].
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Conjugate family
http://find/ -
7/29/2019 ST 522 Slides
73/177
Let Fdenote the class of pdfs or pmfs f(x|). A class of priordistributions is a conjugate family for F if the posteriordistribution is in the class for all f F, all priors in , and allobservation values x.
Examples:The beta family is conjugate for the binomial family.
The normal family is conjugate for the normal family.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Methods of Evaluating Estimators
http://find/ -
7/29/2019 ST 522 Slides
74/177
Various criteria to evaluate and compare different pointestimators
mean squared error
best unbiased estimators or UMVUE (Uniform MinimumVariance Unbiased Estimator)
optimal for general loss function and risk
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Unbiasedness and Mean Squared ErrorThe bias of a point estimator W of is Bias(W) = EW .An estimator whose bias is equal to 0 is called unbiased
http://find/ -
7/29/2019 ST 522 Slides
75/177
An estimator whose bias is equal to 0 is called unbiased.
An unbiased estimator satisfies EW = for all .
The mean squared error (MSE) of an estimator W of isdefined by E(W )2.
the MSE is a function of , and has the representation
E(W )2 = VarW + (BiasW)2.
the MSE incorporates two components, one measuring thevariability of the estimator (precision) and the other measuringits bias (accuracy).Small value of MSE implies small combined variance and bias.Unbiased estimators do a good job of controlling bias.Smaller MSE indicates smaller probability for W to be far from, because
P(|W | > ) 12
E(W )2 = 12
MSE(W)
by Chebyshev Inequality.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
http://find/ -
7/29/2019 ST 522 Slides
76/177
In general, there will not be one best estimator. Often the MSE
of two estimators cross each other, showing that each estimator isbetter in only a portion of the parameter space.
Example
Let X1, X2 be iid from Bin(1, p) with 0 < p < 1. Compare three
estimators with respect to their MSE.
p1 = X1
p2 =X1+X2
2
p3 = 0.5.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Illustration
http://find/ -
7/29/2019 ST 522 Slides
77/177
Let X1, . . . , Xn be iid N(, 2). Show X is unbiased for and
S2 is unbiased for 2, and compute their MSEs.What about non-normal distributions with mean andvariance 2?
Let X1, . . . , Xn be iid N(, 2). Show the estimator2 = 1
n
ni=1(Xi X)2 is biased for 2, but it has a smaller
MSE than S2.More generally, find the MSE of cS2.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Uniformly Minimum Variance Unbiased Estimator
If the estimator W is unbiased for (), then its MSE is equal to
http://find/ -
7/29/2019 ST 522 Slides
78/177
( ), qVar(W). Therefore, choosing a better unbiased estimator is
equivalent to choosing the one with smaller variance.
Definition
An estimator W is a best unbiased estimator of () if it satisfies:
EW = () for all ;
For any other estimator W with EW = (), we have
VarW VarW for all .
W is also called a uniform minimum variance unbiased estimator(UMVUE).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
http://find/ -
7/29/2019 ST 522 Slides
79/177
Example
X1, . . . , Xn iid Poi(). Both X and S2 are unbiased for .
How to find a best unbiased estimator?
If B() is a lower bound on the variance of any unbiasedestimators of (), and if W is unbiased satisfiesVarW
= B(), then W is a UMVUE.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Cramer-Rao InequalityTheorem
Let X be a sample with pdf f (x ) Suppose W (X) is an
http://find/ -
7/29/2019 ST 522 Slides
80/177
Let X be a sample with pdf f(x, ). Suppose W(X) is anestimator satisfying
EW(X) = () for any ;VarW(X) < .
If differentiation under integral sign can be carried out, then
Var(W(X)) [()]2E
( log f(X|))2
.
In the i.i.d. case, the bound reduces to ()2/nI(), where
I() = E
(
log f(X|))2
is called the Fisher information (per observation).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Score function: s(X, ) = log f(X|) = 1f(X|) f(X|).
( ( ))
http://find/ -
7/29/2019 ST 522 Slides
81/177
Lemma (Expressions for I())
If differentiation and integration are interchangeable,
I() = E (s(X, ))2 = var (s(X, ))
=
E
2
2log f(X, )
=
log f(x, )
2f(x, )dx
= f(x, )2
f(x, )dx
=
2
2log f(x, )
f(x, )dx.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
http://find/ -
7/29/2019 ST 522 Slides
82/177
X1, . . . , Xn iid Poi(). Find the Fisher information numberand a UMVUE for .
X1, . . . , Xn iid N(,
2
), unknown but
2
known. Find aUMVUE for using Cramer-Rao bound.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
http://find/ -
7/29/2019 ST 522 Slides
83/177
When can we exchange differentiation and integration?yes for exponential family.
not always true for non-exponential family. We have to do amatch check for d
d h(x)f(x, )dx and h(x) [f(x, )]dx.
Example
X1, . . . , Xn iid from Unif(0, ).
Cramer-Rao bound does not work here!
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Attainability of Cramer-Rao boundThe Cramer-Rao bound inequality says, if W achieves thevariance bound then it is an UMVUE. In the one-parameter
http://find/ -
7/29/2019 ST 522 Slides
84/177
exponential family case, we can find such an estimator. But there
is no guarantee that this lower bound is sharp (attainable) in othersituations. It is possible that the value of Cramer-Rao bound maybe strictly smaller than the variance of any unbiased estimator.
Corollary
Let X1, . . . , Xn be iid with pdf f(x, ), where f(x, ) satisfies theassumptions of the Cramer-Rao bound theorem. LetL(|x) = ni=1 f(xi, ) denote the likelihood function. If W(X) isunbiased for (), then W(X) attains the Cramer-Rao LowerBound if and only if
a()[W(X) ()] = s(X, )
for some function a().
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Attainability in one-parameter exponential family
http://find/ -
7/29/2019 ST 522 Slides
85/177
TheoremLet X1, . . . , Xn be iid from a one-parameter exponential familywith the pdf f(x, ) = c()h(x)exp{w()T(x)}. AssumeE[T(X)] = (). Then n1
ni=1 T(Xi), as an unbiased estimator
of (), attains the Cramer-Rao Lower Bound, i.e.
Var
n1
ni=1
T(Xi)
=
[()]2
nI().
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
http://find/ -
7/29/2019 ST 522 Slides
86/177
X1, . . . , Xn iid from Bin(1, ). Find an UMVUE of and showit attains the Lower Bound.
X1, . . . , Xn N(, 2), with (, 2) both unknown. Consider
estimation of 2. What is the Cramer-Rao Lower bound andis it attainable?
Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II
Constructing UMVUE using Rao-Blackwell Method
http://find/ -
7/29/2019 ST 522 Slides
87/177
An important method of finding/constructing UMVUEs with thehelp of conditioning on a complete and sufficient statistics.Review on conditional expectation:
E(X) = E[E(X|Y)], for any X, Y.Var(X) = Var[E(X|Y)] + E[Var(X|Y)], for any X, YE(g(X)|Y) = g(x)fx|y(x|y)dx, and it is a function of Y.Cov(E(X|Y), Y) = Cov(X, Y).
Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II
Rao-Blackwell Theorem
Th
http://goforward/http://find/http://goback/ -
7/29/2019 ST 522 Slides
88/177
Theorem
Let W be unbiased for () and T be a sufficient statistic for .Define (T) = E(W|T). Then the following hold
E(T) = ();
Var(T) VarW for all .Thus, E(W|T) is a uniformly better unbiased estimator of() than W.
Conditioning any unbiased estimator on a sufficient statistic willresult in a uniform improvement, so we need consider only
statistics that are functions of a sufficient statistic for bestunbiased estimators.
Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II
Examples
http://find/ -
7/29/2019 ST 522 Slides
89/177
Let X1, X2 be iid N(, 1). Show X1 is unbiased for andE(X1|X) is uniformly better.Let X1, . . . , Xn be iid Unif(0, ). Show Y = (n + 1)X(1) isunbiased for and E(Y|X(n)) is uniformly better.
Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II
Uniqueness of UMVUE
http://find/ -
7/29/2019 ST 522 Slides
90/177
Theorem
If W is an UMVUE of (), then W is unique.
Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II
UMVUE and unbiased estimators of zero
http://find/ -
7/29/2019 ST 522 Slides
91/177
Theorem
IfEW = (), W is the best unbiased estimator of () if andonly if W is uncorrelated with all unbiased estimators of 0.
Example
Let X be an observation from a Unif(, + 1). Show that
X 12 is unbiased for .Show that h(X) = sin(2X) is an unbiased estimators of zero.
Show X
12 and h(X) are correlated. So X
12 is not best.
Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II
Lehmann-Scheffe theorem
http://find/ -
7/29/2019 ST 522 Slides
92/177
Theorem
Let T be a complete sufficient statistic for a parameter , and let(T) be any estimator based on T. Then (T) is the unique bestunbiased estimators of its expected value.
ThusFind a complete sufficient statistic T for a parameter ,
Find an unbiased estimator h(X) of (),then (T) = E(h(X)|T) is the best unbiased estimator of().
Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II
Examples
http://find/ -
7/29/2019 ST 522 Slides
93/177
Let X1, . . . , Xn be iid Bin(k, ).
X1, . . . , Xn are iid from Unif(0, ).
Find the UMVUE of .Find the UMVUE of g(), where g is differentiable on (0, ).
Suppose X1, . . . , Xn are iid from Poi().Find the UMVUE of .Find the UMVUE of g() = r, r 1 integer.Find the UMVUE of g() = e.
Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II
More Examples
S h h d i bl Y Y i f
http://find/ -
7/29/2019 ST 522 Slides
94/177
Suppose that the random variables Y1, . . . , Yn satisfy
Yi = xi + i, i = 1, . . . , n,
where x1, . . . , xn are fixed constants, and 1, . . . , n are iidN(0, 2) with 2 known. Find the MLE of and show it is
UMVUE.Suppose X1, . . . , Xn are iid from exp(), > 0.
Find the UMVUE for .Find the UMVUE for () = 1 F(s), whereF(s) = P(X1 > s).
Find the UMVUE for e1/.
Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II
More Examples (contd.)
http://find/ -
7/29/2019 ST 522 Slides
95/177
Suppose X1, . . . , Xn are iid from N(, 2), both (, 2)unknown.
Find the UMVUE for .Find the UMVUE for 2.Find the UMVUE for 2.
Normal probability. X1, . . . , Xn iid N(, 1).() = P(X1 c) = (c ).Ridiculous UMVUE. X1, . . . , Xn iid Poi(). () = e
3.
Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II
Loss Function Optimality
Observations X1, . . . , Xn are iid with pdf f(x, ), . Toevaluate the estimator (X), various loss function can be used.
http://find/ -
7/29/2019 ST 522 Slides
96/177
The loss function measures the closeness of and
absolute error loss: L(, ) = ( )2squared error loss: L(, ) = | |a loss that penalizes overestimation more thanunderestimation is
L(, ) = ( )2I( < ) + 10( )2I( ).
a loss that penalized more if is near 0 than if || is large
L(, ) = ( )2|| + 1
Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II
Loss Function Optimality (contd.)To compare estimators, we use the expected loss, called the riskfunction,
R(, ) = EL(, (X)).
http://find/ -
7/29/2019 ST 522 Slides
97/177
( ) ( ( ))
If R(, 1) < R(, 2) for all , then 1 is the preferredestimator because it performs better for all . In particular, for thesquared error loss, the risk function is the MSE.
Example
X1, . . . , Xn iid from Bin(1, ). Compare two estimators in terms oftheir MSE.
MLE 1 = X
Bayes estimator: prior () Beta(, ) with
= =
n/4,
B =
ni=1 Xi +
n/4
n +
n.
Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II
Minimaxity
Risk functions are generally overlapping. One cannot beat everyoneelse.
http://goforward/http://find/http://goback/ -
7/29/2019 ST 522 Slides
98/177
Example
X1, . . . , Xn iid N(, 2). Consider the estimators of the form
b(X) = bS2.
Minimaxity: Compare the worst case scenario compare themaximum risks. Find the estimator which has the smallestmaximum risk minimax estimator.Downside
Problems with unbounded risk maximum is infinity.
Not easy to find the minimax estimator.
Too pessimistic.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Bayes RuleThe Bayes risk is the average risk with respect to the prior ,
R(, )()d.
http://find/ -
7/29/2019 ST 522 Slides
99/177
By definition, the Bayes risk can be written as
R(, )()d =
L(, (x))f(x|)dx
()d.
Note f(x)() = (
|x)m(x), where (x
|) is the posterior
distribution of and m(x) is the marginal distribution of X, thenthe Bayes risk becomes
R(, )()d =
L(, (x))(|x)d
m(x)dx.
The quantity L(, (x))(|x)d is called the posterior expected
loss.To minimize the Bayes risk, we only need to find to minimize theposterior expected loss for each x.Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Bayes Rule (contd.)
Th B l h h ld
http://find/ -
7/29/2019 ST 522 Slides
100/177
The Bayes rule with respect to a prior is an estimator that yieldsthe smallest value of the Bayes risk.
For squared error loss, the posterior expected loss is
(
a)2(
|x)d = E (
a)2
|x ,
therefore the Bayes rule is E(|x).For absolute error loss, the posterior expected loss isE(| a||x). The Bayes rule is the median of (|x).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
http://find/ -
7/29/2019 ST 522 Slides
101/177
X1, . . . , Xn are iid from N(, 2) and let () be N(, 2).
The values 2, , 2 are known.
X1, . . . , Xn are iid from Bin(1, ) and let () be Beta(, ).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Hypothesis Testing
http://find/ -
7/29/2019 ST 522 Slides
102/177
Point estimation: provide a single estimate of Hypothesis testing: test a statement about
A hypothesis is a statement about a population parameter.
Two complementary hypotheses in a hypothesis testing arecalled the null hypothesis and alternative hypothesis. Let
0be a subset of the parameter space, called null region. Thehypotheses are denoted by H0 and H1,
H0 : 0 vs H1 : c0.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
IllustrationExample
An ideal manufacturing process requires that all products ared f i Thi i ld Th l i k h
http://find/ -
7/29/2019 ST 522 Slides
103/177
non-defective. This is very seldom. The goal is to keep theproportion of defective items as low as possible. Let be theproportion of defective items, and 0.01 be the maximumacceptable proportion of defective items.Statement 1: 0.01 (the proportion of defectives isunacceptably high)Statement 2: < 0.01 (acceptable quality)
Example
Let be the average change in a patients blood pressure aftertaking a drug. An experimenter might be interested in testingH0 : = 0 (the drug has no effect on blood pressure)H1 : = 0 (there is some effect)
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Different Types of Hypotheses
http://find/ -
7/29/2019 ST 522 Slides
104/177
Simple hypotheses: Both H0 and H1 consist of only oneprobability distribution,
Composite hypotheses: Either H0 or H1 contains more thanone possible distribution
One-sided hypotheses: H : 0 or H : < 0.Two-sided hypotheses: H0 : = 0 vs H1 : = 0.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Rejection region
A hypothesis testing procedure or hypothesis test is a rulethat specifies:
for which sample values the decision is made to accept H0 as
http://find/ -
7/29/2019 ST 522 Slides
105/177
p p 0
truefor which sample values H0 is rejected and H1 is accepted astrue.
The subset of the sample space for which H0 will be rejectedis R: rejection region or critical region.
The complement of the rejection region is Rc: acceptanceregion.
The rejection region R of a hypothesis test is usually definedby a test statistic W(X), a function of the sample
R = {X : W(X) > c} = reject H0.
Rc = {X : W(X) c} = accept H0.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Methods of Evaluating Tests
In deciding to accept or reject the null hypothesis H0, we mightmake a mistake no matter whatever the decision is. There are two
http://find/ -
7/29/2019 ST 522 Slides
106/177
types of errors:Type I error: if H0 is actually true, i.e. 0, but the testincorrectly decides to reject H0
Type II error: if H0 is actually false, i.e. c0, but the testincorrectly decides to accept H0
DecisionAccept H0 Reject H0
H0 Correct decision Type I errorTruth
H1 Type II error Correct decision
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Power Function
Definition
The power function of a hypothesis test with rejection region R isthe function of defined by
http://find/ -
7/29/2019 ST 522 Slides
107/177
y
() = P(X R).
=
probability of Type I error if 01 probability of Type II error if c0
Note P(Type I error) = (), for 0, P(Type II error) =1 (), for c0Ideal test: () = 0 for all
0; () = 1 for all
c0.
Good test:
() is near 0 (small) for most 0;() is near 1 (large) for most c0.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Example (Binomial power function)
http://find/ -
7/29/2019 ST 522 Slides
108/177
Example (Binomial power function)
XBin(5, ).
H0 : 12
versus H1 : >1
2.
Test 1: reject H0 if and only if all successes are observed,i.e R = {5}Test 2: reject H0 if X = 3, 4, or 5.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Likelihood Ratio Tests (LRT)
Definition
http://find/ -
7/29/2019 ST 522 Slides
109/177
The likelihood ratio test statistic for testing H0 : 0 vsH1 : c0 is
(x) =sup0L(|x)supL(|x)
.
A likelihood ratio test (LRT) has a rejection region
R : {x : (x) c},
where c is any number satisfying 0 c 1.This should be reduced to the simplest possible form.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Rationale of LRT
The numerator of (x) is the maximum probability of the
http://find/ -
7/29/2019 ST 522 Slides
110/177
observed sample, computed over parameters in H0. Thedenominator of (x) is the maximum probability of theobserved sample over all possible parameters.
The numerator says which 0 makes the observation ofdata most likely; the denominator say which
make the
observation of data most likely.
The ratio of these two maxima is small if there are parameterpoints in H1 for which the observed sample is much morelikely than for any parameter in H0. In this situation, the LRT
criterion says H0 should be rejected and H1 accepted as true.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Relation between LRT and MLE
Let 0 be the MLE of in the null set 0 (restricted
http://find/ -
7/29/2019 ST 522 Slides
111/177
Let 0 be the MLE of in the null set 0 (restrictedmaximization).Let be the MLE of in the full set (unrestrictedmaximization). then the LRT statistic, a function of x (not ) is
(x) = sup0L(|x
)supL(|x) = L(0|x)
L(|x)In R : {x : (x) c}, different c gives different rejection regionand hence different tests.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
X1, . . . , Xn iid N(, 2) with unknown (2 known).Consider testing
http://find/ -
7/29/2019 ST 522 Slides
112/177
H0 : = 0 versus H1 : = 0,where 0 is a number fixed by the experimenter prior to theexperiment.
Find the LRT and its power function.
Comment on the decision rules given by different cs.Let X1, . . . , Xn be a random sample from alocation-exponential family
f(x, ) = e(x) if x
,
where < < . Consider testing H0 : 0 versusH1 : > 0. Find the LRT.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
LRT and sufficiency
http://find/ -
7/29/2019 ST 522 Slides
113/177
TheoremIf T(X) is a sufficient statistic for , (t) is the LRT statisticbased on T, and (x) is the LRT statistic based on x. Then
(T(x)) = (x)
for every x in the sample space.
Thus the simplified expression for (x) should depend on x onlythrough T(x) if T(X) is a sufficient statistic for .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
http://find/ -
7/29/2019 ST 522 Slides
114/177
X1, . . . , Xn iid N(, 2) with 2 known. Test
H0 : = 0 versus H1 : = 0.
Let X1, . . . , Xn be a random sample from alocation-exponential family. Test H0 : 0 versusH1 : > 0.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Nuisance parameter case
Likelihood ratio tests are also useful when there are nuisance
http://find/ -
7/29/2019 ST 522 Slides
115/177
parameters, which are present in the model but not of directinterest.
Example
X1, . . . , Xn
iid N(, 2), both and 2 unknown. Test
H0 : 0 versus H1 : > 0.Specify and 0.
Find the LRT and the power function.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Bayesian Tests
http://find/ -
7/29/2019 ST 522 Slides
116/177
Using the posterior density (|x, computeP( 0 |x) = P(H0 is true |x)P( c0 |x) = P(H1 is true |x)
Decide in favor the hypothesis which has greater posteriorprobability: Accept H0 if P( 0 |x) 12 .Does not work if 0 is a point and is given a prior density. Onewill need to put a prior mass at the point.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
http://find/ -
7/29/2019 ST 522 Slides
117/177
Example
Let X1, . . . , Xn be iid N(, 2) and the prior distribution on be
N(, 2), where 2, , 2 are known. Test H0 :
0 against
H1 : > 0.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Unbiased Test
Definition
http://find/ -
7/29/2019 ST 522 Slides
118/177
A test with power function () is unbiased if
() (), for every c0 and 0.
In most problems, there are many unbiased tests.
Recall () = P(reject H0). An unbiased test says that theprobability of rejecting H0 when H0 is true is smaller than theprobability of rejecting H0 when H0 is false.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
XBin(5, ). Consider testing
http://find/ -
7/29/2019 ST 522 Slides
119/177
H0 : 12
versus H1 : >12
and reject H0 if X = 5.
X1, . . . , Xn
N(, 2),with 2 known. Consider testing
H0 : 0 versus H1 : > 0.
The LRT test is unbiased.
Draw the graph of the power function.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Controlling Type I error
http://find/ -
7/29/2019 ST 522 Slides
120/177
For a fixed sample size, it is usually impossible to make both typesof error arbitrarily small.Common approach:
Control the Type I error probability at a specified level .
Within this class of tests, make Type II error probability thatis as small as possible; equivalently, maximize the power.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Size and level test
Definition
For 0 1, a test with power function () is a size test if
http://find/ -
7/29/2019 ST 522 Slides
121/177
sup0
() = .
Definition
For 0 1, a test with power function () is a level test ifsup0
() .
If these relations hold only in the limit as n , we call the testsrespectively asymptotically size (level) . [More details in the finalchapter]
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Notations and remarks
Typical choices of are: 0.01, 0.05, 0.10.
http://find/ -
7/29/2019 ST 522 Slides
122/177
We use z/2 to denote the point having probability /2 to theright of it for a standard normal pdf. By convention, we have
P(Z > z) = , where Z N(0, 1)P(Tn1 > tn1,/2) = /2, where Tn1
tn1
P(2p > 2p,1) = 1 , chi square with d.f. pNote z = z1.Commonly used cutoffs:z0.05 = 1.645, z0.025 = 1.96, z0.01 = 2.33, z0.005 = 2.58.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
How to specify H0 and H1?
If an experimenter expects an experiment will indicate aphenomenon, should choose H1 to be the theory beingproposed.
http://find/ -
7/29/2019 ST 522 Slides
123/177
H1 is sometimes called researchers hypothesis. By using alevel test with small , the experiment is guarding againstsaying the data support the research hypothesis when it isfalse.
Announcing a new phenomenon when in fact nothing hashappened is usually more serious than missing something newthat has in fact occurred.
Similarly, in judicial system the evidence is collected to decide
whether the accused is innocent or guilty. To prevent thepossibility of penalizing an innocent person incorrectly, thetest should be set up H0: innocent versus H1 : guilty
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
http://find/ -
7/29/2019 ST 522 Slides
124/177
How to critical value of LRT
In order to make a LRT test be a size test, we choose c such that
sup
0
P((X) c) = .
-
7/29/2019 ST 522 Slides
125/177
0 iid N(, 2), 2 is known. H0 : 0 vs H1 : > 0.iid N(, 2), 2 is known. Consider testing for H0 : = 0 vsH1 :
= 0.
Let X1, . . . , Xn be iid from N(, 2), 2 unknown. Consider
testing H0 : = 0 versus H1 : = 0. Show that the LRTtest that rejects H0 if |X 0| > tn1,/2S/
n is a test of
size .
iid location-exponential dist. Consider testing H0 : 0 vsH1 : > 0. Find the size LRT test.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Sample size calculation
For a fixed sample size, it is usually impossible to make both typesof error probabilities arbitrarily small. But if we can choose the
http://find/ -
7/29/2019 ST 522 Slides
126/177
sample size, it is possible to make the desired power level.
Example
iid N(, 2), 2 is known. Test H0 : 0 vs H1 : > 0. TheLRT test rejects H0 if (
X 0)/(/n) > C has the powerfunction
() = 1
C +0 /
n
.
Note () is increasing in .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Notes
The maximum Type I error is
() (0) 1 (C )
http://find/ -
7/29/2019 ST 522 Slides
127/177
sup0 () = (0) = 1 (C).
For the size test, C = z.
After C is chosen, it is possible to increase () for > 0 by
increasing the sample size n. Thus we can minimize Type IIerror (Remember: Type I error is under control already).Draw the picture of power function for small n and large n.
Assume C = z. How to choose n such that the maximumType II error is 0.2 if
0 + ?
Compute n if = 0.05 in (3).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Example
Let X Bin(n, ). Testing:
http://find/ -
7/29/2019 ST 522 Slides
128/177
H0 : 3/4 vs H1 : < 3/4.
The LRT test for this problem is to reject H0 if X c.Choose c and n such that the following satisfies simultaneously:
If = 34 , we have Pr(reject H0|) = 0.01; (control Type Ierror)
If = 12 , we have Pr(reject H0|) = 0.99. (control Type IIerror)
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Most Powerful Tests
http://find/ -
7/29/2019 ST 522 Slides
129/177
Given that the maximum probability of Type I error less than orequal to , the most powerful level test minimizes theprobability of Type II error, or, equivalently maximizes the powerfunction at a
c0.
If this occurs for all c0, such a test is called the uniformlymost powerful (UMP) level test.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Test function
Given a rejection region R, define a test function on the sample
space to be
http://goforward/http://find/http://goback/ -
7/29/2019 ST 522 Slides
130/177
space to be
(x) =
1 if x R0 if x / R .
Interpret (X) as the probability of rejecting the null hypothesis
given the sample X.This also opens doors for randomized tests, where (X) can eventake values strictly between 0 and 1.Note the expected value of is the power function:E[(X)] = P(X
R) = ().
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Existence of UMP tests
Lemma (Neyman-Pearson)
Consider testing H0 : = 0 versus H1 : = 1, where the pdf or
pmf corresponding to i is f (x i ) i = 0 1 Consider any test
http://find/ -
7/29/2019 ST 522 Slides
131/177
pmf corresponding to i is f(x, i), i = 0, 1. Consider any testfunction satisfying
(x) = 1, if f(x, 1) > kf(x, 0),
0, if f(x, 1) < kf(x, 0),
for some k 0, andE0(X) = . Then(X) is a UMP size test,
if k > 0, any other UMP level test must have size and
can differ from only on the set{x : f(x, 1) = kf(x, 0)}.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
X Bin(2 ) one observation H0 : = 1 versus H1 : = 3
http://find/ -
7/29/2019 ST 522 Slides
132/177
XBin(2,), one observation. H0 : = 12 versus H1 : = 34 .To obtain the UMP level 1/8 test and the UMP level 1/2 test?
X Exp(), H0 : = 1 versus H1 : = 2.X
Cauchy(), H0 : = 0 versus H1 : = 1.
X Un(0, ), H0 : = 1 versus H1 : = 2.X Un(, + 1), H0 : = 0 versus H1 : = 2.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Sufficient statistic and UMP test
Let T (X) be a sufficient statistic for and g (t ) is the pdf or
http://find/ -
7/29/2019 ST 522 Slides
133/177
Let T(X) be a sufficient statistic for and g(t, ) is the pdf orpmf of T corresponding to . Then a UMP level test (T)based on T is given by
(t) =1, if g(t, 1) > kg(t, 0),
0, if g(t, 1) < kg(t, 0),
for some k 0, where = E0(T).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
http://find/ -
7/29/2019 ST 522 Slides
134/177
UMP normal test for mean: X1, . . . , Xn be iid from N(, 2)
with 2 known, H0 : = 0 versus H1 : = 1, where 1 > 0.
UMP normal test for variance: X1, . . . , Xn be iid fromN(0, 2) with 2 unknown. H
0: 2 = 2
0versus H
1: 2 = 2
1,
where 21 > 20.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Comments
Discrete Case: Suppose only has two possible values 0 or1, and X is a discrete variable taking finite k values with
Pi(X = aj) j = 1 k ; i = 0 1 H0 : = 0 vs = 1 The
http://find/ -
7/29/2019 ST 522 Slides
135/177
P (X = aj),j = 1, . . . , k; i = 0, 1. H : = vs = . Therejection region R of the UMP level test satisfies
maxR
ajR
P1(X = aj)
subject toajR
P0(X = aj) .
N-P test is the LRT test for H0 : = 0 vs = 1.
For simple hypotheses, the UMP level test is unbiased, i.e.(1) > (0) = .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
UMP test for one-sided composite alternative
http://find/ -
7/29/2019 ST 522 Slides
136/177
iid N(, 1).H0 : = 0 vs H1 : > 0.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Monotone Likelihood Ratio (MLR)
http://find/ -
7/29/2019 ST 522 Slides
137/177
Definition
A family of pdfs or pmfs {g(t, ) : } for a univariate randomvariable T with real-valued parameter has a monotone likelihoodratio (MLR) if, for every 2 > 1, g(t, 2)/g(t, 1) is an increasingfunction of t on {t : g(t, 1) > 0 or g(t, 2) > 0}.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
Normal, Poisson, Binomial all have the MLR property.
If T is from an exponential family with the density
http://find/ -
7/29/2019 ST 522 Slides
138/177
If T is from an exponential family with the densityf(t, ) = h(t)c()ew()t, then T has an MLR if w() is anondecreasing function in .
If X1, . . . , Xn iid from N(, 2) with known, then X has an
MLR.If X1, . . . , Xn iid from N(,
2) with known, thenni=1(Xi )2 has an MLR.
iid Unif(0, ), T = X(n) has MLR property.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Stochastically increasing
Definition
http://find/ -
7/29/2019 ST 522 Slides
139/177
A statistic T with family of pdf{f(t, ), } is calledstochastically increasing in if 1 < 2 implies that
P1(T > c) P2(T > c) for every c,or equivalently, F2(c) F1(c), where F is the cdf.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Useful facts
Lemma
If f il T h th MLR t th it i t h ti ll
http://find/ -
7/29/2019 ST 522 Slides
140/177
If a family T has the MLR property, then it is stochasticallyincreasing in its parameter.
A location family T is stochastically increasing in its locationparameter.
Let a test have rejection region R = {T > c}. If T has theMLR property, then the power function() = P(T R) = P(T > c) is non-decreasing in .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Karlin-Rubin Theorem
Theorem
Let T(X) be a sufficient statistic for and the family
{g(t, i), } has the MLR property. Then
http://find/ -
7/29/2019 ST 522 Slides
141/177
{g ( , ), } p p yFor testing H : 0 vs H1 : > 0, the UMP level testrejects H0 if and only if T > t0, where
= P0
(T > t0).
For testing H : 0 vs H1 : < 0, the UMP level testrejects H0 if and only if T < t0, where
= P0(T < t0).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
Let X1, . . . , Xn be iid from N(, 2), 2 known.
Find the UMP level test for testing H0 : 0 vs
http://find/ -
7/29/2019 ST 522 Slides
142/177
Find the UMP level test for testing H0 : 0 vsH1 : > 0.
Find the UMP level test for testing H0 : 0 vsH1 : < 0.
Let X1, . . . , Xn be iid from N(0, 2), 2 unknown, 0 known.Find the UMP level test for testing H0 :
2 20 vsH1 :
2 > 20.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Nonexistence of UMP test
F bl ith t id d lt ti th i
http://find/ -
7/29/2019 ST 522 Slides
143/177
For many problems with two-sided alternative, there is noUMP level test, because the class of level test is so largethat no one test dominates all the others in terms of power.
Search a UMP test within some subset of the class of level test, for example, the subset of all unbiased tests.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Example
Let X X be iid f o N( 2) 2 k o Co side testi
http://find/ -
7/29/2019 ST 522 Slides
144/177
Let X1, . . . , Xn be iid from N(, 2), 2 known. Consider testing
H0 : = 0 vs H1 : = 0.There is no UMP level test.
Find the UMP level test within the class of unbiased tests.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
p-value
The choice of is subjective. Different people may have
http://find/ -
7/29/2019 ST 522 Slides
145/177
different tolerance levels .
If is small, the decision is conservative.
If is large, the decision is overly liberal.
If you reject (or accept) H0, is it a strong or borderlinerejection (acceptance)?
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
p-value (contd.)
Definition
A p-value is the smallest possible level at which H0 would berejected
http://find/ -
7/29/2019 ST 522 Slides
146/177
rejected.
Note
p-value is a test statistic, taking value 0
p(x)
1 for the
sample x.Small values of p(X) gives evidence that H1 is true.
The smaller p-value, the stronger the evidence of rejecting H0.
Reject H0 at level is equivalent to p-value being less than .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
p-value for composite null
A p-value is called valid if, for every 0 and every 0 1,we have P(p(X) ) .
http://find/ -
7/29/2019 ST 522 Slides
147/177
Theorem
Let W(X) be a test statistic such that large values of W giveevidence that H1 is true. For each sample point x, define
p(x) = sup0P(W(X) W(x)).
Then p(X) is a valid p-value.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
Two-sided normal p-value:
Let X1, . . . , Xn be iid from N(, 2), 2 unknown. ConsiderH H h LR
http://find/ -
7/29/2019 ST 522 Slides
148/177
testing H0 : = 0 versus H1 : = 0, use the LRT statisticW(X) = |X 0|/(S/
n).
Let 0 = 1, n = 16 , observed x = 1.5, s2 = 1. Do you reject
the hypothesis = 1 at level 0.05? at level 0.1?One-sided normal p-value:In the above example, consider testing H0 : 0 versusH1 : > 0.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
p-value and sufficient statistic
Sometimes there is a non-trivial sufficient for the null model. Then
http://find/ -
7/29/2019 ST 522 Slides
149/177
defining a p-value through conditioning on a sufficient statisticeffectively reduces the composite null to a point null:
p(x) = P(W(X) W(x)|S = S(x)).
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Fishers Exact Test
L t S d S b i d d t b ti ith S Bi ( )
http://find/ -
7/29/2019 ST 522 Slides
150/177
Let S1 and S2 be independent observations with S1 Bin(n1, p1)and S2 Bin(n2, p2). Consider testing H0 : p1 = p2 versusH1 : p1 > p2.To form an exact (non-asymptotic) level test.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Interval Estimation
http://find/ -
7/29/2019 ST 522 Slides
151/177
Interval estimate (L(X), U(X))Confidence coefficient min P( (L(X), U(X))) = 1 .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Method of inversion
One to one correspondence between tests and confidence intervals.
Hypothesis testing: Fix the parameter asks what samplevalues (in the appropriate region) are consistent with thatfi d l
http://find/ -
7/29/2019 ST 522 Slides
152/177
fixed value.
Confidence set: Fix the sample value asks what parametervalues make this sample most plausible.
For each 0 , let A(0) be the acceptance region of a level test H0 : = 0. Define a set C(x) = {0 : x A(0)}. ThenC(x) is a (1 )-confidence set.Example
iid N(, 2), unknown, is parameter of interest.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Method of inversion (contd.)
In general, inverting acceptance region of a two sided test will givetwo sided interval and inverting acceptance region of a one sidedtest will give an open end interval on one side.
http://find/ -
7/29/2019 ST 522 Slides
153/177
Theorem
Let acceptance region of a two sided test be of the formA() =
{x : c1()
T(x)
c2()
}and let the cutoff be
symmetric, that is, P(T(X) > c2()) = /2 andP(T(X) < c1()) = /2.If T has MLR property then both c1() and c2() are increasing in.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
X1, . . . , Xn N(, 2), both unknown.U fid b d f
http://find/ -
7/29/2019 ST 522 Slides
154/177
Upper confidence bound for .Lower confidence bound for .
X1, . . . , Xn Exp(). Invert the LRT.Discrete. X1, . . . , Xn Bin(1, ) Obtain a lower confidencebound.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Pivot
Definition
http://find/ -
7/29/2019 ST 522 Slides
155/177
A random quantity Q(X, ) is called pivotal quantity (or a pivot) ifthe distribution of Q(X, ) is independent of .
Note this is different from an ancillary statistic since Q(X, )depends also on and hence is not a statistic.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
Location familyScale family
http://find/ -
7/29/2019 ST 522 Slides
156/177
Scale family
Location-scale family
iid exponential. Gamma pivot.
A statistic T has density f(t, ) = g(Q(t, ))|(/t)Q(t, )|.Then Q(T, ) is a pivot.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Method of pivot
How to construct a confidence set using a pivotal quantity?
http://find/ -
7/29/2019 ST 522 Slides
157/177
Find a, b such that P(a Q(X, ) b) = 1 .Define C(x) = { : a Q(x, ) b}.
Then P( C(X)) = P(a Q(X, ) b) = 1 .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Method of pivot (contd.)
When will C(x) be an interval?If Q(x, ) is monotone in , then C(x) is an interval.
http://find/ -
7/29/2019 ST 522 Slides
158/177
Examples:
iid exponetial.iid N(, 2), known. Interval for .
iid N(, 2), unknown. Interval for .iid N(, 2), known. Interval for .iid N(, 2), unknown. Interval for .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Method of pivot (contd.)
If F(t, ) is decreasing in for all t, define L, U byF(t, L) = 1 2, F(t, U) = 1, 1 + 2 = . Then[L(T), U(T)] is (1 ) CI for .Similarly if F(t|) is increasing in for all t, define L, U by
http://find/ -
7/29/2019 ST 522 Slides
159/177
F(t, L) = 2, F(t, U) = 1 1, 1 + 2 = . Then[L(T), U(T)] is (1 ) CI for .Examples:
iid from f(x, ) = e(x)I(x > ). X(n) sufficient.(1 ) CI is not unique. Among many choices, want tominimize expected length.iid N(, 2) known.iid N(, 2) unknown.
iid exponential.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Asymptotic Evaluation
X1, . . . , Xn i.i.d. f(x, ), n large. Mathematically n .The assumption n makes life easier. Dependence of
http://find/ -
7/29/2019 ST 522 Slides
160/177
p poptimality on models or loss functions becomes lesspronounced.
Because limit theorems become available, distributions can befound approximately. Limiting distributions are much simplerthan actual distributions.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Convergence in probability
Definition
We say that Yn p c (Yn converges in probability to a constantc) if P(|Y c | > ) 0 as n for all > 0
http://find/ -
7/29/2019 ST 522 Slides
161/177
c), ifP(|Yn c| > ) 0 as n for all > 0.Usual calculus applies for convergence in probability.A possible method of showing this is Chebychevs inequality P(|Yn c| > ) 2E(Yn c)2 = 2[var(Yn) + (E(Yn) c)2],so it is enough to show that the right hand side goes to 0.If Yn = Xn, then Xn p E(X) by the law of large numbers.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Convergence in distribution
Definition
If Yn is a sequence of random variables and F is a continuous cdf,we say that Yn converges in distribution to F ifP(Yn
x)
F(x) for all x.We also say that Yn d Y where Y is a random variable havingcdf F
http://find/ -
7/29/2019 ST 522 Slides
162/177
cdf F.
The central limit theorem states that
n(Xn E(X)) converges indistribution to N(0, var(X)), i.e.,
P
n(Xn E(X))
var(X)
(x)
for all x where stands for the standard normal cdf.Another important result is Slutskys theorem: If Yn d Y andZn p c, then Yn + Zn Y + c, YnZn d cY, Yn/Zn Y/c ifc= 0.Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Consistency
Definition
Let Wn = Wn(X1, . . . , Xn) be a sequence of estimators for ().We say that Wn is consistent for estimating () if Wn p ()
http://find/ -
7/29/2019 ST 522 Slides
163/177
( ) p ( )under P for all .
Theorem
IfE(Wn) () (in which case Wn is called asymptoticallyunbiased for ()) andvar(Wn) 0 for all , then Wn isconsistent for ().
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
If X1, . . . , Xn are i.i.d. f with E(X) = and var(X) = 2,
then Xn is consistent for and S2n =
ni=1(Xi Xn)2/(n 1)
is consistent for 2.ni=1(Xi Xn)2/n is consistent for 2 too.
(Invariance principle of consistency): If W is consistent for
http://find/ -
7/29/2019 ST 522 Slides
164/177
(Invariance principle of consistency): If Wn is consistent for and g is a continuous function, then g(Wn) is consistent forg().
Method of moment estimator is generally consistent.UMVUE is consistent: Let X1, . . . , Xn be i.i.d. f(x, ) and letWn be the UMVUE of (). Then Wn is consistent for ().
Consistency of MLE: Let X1, . . . , Xn be i.i.d. f(x, ), a
parametric family satisfying some regularity conditions. Thenthe MLE n is consistent for .
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
http://find/ -
7/29/2019 ST 522 Slides
165/177
Delta method
TheoremIf Tn isAN(,
2()/n), then g(Tn) isAN(g(), (g())22()/n).
-
7/29/2019 ST 522 Slides
166/177
( , ( )/ ) g ( ) (g ( ), (g ( )) ( )/ )
A multivariate version is also true.
CLT and delta method combination gives asymptoticnormality of many statistics of interest.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Efficiency
How to distinguish between consistent estimators.
Let estimators be asymptotically normal. Asymptotic meansare the same. Can compare asymptotic variances.
http://find/ -
7/29/2019 ST 522 Slides
167/177
are the same. Can compare asymptotic variances.
Often one variance is smaller than another throughout.
If there is a lower bound, and that lower bound is attained,
then the estimator making that happen is calledasymptotically efficient. Clearly such an estimator isimpossible to beat asymptotically the best.
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Efficiency bound
Cramer-Rao bound for MSE of Tn in estimating ():
(() + bn())2
nI()
,
where I() is Fisher information, bn() the bias.
http://find/ -
7/29/2019 ST 522 Slides
168/177
( ) , n( )
So if bn() 0, then the bound for the asymptotic varianceshould be (())2/I().
In particular, if () = , the bound for asymptotic variance is1/I().
Strictly, speaking, this bound is not valid, although it is nearlycorrect.
Then we can define an estimator to be asymptotically efficientif its asymptotic variance is 1/I().
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Attaining efficiency bound
Theorem
The MLE isAN(, 1/(nI())).
More generally, () isAN((), (())2/(nI())).
http://find/ -
7/29/2019 ST 522 Slides
169/177
The MLE is not the only possible asymptotically efficientestimator.
Any Bayes estimator is asymptotically efficient.Method of moment estimators are asymptotically normal, butneed not be asymptotically efficient.
Define asymptotic efficiency of n AN(, v()/n) byI()/v().
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Examples
Cauchy
http://find/ -
7/29/2019 ST 522 Slides
170/177
Logistic
Mean versus median
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Asymptotic distribution of likelihood ratio statistic
Theorem (Point null case)
Let X1 X be i i d f (x |) and let (X) be the likelihood
http://find/ -
7/29/2019 ST 522 Slides
171/177
Let X1, . . . , Xn be i.i.d. f(x|) and let n(X) be the likelihoodratio for testing H0 : = 0 vs H1 : = 0 and is d dimensional.Then
2log n(X)
d
2d.
Example: Poisson
Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II
Asymptotic distribution of likelihood ratio statistic
Theorem (General case)Let X