ST 522 Slides

download ST 522 Slides

of 177

Transcript of ST 522 Slides

  • 7/29/2019 ST 522 Slides

    1/177

    ST 522: Statistical Theory II

    Subhashis Ghoshal,North Carolina State University

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    http://find/
  • 7/29/2019 ST 522 Slides

    2/177

    Useful Results from Calculus

    We recapitulate some facts from calculus we need throughout.

    Theorem (Binomial theorem)

    (a+b)n =

    n

    0

    anb0+

    n

    1

    an1b1+ +

    n

    n 1

    a1bn1+

    n

    n

    a0bn.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    http://find/
  • 7/29/2019 ST 522 Slides

    3/177

    Common infinite series

    Geometric series

    a + ar +

    + arn1 = a

    rn 1

    r 1= a

    1 rn

    1 r, r

    = 1.

    Infinite Geometric series

    a + ar + ar2 + = a 11r, |r| < 1.(1 x)1 = 1 + x + x2 + , |x| < 1(1 + x)1 = 1 x + x2 x3 + , |x| < 1.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    http://find/
  • 7/29/2019 ST 522 Slides

    4/177

    Common infinite series (contd.)

    Infinite binomial series

    (1 x)2

    = 1 + 2x + 3x2

    + 4x3

    + , |x| < 1,(1 x)r = 1 +n=1 r+n1n xn, |x| < 1, where for any realnumber ,

    n

    = ( 1) ( n + 1)/n!, the generalized

    binomial coefficient. In particular,

    r+n1

    n = r(r+ 1) (r+ n 1)/n!. Also note that for > 0,

    r

    = (1)r(+1)(+r1)

    r! .Exponential series

    ex = 1 +x

    1!+

    x2

    2!+

    Logarithmic series

    log(1 + x) = x x2

    2+

    x3

    3 , |x| < 1

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    http://find/
  • 7/29/2019 ST 522 Slides

    5/177

    Useful limits

    limn(1 + 1/n)n = e.

    limn(1 + n/n)n = e for any n .

    limx0(1 + ax)1/x = ea.limx0

    log(1+x)x

    = 1.

    limx0sin xx

    = 1.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    http://find/
  • 7/29/2019 ST 522 Slides

    6/177

    Derivatives

    ddx

    xn = nxn1.ddx

    eax = aeax.ddx

    ax = ax log a.ddx

    log x = 1/x.ddx

    sin x = cos x.d

    dx cos x = sin x.ddx

    tan x = 1 + tan2 x.ddx

    sin1 x = 1/

    1 x2.ddx

    tan1 x = 11+x2

    .

    ddx(af(x) + bg(x)) = af(x) + bg(x).ddx

    f(x)g(x) = f(x)g(x) + f(x)g(x).

    ddx

    (f(x)/g(x)) = f(x)g(x)f(x)g(x)

    g2(x).

    ddx

    f(g(x)) = f(g(x))g(x).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    I i

    http://find/
  • 7/29/2019 ST 522 Slides

    7/177

    Integration

    xndx = x

    n+1

    n+1 , n = 1.x

    1

    dx = log x.eaxdx = eax/a, a = 0.f(x)f(x) dx = log f(x).

    Integration by substitutiong(f(x))f(x)dx =

    g(y)dy, y = f(x).

    Integration by parts

    u(x)v(x)dx = u(x)V(x) V(x)u(x)dx,

    where V(x) =

    v(x)dx, u(x) is called the first function andv(x) the second.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    I i ( d )

    http://find/http://goback/
  • 7/29/2019 ST 522 Slides

    8/177

    Integration (contd.)

    Integration by partial fractionsApplies while integrating the ratio of two polynomials P(x)and Q(x), where the degree of P is less than the degree of Qwithout loss of generality. Factorize Q(x) in linear andquadratic factors. The ratio can be written as uniquely alinear combination of the reciprocals of the linear factors and

    linear over quadratic factors. The resulting expression can beintegrated term by term. Consult any standard Calculus textsuch as Apostol.

    Definite Integral

    ba

    f(x)dx = F(x))ba

    = F(b) F(a),

    where F(x) = f(x)dx.Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    O d St ti ti

    http://find/http://goback/
  • 7/29/2019 ST 522 Slides

    9/177

    Order Statistics

    Given a random sample, we are interested in the smallest,largest, or middle observations.

    the highest flood watersthe lowest winter temperature recorded in the last 50 yearsthe median price of houses sold in last monththe median salary of NBA players

    Definition: Given a random sample, X1, , Xn, the sampleorder statistics are the sample values placed in ascendingorder,

    X(1) = min1in Xi,X(2) = second smallest Xi,... = ...

    X(n) = max1in Xi.Example: Suppose four numbers are observed as a sample ofsize 4. The sample values arex1 = 6, x2 = 9, x3 = 3, x4 = 8.. What are the order

    statistics?Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    O d St ti ti ( td )

    http://find/
  • 7/29/2019 ST 522 Slides

    10/177

    Order Statistics (contd.)

    Order statistics are random variables themselves (as functionsof a random sample).

    Order statistics satisfy

    X(1) X(n).Though the samples X1, , Xn are independently andidentically distributed, the order statistics X(1), , X(n) arenever independent because of the order restriction.

    We will study their marginal distributions and jointdistributions

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    O d St tisti s M i l dist ib ti s

    http://find/
  • 7/29/2019 ST 522 Slides

    11/177

    Order Statistics - Marginal distributions

    Assume X1, , Xn are from a continuous population with cdfF(x) and pdf f(x).

    The nth order statistic, or the sample maximum, X(n) had thepdf

    fX(n)(x) = n[F(x)]n1f(x)

    The first order statistic, or the sample minimum, X(1) had thepdf

    fX(1)(x) = n[1 F(x)]n1f(x)More generally, the jth order statistic X(j) has the pdf

    fX(j)(x) =n!

    (j 1)!(n j)! f(x)[F(x)]j1[1 F(x)]nj.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Order Statistics Joint distributions

    http://find/http://goback/
  • 7/29/2019 ST 522 Slides

    12/177

    Order Statistics -Joint distributions

    For 1

    i < j

    n, the joint pdf of X(i)

    and X(j)

    is

    fX(i),X(j)(u, v) =n!

    (i 1)!(j i 1)!(n j)! f(u)f(v)[F(u)]i1

    [F(v) F(u)]ji1[1 F(v)]nj

    if < u < v < ; = 0 otherwise.Special case: Joint pdf of X(1) and X(n)

    The joint pdf X(1), , X(n) is

    fX(1), ,X(n)(u1, , un)= n!f(u1) f(un)1l{ < u1 < < un < }.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Illustration

    http://goforward/http://find/http://goback/
  • 7/29/2019 ST 522 Slides

    13/177

    Illustration

    Example: X1, , Xn are iid from unif [0, 1].Show that X(j) Beta(j, n + 1 j).Compute E[X(j)] and Var[X(j)]

    The joint pdf of X(1) and X(n).

    Let n = 5. Derive the joint pdf of X(2) and X(4).

    X(1)|X(n) X(n)Beta(1, n 1)For any i < j, X(i)|X(j) X(j)Beta(i,j i)

    Let n = 5. Derive the joint pdf of X(1), , X(5).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Example

    http://find/
  • 7/29/2019 ST 522 Slides

    14/177

    Example

    Compute P(X(1) > 1, X(n) 2).

    P(X(1) > x, X(n) y) =n

    i=1

    P(x < Xi y) = [F(y) F(x)]n.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Common statistics based on order statistics

    http://find/http://goback/
  • 7/29/2019 ST 522 Slides

    15/177

    Common statistics based on order statistics

    sample range: R = X(n) X(1)sample midrange: V = X(n) + X(1) /2sample median:

    M =

    X((n+1)/2) if n is odd

    X(n/2) + X(n/2+1) /2 if n is even.sample percentile: For any 0 < p < 1, the (100p)th samplepercentile is the observation such that about np of theobservations are less than this observation and n(1 p)th ofthe observations are larger.

    sample median M is 50th sample quantile (the second samplequartile)denote Q1 as 25th sample quantile (the first sample quartile)denote Q3 as 75th sample quantile (the third sample quartile)interquartile range IQR=Q3 Q1 (describing the spread aboutthe median)

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Remarks

    http://find/http://goback/
  • 7/29/2019 ST 522 Slides

    16/177

    Remarks

    Sample Mean vs Sample Median

    Sample Median vs Population Median

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Principles of data reduction

    http://find/http://goback/
  • 7/29/2019 ST 522 Slides

    17/177

    Principles of data reduction

    Data X, (X1, . . . , Xn): Probability distribution P completely or

    partially unknown.Distribution often modeled by standard ones such as Poisson,normal.A few parameters control the distribution of the data. P = P

    Parameter : unknown, object of interest.Inference: Any conclusion about parameter values based on data.Three main inference problems point estimation, hypothesistesting, interval estimation.Statistic T = T(X): Any function of data. A summary measure of

    the data.Statistics may be used as point estimators, test statistics, upperand lower confidence limit.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Inductive reasoning

    http://find/
  • 7/29/2019 ST 522 Slides

    18/177

    Inductive reasoning

    Role of probability theory: Extent of randomness of T controlledby . Probabilistic characteristics such as expectation, variance,moments, distribution involve .Conversely, value of T reflects knowledge about . For instance, ifT has expectation and is unknown, then can be estimated by

    T. Intuitively, if we observe a large value of T, we tend toconclude that must be large.Need to assess the extent of the error.Frequentist approach: Randomness of error means need to judge

    based on average error over repeated sampling. Thus need tostudy the sampling distribution of T.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Sufficiency

    http://find/
  • 7/29/2019 ST 522 Slides

    19/177

    Sufficiency

    As T summarizes the data X, the first natural question is that

    whether there is any loss of information due to summarization.Data contains many information, some are relevant for and someare not.Dropping an irrelevant information is desirable, but dropping arelevant information is undesirable.

    How to compare the amount of information about in data and inT? Is it sufficient to consider only the reduced data T?

    Definition (Sufficient statistic)

    A statistic T is called sufficient if the conditional distribution of Xgiven T is free of (that is, the conditional is a completely knowndistribution).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    http://find/
  • 7/29/2019 ST 522 Slides

    20/177

    Example

    Toss a coin 100 times. The probability of head p is unknown.T=number of heads obtained.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Sufficiency principle

    http://find/
  • 7/29/2019 ST 522 Slides

    21/177

    Sufficiency principle

    If T is sufficient, the extra information carried by X is worthlessas long as is concerned. It is then only natural to considerinference procedures which do not use this extra irrelevantinformation. This leads to the principle of sufficiency.

    Definition (Sufficiency principle)

    Any inference procedure should depend on the data only through asufficient statistic.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    How to check sufficiency?

    http://find/
  • 7/29/2019 ST 522 Slides

    22/177

    How to check sufficiency?

    Theorem (Neyman-Fisher Factorization theorem)

    T is sufficient iff f(x; ) can be written as the product

    g(T(x); )h(x), where the first factor depends on x only thoughT(x) and the second factor is free of .

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    http://find/
  • 7/29/2019 ST 522 Slides

    23/177

    Example

    X1, . . . , Xn iid

    N(, 1).

    Bin(1, )

    Poi().

    N(, 2). = (, ).

    Ga(, ). = (, ). (Includes exponential)

    U(0, ), range of X depends on .

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Exponential family

    http://find/
  • 7/29/2019 ST 522 Slides

    24/177

    p y

    f(x; ) = c()h(x) exp[k

    j=1 wj()tj(x)], = (1, . . . , d), d k.Theorem

    Let X1, . . . , Xn be iid observations from the above exponentialfamily. Then T(X) = (

    ni=1 t1(Xi), . . . ,

    ni=1 tk(Xi)) is sufficient

    for = (1, . . . , d).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Applications

    http://find/
  • 7/29/2019 ST 522 Slides

    25/177

    pp

    beta(, ).

    Curved exponential family: N(, 2).

    Old examples revisited: binomial, Poisson, normal,exponential, gamma (except uniform). Exercise

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    More applications

    http://find/
  • 7/29/2019 ST 522 Slides

    26/177

    pp

    Discrete uniform. P(X = x) = 1/, x = 1, . . . , , a positiveinteger.

    f(x, ) = e(x)

    , x > .A universal example. iid f density. Order statisticsT = (X(1), . . . , X(n)) is sufficient.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Remarks

    http://find/
  • 7/29/2019 ST 522 Slides

    27/177

    In the order statistics example, the dimension of T the same

    as the dimension of the data. Still this is a nontrivial reductionas n! different values of data corresponds to one value of T.

    Often one finds better reductions for specific parametricfamilies, as seen in the many examples before.

    Trivially X is always sufficient for itself, has no gain.When one statistic is a mathematical function of the otherand vice versa (i.e., there is a one to one correspondence),then they carry exactly the same amount of information, soare equivalent.

    More generally, if T is sufficient for and T = c(U), amathematical function of some other statistic U, then U isalso sufficient.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples of in-sufficiency

    http://find/
  • 7/29/2019 ST 522 Slides

    28/177

    X1, X2 iid Poi(). T = X1 X2 is not sufficient.

    X1, . . . , Xn iid pmf f(x; ). T = (X1, . . . , Xn1) is notsufficient.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Minimal sufficiency

    http://goforward/http://find/http://goback/
  • 7/29/2019 ST 522 Slides

    29/177

    Maximum possible reduction.

    Definition (Minimal sufficient statistic)

    T is a minimal sufficient statistic if, given any other sufficient

    statistic T, there is a function c() such that T = c(T).Equivalently, T is minimal sufficient if, given any other sufficientstatistic T, whenever x and y are two data values such thatT(x) = T(y), then T(x) = T(y).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Checking minimal sufficiency

    http://goforward/http://find/http://goback/
  • 7/29/2019 ST 522 Slides

    30/177

    Theorem (Lehmann-Scheffe Theorem)

    A statistic T is minimal sufficient if the following property holds:For any two sample points x and y, f(x; )/f(y; ) does notdepend on if and only if T(x) = T(y).

    Corollary

    Minimal sufficient statistic is not unique. But any two are inone-to-one correspondence, so are equivalent.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    http://goforward/http://find/http://goback/
  • 7/29/2019 ST 522 Slides

    31/177

    iid N(, 2).

    iid U(, + 1).

    iid Cauchy().

    iid U(, ).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Minimal sufficiency in exponential family

    http://goforward/http://find/http://goback/
  • 7/29/2019 ST 522 Slides

    32/177

    Theorem

    For iid observations from an exponential family

    f(x; ) = c()h(x)exp[wj()tj(x)],so that, no affine (linear plus constant) relationship exists betweenw1(), . . . , wk(), the statisticT(X) = (

    ni=1 t1(Xi), . . . ,

    ni=1 tk(Xi)) is minimal sufficient for

    = (1, . . . , d).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    http://goforward/http://find/http://goback/
  • 7/29/2019 ST 522 Slides

    33/177

    N(, 2).

    Ga(, ).

    Be(, ).N(, 2).

    Be(, 1 ), 0 < < 1.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Ancillary statistic

    http://find/
  • 7/29/2019 ST 522 Slides

    34/177

    Definition

    A statistic T is called ancillary if its distribution does not dependon the parameter.

    Induced family is singleton, completely known, contains noinformation about . Opposite of sufficiency.Function of ancillary is ancillary.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    http://find/
  • 7/29/2019 ST 522 Slides

    35/177

    iid U(, + 1).

    Location family, iid f(x ).Scale family, iid 1f(x/).

    iid N(, 1).X1, X2 iid N(0,

    2).

    X1, . . . , Xn iid N(, 2).

    T = ((X1 X)/S, . . . , (Xn X)/S), where S is samplestandard deviation is ancillary.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Results

    http://find/
  • 7/29/2019 ST 522 Slides

    36/177

    Location family. f(x ).T is a location invariant statistic, i.e.,T(x1 + b, . . . , xn + b) = T(x1, . . . , xn). Then T is ancillary.In particular, sample sd S is ancillary (and so are otherestimates of scale).

    Location scale family. 1

    f((x )/).T is a location-scale invariant statistic, i.e.,T(ax1 + b, . . . , axn + b) = T(x1, . . . , xn). Then T is ancillary.If T1 and T2 are such thatT1(ax1 + b, . . . , axn + b) = aT1(x1, . . . , xn) and

    T2(ax1 + b, . . . , axn + b) = aT2(x1, . . . , xn), then T1/T2 isancillary.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    http://find/
  • 7/29/2019 ST 522 Slides

    37/177

    Question. An ancillary statistic does not contain any informationabout . Then why do we study it?It indicates how good the given sample is.

    ExampleX1, . . . , Xn iid U( 1, + 1). is estimated by the midrange(X(1) + X(n))/2. The range R = X(n) X(1) is ancillary.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    http://find/
  • 7/29/2019 ST 522 Slides

    38/177

    Question. Can addition or removal of ancillary informationchange the information content about ?Intuitively, one may think that ancillary contains no informationabout , so it should not change the information content. But this

    interpretation is false.

    U(, + 1).

    A more dramatic example: (X, Y) BVN(0, 0, 1, 1, ).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Completeness

    http://goforward/http://find/http://goback/
  • 7/29/2019 ST 522 Slides

    39/177

    Let a parametric family {f(x, ), } be given. Let T be astatistic. Induced family of distributions fT(t, ), .Definition

    A statistic T is called complete (for the family {f(x, ), }), orequivalently the induced family fT(t, ), is called complete if

    E(g(T)) = 0 for all implies g(T) = 0 a.s. P for all .

    In other words, no non-constant function of T can have constantexpectation (in ).Completeness not only depends on the statistic, but also on thefamily. For instance, no nontrivial statistic is complete if the familyis singleton.In order to find optimal estimators and tests, one sometimes needsto find complete sufficient statistics.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    http://find/
  • 7/29/2019 ST 522 Slides

    40/177

    X bin(n, ), 0 < < 1.X

    Poi(), 0 < .

    X N(, 1), < < .

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    http://find/
  • 7/29/2019 ST 522 Slides

    41/177

    Theorem

    Let X1, . . . , Xn be iid observations from the above exponential

    family. Then T(X) = (n

    i=1 t1(Xi), . . . ,n

    i=1 tk(Xi)) is completeif the parameter space contains an open set in Rk (i.e., d = k).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    http://find/
  • 7/29/2019 ST 522 Slides

    42/177

    A non-exponential example: iid U(0, ), T = X(n).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Useful facts

    http://find/
  • 7/29/2019 ST 522 Slides

    43/177

    If T is complete and S = (T) is a function of T, then S isalso complete.

    The constant statistic is complete for any family.

    A non-constant ancillary statistic cannot be complete.A statistic is called first order ancillary if its expectation is freeof . If a non-constant function of statistic T is first orderancillary, then T cannot be complete.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Connection with minimal sufficiency

    http://find/
  • 7/29/2019 ST 522 Slides

    44/177

    Theorem

    If T is complete and sufficient, and a minimal sufficient statisticexists, then T is also minimal sufficient.

    As a consequence, in the search for complete sufficient statistics, itis enough check completeness of a minimal sufficient statistic (ifexists and easily found).This implies no complete sufficient statistic exists for theU(, + 1) family, or the Cauchy() family.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Basus theorem

    http://goforward/http://find/http://goback/
  • 7/29/2019 ST 522 Slides

    45/177

    T complete sufficient carries all relevant information about . Sancillary carries no information about . The followingremarkable result shows that they are statistically independent.

    Theorem (Basus theorem)

    A complete sufficient statistic is independent of all ancillarystatistics.

    Completeness cannot be dropped, even if T is minimal sufficient iid U(, + 1).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Applications

    http://goforward/http://find/http://goback/
  • 7/29/2019 ST 522 Slides

    46/177

    iid exponential. Then T =

    ni=1 Xi and (W1, . . . , Wn) are

    independent, where Wj = Xj/T. Also calculate E(Wj).

    iid normal. T = X and sample standard deviation S areindependent.

    iid U(0, ). Then X(n) and X(1)/X(n) are independent. Alsocalculate E(X(1)/X(n)).

    iid Ga(, ), > 0 known. Let U = (n

    i=1 Xi)1/n. Then

    U/X is ancillary, independent of X. AlsoE

    [(U/X)

    k

    ] =E

    (U

    k

    )/E

    (X

    k

    ).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Likelihood

    http://find/
  • 7/29/2019 ST 522 Slides

    47/177

    X f(, ) pmf or pdf. X = x is observed.Definition

    The likelihood function is a function of the parameter with anobserved sample, and is given by L(|x) = f(x, ).Same expression, but now x is fixed and is variable.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    http://find/
  • 7/29/2019 ST 522 Slides

    48/177

    binomial experiment. Decide to stop after 10 trials. 3successes obtained.

    negative binomial experiment. Decide to stop after 3successes. 10 trials were needed.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    http://find/
  • 7/29/2019 ST 522 Slides

    49/177

    Likelihood can be viewed as the degree of plausibility. An estimateof may be obtained by choosing the most plausible value, i.e.,where the likelihood function is maximized. This leads to one ofthe most important methods of estimation the maximumlikelihood estimator (more details in Chapter 7).For instance, in either example above, the likelihood function ismaximized at 0.3.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    More examples

    http://find/
  • 7/29/2019 ST 522 Slides

    50/177

    iid Poisson()

    iid N(, 2)

    iid U(0, )Exponential family

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Bayesian approach

    http://find/
  • 7/29/2019 ST 522 Slides

    51/177

    Suppose that can be considered as a random quantity with somemarginal distribution (), a pre-experiment assessment called theprior distribution. Then we can legitimately calculate the posteriordistribution of given the data by the Bayes theorem. Thisposterior distribution will be the source of any inference about .

    Theorem (Bayes theorem)

    (|X) = ()f(X, )

    (t)f(X, t)dt.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    http://find/
  • 7/29/2019 ST 522 Slides

    52/177

    iid Bin(1, ), prior U(0, 1).

    iid Poi(), prior standard exponential.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    http://find/
  • 7/29/2019 ST 522 Slides

    53/177

    Difficulty: is fixed, nonrandom.

    How to specify a prior?

    Bayesians response:

    Probability is a quantification of uncertainty of any type.

    The arbitrariness of prior choice can be rectified to someextent by the use of automatic priors which arenon-informative. (More later)

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Point Estimation

    http://find/
  • 7/29/2019 ST 522 Slides

    54/177

    Find estimators for the unknown parameter or its function().

    Evaluate your estimators (are they good?)

    Definition

    A point estimator of , is a function = W(X1, . . . , Xn).Given a sample of realized observations, the number W(x1, . . . , xn)is called a point estimate of .

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Methods of point estimation

    http://find/
  • 7/29/2019 ST 522 Slides

    55/177

    method of moments

    maximum likelihood estimator (MLE)

    Bayes estimators

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Method of Moments

    http://find/
  • 7/29/2019 ST 522 Slides

    56/177

    Let X1, . . . , Xn be a sample from a population with pdf or pmff(x|1, . . . , k). Estimate = (1, . . . , k) by solving k equationsformed by matching first k sample and population raw moments:

    m1

    = 1nn

    i=1Xi

    , 1

    = E

    (X)m2 =

    1n

    ni=1 X

    2i ,

    2 = E(X

    2). . . , . . .mk =

    1n

    ni=1 X

    ki ,

    k = E(X

    k)

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    http://find/
  • 7/29/2019 ST 522 Slides

    57/177

    X1, . . . , Xn iid N(, 2), both and 2 unknown.X1, . . . , Xn iid Bin(1, ).X1, . . . , Xn iid Ga(, ), with (, ) unknown.X1, . . . , Xn iid Unif(1, 2), where 1 < 2, both unknown.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Features

    http://find/
  • 7/29/2019 ST 522 Slides

    58/177

    Easy to implement

    Computationally cheap

    Converges to the parameter with increasing probability (called

    consistency)Not necessarily give asymptotically most efficient estimator

    Often used as an initial estimator in iterative methods

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Maximum Likelihood Estimator

    http://find/
  • 7/29/2019 ST 522 Slides

    59/177

    Recall that the likelihood function is

    L(|X) = L(|X1, . . . , Xn) =n

    i=1

    f(Xi|)

    Definition

    The maximum likelihood estimator (MLE) of is the location atwhich L(|X) attains its maximum as a function of . Its numericalvalue is often called the maximum likelihood estimate.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    How to find the MLE?

    We want to find the global maximum of L(|X)

    http://find/
  • 7/29/2019 ST 522 Slides

    60/177

    We want to find the global maximum of L(|X).If L(|X) is differentiable in (1, . . . , k), we solve

    j

    L(|X) = 0, j = 1, . . . , k.

    The solutions to these likelihood equations locate onlyextreme points in the interior of , and provide possible

    candidates for the MLE. They can be local or global minima,local or global maxima, or inflection points. Our job is to finda global maximum.

    ((d2/d2)L())= < 0 is sufficient for local maxima. We alsoneed to check the boundary points separately.

    If there is only one local maxima, then that must be theunique global maxima.

    Many examples falls in this category, so no further work willbe needed then.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    How to find the MLE? (contd.)

    http://find/
  • 7/29/2019 ST 522 Slides

    61/177

    In practice, we often work with log L(

    |X), i.e. solve

    jlog L(|X) = 0, j = 1, . . . , k.

    We consider several different situations:

    one parameter casenon-differentiable L(|X)restricted range MLE (e.g. is not the whole real line)

    discrete parameter space

    two-parameter case

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples: One-parameter case

    http://find/
  • 7/29/2019 ST 522 Slides

    62/177

    X1, . . . , Xn iid N(, 1), with unknown.X1, . . . , Xn iid Poi().X1, . . . , Xn

    iid Exp().

    (numerical/iterative method): X1, . . . , Xn iid Weibull().(numerical/iterative method): X1, . . . , Xn iid gamma(, 1).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Restricted MLE

    http://find/
  • 7/29/2019 ST 522 Slides

    63/177

    Parameter space is a proper subset of the set of all possiblevalues of the parameter. Special attention is needed to make sure

    X1, . . . , Xn iid N(, 1), 0.But what if > 0?

    X1, . . . , Xn iid N(, 2), a b.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Non-differentiable likelihood

    http://find/
  • 7/29/2019 ST 522 Slides

    64/177

    X1, . . . , Xn iid Unif(0, ], > 0.X1, . . . , Xn iid exponential location family with pdf

    f(x) = e(x)

    , if x .X1, . . . , Xn iid Unif( 12 , + 12).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Discrete parameter space

    http://find/
  • 7/29/2019 ST 522 Slides

    65/177

    Example

    Let X by a single observation taking values from {0, 1, 2} accordingto P, where = 0 or 1. The probability of X is summarized

    x = 0 x = 1 x = 2 = 0 0.8 0.1 0.1 = 1 0.2 0.3 0.5

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples: Two-parameter case

    http://find/
  • 7/29/2019 ST 522 Slides

    66/177

    For differentiable likelihood, needs calculus of several variables ingeneral, but often simple tricks help reduce to one-dimension.

    X1, . . . , Xn iid N(, 2).

    X1, . . . , Xn iid location-scale exponential family, with pdff(x; , ) =

    1

    e(x)/ if x .

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Remarks about the MLE

    http://find/
  • 7/29/2019 ST 522 Slides

    67/177

    The MLE is the value for which the observed sample

    xismost likely; possess some optimal properties (discussed later)

    In exponential families, coincides with the method of momentestimator.

    The MLE can be numerically sensitive to the variation in the

    data, if the likelihood function is discontinuous.

    If T is sufficient for , then the MLE must be a function ofT.

    MLE is the value of that maximizes g(T(X), ), where

    g(t, )) is the pdf or pmf of T = T(X) at t.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Induced likelihood

    http://find/
  • 7/29/2019 ST 522 Slides

    68/177

    If = () is a parametric function, then the likelihood for isdefined by

    L(|X) = sup:()=

    L(|X).

    Theorem (Invariance Principle)

    If is the MLE of , then for any function (), the MLE of ()is ().

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    http://find/
  • 7/29/2019 ST 522 Slides

    69/177

    X1, . . . , Xn iid Bin(1, ). Find the MLE of

    (1 ).X1, . . . , Xn iid Poi(). Find the MLE ofP(X 1).X1, . . . , Xn iid N(, 2).

    Find the MLE of /.Find the MLE of the population median.Find the MLE for c = c(, ) such that P,(X > c) = 0.025.(the 97.5% percentile of the distribution of X).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    EM-algorithm

    Useful numerical algorithm to compute the MLE with

    http://find/
  • 7/29/2019 ST 522 Slides

    70/177

    missing data

    Iterative method repeating E-step (Expectation) and M-step(Maximization).

    Given data Y, missing vital X. Augmented data (X, Y).

    Actual likelihood L(|Y) = E[L(|X, Y)|Y].

    Start with an initial estimator.

    Calculate E=0(log L(|X, Y)|Y).Maximize with respect to to get update 1.

    Repeat the procedure by replacing the old estimate by the

    new until convergence.

    Example

    Multinomial (( + 1)/2, /4, /4, 1/2 ).Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Bayes Estimators

    http://find/
  • 7/29/2019 ST 522 Slides

    71/177

    Recall, in the Bayesian approach is considered as a quantity

    whose variation can be described by a probability distribution(called the prior distribution). A sample is then taken from apopulation indexed by and the prior distribution is updated withthis sample information. The updated prior is called the posteriordistribution.

    Prior distribution of : ()Posterior distribution of : (|X) = f(X|)()/m(X)Marginal distribution of X: m(X) =

    f(X|)()d

    The mean of the posterior distribution, E(|X), can be used

    as the Bayes estimator of .

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    http://find/
  • 7/29/2019 ST 522 Slides

    72/177

    X1, . . . , Xn iid Bin(1, ). Assume the prior distribution on isBeta(, ). Find the posterior distribution of and the Bayesestimator of .

    Special case: () Unif(0,1).X1, . . . , Xn iid N(0, ), [0, 1], prior U[0, 1].

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Conjugate family

    http://find/
  • 7/29/2019 ST 522 Slides

    73/177

    Let Fdenote the class of pdfs or pmfs f(x|). A class of priordistributions is a conjugate family for F if the posteriordistribution is in the class for all f F, all priors in , and allobservation values x.

    Examples:The beta family is conjugate for the binomial family.

    The normal family is conjugate for the normal family.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Methods of Evaluating Estimators

    http://find/
  • 7/29/2019 ST 522 Slides

    74/177

    Various criteria to evaluate and compare different pointestimators

    mean squared error

    best unbiased estimators or UMVUE (Uniform MinimumVariance Unbiased Estimator)

    optimal for general loss function and risk

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Unbiasedness and Mean Squared ErrorThe bias of a point estimator W of is Bias(W) = EW .An estimator whose bias is equal to 0 is called unbiased

    http://find/
  • 7/29/2019 ST 522 Slides

    75/177

    An estimator whose bias is equal to 0 is called unbiased.

    An unbiased estimator satisfies EW = for all .

    The mean squared error (MSE) of an estimator W of isdefined by E(W )2.

    the MSE is a function of , and has the representation

    E(W )2 = VarW + (BiasW)2.

    the MSE incorporates two components, one measuring thevariability of the estimator (precision) and the other measuringits bias (accuracy).Small value of MSE implies small combined variance and bias.Unbiased estimators do a good job of controlling bias.Smaller MSE indicates smaller probability for W to be far from, because

    P(|W | > ) 12

    E(W )2 = 12

    MSE(W)

    by Chebyshev Inequality.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    http://find/
  • 7/29/2019 ST 522 Slides

    76/177

    In general, there will not be one best estimator. Often the MSE

    of two estimators cross each other, showing that each estimator isbetter in only a portion of the parameter space.

    Example

    Let X1, X2 be iid from Bin(1, p) with 0 < p < 1. Compare three

    estimators with respect to their MSE.

    p1 = X1

    p2 =X1+X2

    2

    p3 = 0.5.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Illustration

    http://find/
  • 7/29/2019 ST 522 Slides

    77/177

    Let X1, . . . , Xn be iid N(, 2). Show X is unbiased for and

    S2 is unbiased for 2, and compute their MSEs.What about non-normal distributions with mean andvariance 2?

    Let X1, . . . , Xn be iid N(, 2). Show the estimator2 = 1

    n

    ni=1(Xi X)2 is biased for 2, but it has a smaller

    MSE than S2.More generally, find the MSE of cS2.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Uniformly Minimum Variance Unbiased Estimator

    If the estimator W is unbiased for (), then its MSE is equal to

    http://find/
  • 7/29/2019 ST 522 Slides

    78/177

    ( ), qVar(W). Therefore, choosing a better unbiased estimator is

    equivalent to choosing the one with smaller variance.

    Definition

    An estimator W is a best unbiased estimator of () if it satisfies:

    EW = () for all ;

    For any other estimator W with EW = (), we have

    VarW VarW for all .

    W is also called a uniform minimum variance unbiased estimator(UMVUE).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    http://find/
  • 7/29/2019 ST 522 Slides

    79/177

    Example

    X1, . . . , Xn iid Poi(). Both X and S2 are unbiased for .

    How to find a best unbiased estimator?

    If B() is a lower bound on the variance of any unbiasedestimators of (), and if W is unbiased satisfiesVarW

    = B(), then W is a UMVUE.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Cramer-Rao InequalityTheorem

    Let X be a sample with pdf f (x ) Suppose W (X) is an

    http://find/
  • 7/29/2019 ST 522 Slides

    80/177

    Let X be a sample with pdf f(x, ). Suppose W(X) is anestimator satisfying

    EW(X) = () for any ;VarW(X) < .

    If differentiation under integral sign can be carried out, then

    Var(W(X)) [()]2E

    ( log f(X|))2

    .

    In the i.i.d. case, the bound reduces to ()2/nI(), where

    I() = E

    (

    log f(X|))2

    is called the Fisher information (per observation).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Score function: s(X, ) = log f(X|) = 1f(X|) f(X|).

    ( ( ))

    http://find/
  • 7/29/2019 ST 522 Slides

    81/177

    Lemma (Expressions for I())

    If differentiation and integration are interchangeable,

    I() = E (s(X, ))2 = var (s(X, ))

    =

    E

    2

    2log f(X, )

    =

    log f(x, )

    2f(x, )dx

    = f(x, )2

    f(x, )dx

    =

    2

    2log f(x, )

    f(x, )dx.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    http://find/
  • 7/29/2019 ST 522 Slides

    82/177

    X1, . . . , Xn iid Poi(). Find the Fisher information numberand a UMVUE for .

    X1, . . . , Xn iid N(,

    2

    ), unknown but

    2

    known. Find aUMVUE for using Cramer-Rao bound.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    http://find/
  • 7/29/2019 ST 522 Slides

    83/177

    When can we exchange differentiation and integration?yes for exponential family.

    not always true for non-exponential family. We have to do amatch check for d

    d h(x)f(x, )dx and h(x) [f(x, )]dx.

    Example

    X1, . . . , Xn iid from Unif(0, ).

    Cramer-Rao bound does not work here!

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Attainability of Cramer-Rao boundThe Cramer-Rao bound inequality says, if W achieves thevariance bound then it is an UMVUE. In the one-parameter

    http://find/
  • 7/29/2019 ST 522 Slides

    84/177

    exponential family case, we can find such an estimator. But there

    is no guarantee that this lower bound is sharp (attainable) in othersituations. It is possible that the value of Cramer-Rao bound maybe strictly smaller than the variance of any unbiased estimator.

    Corollary

    Let X1, . . . , Xn be iid with pdf f(x, ), where f(x, ) satisfies theassumptions of the Cramer-Rao bound theorem. LetL(|x) = ni=1 f(xi, ) denote the likelihood function. If W(X) isunbiased for (), then W(X) attains the Cramer-Rao LowerBound if and only if

    a()[W(X) ()] = s(X, )

    for some function a().

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Attainability in one-parameter exponential family

    http://find/
  • 7/29/2019 ST 522 Slides

    85/177

    TheoremLet X1, . . . , Xn be iid from a one-parameter exponential familywith the pdf f(x, ) = c()h(x)exp{w()T(x)}. AssumeE[T(X)] = (). Then n1

    ni=1 T(Xi), as an unbiased estimator

    of (), attains the Cramer-Rao Lower Bound, i.e.

    Var

    n1

    ni=1

    T(Xi)

    =

    [()]2

    nI().

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    http://find/
  • 7/29/2019 ST 522 Slides

    86/177

    X1, . . . , Xn iid from Bin(1, ). Find an UMVUE of and showit attains the Lower Bound.

    X1, . . . , Xn N(, 2), with (, 2) both unknown. Consider

    estimation of 2. What is the Cramer-Rao Lower bound andis it attainable?

    Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II

    Constructing UMVUE using Rao-Blackwell Method

    http://find/
  • 7/29/2019 ST 522 Slides

    87/177

    An important method of finding/constructing UMVUEs with thehelp of conditioning on a complete and sufficient statistics.Review on conditional expectation:

    E(X) = E[E(X|Y)], for any X, Y.Var(X) = Var[E(X|Y)] + E[Var(X|Y)], for any X, YE(g(X)|Y) = g(x)fx|y(x|y)dx, and it is a function of Y.Cov(E(X|Y), Y) = Cov(X, Y).

    Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II

    Rao-Blackwell Theorem

    Th

    http://goforward/http://find/http://goback/
  • 7/29/2019 ST 522 Slides

    88/177

    Theorem

    Let W be unbiased for () and T be a sufficient statistic for .Define (T) = E(W|T). Then the following hold

    E(T) = ();

    Var(T) VarW for all .Thus, E(W|T) is a uniformly better unbiased estimator of() than W.

    Conditioning any unbiased estimator on a sufficient statistic willresult in a uniform improvement, so we need consider only

    statistics that are functions of a sufficient statistic for bestunbiased estimators.

    Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II

    Examples

    http://find/
  • 7/29/2019 ST 522 Slides

    89/177

    Let X1, X2 be iid N(, 1). Show X1 is unbiased for andE(X1|X) is uniformly better.Let X1, . . . , Xn be iid Unif(0, ). Show Y = (n + 1)X(1) isunbiased for and E(Y|X(n)) is uniformly better.

    Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II

    Uniqueness of UMVUE

    http://find/
  • 7/29/2019 ST 522 Slides

    90/177

    Theorem

    If W is an UMVUE of (), then W is unique.

    Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II

    UMVUE and unbiased estimators of zero

    http://find/
  • 7/29/2019 ST 522 Slides

    91/177

    Theorem

    IfEW = (), W is the best unbiased estimator of () if andonly if W is uncorrelated with all unbiased estimators of 0.

    Example

    Let X be an observation from a Unif(, + 1). Show that

    X 12 is unbiased for .Show that h(X) = sin(2X) is an unbiased estimators of zero.

    Show X

    12 and h(X) are correlated. So X

    12 is not best.

    Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II

    Lehmann-Scheffe theorem

    http://find/
  • 7/29/2019 ST 522 Slides

    92/177

    Theorem

    Let T be a complete sufficient statistic for a parameter , and let(T) be any estimator based on T. Then (T) is the unique bestunbiased estimators of its expected value.

    ThusFind a complete sufficient statistic T for a parameter ,

    Find an unbiased estimator h(X) of (),then (T) = E(h(X)|T) is the best unbiased estimator of().

    Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II

    Examples

    http://find/
  • 7/29/2019 ST 522 Slides

    93/177

    Let X1, . . . , Xn be iid Bin(k, ).

    X1, . . . , Xn are iid from Unif(0, ).

    Find the UMVUE of .Find the UMVUE of g(), where g is differentiable on (0, ).

    Suppose X1, . . . , Xn are iid from Poi().Find the UMVUE of .Find the UMVUE of g() = r, r 1 integer.Find the UMVUE of g() = e.

    Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II

    More Examples

    S h h d i bl Y Y i f

    http://find/
  • 7/29/2019 ST 522 Slides

    94/177

    Suppose that the random variables Y1, . . . , Yn satisfy

    Yi = xi + i, i = 1, . . . , n,

    where x1, . . . , xn are fixed constants, and 1, . . . , n are iidN(0, 2) with 2 known. Find the MLE of and show it is

    UMVUE.Suppose X1, . . . , Xn are iid from exp(), > 0.

    Find the UMVUE for .Find the UMVUE for () = 1 F(s), whereF(s) = P(X1 > s).

    Find the UMVUE for e1/.

    Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II

    More Examples (contd.)

    http://find/
  • 7/29/2019 ST 522 Slides

    95/177

    Suppose X1, . . . , Xn are iid from N(, 2), both (, 2)unknown.

    Find the UMVUE for .Find the UMVUE for 2.Find the UMVUE for 2.

    Normal probability. X1, . . . , Xn iid N(, 1).() = P(X1 c) = (c ).Ridiculous UMVUE. X1, . . . , Xn iid Poi(). () = e

    3.

    Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II

    Loss Function Optimality

    Observations X1, . . . , Xn are iid with pdf f(x, ), . Toevaluate the estimator (X), various loss function can be used.

    http://find/
  • 7/29/2019 ST 522 Slides

    96/177

    The loss function measures the closeness of and

    absolute error loss: L(, ) = ( )2squared error loss: L(, ) = | |a loss that penalizes overestimation more thanunderestimation is

    L(, ) = ( )2I( < ) + 10( )2I( ).

    a loss that penalized more if is near 0 than if || is large

    L(, ) = ( )2|| + 1

    Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II

    Loss Function Optimality (contd.)To compare estimators, we use the expected loss, called the riskfunction,

    R(, ) = EL(, (X)).

    http://find/
  • 7/29/2019 ST 522 Slides

    97/177

    ( ) ( ( ))

    If R(, 1) < R(, 2) for all , then 1 is the preferredestimator because it performs better for all . In particular, for thesquared error loss, the risk function is the MSE.

    Example

    X1, . . . , Xn iid from Bin(1, ). Compare two estimators in terms oftheir MSE.

    MLE 1 = X

    Bayes estimator: prior () Beta(, ) with

    = =

    n/4,

    B =

    ni=1 Xi +

    n/4

    n +

    n.

    Subhashis Ghoshal North Carolina State University ST 522: Statistical Theory II

    Minimaxity

    Risk functions are generally overlapping. One cannot beat everyoneelse.

    http://goforward/http://find/http://goback/
  • 7/29/2019 ST 522 Slides

    98/177

    Example

    X1, . . . , Xn iid N(, 2). Consider the estimators of the form

    b(X) = bS2.

    Minimaxity: Compare the worst case scenario compare themaximum risks. Find the estimator which has the smallestmaximum risk minimax estimator.Downside

    Problems with unbounded risk maximum is infinity.

    Not easy to find the minimax estimator.

    Too pessimistic.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Bayes RuleThe Bayes risk is the average risk with respect to the prior ,

    R(, )()d.

    http://find/
  • 7/29/2019 ST 522 Slides

    99/177

    By definition, the Bayes risk can be written as

    R(, )()d =

    L(, (x))f(x|)dx

    ()d.

    Note f(x)() = (

    |x)m(x), where (x

    |) is the posterior

    distribution of and m(x) is the marginal distribution of X, thenthe Bayes risk becomes

    R(, )()d =

    L(, (x))(|x)d

    m(x)dx.

    The quantity L(, (x))(|x)d is called the posterior expected

    loss.To minimize the Bayes risk, we only need to find to minimize theposterior expected loss for each x.Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Bayes Rule (contd.)

    Th B l h h ld

    http://find/
  • 7/29/2019 ST 522 Slides

    100/177

    The Bayes rule with respect to a prior is an estimator that yieldsthe smallest value of the Bayes risk.

    For squared error loss, the posterior expected loss is

    (

    a)2(

    |x)d = E (

    a)2

    |x ,

    therefore the Bayes rule is E(|x).For absolute error loss, the posterior expected loss isE(| a||x). The Bayes rule is the median of (|x).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    http://find/
  • 7/29/2019 ST 522 Slides

    101/177

    X1, . . . , Xn are iid from N(, 2) and let () be N(, 2).

    The values 2, , 2 are known.

    X1, . . . , Xn are iid from Bin(1, ) and let () be Beta(, ).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Hypothesis Testing

    http://find/
  • 7/29/2019 ST 522 Slides

    102/177

    Point estimation: provide a single estimate of Hypothesis testing: test a statement about

    A hypothesis is a statement about a population parameter.

    Two complementary hypotheses in a hypothesis testing arecalled the null hypothesis and alternative hypothesis. Let

    0be a subset of the parameter space, called null region. Thehypotheses are denoted by H0 and H1,

    H0 : 0 vs H1 : c0.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    IllustrationExample

    An ideal manufacturing process requires that all products ared f i Thi i ld Th l i k h

    http://find/
  • 7/29/2019 ST 522 Slides

    103/177

    non-defective. This is very seldom. The goal is to keep theproportion of defective items as low as possible. Let be theproportion of defective items, and 0.01 be the maximumacceptable proportion of defective items.Statement 1: 0.01 (the proportion of defectives isunacceptably high)Statement 2: < 0.01 (acceptable quality)

    Example

    Let be the average change in a patients blood pressure aftertaking a drug. An experimenter might be interested in testingH0 : = 0 (the drug has no effect on blood pressure)H1 : = 0 (there is some effect)

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Different Types of Hypotheses

    http://find/
  • 7/29/2019 ST 522 Slides

    104/177

    Simple hypotheses: Both H0 and H1 consist of only oneprobability distribution,

    Composite hypotheses: Either H0 or H1 contains more thanone possible distribution

    One-sided hypotheses: H : 0 or H : < 0.Two-sided hypotheses: H0 : = 0 vs H1 : = 0.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Rejection region

    A hypothesis testing procedure or hypothesis test is a rulethat specifies:

    for which sample values the decision is made to accept H0 as

    http://find/
  • 7/29/2019 ST 522 Slides

    105/177

    p p 0

    truefor which sample values H0 is rejected and H1 is accepted astrue.

    The subset of the sample space for which H0 will be rejectedis R: rejection region or critical region.

    The complement of the rejection region is Rc: acceptanceregion.

    The rejection region R of a hypothesis test is usually definedby a test statistic W(X), a function of the sample

    R = {X : W(X) > c} = reject H0.

    Rc = {X : W(X) c} = accept H0.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Methods of Evaluating Tests

    In deciding to accept or reject the null hypothesis H0, we mightmake a mistake no matter whatever the decision is. There are two

    http://find/
  • 7/29/2019 ST 522 Slides

    106/177

    types of errors:Type I error: if H0 is actually true, i.e. 0, but the testincorrectly decides to reject H0

    Type II error: if H0 is actually false, i.e. c0, but the testincorrectly decides to accept H0

    DecisionAccept H0 Reject H0

    H0 Correct decision Type I errorTruth

    H1 Type II error Correct decision

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Power Function

    Definition

    The power function of a hypothesis test with rejection region R isthe function of defined by

    http://find/
  • 7/29/2019 ST 522 Slides

    107/177

    y

    () = P(X R).

    =

    probability of Type I error if 01 probability of Type II error if c0

    Note P(Type I error) = (), for 0, P(Type II error) =1 (), for c0Ideal test: () = 0 for all

    0; () = 1 for all

    c0.

    Good test:

    () is near 0 (small) for most 0;() is near 1 (large) for most c0.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Example (Binomial power function)

    http://find/
  • 7/29/2019 ST 522 Slides

    108/177

    Example (Binomial power function)

    XBin(5, ).

    H0 : 12

    versus H1 : >1

    2.

    Test 1: reject H0 if and only if all successes are observed,i.e R = {5}Test 2: reject H0 if X = 3, 4, or 5.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Likelihood Ratio Tests (LRT)

    Definition

    http://find/
  • 7/29/2019 ST 522 Slides

    109/177

    The likelihood ratio test statistic for testing H0 : 0 vsH1 : c0 is

    (x) =sup0L(|x)supL(|x)

    .

    A likelihood ratio test (LRT) has a rejection region

    R : {x : (x) c},

    where c is any number satisfying 0 c 1.This should be reduced to the simplest possible form.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Rationale of LRT

    The numerator of (x) is the maximum probability of the

    http://find/
  • 7/29/2019 ST 522 Slides

    110/177

    observed sample, computed over parameters in H0. Thedenominator of (x) is the maximum probability of theobserved sample over all possible parameters.

    The numerator says which 0 makes the observation ofdata most likely; the denominator say which

    make the

    observation of data most likely.

    The ratio of these two maxima is small if there are parameterpoints in H1 for which the observed sample is much morelikely than for any parameter in H0. In this situation, the LRT

    criterion says H0 should be rejected and H1 accepted as true.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Relation between LRT and MLE

    Let 0 be the MLE of in the null set 0 (restricted

    http://find/
  • 7/29/2019 ST 522 Slides

    111/177

    Let 0 be the MLE of in the null set 0 (restrictedmaximization).Let be the MLE of in the full set (unrestrictedmaximization). then the LRT statistic, a function of x (not ) is

    (x) = sup0L(|x

    )supL(|x) = L(0|x)

    L(|x)In R : {x : (x) c}, different c gives different rejection regionand hence different tests.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    X1, . . . , Xn iid N(, 2) with unknown (2 known).Consider testing

    http://find/
  • 7/29/2019 ST 522 Slides

    112/177

    H0 : = 0 versus H1 : = 0,where 0 is a number fixed by the experimenter prior to theexperiment.

    Find the LRT and its power function.

    Comment on the decision rules given by different cs.Let X1, . . . , Xn be a random sample from alocation-exponential family

    f(x, ) = e(x) if x

    ,

    where < < . Consider testing H0 : 0 versusH1 : > 0. Find the LRT.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    LRT and sufficiency

    http://find/
  • 7/29/2019 ST 522 Slides

    113/177

    TheoremIf T(X) is a sufficient statistic for , (t) is the LRT statisticbased on T, and (x) is the LRT statistic based on x. Then

    (T(x)) = (x)

    for every x in the sample space.

    Thus the simplified expression for (x) should depend on x onlythrough T(x) if T(X) is a sufficient statistic for .

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    http://find/
  • 7/29/2019 ST 522 Slides

    114/177

    X1, . . . , Xn iid N(, 2) with 2 known. Test

    H0 : = 0 versus H1 : = 0.

    Let X1, . . . , Xn be a random sample from alocation-exponential family. Test H0 : 0 versusH1 : > 0.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Nuisance parameter case

    Likelihood ratio tests are also useful when there are nuisance

    http://find/
  • 7/29/2019 ST 522 Slides

    115/177

    parameters, which are present in the model but not of directinterest.

    Example

    X1, . . . , Xn

    iid N(, 2), both and 2 unknown. Test

    H0 : 0 versus H1 : > 0.Specify and 0.

    Find the LRT and the power function.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Bayesian Tests

    http://find/
  • 7/29/2019 ST 522 Slides

    116/177

    Using the posterior density (|x, computeP( 0 |x) = P(H0 is true |x)P( c0 |x) = P(H1 is true |x)

    Decide in favor the hypothesis which has greater posteriorprobability: Accept H0 if P( 0 |x) 12 .Does not work if 0 is a point and is given a prior density. Onewill need to put a prior mass at the point.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    http://find/
  • 7/29/2019 ST 522 Slides

    117/177

    Example

    Let X1, . . . , Xn be iid N(, 2) and the prior distribution on be

    N(, 2), where 2, , 2 are known. Test H0 :

    0 against

    H1 : > 0.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Unbiased Test

    Definition

    http://find/
  • 7/29/2019 ST 522 Slides

    118/177

    A test with power function () is unbiased if

    () (), for every c0 and 0.

    In most problems, there are many unbiased tests.

    Recall () = P(reject H0). An unbiased test says that theprobability of rejecting H0 when H0 is true is smaller than theprobability of rejecting H0 when H0 is false.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    XBin(5, ). Consider testing

    http://find/
  • 7/29/2019 ST 522 Slides

    119/177

    H0 : 12

    versus H1 : >12

    and reject H0 if X = 5.

    X1, . . . , Xn

    N(, 2),with 2 known. Consider testing

    H0 : 0 versus H1 : > 0.

    The LRT test is unbiased.

    Draw the graph of the power function.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Controlling Type I error

    http://find/
  • 7/29/2019 ST 522 Slides

    120/177

    For a fixed sample size, it is usually impossible to make both typesof error arbitrarily small.Common approach:

    Control the Type I error probability at a specified level .

    Within this class of tests, make Type II error probability thatis as small as possible; equivalently, maximize the power.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Size and level test

    Definition

    For 0 1, a test with power function () is a size test if

    http://find/
  • 7/29/2019 ST 522 Slides

    121/177

    sup0

    () = .

    Definition

    For 0 1, a test with power function () is a level test ifsup0

    () .

    If these relations hold only in the limit as n , we call the testsrespectively asymptotically size (level) . [More details in the finalchapter]

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Notations and remarks

    Typical choices of are: 0.01, 0.05, 0.10.

    http://find/
  • 7/29/2019 ST 522 Slides

    122/177

    We use z/2 to denote the point having probability /2 to theright of it for a standard normal pdf. By convention, we have

    P(Z > z) = , where Z N(0, 1)P(Tn1 > tn1,/2) = /2, where Tn1

    tn1

    P(2p > 2p,1) = 1 , chi square with d.f. pNote z = z1.Commonly used cutoffs:z0.05 = 1.645, z0.025 = 1.96, z0.01 = 2.33, z0.005 = 2.58.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    How to specify H0 and H1?

    If an experimenter expects an experiment will indicate aphenomenon, should choose H1 to be the theory beingproposed.

    http://find/
  • 7/29/2019 ST 522 Slides

    123/177

    H1 is sometimes called researchers hypothesis. By using alevel test with small , the experiment is guarding againstsaying the data support the research hypothesis when it isfalse.

    Announcing a new phenomenon when in fact nothing hashappened is usually more serious than missing something newthat has in fact occurred.

    Similarly, in judicial system the evidence is collected to decide

    whether the accused is innocent or guilty. To prevent thepossibility of penalizing an innocent person incorrectly, thetest should be set up H0: innocent versus H1 : guilty

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    http://find/
  • 7/29/2019 ST 522 Slides

    124/177

    How to critical value of LRT

    In order to make a LRT test be a size test, we choose c such that

    sup

    0

    P((X) c) = .

  • 7/29/2019 ST 522 Slides

    125/177

    0 iid N(, 2), 2 is known. H0 : 0 vs H1 : > 0.iid N(, 2), 2 is known. Consider testing for H0 : = 0 vsH1 :

    = 0.

    Let X1, . . . , Xn be iid from N(, 2), 2 unknown. Consider

    testing H0 : = 0 versus H1 : = 0. Show that the LRTtest that rejects H0 if |X 0| > tn1,/2S/

    n is a test of

    size .

    iid location-exponential dist. Consider testing H0 : 0 vsH1 : > 0. Find the size LRT test.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Sample size calculation

    For a fixed sample size, it is usually impossible to make both typesof error probabilities arbitrarily small. But if we can choose the

    http://find/
  • 7/29/2019 ST 522 Slides

    126/177

    sample size, it is possible to make the desired power level.

    Example

    iid N(, 2), 2 is known. Test H0 : 0 vs H1 : > 0. TheLRT test rejects H0 if (

    X 0)/(/n) > C has the powerfunction

    () = 1

    C +0 /

    n

    .

    Note () is increasing in .

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Notes

    The maximum Type I error is

    () (0) 1 (C )

    http://find/
  • 7/29/2019 ST 522 Slides

    127/177

    sup0 () = (0) = 1 (C).

    For the size test, C = z.

    After C is chosen, it is possible to increase () for > 0 by

    increasing the sample size n. Thus we can minimize Type IIerror (Remember: Type I error is under control already).Draw the picture of power function for small n and large n.

    Assume C = z. How to choose n such that the maximumType II error is 0.2 if

    0 + ?

    Compute n if = 0.05 in (3).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Example

    Let X Bin(n, ). Testing:

    http://find/
  • 7/29/2019 ST 522 Slides

    128/177

    H0 : 3/4 vs H1 : < 3/4.

    The LRT test for this problem is to reject H0 if X c.Choose c and n such that the following satisfies simultaneously:

    If = 34 , we have Pr(reject H0|) = 0.01; (control Type Ierror)

    If = 12 , we have Pr(reject H0|) = 0.99. (control Type IIerror)

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Most Powerful Tests

    http://find/
  • 7/29/2019 ST 522 Slides

    129/177

    Given that the maximum probability of Type I error less than orequal to , the most powerful level test minimizes theprobability of Type II error, or, equivalently maximizes the powerfunction at a

    c0.

    If this occurs for all c0, such a test is called the uniformlymost powerful (UMP) level test.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Test function

    Given a rejection region R, define a test function on the sample

    space to be

    http://goforward/http://find/http://goback/
  • 7/29/2019 ST 522 Slides

    130/177

    space to be

    (x) =

    1 if x R0 if x / R .

    Interpret (X) as the probability of rejecting the null hypothesis

    given the sample X.This also opens doors for randomized tests, where (X) can eventake values strictly between 0 and 1.Note the expected value of is the power function:E[(X)] = P(X

    R) = ().

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Existence of UMP tests

    Lemma (Neyman-Pearson)

    Consider testing H0 : = 0 versus H1 : = 1, where the pdf or

    pmf corresponding to i is f (x i ) i = 0 1 Consider any test

    http://find/
  • 7/29/2019 ST 522 Slides

    131/177

    pmf corresponding to i is f(x, i), i = 0, 1. Consider any testfunction satisfying

    (x) = 1, if f(x, 1) > kf(x, 0),

    0, if f(x, 1) < kf(x, 0),

    for some k 0, andE0(X) = . Then(X) is a UMP size test,

    if k > 0, any other UMP level test must have size and

    can differ from only on the set{x : f(x, 1) = kf(x, 0)}.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    X Bin(2 ) one observation H0 : = 1 versus H1 : = 3

    http://find/
  • 7/29/2019 ST 522 Slides

    132/177

    XBin(2,), one observation. H0 : = 12 versus H1 : = 34 .To obtain the UMP level 1/8 test and the UMP level 1/2 test?

    X Exp(), H0 : = 1 versus H1 : = 2.X

    Cauchy(), H0 : = 0 versus H1 : = 1.

    X Un(0, ), H0 : = 1 versus H1 : = 2.X Un(, + 1), H0 : = 0 versus H1 : = 2.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Sufficient statistic and UMP test

    Let T (X) be a sufficient statistic for and g (t ) is the pdf or

    http://find/
  • 7/29/2019 ST 522 Slides

    133/177

    Let T(X) be a sufficient statistic for and g(t, ) is the pdf orpmf of T corresponding to . Then a UMP level test (T)based on T is given by

    (t) =1, if g(t, 1) > kg(t, 0),

    0, if g(t, 1) < kg(t, 0),

    for some k 0, where = E0(T).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    http://find/
  • 7/29/2019 ST 522 Slides

    134/177

    UMP normal test for mean: X1, . . . , Xn be iid from N(, 2)

    with 2 known, H0 : = 0 versus H1 : = 1, where 1 > 0.

    UMP normal test for variance: X1, . . . , Xn be iid fromN(0, 2) with 2 unknown. H

    0: 2 = 2

    0versus H

    1: 2 = 2

    1,

    where 21 > 20.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Comments

    Discrete Case: Suppose only has two possible values 0 or1, and X is a discrete variable taking finite k values with

    Pi(X = aj) j = 1 k ; i = 0 1 H0 : = 0 vs = 1 The

    http://find/
  • 7/29/2019 ST 522 Slides

    135/177

    P (X = aj),j = 1, . . . , k; i = 0, 1. H : = vs = . Therejection region R of the UMP level test satisfies

    maxR

    ajR

    P1(X = aj)

    subject toajR

    P0(X = aj) .

    N-P test is the LRT test for H0 : = 0 vs = 1.

    For simple hypotheses, the UMP level test is unbiased, i.e.(1) > (0) = .

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    UMP test for one-sided composite alternative

    http://find/
  • 7/29/2019 ST 522 Slides

    136/177

    iid N(, 1).H0 : = 0 vs H1 : > 0.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Monotone Likelihood Ratio (MLR)

    http://find/
  • 7/29/2019 ST 522 Slides

    137/177

    Definition

    A family of pdfs or pmfs {g(t, ) : } for a univariate randomvariable T with real-valued parameter has a monotone likelihoodratio (MLR) if, for every 2 > 1, g(t, 2)/g(t, 1) is an increasingfunction of t on {t : g(t, 1) > 0 or g(t, 2) > 0}.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    Normal, Poisson, Binomial all have the MLR property.

    If T is from an exponential family with the density

    http://find/
  • 7/29/2019 ST 522 Slides

    138/177

    If T is from an exponential family with the densityf(t, ) = h(t)c()ew()t, then T has an MLR if w() is anondecreasing function in .

    If X1, . . . , Xn iid from N(, 2) with known, then X has an

    MLR.If X1, . . . , Xn iid from N(,

    2) with known, thenni=1(Xi )2 has an MLR.

    iid Unif(0, ), T = X(n) has MLR property.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Stochastically increasing

    Definition

    http://find/
  • 7/29/2019 ST 522 Slides

    139/177

    A statistic T with family of pdf{f(t, ), } is calledstochastically increasing in if 1 < 2 implies that

    P1(T > c) P2(T > c) for every c,or equivalently, F2(c) F1(c), where F is the cdf.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Useful facts

    Lemma

    If f il T h th MLR t th it i t h ti ll

    http://find/
  • 7/29/2019 ST 522 Slides

    140/177

    If a family T has the MLR property, then it is stochasticallyincreasing in its parameter.

    A location family T is stochastically increasing in its locationparameter.

    Let a test have rejection region R = {T > c}. If T has theMLR property, then the power function() = P(T R) = P(T > c) is non-decreasing in .

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Karlin-Rubin Theorem

    Theorem

    Let T(X) be a sufficient statistic for and the family

    {g(t, i), } has the MLR property. Then

    http://find/
  • 7/29/2019 ST 522 Slides

    141/177

    {g ( , ), } p p yFor testing H : 0 vs H1 : > 0, the UMP level testrejects H0 if and only if T > t0, where

    = P0

    (T > t0).

    For testing H : 0 vs H1 : < 0, the UMP level testrejects H0 if and only if T < t0, where

    = P0(T < t0).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    Let X1, . . . , Xn be iid from N(, 2), 2 known.

    Find the UMP level test for testing H0 : 0 vs

    http://find/
  • 7/29/2019 ST 522 Slides

    142/177

    Find the UMP level test for testing H0 : 0 vsH1 : > 0.

    Find the UMP level test for testing H0 : 0 vsH1 : < 0.

    Let X1, . . . , Xn be iid from N(0, 2), 2 unknown, 0 known.Find the UMP level test for testing H0 :

    2 20 vsH1 :

    2 > 20.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Nonexistence of UMP test

    F bl ith t id d lt ti th i

    http://find/
  • 7/29/2019 ST 522 Slides

    143/177

    For many problems with two-sided alternative, there is noUMP level test, because the class of level test is so largethat no one test dominates all the others in terms of power.

    Search a UMP test within some subset of the class of level test, for example, the subset of all unbiased tests.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Example

    Let X X be iid f o N( 2) 2 k o Co side testi

    http://find/
  • 7/29/2019 ST 522 Slides

    144/177

    Let X1, . . . , Xn be iid from N(, 2), 2 known. Consider testing

    H0 : = 0 vs H1 : = 0.There is no UMP level test.

    Find the UMP level test within the class of unbiased tests.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    p-value

    The choice of is subjective. Different people may have

    http://find/
  • 7/29/2019 ST 522 Slides

    145/177

    different tolerance levels .

    If is small, the decision is conservative.

    If is large, the decision is overly liberal.

    If you reject (or accept) H0, is it a strong or borderlinerejection (acceptance)?

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    p-value (contd.)

    Definition

    A p-value is the smallest possible level at which H0 would berejected

    http://find/
  • 7/29/2019 ST 522 Slides

    146/177

    rejected.

    Note

    p-value is a test statistic, taking value 0

    p(x)

    1 for the

    sample x.Small values of p(X) gives evidence that H1 is true.

    The smaller p-value, the stronger the evidence of rejecting H0.

    Reject H0 at level is equivalent to p-value being less than .

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    p-value for composite null

    A p-value is called valid if, for every 0 and every 0 1,we have P(p(X) ) .

    http://find/
  • 7/29/2019 ST 522 Slides

    147/177

    Theorem

    Let W(X) be a test statistic such that large values of W giveevidence that H1 is true. For each sample point x, define

    p(x) = sup0P(W(X) W(x)).

    Then p(X) is a valid p-value.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    Two-sided normal p-value:

    Let X1, . . . , Xn be iid from N(, 2), 2 unknown. ConsiderH H h LR

    http://find/
  • 7/29/2019 ST 522 Slides

    148/177

    testing H0 : = 0 versus H1 : = 0, use the LRT statisticW(X) = |X 0|/(S/

    n).

    Let 0 = 1, n = 16 , observed x = 1.5, s2 = 1. Do you reject

    the hypothesis = 1 at level 0.05? at level 0.1?One-sided normal p-value:In the above example, consider testing H0 : 0 versusH1 : > 0.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    p-value and sufficient statistic

    Sometimes there is a non-trivial sufficient for the null model. Then

    http://find/
  • 7/29/2019 ST 522 Slides

    149/177

    defining a p-value through conditioning on a sufficient statisticeffectively reduces the composite null to a point null:

    p(x) = P(W(X) W(x)|S = S(x)).

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Fishers Exact Test

    L t S d S b i d d t b ti ith S Bi ( )

    http://find/
  • 7/29/2019 ST 522 Slides

    150/177

    Let S1 and S2 be independent observations with S1 Bin(n1, p1)and S2 Bin(n2, p2). Consider testing H0 : p1 = p2 versusH1 : p1 > p2.To form an exact (non-asymptotic) level test.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Interval Estimation

    http://find/
  • 7/29/2019 ST 522 Slides

    151/177

    Interval estimate (L(X), U(X))Confidence coefficient min P( (L(X), U(X))) = 1 .

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Method of inversion

    One to one correspondence between tests and confidence intervals.

    Hypothesis testing: Fix the parameter asks what samplevalues (in the appropriate region) are consistent with thatfi d l

    http://find/
  • 7/29/2019 ST 522 Slides

    152/177

    fixed value.

    Confidence set: Fix the sample value asks what parametervalues make this sample most plausible.

    For each 0 , let A(0) be the acceptance region of a level test H0 : = 0. Define a set C(x) = {0 : x A(0)}. ThenC(x) is a (1 )-confidence set.Example

    iid N(, 2), unknown, is parameter of interest.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Method of inversion (contd.)

    In general, inverting acceptance region of a two sided test will givetwo sided interval and inverting acceptance region of a one sidedtest will give an open end interval on one side.

    http://find/
  • 7/29/2019 ST 522 Slides

    153/177

    Theorem

    Let acceptance region of a two sided test be of the formA() =

    {x : c1()

    T(x)

    c2()

    }and let the cutoff be

    symmetric, that is, P(T(X) > c2()) = /2 andP(T(X) < c1()) = /2.If T has MLR property then both c1() and c2() are increasing in.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    X1, . . . , Xn N(, 2), both unknown.U fid b d f

    http://find/
  • 7/29/2019 ST 522 Slides

    154/177

    Upper confidence bound for .Lower confidence bound for .

    X1, . . . , Xn Exp(). Invert the LRT.Discrete. X1, . . . , Xn Bin(1, ) Obtain a lower confidencebound.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Pivot

    Definition

    http://find/
  • 7/29/2019 ST 522 Slides

    155/177

    A random quantity Q(X, ) is called pivotal quantity (or a pivot) ifthe distribution of Q(X, ) is independent of .

    Note this is different from an ancillary statistic since Q(X, )depends also on and hence is not a statistic.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    Location familyScale family

    http://find/
  • 7/29/2019 ST 522 Slides

    156/177

    Scale family

    Location-scale family

    iid exponential. Gamma pivot.

    A statistic T has density f(t, ) = g(Q(t, ))|(/t)Q(t, )|.Then Q(T, ) is a pivot.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Method of pivot

    How to construct a confidence set using a pivotal quantity?

    http://find/
  • 7/29/2019 ST 522 Slides

    157/177

    Find a, b such that P(a Q(X, ) b) = 1 .Define C(x) = { : a Q(x, ) b}.

    Then P( C(X)) = P(a Q(X, ) b) = 1 .

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Method of pivot (contd.)

    When will C(x) be an interval?If Q(x, ) is monotone in , then C(x) is an interval.

    http://find/
  • 7/29/2019 ST 522 Slides

    158/177

    Examples:

    iid exponetial.iid N(, 2), known. Interval for .

    iid N(, 2), unknown. Interval for .iid N(, 2), known. Interval for .iid N(, 2), unknown. Interval for .

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Method of pivot (contd.)

    If F(t, ) is decreasing in for all t, define L, U byF(t, L) = 1 2, F(t, U) = 1, 1 + 2 = . Then[L(T), U(T)] is (1 ) CI for .Similarly if F(t|) is increasing in for all t, define L, U by

    http://find/
  • 7/29/2019 ST 522 Slides

    159/177

    F(t, L) = 2, F(t, U) = 1 1, 1 + 2 = . Then[L(T), U(T)] is (1 ) CI for .Examples:

    iid from f(x, ) = e(x)I(x > ). X(n) sufficient.(1 ) CI is not unique. Among many choices, want tominimize expected length.iid N(, 2) known.iid N(, 2) unknown.

    iid exponential.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Asymptotic Evaluation

    X1, . . . , Xn i.i.d. f(x, ), n large. Mathematically n .The assumption n makes life easier. Dependence of

    http://find/
  • 7/29/2019 ST 522 Slides

    160/177

    p poptimality on models or loss functions becomes lesspronounced.

    Because limit theorems become available, distributions can befound approximately. Limiting distributions are much simplerthan actual distributions.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Convergence in probability

    Definition

    We say that Yn p c (Yn converges in probability to a constantc) if P(|Y c | > ) 0 as n for all > 0

    http://find/
  • 7/29/2019 ST 522 Slides

    161/177

    c), ifP(|Yn c| > ) 0 as n for all > 0.Usual calculus applies for convergence in probability.A possible method of showing this is Chebychevs inequality P(|Yn c| > ) 2E(Yn c)2 = 2[var(Yn) + (E(Yn) c)2],so it is enough to show that the right hand side goes to 0.If Yn = Xn, then Xn p E(X) by the law of large numbers.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Convergence in distribution

    Definition

    If Yn is a sequence of random variables and F is a continuous cdf,we say that Yn converges in distribution to F ifP(Yn

    x)

    F(x) for all x.We also say that Yn d Y where Y is a random variable havingcdf F

    http://find/
  • 7/29/2019 ST 522 Slides

    162/177

    cdf F.

    The central limit theorem states that

    n(Xn E(X)) converges indistribution to N(0, var(X)), i.e.,

    P

    n(Xn E(X))

    var(X)

    (x)

    for all x where stands for the standard normal cdf.Another important result is Slutskys theorem: If Yn d Y andZn p c, then Yn + Zn Y + c, YnZn d cY, Yn/Zn Y/c ifc= 0.Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Consistency

    Definition

    Let Wn = Wn(X1, . . . , Xn) be a sequence of estimators for ().We say that Wn is consistent for estimating () if Wn p ()

    http://find/
  • 7/29/2019 ST 522 Slides

    163/177

    ( ) p ( )under P for all .

    Theorem

    IfE(Wn) () (in which case Wn is called asymptoticallyunbiased for ()) andvar(Wn) 0 for all , then Wn isconsistent for ().

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    If X1, . . . , Xn are i.i.d. f with E(X) = and var(X) = 2,

    then Xn is consistent for and S2n =

    ni=1(Xi Xn)2/(n 1)

    is consistent for 2.ni=1(Xi Xn)2/n is consistent for 2 too.

    (Invariance principle of consistency): If W is consistent for

    http://find/
  • 7/29/2019 ST 522 Slides

    164/177

    (Invariance principle of consistency): If Wn is consistent for and g is a continuous function, then g(Wn) is consistent forg().

    Method of moment estimator is generally consistent.UMVUE is consistent: Let X1, . . . , Xn be i.i.d. f(x, ) and letWn be the UMVUE of (). Then Wn is consistent for ().

    Consistency of MLE: Let X1, . . . , Xn be i.i.d. f(x, ), a

    parametric family satisfying some regularity conditions. Thenthe MLE n is consistent for .

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    http://find/
  • 7/29/2019 ST 522 Slides

    165/177

    Delta method

    TheoremIf Tn isAN(,

    2()/n), then g(Tn) isAN(g(), (g())22()/n).

  • 7/29/2019 ST 522 Slides

    166/177

    ( , ( )/ ) g ( ) (g ( ), (g ( )) ( )/ )

    A multivariate version is also true.

    CLT and delta method combination gives asymptoticnormality of many statistics of interest.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Efficiency

    How to distinguish between consistent estimators.

    Let estimators be asymptotically normal. Asymptotic meansare the same. Can compare asymptotic variances.

    http://find/
  • 7/29/2019 ST 522 Slides

    167/177

    are the same. Can compare asymptotic variances.

    Often one variance is smaller than another throughout.

    If there is a lower bound, and that lower bound is attained,

    then the estimator making that happen is calledasymptotically efficient. Clearly such an estimator isimpossible to beat asymptotically the best.

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Efficiency bound

    Cramer-Rao bound for MSE of Tn in estimating ():

    (() + bn())2

    nI()

    ,

    where I() is Fisher information, bn() the bias.

    http://find/
  • 7/29/2019 ST 522 Slides

    168/177

    ( ) , n( )

    So if bn() 0, then the bound for the asymptotic varianceshould be (())2/I().

    In particular, if () = , the bound for asymptotic variance is1/I().

    Strictly, speaking, this bound is not valid, although it is nearlycorrect.

    Then we can define an estimator to be asymptotically efficientif its asymptotic variance is 1/I().

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Attaining efficiency bound

    Theorem

    The MLE isAN(, 1/(nI())).

    More generally, () isAN((), (())2/(nI())).

    http://find/
  • 7/29/2019 ST 522 Slides

    169/177

    The MLE is not the only possible asymptotically efficientestimator.

    Any Bayes estimator is asymptotically efficient.Method of moment estimators are asymptotically normal, butneed not be asymptotically efficient.

    Define asymptotic efficiency of n AN(, v()/n) byI()/v().

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Examples

    Cauchy

    http://find/
  • 7/29/2019 ST 522 Slides

    170/177

    Logistic

    Mean versus median

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Asymptotic distribution of likelihood ratio statistic

    Theorem (Point null case)

    Let X1 X be i i d f (x |) and let (X) be the likelihood

    http://find/
  • 7/29/2019 ST 522 Slides

    171/177

    Let X1, . . . , Xn be i.i.d. f(x|) and let n(X) be the likelihoodratio for testing H0 : = 0 vs H1 : = 0 and is d dimensional.Then

    2log n(X)

    d

    2d.

    Example: Poisson

    Subhashis Ghoshal, North Carolina State University ST 522: Statistical Theory II

    Asymptotic distribution of likelihood ratio statistic

    Theorem (General case)Let X