Geometrical Method of Asymptotic ConditionalGeometrical Method of Asymptotic Conditional Inference...
Transcript of Geometrical Method of Asymptotic ConditionalGeometrical Method of Asymptotic Conditional Inference...
Geometrical Method of Asymptotic Conditional Inference Based on the Subset Parameters
Bo-Cheng Wei and Chih-Ling Tsai University of Minnesota
School of Statistics Technical Report No. 417
April 1983
University of Minnesota School of Statistics
Department of Applied Statistics St. Paul, Minnesota 55108
*The first author is the visiting scholar from The Nanjing Institute of Technology, The People's Republic of China.
Summary
Given a multiparameter curved exponential family with parameter vector·
µ which can be partitioned into a component parameter of interest u, and a
component nuisance parameter v, we use differential geometry and Edgeworth
expansion approach to derive the asymptotic conditional distribution, ex-A
pectation and variance of an efficient estimator u conditioned on an effi-A A
cient estimator v. The asymptotic conditional variance of u conditioned A
on an efficient estimator v and an ancillary statistic is also derived. If
the nuisance parameter v doesn't exist, then the results are exactly the same
as given by Amari (1982b).
KEY WORDS: Ancillary Statistics; Conditional Inference; Curved Exponential
Family; Curvature; Differential Geometry in Statistics; Edgeworth Expansion;
Non~linear Model; QR-decomposition.
1. Introduction
Amari (1982b) derived the asymptotic conditional expectation and ,..
the asymptotic conditional variance of an efficient estimatorµ given
an ancillary statistic in a multiparameter curved exponential family.
Often the underlying distribution depends not only on a set of parameters
u which are of interest, but also on a set of nuisance parameters v. For
instance, we may wish to make inferences about the mean u of a normal popula
tion with unknown variance v. In the Bayesian approach, inference about u
is completely determined by the posterior distribution of u, obtained by
"integrating out" the nuisance parameter v from the joint posterior dis
tribution of u and v. In calculating such probabilities, we must have
a posterior distribution for v. If no such information on v can be obtained,
inference on u can be made based on a sufficient statistic for u.
The traditional conditionality principle specifies that if the minimal
sufficient statistic T contains a component a (called an ancillary
statistic) whose distribution is independent of~= (u,v), then inference
about~ should be based only on the conditional distribution of T given
a. Amari constructed a differential geometry theory approach for this type
of conditional inference. In this paper, we propose a differential geometry
method in obtaining the asymptotic conditional distribution of an efficient ,.. ,..
estimator u, given an efficient estimator v of the nuisance parameter v in
the case of multiparameter curved exponential family. The exponential
curvature of a model will be shown to play a fundamental role in the
asymptotic theory. Furthermore, the asymptotic conditional variance of ,.. ,.. u given v and the ancillary statistics are al so obtained. Finally, the asymptotic
2
"' "' conditional variance of u given vis derived for the multi parameter non-
1 inear model and logistic regression model.
Amari (1982) set an example of constructinq a differential
geometrical framework in statistics. The present paper will follow this
structure and most of the notation used in Amari's paper.
Denote the set of the distributions of exponential family S by density
functions
(1.1) p(x,e) = c(X)exp{0TX-t1J(0)}
where X = (X1 , ... ,Xn)T is a random vector in the sample space x,
e = (e1 , ... ,en)T is a vector parameter specifying the distributions S with
respect to some given measure m(•). We always assume that the necessary
regularity conditions are satisfied (see e.g., Barndorff-Nielson, 1980). The
set of distributions forms an n-dimensional Riemannian manifold. Its
Rianannian metric tensor gij(e) in the a-coordinate system ate is given
by the Fisher information matrix as follows
( 1 . 2) g .. (e) = E(a.2-a.2) = a.a.t1J(e) lJ 1 J 1 J
where 2 and ai are abbreviations of R.(x,e) = log p(x,e} and a/aei respectively.
The inverse of matrix gij is gij_ A one-parameter family of affine connections is
given by
( ) a ( ) 1 -a. ( ) 1 .3 rijk e = -r Tijk e
where
In the tangent space T8 of Sate, a1t(i=l, ... ,n) are n natural basis vectors
3
under the 6-coordinate system. The inner product of two vectors
x = (xi) and Y = (Yi) in the tangent space r 0 ate is given by
( 1 . 5) < X , Y > = g .. ( 6) Xi Yj lJ
where Einstein's summation convention is used as in the rest of this paper.
If the inner product of X and Y is zero, X and Y are orthogonal. A covari
ance derivative of Xie: T6 with respect to a-connection is given by
a. . ox1 axi a. ( 1 ) , , xj
.6 vkx = dek = aek + rkj
If ek = ek(1,) ( 1 m) d t ~ µ = µ , ... ,µ , eno e
a. . ox1 ox1 aek (1 .7) v x1
- - = sk~ x1 a = dµa - dek al a k
• m • We use notation vax1 for a.=1 and VaX1 for a.= -1 in the present paper.
There are some advantages in using the expectation parameter n in statistics
(Amari, 1982a). Where the expectation of x1 is given by
( 1 • 8) EX.= n-(e) = a.~(e) 1 1 1
the mapping (1 .8) from e ton is one-to-one. n can also be used as a coordinate
system for S. Any vector X of the sample space x can be treated as a vector
in then-coordinate system since EX= n.
Denote the set of distributions of curved exponential family M by
density functions
(1.9) p{X,6{µ)) = c(X)exp{6(µ)TX-tµ(6(µ))}
h ( 1 m)T . t ·t . M wereµ=µ, ... ,µ 1s a vector parame er spec, y1ng .
continuously twice differentiable vector functions ofµ.
0(µ) and n{µ) are
M forms an
4
m-dimension sul:manifold embedded in S. In the tangent subspace Tµ of M
atµ, sl(A=l, ... ,m) are m basis vectors of \1
, where
sl(µ) = aa:(µ) = aAe; au
(1 .10)
the Riemannian metric tensor of Mat 0(µ) is given by
(1.11} i ·' 9AB(µ) = 8A8~9ij(S(µ)) •
AB CB B B The inverse of gAB is g . Note that gAc9 = oA where oA is Kronecker delta.
The a-connection of the curved exponential family is given by
(1.12)
where
(1.13)
Note that
(1.14}
Cl; Cl i where HAS= VABB is called the a-curvature.
The notes of index rule. In the present paper, we use notation
w = ( µ 1 ,a ) , µ 1 = ( u 1 , v 1
) , µ = ( u , v ) . The i nd i c es are u s ed as f o 11 o ws
a,B,y, run from 1 ton for w.
A,B,C, run from 1 tom for µ,µ'.
a,b,c, run from 1 to k for u ,u 1 •
p,q,r, run from k+l tom for v,v'.
K,A,o, run from m+ 1 ton for a .
The tensor notation is used for matrices since it can be easily
generalized for multi-index arithmetic. The super-index denotes the number
of row and sub-index denotes the number of column.
5
2. Subset Parameters of a Curved Exponential Family
Many authors are interested in ancillary statistic and the associated
conditional inference (see, e.g., Efron and Hinkley, 1978; Hinkley, 1980;
Barndorff-Nielsen, 1980). The ancillary statistic can be used to recover
information loss. However, sometimes one might pay more attention to the
parameter itself. Supposeµ is partitioned into two parts: µ=(u,v), where
u=(u1 , ..• ,uk)T is the parameter of interest, V=(vk+l , .•• ,vk+i)T, k+i=m.
By Amari 1s methods, Riemannian manifold Scan be decomposed into two
parts at any point 0(µ): sul::manifold Mand its orthogonal complement a
ancillarly space. According to the orthogonality, many inferences can be
made. We will rotate the coordinate system µ=(u,v) toµ'= (u' ,v') to get an
orthonormal basis in the tangent subspace T , so that M can be decomposed µ
into two orthogonal sul::manifolds. Inference can be made for u' and v'.
Let B be the nxm matrix of sl, i=l, ... ,n; A=l, ... ,m. We can form
the QR decomposition of B proposed by Bates and Watts (1980)
( 2 .1) B =QR or
where Q is an nxm matrix with orthogonal column vectors. That means
R is an mxm upper triangular matrix. Note that the QR decomposition used in
the present paper is slightly different from the ordinary QR decomposition
since the inner product here is based on (1 .5). So ordinary QR decomposition
6
program cannot be used for computing. But the procedures are almost the same.
Transform the coordinate system µ to µ 1
{2.3) µ' = Rµ
(2.4) µ = Lµ'
-1 where L = R
or µ'A= RAi C
or µA= LA ,C c1.1
1.1 1 is partitioned into two parts: µ' = {u' ,v') corresponding u and v.
By (2.3) and (2.4), the partitioned equations are given by
{2.5) u'a = R~uc + R~vr
v 'P = RPv r r
(2.6) ua = L~u'c + L~v,r
vP = LPv' r r
where LA= rl~ L~l B LP LP b q
and
RA = rR~ R~l B RP RP b q
a,b run from 1 to k and p,q run from k+l tom,
{ 2. 7) Lq = Rq = 0 b b ( 1 ~ b ~ k , k+ 1 ~ q ~ m)
Note that {1.10) - (1.14) hold forµ' coordinate system by adding 11111 for
re1ated quantities.
After QR decomposition and transformation, the basis of the tangent
subspace T , of M at µ 1 becomes orthonormal. In fact by (1 .11), the metrk µ
tensor gAB of M with respect to coordinateµ' can be represented by
( 2 • 8) g AB = BA; BB jg; j
..
7
By (2.1) and (2,4),
( 2. 9) B, i = ae i ai = 8 i L c = Qi RDL c = Q \so = Qi A 'oµc aµ'A CA ·o CA DA A
(2.2) shows that
(2.10)
g I AB -- _rAB. Obviously, u Therefore, when we lower or raise any index of
a tensor by multiplying the metric tensor gAB or its inverse g'AB inµ'
coordinate system, there is no numerical change for that tensor. For example,
the value ofT'ABC is equal to the value of TAsc·
Since tangent vectorsB~C (A=l , ... ,m) are orthonormal, the tangent sub
space Tu' spanned by B~i = aei/au'a (a=l, ... ,k) is orthogonal to the tangent
'i - i 'p -subspace Tv, spanned by BP - ae / av ( p-k+ 1 , ••• ,m) . These two tangent sub-
spaces correspond to two certain submanifolds Mu' and Mv' atµ'. We can study
parameter u' and v' instead of u and v, then come back by {2.5). Note that
the upper triangle matrices Land R give us advantages from {2.7).
Since the transformation matrices Land Rand the metric tensors
gAB and gAB relate the µ-coordinates with the µ'-coordinates, they are im
portant in discussing the behavior of u' and v'. The following fonnulas are
useful :
{2.11)
(2.12)
(2.13)
{2.14}
C D LALBgCD = 0AB
C D RARB°CD = 9AB
RARB9
CD = 0AB C D
LALB0CD = 9AB C D
(2.11) comes from (2.8), {2.9) and (1.11), in fact
8
- I - iCjD - CD 0AB - 9AB - BcLABDLBgij - LALB9CO
'Bymultiplyinginverseof Lin {2.11}, (2.12) can be proved
A B A C B D C D RERF0AB = RELARFLB9co = 0E°F9co = 9EF
then taking inverse .0f- (2.11) and (2.12), (2.13) and (2.14) can be obtainen.
Corresponding to u and v, we partition related matrices gAB and gAB
19a b
9aql 9AB = 9pb 9pq
g = g AB [gab aq) gpb gpq
where a.b run from 1 to k and p.q run from k+l tom. The equation (2.11)
(2.14) are not necessarily true for index a,b,c, ... , and p,q,r, .... In fact
by (2.7), the partitioned equations for {2.13) have the following form:
{2.15)
(2.16)
{2.17)
ab= LaLb0cd + Lalb0rs g C d r S
ap = Lalp0rs 9 r s
gpq = Lplqors r s
By {2.15)-(2.17), more useful formulas can be obtained
(2.18)
{2.19)
{2.20)
- r s gpq = RpRqors
9ab_
9ap9 9
qb = Lalb0cd pq Cd
opq = RpRq{s r s
where g = {gpq)-1 pq
9
3. Conditional Distribution
Suppose we sample x1, ••• ,XN independently from the curved exponential N
family with density p(X,e(u)) atµ and X = N-l E Xi is a sufficient statistic. i =l ,..
Letµ be a consistent, first order efficient estimator b<1sedonXandleta{µ) be ,.. ,..
the associated ancillary family {Amari 1981). Obviously,µ' =Ruis also a consistent,
first order efficient estimator and a(µ') is its associated ancillary family. ,..
First, we concentrate on the estimatorµ' of the parameter µ 1 =Rµ. Suppose
the local coordinate system {µ',a) in some neighborhood around n(u') has been con
constructed. The a-coordinate corresponds to ancillary space. The coordinate
of the point of Mis {µ',0). Put w=(µ 1 ,a)=(u 1 ,v 1 ,a) (see section l for index
rule). When dealing with X as a point of n-coordinate system, X=n(~',;) and
w=(µ',a) form a sufficient statistic (Amari, 1981).
~ote that (1.10)-(1.14) hold for w-coordinate system by replacing
a,B,y for A,B,C. For example, the metric tensor in thew-coordinate system
is written as
i j g B = B B8g •. a a 1J
gaB = gAB = oAB if a,B run from 1 tom, 9as=gKA if a,B run from m+l ton.
Otherwise, 9as=gAK=O.
In order to obtain the Edgeworth expansion of the estimators,~ has to
be standardized tow:
,.. ,.. (3.1) w = IN (w-Ew)
By Amari 1 s paper (1981, 1982b)
(3.2)
(3.3)
10
where
m •• (3.4) cSyo = r8Y0 = a8(ayni)·(a0nj) 9
1J
'A A A When S,y,o ~m, the symbol b is the bias ofµ'
(3 5) b'A = __ l ~'A0BC __ 1 ~,A KA • 2 8 C 2 KAg
where ~~~ is a curvature tensor of the ancillary space. It vanishes when~ is an
ML estimator. (3.5) shows that the bias of ML estimator is independent of
the a-coordinate. Define:
A
(3.6) w = vN(w-w)
then by (3.1)
(3. 7) ... ex - -a. w - w - ba. / rN
It means that if the tolerance error is up to 0(1/l""N), w can be replaced
by w without loss. w is useful to eliminate the bias term.
- - -Amari gave Edgeworth expansions for w, µ' and a. We only need the
-expansion of µ ' = ( u ' , v 1
)
(3.8) p{~') = w(~'){1 + - 1- K' H,Asc(~') + o{l/N)}. SIN ABC
where \JJ ( ~ ' ) = ( 1 / v'2rr .) m exp { - + o AB~ ' A~ ' B }
K' - T' C' C' C' ABC - ABC - ABC - BCA - CAB
H'ABC(~') are multidimensional Hermite polynomials ofµ' (Amari, 1981).
(3 9) H-,Asc(-') - -,A-,B-,C ~As-,c ~sc-,A ~cA-,B . µ - µ µ µ - u µ - u µ - u µ
11
By integrating (3.8) with respect to u' and using
(3.10) K , H , ABC = K , H , a be + 3 K , H , a bH , p + 3 K , H , a H , pq + K , H , pq r ABC abc abp a pq pqr
where H'ab = ~.a~,b _ 0ab H'a = ~,a
The expansion of v' can be obtained by
(3.11) p(;') = ~(;1{1 + - 1- K' H1 pgr(; 1) + 0(1/N)}.
6/lf pgr
- -In order to returnµ and v from (3.8) and (3.11), we use (2.3), (2.5) and (2.12).
- -The expansion p(µ) and p(v) are given by
{3.12) - 1 -A-B 1 - ABC p{µ) = c exp{- 2 gABµ µ }{1 + -- KABcH' {µ'+µ) + 0(1/N)}, 6/"N
(3.13) p(;) = c exp{-12
o rfRq;r;s}{l + - 1- K' H'p~;r(;'~) + 0(1/N)}. rq r s 6/"N ~r
where we always denote integral constant by the same notation c without loss
of generality. H'ABC(~'+~) are the abbreviation of substituting µ'=Rµ for (3.9).
(3.12) shows that distribution ofµ is asymptotically normal with co-
variance gAB_ By (2.18), RpRqo equals inverse of l 5• Sop(;) is rs pq
asymptotically normal with covariance grs. It is the marginal distribution ofµ in
the asymptotic sense. Therefore, the first theorem can be obtained by using
(3.10) and (2.20).
Theorem 1. The conditional distribution of u given by vis given by
(3.14) p{u Iv) = Q(~,;){1 +-1- [K'bcH'abc(~'~) + 3K'b R~~'ab(~'~)~r(;) s/1f a a p ,
+ 3K' RpRqH'a{~'~)Hrs(;)] + 0(1/N)} apq r s '
where Q(u,v) = c exp{-½ (~a-L~R~;q)R~R:ocd(~b-L~R~;s)
p(ulv) is asymptotically normal with
12
(3.15) E(ualv) = LaRp;q + 0(1/v'N) = LaRP'yG + 0(1/v'N) p q p q
(3.16) Var(~a,~bl;) = L~L~ocd + 0(1//iD .
Remark 1. ( 2. 15 } - ( 2. 19 ) show that
(3.17) L aRp = gal>g p q pq
(3.18) (RcRdo )-1 = LaLb-"cd = ab_ ap - qb a b cd c du 9 9 9pq9
They match the ordinary conditional expectations and covariances in the multi
normal case.
Remark 2. u and v can be replaced by u and v in the right hand of (3.14} - -
except first term Q(u,v) to eliminate the bias term without effect on the order
of magnitude of the error. By (3.15) and (3.16} the expressions of expectation
and covariance with error 0(1/l'N) are independent of the a-coordinate.
Remark 3. By Amari's paper (1982,b) and (1 .14)
a, . . Ct
K' = -3 < V'B' 1 s•J > = -3r' abc a b ' c abc 1
(a,=-3)
a, . . < V~Bb
1, s~J > is the projection of the a-curvature of sutxnanifold Mu' onto
tangent subspace Tu' (Amari, 1982a). It is not a tensor so K~bc depends on
coordinate system.
K~bp
of Mu,. So
M I. V
K~pq
K' a~
=< v' B'i B'j > = H' a b ' p abp is the intrinsic curvature tensor
K~bp is an invariant. m • • m
= - < v' B' 1 B 'J > = -H,' p q ' a pqa is the intrisic curvature tensor of
is also an invariant.
All those quantities do not depend on ancillary space.
13
4. Conditional Expectation and Covariance
It might be hard to calculate the conditional expectation and covariance
more precisely by using (3.14), since the expression involves some complex
calculations. By (2.6), the calculation can be done by using the conditional
d i st r i but ion p ( u ' I v 1 ) •
... By (3.10), distribution ofµ' can be rewritten as
( 4 .1)
• { 1 + _1_ [ K , H , a be ( ~ , ) + 3 K , H , a b ( ~ , ) H , p (; , ) + 3 K , H , a ( ~ , ) SIN abc abp apq
The conditional distribution of u' given v' is obtained by
(4.2)
Taking (2.6) into account, it is easy to compute conditional expectation and -
covariance of u by (4.2) with the aid of the orthogonality of the multidimensional
Hermite polynomials.
E(~al;) = E(La~,c+ La;,rl;') = LaRr;s+ LaE{H'c(~')I;'} c r r s c
Note that
14
f H1c(~ 1 )p(~ 1 I;' )d~' = - 1- K' oacH,pq(;') + 0(1/N) 2/N apq
Replacing v by v for H'pq(;'+;) and taking (2.5) and (2.20) into account,
we obtain
A A
Theorem 2. The conditional expectation of an efficient estimator u given v
is obtained by
(4.3) a s k m . Aa A a b a r As b ) 1 LaRpRqH,
E ( u Iv ) = u + N + Lr R 5 ( Liv - N - 2 c ~ 1
c r s p qc
Ar "r r where ~v = v -v .
• (Li;r' Li;s - {s /N) + O(N-3/ 2)
The fourth ter-m of the right hand of (4.3) is in terms of relative curvature.
It is added to adjust the ordinary conditional expectation of multi-normal case
(first three terms).
In order to compute covariance, ·the following formula is needed:
(4.4) Var(~a,~bl;) = E(;a~bl;) - E(~al;)E(~bl;)
Note that ~'a~' b = H' ab + f b and
I H , a b (; , ) P (; , I ; , ) d ~ , = H ' P (;' ) K , ( t c 0bd + t d 0oc ) 2/N cdp
After some simple calculation, we obtain
Theorem 3. A A
The conditional covariance of an efficient estimator u given v
is obtained by
15
( 4. 5) "'a "'b "' l a b cd k k b "' 2 Var ( u , u I v ) = -N { Lc L d o + I: I: La L R PH I d 6 v r } + 0 ( N - )
c=l d=l c d r c p
Similarly, the conditional distribution, expectation and covariance A A A
of u given v and a can be also obtained. For example, we have
A A
Theorem 4. The conditional covariance of an efficient estimator u given v A
and a is obtained by
(4.6) A "'b A A 1 b cd k k b A A 2
Var ( u a , u I v , a ) = -N {Lac Ld o + E E La L ( H I ff 6 v r + H I a K) } + 0 ( N - ) c = l d = l c d cd p r c d K
A
This result is similar to (4.5). If vis not given, that means k=m, H~dp=O.
(4.6) reduces to Amari's result (1982,b),see appendix for details.
5. Examples in Non-Linear Model and Logistic Regress ion Model
Example 1: Non-Linear Model
Drapper and Smith (1.981) defined a model as non-1 inear in the parameters
if that model cannot be written as
( s .. 1 )
where gj(X) is any function of the independent variable X. Let yij (i=l, ... ,n,
j=l, •.. ,N) collected at corresponding experimental settings x1 (i=l, ... ,n).
It is assumed that the relationship between the responses and the experi
mental settings can be represented by an equation of the form
( 5-.2) y .. = e(X.,µ) + e .. lJ 1 lJ
h _ ( I k k+ 1 m) • f k d • wereµ- u , ... ,u , v , ... ,v 1s a set o un nown parameters an e •. 1s an lJ
additive error component with normal distributed, where
16
E(€ij) = 0, i=l, ... ,n, j:::1, ... ,N
and E(Eij £ 1j) = oH' i,R.=1, ... ,n, j=l, ... ,N. The probability density of
Yi= (ylt'···,Ynt)T is
(5.3) P(y1,e) = c exp{-\(y1-a)T(y1-e)}.
The set of such probability density function P(y,0) which belongs to expo
nential family forms an-dimensional manifold S. Hence, the metric tensor
gij and ~-connection rfjk can be calculated by equation (1 .2) and (.1 .3).
( s . 4 ) 9 . . ( a ) = E ( a .1 a .1 ) = o . . lJ 1 J lJ
CL
( s . s ) r i j k ( e ) = 1 2 CL Ti j k = 1 2 CL E ( a ;R, al' a kt ) = o .
Because ei is also a function of parameterµ, P(y,8(µ)) is am-dimensional
curved exponential family of a large n-parameter exponential family.
The metric tensor and CL-connection over the m-submanifold are calculated
from equations (1.11) and (1.12) as
(5.6) i .
gAB(µ) = BABigij(S(µ))
i . = BAB~oij and
(5. 7) CL • • l r ABC = caAB~)B~gij + 2(1-CL) TABC
- i j - caABE)BCoij
i . k because TABC = BABiBc Tijk = 0
17
We apply equation (3.5), equation (5.7), remark 3 and theorm 4 to the
multiparameter non-linear model case and get the following result.
(5.8) " " l - l "'T "' , T N 1 "' l Var(ulv) =N{r11 -r12r 22r 21 - L [(ilv ) ][A··]Ll + O(N°2)
where LL = ~, r11 , r12 and r22 are k by k, k by (m-k) and (m-k) T [Ell El.,]
E21 E22
by (m-k) sullnatrices of LL T, respectively; r11 - r12 r21 r21 is the con-A ,-.
ditional variance of u given v for the non-linear model with linear approxi-
mation; Lis the first k by k submatrix of L; A~1 contains the first k by k
sullnatrices of the last (m-k) components in the parameter effects array AT· which was defined by Bates and Watts (1980) and[·] [·] is the bracket multi
plication which was also defined by them.
Example 2: Logistic Regression Model
Given a sample of n independent binominal response Y;"' B(ni,pi), the
log likelihood function for the sample is the sum of individual likelihood
contributions:
n • i{0,y) = E i(01 ,yi)
; =1
= y.ei - a(e) + b(y) l
where b(y) = E 1 og 1 , n rn ·] i
a(e) = n1 log{l+e9 )
and
i=l .Yi
P. . 1 01 = log 1-P.
1
The logistic regression specifies the relationship e = logist (P) = X µ
where P = ( P1 , ... , P n) T, e = ( e 1 , ... , e") T
18
µ = {µ1, ••• ,µm)T, X = (Xl' ... ,Xm) and Xa = (X!,. ... ,X:)T, a=l, ... ,m.
Therefore, the set of densities of the logistic regression model belongs
to the curved exponential family. a
The metric tensor g .. and a-connection r .. k over then-dimensional mani-1 J 1 J n . exp(. ej )
fold can be calculated as 9;J·(e) = E(ai1 a.1) = oi. J . -J J (l+exp(e1 ))
and ~ijk(a} = 12° E(d;t ajt akt)
_ l-a nj(exp(aj))(l-exp(ek))
- -2- oijojk (1 + exp{a;))3
a The matric tensor gab' a-connection rabc and
m-dimensional sutxnanifold can be calculated as
- i j gab - XaXbgij
and
a
rabc = i j k a
xaxbxc rijk
~i = ~~ xixk ab J k a b
a. -curvature H~b over the
Now we apply remark 3 and theorm 4 to the logistic regression model with N
identically independent replications at each experimental points x and get
the following result.
H~bp = H'aibB•i 9;j = 0
and "a "b ,... 1 a b c 1
Var(u ,u Iv) = N {Lcldod} + 0(2 ) . N
A A
Therefore, the c_onditional variance of u given v in the logistic regression
model is independent of exponential curvature.
;.
19
Acknowledgement
We would like to express our sincere gratitude to Professor D. V. Hinkley,
who has given us a great deal of useful advice in the preparation of this
paper. We also thank Professor R. D. Cook for his helpful discussion and
correspondence on this topic.
.. :.,·::-:-. ...... : .
20
APPENDIX
.The proof of theorem 4:
The Edgeworth expansion of the density function of w is given by
(A. l) p(:) = 1µ{~')$(;')<1>(;){1 +-1- KB Ha.Sy(:) +0(-Nl)} 6/N a. y
where <1>(;) = c exp{-\ gKA;K;A}
Si nee g AK = o ( l ~ A ~ m, m < K ~ n) and ga P = o ( 1 ~ a ~ k, k < p ~ m) •
K 8
Ha.By can be decomposed by a y
(A.2) K Ha.By= K' H'abc + 3K' H'abH,P + 3K' H'aH·ipq + K' j:i,pq-r a.By a be a bp a pq pq-r
+ 3K' H'abHK + 6K' H'aH,PHK + 3K' H'pqj:ii<: abK apK pqK
+ 3K' H'aHKA + 3K' j:i,Pj:iKA + K j:iKAo a KA PKA KAo
-(Amari, 1981). Integrating (A.l) with respect to u', the expansion of
density p(v' ,a) can be written as
(A.3) p{v',a) = l/J(;')<t>(;) {l +-1- [K' y,Pqr+3K' j:i,pqHK+3K' H'PHKA 6/N pq·r pqK pKA
+ K HKAoJ + o(l)l KAO N •
By (A.1), (A.2) and (A.3), the expansion of the conditional distribution of -u' is obtained by
p{~' 1;, ,;) = l/J(~' ){1 +-1- [K' 8,abc + 3K' 8,abj:i,P + 3K' 8,aj:i,pq
6,N abc abp apq
+ 3K' j:i,abj:iK+ 6 K' H'aH,PHK+3K' H'aHKA] + o(l)}. abK api<: a.KA N
By the orthogonality of multidimensional Hermite polynomials, it is easy to
calculate
21
(A.S)
(A. 6) - - b - - b .rac .rbd - - 1 E(u 1 au 1 jv',a) = oa +_u_u_ [K' H'P+K' HK]+ 0(-N) J:r Cdp CdK
"' The theorem 4 can be followed by (A.5), (A.6) and (4.4). If v is not given,
then k=m, (4.6) reduces to
(A.7)
where
.. EF i . EF H' = < V1 811 BJ > = Lclo < VEBF, sJ >=LL H CDK C D ' K K C D EFK
noting that •
(A.7) reduces to
This is the same as Alnari's result (1982,b).
22
REFERENCES
AMARI, S. (1982,a). Differential geometry of curved exponential families
curvatures and information loss. Ann. Statist. 10, 357-385.
AMARI, S. (1982,b). Geometric31 theory of asymptotic ancillarity and
conditional inference. Biometrika 69, 1-17.
AMARI, S. and KUMON, M. (1981). Differential geometry of Edgeworth expansions
in curved exponential family. Technical Report, METR 81-7, University
of Tokyo.
BARNDORFF-NIELSEN, 0. (1980). Conditional resolutions. Biometrika 67, 293-310.
BATES, D.M. and WATTS, D.G. (1980). Relative curvature measures of nonli~earity.
J. 'Roy. Statist. Soc. B 42,_ 1-25.
DRAPER, N. and SMITY, H. (1981). Applied Regression Analysis, 2nd ed., John
Wiley & Sons, Inc.
EFRON, B. and HINKLEY, D.V." {1978). Assessing the accuracy of the maximum
likelihood estimator: Observed versus expected Fisher information (with
discussion), Biometrika 65, 457-87.
HINKLEY, D.V. (1980). Likelihood as approximate pivotal distribution.
Biometrika 67, 287-92.
Bo-Cheng Wei Department of Basic Sciences Nanjing Institute of Technology People's Republic of China {After September 30, 1983)
Chih-Ling Tsai Department· of Applied Statistics
and Operations Research New York University 100 Trinity Place New York, NY 10006 (After August 30, 1983)
£•