Log-Likelihood Ratio Calculation for Iterative Decoding on ...
A Likelihood Ratio Test of Independence of Components … Reports/Technical Reports 2007-/TR... ·...
Transcript of A Likelihood Ratio Test of Independence of Components … Reports/Technical Reports 2007-/TR... ·...
A Likelihood Ratio Test of Independence of Componentsfor High-dimensional Normal Vectors
A PROJECT
SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL
OF THE UNIVERSITY OF MINNESOTA
BY
Lin Zhang
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
MASTER OF SCIENCE
Yongcheng Qi
May, 2013
Acknowledgements
I would like to thank my advisor, Yongcheng Qi for his support and guidance these
years. Thank Dr. Kang Ling James, she gave me cherished advise and encouragement
to my project and me. Thank Dr Barry James and Dr. Richard Green for serving
on my project committee, taking the time to read this paper, and hearing the defense.
Thank Xingguo Li, he helps me a lot on software use.
i
Abstract
Consider a p-variate normal population. We are interested in testing the indepen-
dence of its components based on a random sample of size n from this population. In
classic multivariate analysis, the dimension p is fixed or relatively small compared with
the sample size n, and the likelihood ratio test (LRT) is an effective way to test the
hypothesis of independence, and the limiting distribution of the LRT is a chi-squared
distribution. When p goes to infinity, the chi-square approximation of the LRT may
be invalid. In multivariate analysis, testing the independence of grouped components
is one topic of interest. When the grouping is well balanced and the number of groups
is fixed, the LRT, properly normalized, has a normal limit as proved in the literature.
In practice, grouping can be unbalanced, and the number of groups can be arbitrarily
large. In this project, we prove that the LRT statistic converges to a normal distribution
under quite general conditions. Simulation results including histograms and compar-
isons of sizes and powers of tests with those in the classical chi-square approximations
are presented as well.
Keywords: Likelihood ratio test, High-dimensional normal vector, Blocked Diagonal,
covariance matrix.
iii
Contents
Acknowledgements i
Dedication ii
Abstract iii
List of Tables vi
List of Figures vii
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Discription of the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Likelihood Ratio test 3
2.1 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Wilks’ Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Main Result 7
4 Some Lemmas 9
4.1 Multivariate Gamma Function and ξ(x) function . . . . . . . . . . . . . 9
4.2 Moment of Wilk’s Statistic . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Infinitesimal and estimates . . . . . . . . . . . . . . . . . . . . . . . . . 11
5 Proof of Theorem 17
iv
Chapter 1
Introduction
1.1 Background
In classic statistical inference, the likelihood ratio test (LRT) is one widely-used method
for hypothesis testing. An advantage of using the LRT is that one doesn’t have to esti-
mate the variance of the test statistics. It is well known that the asymptotic distribution
of the LRT is chi-square under certain regularity conditions when the dimension p is a
small constant or is negligible compared with the sample size n. However, the chi-square
approximation doesn’t fit the distribution of the LRT very well for the high-dimension
case, especially when p grows with the sample size n. For many modern datasets, their
dimensions can be proportionally large compared with the sample size. For example, fi-
nancial data, consumer data, modern manufacturing data and multimedia data all have
this feature. To deal with this feature, Schott(2001, 2005, 2007), Ledoit and Wolf(2002),
Bai et al.(2009), Chen et al(2010), Jiang et al (2012), and Jiang and Yang(2012) derived
different methods to study the classical likelihood ratio test when the dimension p is
large.
In this project, we are interested in testing the independence of grouped components
from a high dimensional normal vector. The same problem has been considered by Jiang
and Yang(2013), and Jiang and Qi (2013) when the number of the partition is fixed. The
aim of the project is to extend the test to an arbitrary partitions and allow the number
of the partition to change with the sample size n. The results are very applicable in
practice. For example, in the analysis of microarray data on genes, it is meaningful to
1
2
check whether there is correlation among pieces of genes.
1.2 Discription of the Model
This problem could be abstracted as the statistical model below. For a multivariate dis-
tribution Np(µ,Σ), we partition a set of p variates with a joint normal distribution into
k subsets and ask whether the k subsets are mutually independent, or equivalently, we
want to test whether variables among different subsets are dependent. More specifically,
let x1, x2, · · · , xn be i.i.d Rp-valued random vectors with normal distribution Np(µ,Σ).
We write the blocked diagonal covariance matrix Σ0 as
Σ0 =
Σ11 0 · · · 0
0 Σ22 · · · 0...
.... . .
...
0 0 · · · Σkk
.
Then we want to test the hypotheses
H0 : Σ = Σ0 vs Ha : Σ 6= Σ0. (1.1)
Let Wn be Wilks’ likelihood ratio test (LRT) statistic (to be given in (2.11)). The
traditional theory of multivariate analysis shows that −ρn logWn(Thereom 11.2.5 of
Muirhead(1982)) goes to a chi-square distribution when n tends to infinity and p is
fixed, where ρ is given in (2.12). Jiang and Yang(2013) showed that the chi-square
approximation is no longer true when p→∞. In fact, their results show that the central
limit theorem(CLT) holds, i.e. (logWn − µn)/σn actually converges to the standard
normal for a fixed number of partition k, where µn and σn can be expressed explicitly
as a function of sample size and partition.
In this project we will prove the CLT for the LRT, allowing that k changes with n
and the partition can be unbalanced in the sense that numbers of components within
subsets are not necessarily proportional.
The rest of the paper is organized as follows. In section 2, we introduce the LRT
and give a brief literature review. In section 3, we state our main result. In section 4,
we present a few lemmas, and in section 5, we provide the proof of the theorem given
in section 3.
Chapter 2
Likelihood Ratio test
2.1 Likelihood Ratio Test
Let the p-component vector x be distributed according to Np(µ,Σ). We partition x
into k subvectors.
x = (x(1), · · · ,x(k))′ (2.1)
where each x(i) has dimension qi respectively, with p =∑k
i=1 qi. The vector of means µ
and the covariance matrix Σ are partitioned similarly:
µ = (µ(1), · · · , µ(k))′ (2.2)
and
Σ =
Σ11 Σ12 · · · Σ1k
Σ21 Σ22 · · · Σ2k
· · · · · · · · · · · ·Σk1 Σk2 · · · Σkk
.
The null hypothesis is that the subvectors x(1), · · · ,x(k) are mutually independently
distributed, i.e, the density of x factors into the product of the density functions of
x(1), · · · ,x(k):
H0 : f(x|µ,Σ) =
k∏i=1
f(x(i)|µ(i),Σii). (2.3)
3
4
If x(1), · · ·x(k) are independent subvectors, then the covariance matrix is block diagonal
and denoted by Σ0:
Σ0 =
Σ11 0 · · · 0
0 Σ22 · · · 0
· · · · · · · · · · · ·0 0 · · · Σkk
with Σii unspecified for 1 ≤ i ≤ k. Given a sample of size n, x1, · · · ,xn are n observa-
tions on x, the likelihood ratio is
Λn =
maxµ,Σ0
L(µ,Σ0)
maxµ,Σ
L(µ,Σ), (2.4)
where
L(µ,Σ) =
n∏i=1
1
(2π)12p|Σ|
12
exp−1
2(xi − µ)′Σ−1(xi − µ). (2.5)
L(µ,Σ0) is L(µ,Σ) with Σij = 0, i 6= j, for all 0 ≤ i, j ≤ k; and the maximum is taken
with respect to all vectors µ and positive definite Σ and Σ0. According to Theorem
11.2.2 of Muirhead(1982), we have
maxµ,Σ
L(µ,Σ) =1
(2π)12pn|ΣΩ|
12n
exp−1
2pn, (2.6)
where
Σ =1
n− 1A =
1
n− 1
n∑i=1
(xi − x)′(xi − x). (2.7)
Under the null hypothesis,
maxµ,Σ0
L(µ,Σ0) =
k∏i=1
maxLi(µ(i),Σii)
=k∏i=1
1
(2π)12qin|Σii|
12n
exp−1
2qin
=1
(2π)12pn
k∏i=1
|Σii|12n
exp−1
2pn,
5
where
Σii =1
n− 1
n∑j=1
(x(i)j − x(i))(x
(i)j − x(i))′. (2.8)
If we partition A and Σ in the same way for Σ, we see that
Σii =1
n− 1Aii.
Then the likelihood ratio becomes
Λn =
maxµ,Σ0
L(µ,Σ0)
maxµ,Σ
L(µ,Σ)=
|ΣΩ|12n∏k
i=i |Σii|12n
=|A|
12n∏k
i=1 |Aii|12n. (2.9)
The critical region of the likelihood ratio test is
Λn ≤ Λn(α), (2.10)
where Λn(α) is a number such that the probability of (2.10) is α when∑
=∑
0.
2.2 Wilks’ Statistic
Now we employ Wilks’ statistic to do the test. Let
Wn =|A|∏k
i=1 |Aii|; (2.11)
Wn can be expressed entirely in terms of sample correlation coefficients. Note to see
that Λn = W12n
n is a monotonically increasing function of Wn. The critical region can
be equivalently written as Wn ≤Wn(α). Notice that Wn = 0 if p > n, since the matrix
A is not of full rank in this case. We see that the LRT test of level α for testing H0 in
(2.3) is Λn ≤ Cα = Wn ≤ Cn2α . Set
f =1
2(p2 −
k∑i=1
q2i ) and ρ = 1−
2(p3 −k∑i=1
q3i ) + 9(p2 −
k∑i=1
q2i )
6(n+ 1)(p2 −k∑i=1
q2i )
. (2.12)
6
When n → ∞ while all qi’s remain fixed, the traditional χ2 approximation to the
distribution of Λn is found in Theorem 11.2.5 in Muirhead (1982):
−2ρ log(Λn)d−→ χ2
f .
When p is large enough or is proportional to n, this chi-square approximation may fail
(Jiang and Yang(2013)).
Considering the insufficiency of the LRT when p is large, Bai et al. (2009) introduced
a corrected likelihood ratio test for covariance matrices of Gaussian populations when
the dimension is large compared to the sample size. They also developed a LRT to fit
high-dimensional normal distribution Np(µ,Σ) with H0 : Σ = Ip. In their derivation,
the dimension p is no longer a fixed constant, but rather is a variable that goes to
infinity along with the sample size n, and the ratio between p = pn and converges to a
constant y, i.e.,
limn→∞
pnn
= y ∈ (0, 1)
Jiang and Yang (2013) further extended by Bai et al. (2009) to cover the case of y = 1,
and obtained the CLT of the LRT used for testing dependence of k groups of components
for high-dimensional datasets, where k is a fixed number.
Chapter 3
Main Result
In this chapter, we will give the main result of the project.
Let k ≥ 2, q1, · · · , qk be an arbitrary partition of dimension p. Denote p =∑k
i=1 qi,
and let
Σ = (Σij)p×p (3.1)
be the covariance matrix (positive definite), where Σij is a qi × qj sub-matrix for all
1 ≤ i, j ≤ k. Consider the following hypotheses,
H0 : Σ = Σ0 vs Ha : Σ 6= Σ0 (3.2)
which is equivalent to hypothesis (2.3). Let S be the sample covariance matrix. Then
partition A := (n− 1)S in the following way:A11 A12 · · · A1k
A21 A22 · · · A2k
......
. . ....
Ak1 Ak2 · · · Akk
where Aij is a qi × qj matrix. Wilks(1935) presented that the likelihood ratio statistic
for test (2.3)
Λn =|A|n/2∏k
i=1 |Aii|n/2= (Wn)n/2. (3.3)
When p > n+ 1, the matrix A is not full rank, therefore, Λn is degenerate.
We have the following theorem.
7
8
Theorem. Let p satisfy p < n − 1 and p → ∞ as n → ∞. q1, · · · , qk are k integers
such that p =∑k
i=1 qi and maxi qip ≤ 1 − δ, for a fixed δ ∈ (0, 1
2) and all large n. Wn is
the Wilks likelihood ratio statistic described as (3.3). Then
logWn − µnσn
d−→ N(0, 1) (3.4)
as n→∞, where
µn = −c log(1− p
n− 1) +
k∑i=1
ci log(1− qin− 1
),
σ2n = − log (1− p/(n− 1)) +
k∑i=1
log(1− qi/(n− 1)),
with c = p− n+ 32 , ci = qi − n+ 3
2 .
Remarks
• In the theorem above, integers k, q1, · · · , qk and p all can depend on sample size
n. Compared with Jiang and Yang(2013) and Jiang and Qi(2013), constraint on
the partition q1, · · · , qk are relaxed.
• The assumption that maxi qip ≤ 1 − δ for some δ ∈ (0, 1) rules out the situation
where maxi qip → 1 along the entire sequence or any subsequence.
• This theorem works for all situations below.
1. For q1 = · · · = qp = 1, this satisfies the condition and it’s complete indepen-
dence. (Theorem 6 of [6]).
2. For relative balanced partition, i.e, when the number of each group is mutually
proportional. (Theorem 2 of [6]).
3. For unbalanced partitions, i.e, it is allowed for some of the groups dominating
the partition. This will be verified by section 6 in the simulation part.
Chapter 4
Some Lemmas
4.1 Multivariate Gamma Function and ξ(x) function
For two sequences of numbers an and bn the notation an = O(bn) as n → ∞means lim sup(anbn ) < ∞. an = o(bn) as n → ∞ means lim an
bn= 0. an ∼ bn as n → ∞
stands for lim anbn
= 1.
Throughout the paper Γ(x) stands for the Gamma function defined on R, which
is defined via an improper integral that converges. Define the multivariate Gamma
function as:
Γp(x) := πp(p−1)/4p∏i=1
Γ(x− 1
2(i− 1)
)(4.1)
with x > (n− p)/2. See p. 62 in Muirhead (1982).
Define
ξ(x) = −2(log(1− x) + x), x ∈ [0, 1). (4.2)
ξ(x) is nonnegative in its domain. By the definition of ξ(x), ξ(0) = 0, and ξ′(x) = 2x1−x .
We have
ξ(x) = ξ(x)− ξ(0) =
∫ x
0ξ′(t)dt =
∫ x
0
2t
1− tdt
Substitute t = ux, then dt = xdu. Therefore
ξ(x) =
∫ 1
0
2ux2
1− uxdu = 2
∫ 1
0
ux2
1− uxdu. (4.3)
9
10
Lemma 4.1.1. Let p satisfy p < n − 1 and p → ∞ as n → ∞. And let qik1 be a
partition of p, i.e., p =∑k
i=1 qi. Then
σ2n = ξ(
p
n− 1)−
k∑i=1
ξ(qi
n− 1), (4.4)
and
[1−k∑i=1
(qip
)2]ξ(p
n− 1) ≤ σ2
n ≤ ξ(p
n− 1). (4.5)
Furthermore, if for some δ ∈ (0, 12), maxi qi
p ≤ 1− δ for all large n, we have
(1− δ)ξ( p
n− 1) ≤ σ2
n ≤ ξ(p
n− 1). (4.6)
Proof. Since ξ(x) is nonnegative in the domain, it follows that σ2n ≤ ξ( p
n−1). By the
integral expression of ξ(x) (4.3)
σ2n = ξ(
p
n− 1)−
k∑i=1
ξ(qi
n− 1)
= 2
∫ 1
0
u( pn−1)2
1− u pn−1
du−k∑i=1
2
∫ 1
0
u( qin−1)2
1− u( qin−1)
du
qi≤p≥ 2
∫ 1
0
u[( pn−1)2 −
∑ki=1( qi
n−1)2]
1− u( pn−1)
du
= 2
∫ 1
0
u( pn−1)2
1− u pn−1
[1−k∑i=1
(qip
)2]du
= 2[1−k∑i=1
(qip
)2]
∫ 1
0
u( pn−1)2
1− u pn−1
du
= [1−k∑i=1
(qip
)2]ξ(p
n− 1).
This proves (4.5).
Since maxi qi/p ≤ 1− δ and∑
i qi = p, we have
k∑i=1
(qip
)2 ≤ max qip
k∑i=1
qip
=max qip
≤ δ,
11
combined with (4.5), implying
σ2n ≥ (1− δ)ξ( p
n− 1).
This completes the proof of Lemma 4.1.1.
4.2 Moment of Wilk’s Statistic
Lemma 4.2.1. (Theorem 11.2.3 from Muirhead(1982)) Let p =∑k
i=1 qi and Wn be
Wilks’ likelihood ratio statistic defined as (3.3). Then
EW tn =
Γp(n−1
2 + t)
Γp(n−1
2 )
k∏i=1
Γqi(n−1
2 )
Γqi(n−1
2 + t)(4.7)
for any t > (p− n)/2, where Γp(x) is defined as (4.1).
4.3 Infinitesimal and estimates
Lemma 4.3.1. Let rn → ∞ and rn/n → 0 as n → ∞. Let q be a variable changing
according to n and rn ≤ q < n− 1.
limn→∞
suprn≤q<n−1
q(n−1)(n−1−q)
ξ( qn−1)
= 0. (4.8)
And furthermore,
limn→∞
suprn≤q<n−1
( q(n−1)(n−1−q))2
ξ( qn−1)
= 0. (4.9)
Proof. Set a = qn−1 , then it is needed to prove
suprn≤q<n−1
q(n−1)(n−1−q)
ξ( qn−1)
= suprn≤q<n−1
an−1−qξ(a)
→ 0.
Recall
ξ(x) = 2
∫ 1
0
ux2
1− uxdu = 2x2η(x),where η(x) =
∫ 1
0
u
1− uxdu.
η(x) is a monotonically increasing function and 12 ≤ η(x) ≤ ∞ for x ∈ [0, 1].
12
Let h be any positive constant in h ∈ (0, 12).
When rn ≤ q < h(n− 1), η(a) ≥ 12 and
suprn≤q<n−1
an−1−qξ(a)
≤ 2 suprn≤q<h(n−1)
1
n− 1− qn− 1
q
= 2 suprn≤q<h(n−1)
1
q(1− a)≤ 2
rn(1− h).
When h(n− 1) ≤ q ≤ (1− h)(n− 1), h ≤ a ≤ 1− h, we have
suph(n−1)≤q≤(1−h)(n−1)
an−1−qξ(a)
= suph(n−1)≤q≤(1−h)(n−1)
1
n− 1− q1
a
1∫ 10
u1−uadu
≤ suph(n−1)≤q≤(1−h)(n−1)
1
n− 1− q1
a
1∫ a0
11−tdt
= suph(n−1)≤q≤(1−h)(n−1)
1
n− 1− q1
alog(1− a)
= suph(n−1)≤q≤(1−h)(n−1)
1
n− 1
1
a
1
1− alog(1− a)
≤ 1
n− 1
1
h2log(1− h).
When (1− h)(n− 1) < q < n− 1,
sup(1−h)(n−1)<q<n−1
an−1−qξ(a)
≤ sup(1−h)(n−1)<q<n−1
1
1− h1
η(1− h).
Combining the 3 cases above and selecting hn = 1√n
, we get (4.7).
Sinceq
(n− 1)(n− q − 1)≤ 1 for rn ≤ q < n− 1
(4.8) follows from (4.7).
This completes the proof of Lemma 4.3.
Lemma 4.3.2. Let rn →∞ and rn/n→ 0 as n→∞. Then
q∑i=1
(1
n− i− 1
n− 1) = − log(1− q
n− 1)− q
n− 1+O(
q
(n− 1)(n− 1− q)) (4.10)
andq∑i=1
(log(n−1)−log(n−i) = (n−q− 1
2) log(1− q
n− 1)+q+O(
q
(n− 1)(n− q − 1)) (4.11)
uniformly on rn ≤ q < n− 1.
13
Proof. By the partial sum of harmonic series,
k∑i=1
1
i= log k + γ +
1
2k+ τ(k),
where γ is the Euler-Mascheroni constant and τ(k) = o( 1k2
) as k → ∞. See ”Sloane’s
A082912 : Sum of an terms of harmonic series is> 10n”, The On-Line Encyclopedia of
Integer Sequences. OEIS Foundation.
First, note that
q∑i=1
(1
n− i− 1
n− 1)
=n−1∑i=1
1
i−n−1−q∑i=1
1
i− q
n− 1
=− log(1− q
n− 1)− q
n− 1− q
2(n− 1)(n− 1− q)+ τ(n− 1− q).
To show (4.10), it suffices to show τ(n−1− q) = O( q(n−1)(n−1−q)). Note that τ(k) ≤ c
k2,
k ≥ 1 for some c > 0. We can also verify that
n− 1
q(n− 1− q)≤ 2.
As a result,
τ(n− 1− q) =n− 1
q(n− 1− q)q
(n− 1)(n− q − 1)≤ 2cq
(n− 1)(n− 1− q). (4.12)
This finishes the proof of (4.10).
To show (4.11), we apply the Stirling formula. Recall the Stirling formula from
Ahlfors(1979) is equivalent to
log Γ(x) = (x− 1
2) log(x)− x+
1
2log(2π) +
1
12x+O(
1
x2) (4.13)
as x→∞.
Take x = n− 1 and x = n− q− 1 respectively and take the difference, then we have
14
as n→∞,
log Γ(n− 1)− log Γ(n− q − 1)
=(n− 3
2) log(n− 1)− (n− q − 3
2) log(n− q − 1)− q
+1
12(
1
n− 1− 1
n− q − 1) +O(
1
(n− 1)2) +O(
1
(n− q − 1)2)
=(n− 3
2) log(n− 1)− (n− q − 3
2) log(n− q − 1)− q
− q
12(n− 1)(n− q − 1)+O(
1
(n− 1)2) +O(
1
(n− q − 1)2)
=(n− 3
2) log(n− 1)− (n− q − 3
2) log(n− q − 1)− q
+O(q
(n− 1)(n− q − 1)).
In the last step, we have used the following estimate
1
(n− 1)2+
1
(n− q − 1)2= O(
q
(n− 1)(n− q − 1)). (4.14)
For any rn ≤ q < n− 1, uniformly,
1
(n− 1)2≤ q
(n− 1)(n− q − 1)
which, combined with (4.11), gives (4.13).
Note that for any integer m ≥ 1, Γ(m) = (m− 1)!. Then we have
q∑i=1
(log(n− 1)− log(n− i))
=q log(n− 1)− (log Γ(n− 1)− log Γ(n− q − 1)) + log(1− q
n− 1)
=q log(n− 1)− (n− 3
2) log(n− 1) + (n− q − 3
2) log(n− q − 1) + q
+O(q
(n− 1)(n− q − 1)) + log(1− q
n− 1)
=(n− q − 1
2) log(1− q
n− 1) + q +O(
q
(n− 1)(n− q − 1)).
This finishes the proof of (4.11).
15
Lemma 4.3.3. (Lemma 2.1 from Jiang and Qi(2013)) Let b := b(x) be a real-valued
function defined on (0,∞).
logΓ(x+ b)
Γ(x)= (x+ b) log(x+ b)− x log x− b− b
2x+O(
b2 + |b|+ 1
x2) (4.15)
holds uniformly on b ∈ [−lx, lx] for any given constant l ∈ (0, 1). Furthermore, as
x→∞,
log[Γ(x+ b)
Γ(x)· Γ(z)
Γ(z + b)] = b(log x−log z)+
b2 − b2
(1
x−1
z)+O(
|b|3(x− z)x3
)+O(b2 + |b|+ 1
x2)
(4.16)
uniformly over b ∈ [−lx/2, lx/2] and z ∈ [x/2, x] for any l ∈ (0, 1).
Lemma 4.3.4. Let t = tn be a bounded variable with respect to n. Let rn → ∞ and
rn/n→ 0 as n→∞. Then
log [Γ(n−1
2 )
Γ(n−12 + t)
]qΓq(
n−12 + t)
Γq(n−1
2 )
=t[(q − n+3
2) log(1− q
n− 1)− n− 2
n− 1q] + t2
ξ( qn−1)
2
+O(q
(n− 1)(n− q − 1))(t+ t2) +O(
|t|3q2
n3) +O(
q(|t|2 + |t|+ 1)
n2).
holds uniformly for rn ≤ q < n− 1.
Proof. From the definition of Γp(x) in (4.1) we have
log [Γ(n−1
2 )
Γ(n−12 + t)
]qΓq(
n−12 + t)
Γq(n−1
2 )= −
q∑i=1
log [Γ(n−1
2 + t)
Γ(n−12 )
Γ(n−i2 )
Γ(n−i2 + t)]. (4.17)
By applying Lemma 4.3.3 to the summands, we have
log[Γ(n−1
2 + t)
Γ(n−1
2
) ·Γ(n−i
2
)Γ(n−i
2 + t) ]
= t(logn− 1
2− log
n− i2
) + (t2 − t)( 1
n− 1− 1
n− i)
+O(|t|3(i− 1)
n3) +O(
|t|2 + |t|+ 1
n2)
= t(log(n− 1)− log(n− i)− (1
n− 1− 1
n− i))
+ t2(1
n− 1− 1
n− i) +O(
|t|3(i− 1)
n3) +O(
|t|2 + |t|+ 1
n2)
16
uniformly for 1 ≤ i ≤ q. Then by Lemma 4.3.2,
q∑i=1
log [Γ(n−1
2 + t)
Γ(n−12 )
Γ(n−i2 )
Γ(n−i2 + t)]
=t(
q∑i=1
[log(n− 1)− log(n− i)] +
q∑i=1
(1
n− i− 1
n− 1))
− t2q∑i=1
(1
n− i− 1
n− 1) +O(
q∑i=1
(|t|3(i− 1)
n3)) +O(
q(|t|2 + |t|+ 1)
n2)
=t[(n− q − 1
2) log(1− q
n− 1) + q +O(
q
(n− 1)(n− q − 1))− log(1− q
n− 1)
− q
n− 1]− t2[
1
2ξ(
q
n− 1)−O(
q
(n− 1)(n− 1− q))] +O(
|t|3q(q − 1)/2
n3)
+O(q(|t|2 + |t|+ 1)
n2)
=− t[(q − n+3
2) log(1− q
n− 1)− n− 2
n− 1q]− t[O(
q
(n− 1)(n− q − 1))]
−t2[1
2ξ(
q
n− 1)]− t2[O(
q
(n− 1)(n− 1− q))]
+[O(|t|3q2
n3) +O(
q(|t|2 + |t|+ 1)
n2)],
which together with (4.17) completes the proof of the lemma.
Chapter 5
Proof of Theorem
To prove (3.4), it suffices to show that
E exp[( logWn − µn
σn
)s] = exp (−µns
σn)E[W
sσnn ] −→ es
2/2
as n→∞ for all s such that |s| ≤ 1 and for corresponding µn and σn, or equivalently
logEW tn = µnt+
σ2nt
2
2+ o(1) (5.1)
as n→∞ with |σnt| ≤ 1. By Lemma 4.2.1,
EW tn =
Γp(n−1
2 + t)
Γp(n−1
2 )
k∏i=1
Γqi(n−1
2 )
Γqi(n−1
2 + t)
= [Γ(n−1
2 )
Γ(n−12 + t)
]pΓp(
n−12 + t)
Γp(n−1
2 )/
k∏i=1
[Γ(n−1
2 )
Γ(n−12 + t)
]qiΓqi(
n−12 + t)
Γqi(n−1
2 ).
By taking logarithms on both sides, we have
logEW tn = −
p∑i=1
log[Γ(n−1
2 + t)
Γ(n−1
2
) Γ(n−i
2
)Γ(n−i
2 + t) ] +
k∑i=1
qi∑j=1
log[Γ(n−1
2 + t)
Γ(n−1
2
) Γ(n−j
2
)Γ(n−j
2 + t) ].
(5.2)
17
18
For the first sum of the right hand side, set c = p − n + 32 and g(t) = t2 + |t| + 1. By
Lemma 4.3.4 with q = p, we have
−p∑i=1
log[Γ(n−1
2 + t)
Γ(n−1
2
) Γ(n−i
2
)Γ(n−i
2 + t) ]
=t[c log(1− p
n− 1)− n− 2
n− 1p] +
t2
2ξ(
p
n− 1)
+(t+ t2)O(p
(n− 1)(n− p− 1)) +O(
|t|3p2
n3) +O(
pg(t)
n2)
=t[c log(1− p
n− 1)]− tn− 2
n− 1p+
t2
2ξ(
p
n− 1)
+(t+ t2)O(p
(n− 1)(n− p− 1)) +O(
pg(t)
n2).
For the second sum on the right hand side of (5.2), we set ci = qi − n+ 32 and note
that p =∑
i qi and maxi qip ≤ 1 − δ for some δ ∈ (0, 1
2). Again using Lemma 4.3.4 with
q = qi, we have
k∑i=1
qi∑j=1
log[Γ(n−1
2 + t)
Γ(n−1
2
) Γ(n−j
2
)Γ(n−j
2 + t) ]
=−k∑i=1
t[ci log(1− qin− 1
)− n− 2
n− 1qi] +
t2
2ξ(
qin− 1
)
−(t+ t2)k∑i=1
O(qi
(n− 1)(n− 1− qi)) +O(
|t|3∑
i q2i
n3) +O(
pg(t)
n2)
=t[−k∑i=1
ci log(1− qin− 1
)] + tn− 2
n− 1p− t2
2
k∑i=1
ξ(qi
n− 1)
−(t+ t2)k∑i=1
O(qi
(n− 1)(n− 1− qi)) +O(
pg(t)
n2).
19
Therefore,
logEW tn = t[c log(1− p
n− 1)−
k∑i=1
ci log(1− qin− 1
)] +t2
2[ξ(
p
n− 1)−
k∑i=1
ξ(qi
n− 1)]
+ (t+ t2)[O(p
(n− 1)(n− p− 1)) +
k∑i=1
O(qi
(n− 1)(n− 1− qi))] +O(
pg(t)
n2)
= tµn +t2
2σ2n
+ (t+ t2)[O(p
(n− 1)(n− p− 1)) +
k∑i=1
O(qi
(n− 1)(n− 1− qi))] +O(
pg(t)
n2),
where
µn = −c log(1− p
n− 1) + ci
k∑i=1
log(1− qin− 1
),
σ2n = ξ(
p
n− 1)−
k∑i=1
ξ(qi
n− 1).
Now it is sufficient to prove as n→∞,
(t+ t2)[O(p
(n− 1)(n− p− 1)) +
k∑i=1
O(qi
(n− 1)(n− 1− qi))] +O(
pg(t)
n2) = o(1) (5.3)
for all |σnt| ≤ 1. Recall the constraint that p ≤ n − 2, p → ∞ and maxi qip ≤ 1 − δ for
some δ ∈ (0, 12).
We can verify that
k∑i=1
O(qi
(n− 1)(n− 1− qi)) ≤
k∑i=1
O(qi
(n− 1)(n− 1− p))
≤ O(
∑qi
(n− 1)(n− 1− p)) = O(
p
(n− 1)(n− 1− p)).
Equation (5.3) becomes
(t+ t2)O(p
(n− 1)(n− p− 1)) +O(
pg(t)
n2) = o(1). (5.4)
From (4.5), we have
|t| ≤ | 1
σn| = O(
1√ξ( pn−1)
).
20
By (4.8) and (4.9) of Lemma 4.3.1 with q = p,
t[O(p
(n− 1)(n− 1− p))] = O(
p(n−1)(n−1−p)√
ξ( pn−1)
) = o(1), (5.5)
t2[O(p
(n− 1)(n− 1− p))] = O(
p(n−1)(n−1−p)
ξ( pn−1)
) = o(1). (5.6)
Recall g(t) = t2 + |t| + 1 and for any p → ∞ and p < n − 1, ξ( pn−1) = O(( p
n−1)2),
with pn−1 ≤ θ, when θ is small in θ ∈ (0, 1). Then
O(pg(t)
n2) = O(
pt2
n2) +O(
p|t|n2
) +O(p
n2)
= O(p
n2
(n− 1)2
p2) +O(
p
n2
n− 1
p) + o(1)
= O(1
p) +O(
1
n) + o(1) = o(1).
Also when θ ≤ pn−1 ≤ 1, by the increasing property of ξ(x) in (4.2),
pt2
n2≤ p
n2/ξ(
p
n− 1) ≤ p
n2/ξ(θ)→ 0.
Similarlypt
n2≤ p
n2/
√ξ(
p
n− 1) ≤ p
n2/√ξ(θ)→ 0,
and
O(pg(t)
n2) = O(
pt2
n2) +O(
p|t|n2
) +O(p
n2) = o(1).
This completes the proof of (3.4).
Chapter 6
Simulation
In this section, we compare the performance of the chi-square approximation and the
normal approximation through a finite sample simulation study. We plot the histograms
for the chi-square statistics which are used for the chi-square approximation and compare
with their corresponding limiting chi-square curves. Similarly, we plot the histograms of
the statistic which is used for the normal approximation and compare with the standard
normal curve.
We also report estimated sizes and powers of tests for the LRT tests based on the
chi-square approximation and the normal approximation. All simulations have been
done by using R, and the histograms, estimates of the sizes and powers are based on
10,000 replicates.
Table 6.1: Size and Power of LRT for Specified Normal Distribution
Size under H0 Power under H1
CLT χ2 approx. CLT χ2 approx.
p = 10, q1 = 3, q2 = 2, q3 = 2, q4 = 3 0.0359 0.049 1 0.95
p = 80, q1 = 30, q2 = 20, q3 = 20, q4 = 10 0.0482 0.0814 1 1
p = 80, q1 = 3, q2 = 2, q3 = 2, q4 = 73 0.0487 0.2462 0.9987 0.9806
p = 80, q1 = 3, q2 = 77 0.0423 0.4877 0.9942 0.8024
Sizes (alpha errors) are estimated based on 10, 000 simulations from Np(0, Ip). The powers are estimatedunder the alternative hypothesis that Σ = 0.15Jp + 0.85Ip, where Jp is matrix with all entries as 1.
21
22
CLT of p=10
log Wn
Density
-4 -2 0 2
0.0
0.1
0.2
0.3
0.4
Approximation of Chi-squareDensity
10 30 50 70
0.00
0.01
0.02
0.03
0.04
0.05
Histogram of z
z
Density
-4 -2 0 2 4
0.0
0.1
0.2
0.3
0.4
Histogram of k
k
Density
2200 2500 2800
0.000
0.001
0.002
0.003
0.004
0.005
Histogram of z
z
Density
-4 -2 0 2 4
0.0
0.1
0.2
0.3
0.4
Histogram of k
k
Density
450 600 750
0.000
0.002
0.004
0.006
0.008
0.010
Histogram of z
z
Density
-4 -2 0 2
0.0
0.1
0.2
0.3
0.4
Histogram of k
k
Density
200 300 400
0.000
0.004
0.008
0.012
Figure 6.1: Histograms of CLT and chi-square Approximation
References
[1] Ahlfors, L.V(1979), Complex Analysis McGraw-Hill, Inc., 3rd Ed.
[2] Anderson, T. W.(1984) An Introduction to Multivariate Statistical Analysis. 2nd
Ed John Wiley & Sons..
[3] Bai, Z., Jiang, D., Yao, J. and Zheng, S. (2009). Corrections to LRT on large
dimensional covariance matrix by RMT. J. Royal Stat. Soc., Ser. B16, 296-298.
[4] Chen, S., Zhang, L., and Zhong, P. (2010). Tests for highdimensional covariance
matrices. J. Amer. Stat. Assoc. 105, 810-819.
[5] Ledoit, O. and Wolf, M. (2002) Some hypothesis test for the covariance matrix when
the dimension is large compared to the sample size. Ann. Statist. 30,1081-1102.
[6] Jiang, D., Jiang, T. and Yang, F.(2012) Likelihood ratio tests for covariance matri-
ces of high-dimensional normal distributions. J. Stat. Plann. Inference 142, 2241-
2256.
[7] Jiang, T. and Yang, F.(2013) Central limit theorems for classical likelihood ratio
tests for high-dimensional normal distributions. Ann of Statistics.
[8] Jiang, T. and Qi, Yongcheng.(2013) Central limit theorems for classical likelihood
ratio tests for high-dimensional normal distributions II. Preprint.
[9] Morrison, D. F. (2004). Multivariate Statistical Methods. Duxbury Press, 4th Ed.
[10] Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley, New
York.
23
24
[11] Schott, J.R.(2007). A test for the equality of covariance matrices when the dimen-
sion is large relative to the sample sizes. Comput. Statist. Data Anal. 51, 6535-6542.
[12] Schott, J. R. (2005). Testing for complete independence in high dimensions.
Biometrika 92, 951-956.
[13] Schott, J. R. (2001). Some tests for the equality of covariance matrices. J. Stat.
Plann. Inference 94, 25-36.
[14] Wilks, S. S. (1935). On the independence of k sets of normally distributed statistical
variables.Econometrica 3, 309-326.