On Fisher information matrices and profile log-likelihood functions in generalized skew-elliptical...

16
METRON - International Journal of Statistics 2010, vol. LXVIII, n. 3, pp. 235-250 CHRISTOPHE LEY – DAVY PAINDAVEINE On Fisher information matrices and proÀle log-likelihood functions in generalized skew-elliptical models Summary - In recent years, the skew-normal models introduced in Azzalini (1985) have enjoyed an amazing success, although an important literature has reported that they exhibit, in the vicinity of symmetry, singular Fisher information matrices and stationary points in the proÀle log-likelihood function for skewness, with the usual unpleasant consequences for inference. For general multivariate skew-symmetric and skew-elliptical models, the open problem of determining which symmetric kernels lead to each such singularity has been solved in Ley and Paindaveine (2010). In the present paper, we provide a simple proof that, in generalized skew-elliptical models involving the same skewing scheme as in the skew-normal distributions, Fisher information matrices, in the vicinity of symmetry, are singular for Gaussian kernels only. Then we show that if the proÀle log-likelihood function for skewness always has a point of inÁection in the vicinity of symmetry, the generalized skew-elliptical distribution considered is actually skew-(multi)normal. In addition, we show that the class of multivariate skew-t distributions (as deÀned in Azzalini and Capitanio 2003), which was not covered by Ley and Paindaveine (2010), does not suffer from singular Fisher information matrices in the vicinity of symmetry. Finally, we brieÁy discuss the implications of our results on inference. Key Words - Generalized skew-elliptical distributions; ProÀle likelihood; Singular Fisher information matrix; Skewness; Skew-normal distributions; Skew-t distribu- tions. 1. Introduction Azzalini (1985) introduced the so-called skew-normal model, which embeds the univariate normal distributions into a Áexible parametric class of (possibly) skewed distributions. More formally, a random variable X is said to be skew- The Àrst author was supported by a Mandat d’Aspirant of the Fonds National de la Recherche ScientiÀque, Communaut´ e fran¸ caise de Belgique. The second author was supported by an A.R.C. contract of the Communaut´ e fran¸ caise de Belgique. Davy Paindaveine is also member of ECORE, the association between CORE and ECARES. Received July 2009 and revised December 2009.

Transcript of On Fisher information matrices and profile log-likelihood functions in generalized skew-elliptical...

METRON - International Journal of Statistics2010, vol. LXVIII, n. 3, pp. 235-250

CHRISTOPHE LEY – DAVY PAINDAVEINE

On Fisher information matricesand pro le log-likelihood functionsin generalized skew-elliptical models

Summary - In recent years, the skew-normal models introduced in Azzalini (1985)have enjoyed an amazing success, although an important literature has reported thatthey exhibit, in the vicinity of symmetry, singular Fisher information matrices andstationary points in the pro le log-likelihood function for skewness, with the usualunpleasant consequences for inference. For general multivariate skew-symmetric andskew-elliptical models, the open problem of determining which symmetric kernelslead to each such singularity has been solved in Ley and Paindaveine (2010). Inthe present paper, we provide a simple proof that, in generalized skew-ellipticalmodels involving the same skewing scheme as in the skew-normal distributions, Fisherinformation matrices, in the vicinity of symmetry, are singular for Gaussian kernelsonly. Then we show that if the pro le log-likelihood function for skewness alwayshas a point of in ection in the vicinity of symmetry, the generalized skew-ellipticaldistribution considered is actually skew-(multi)normal. In addition, we show that theclass of multivariate skew-t distributions (as de ned in Azzalini and Capitanio 2003),which was not covered by Ley and Paindaveine (2010), does not suffer from singularFisher information matrices in the vicinity of symmetry. Finally, we brie y discussthe implications of our results on inference.

Key Words - Generalized skew-elliptical distributions; Pro le likelihood; SingularFisher information matrix; Skewness; Skew-normal distributions; Skew-t distribu-tions.

1. Introduction

Azzalini (1985) introduced the so-called skew-normal model, which embedsthe univariate normal distributions into a exible parametric class of (possibly)skewed distributions. More formally, a random variable X is said to be skew-

The rst author was supported by a Mandat d’Aspirant of the Fonds National de la RechercheScienti que, Communaute francaise de Belgique. The second author was supported by an A.R.C.contract of the Communaute francaise de Belgique. Davy Paindaveine is also member of ECORE,the association between CORE and ECARES.

Received July 2009 and revised December 2009.

236 CHRISTOPHE LEY – DAVY PAINDAVEINE

normal with location parameter μ ∈ R, scale parameter σ ∈ R+0 and skewness

parameter λ ∈ R if it admits the pdf

x �→ 2σ−1φ(x − μ

σ

)�

(x − μ

σ

)), x ∈ R, (1.1)

where φ and � respectively denote the probability density function (pdf) andcumulative distribution function (cdf) of the standard normal distribution. Anintensive study of these distributions has revealed some nice stochastic prop-erties, which led to numerous generalizations of that univariate model. Tocite a few, Azzalini and Dalla Valle (1996) introduced the multivariate skew-normal distribution by replacing the normal kernel φ with its k-variate extension,Branco and Dey (2001) and Azzalini and Capitanio (2003) de ned multivari-ate skew-t distributions, while Azzalini and Capitanio (1999) and Branco andDey (2001) proposed skew-elliptical distributions based on elliptically symmet-ric kernels. Azzalini and Capitanio (2003) also established a link between thedistinct constructions of skew-elliptical distributions, and extended the latterinto a more general class of skewed distributions containing all the pre-citedexamples. This generalization is very much in the spirit of the so-called gen-eralized skew-elliptical distributions analyzed in Genton and Loper do (2005),associated with pdfs of the form

x �→ 2|�|−1/2 f (�−1/2(x − μ)) �(�−1/2(x − μ)), x ∈ Rk, (1.2)

where μ ∈ Rk is a location parameter, � is a scatter parameter belonging to the

class Sk of symmetric and positive de nite k×k matrices (throughout, |A| standsfor the determinant of A, and A1/2, for A ∈ Sk , denotes the symmetric square-root of A, although any other square-root could be used), f (the symmetrickernel) is a spherically symmetric pdf, and where the mapping � : R

k → [0, 1]satis es �(−x) = 1 − �(x) ∀x ∈ R

k . The densities in (1.2) in turn can begeneralized by relaxing the spherical symmetry assumption on f into a weakercentral symmetry one (under which f (−x) = f (x) ∀x ∈ R

k); see Wang etal. (2004). For further information about models of skewed distributions andrelated topics, we refer the reader to the review paper Azzalini (2005).

Besides the quite appealing stochastic properties, the skew-normal model,as already noticed in Azzalini (1985), suffers from two closely related inferen-tial problems: at λ = 0, corresponding to the symmetric situation, (i) the pro lelog-likelihood function for λ always admits a stationary point, and consequently,(i i) the Fisher information matrix for the three parameters in (1.1) is singular.These two unpleasant features are treated in, e.g., Pewsey (2000) or Azzaliniand Genton (2008). Most authors show that, if a normal kernel is used, insome speci c class of skewed densities containing those of (1.1), the problemsin (i) and (i i) occur. However, from an inferential point of view, it is more

Singularities in skew-elliptical models 237

important to derive an “iff” result, stating the exact conditions under whichsuch singularities happen. Some “iff” results have been provided in parametricsubclasses such as the skew-exponential one (DiCiccio and Monti 2004) orskew-t one (Gomez et al. 2007, DiCiccio and Monti 2011), until Ley and Pain-daveine (2010) determined the collection of symmetric kernels bringing suchsingularities in broad semiparametric classes of multivariate skew-symmetricdistributions.

In this paper, we rst provide a simple and direct proof that, in gener-alized skew-elliptical models involving the same skewing scheme as in theskew-normal model, Fisher information matrices, in the vicinity of symmetry,are singular for Gaussian kernels only (Section 2). Then, after discussing thebehavior of the rst derivative of the pro le log-likelihood function for skew-ness, we investigate the second derivative of that function; more precisely, weshow that, if the pro le log-likelihood function has a systematic point of in-ection in the vicinity of symmetry (“systematic” here meaning for any sampleof any sample size), then the generalized skew-elliptical distribution consideredis actually skew-(multi)normal (Section 3). Still within the class of generalizedskew-elliptical distributions, we then turn our attention towards the multivari-ate skew-t distributions as de ned in Azzalini and Capitanio (2003); thesedistributions do not belong to the class of densities considered in Ley andPaindaveine (2010), but are the commonly used form of skew-t distributionsin the literature. We show (Section 4) that the “iff” result from DiCiccio andMonti (2011) concerning the aforementioned singularities actually extends fromthe univariate to the multivariate setup. Finally, we brie y discuss the impactof our results on inference (Section 5).

2. On the singularity of Fisher information matrices

As mentioned in the Introduction, there exist many distinct generalizationsof the skew-normal distributions described in Azzalini (1985). We now focuson generalized skew-elliptical distributions involving the same skewing schemeas in the skew-normal model. More precisely, within the class of densities oftype (1.2), we consider densities of the form

x �→ f �g;μ,�,λ(x) :=2|�|−1/2g(‖�−1/2(x−μ)‖)�(λ′�−1/2(x−μ)), x ∈ R

k, (2.3)

where μ ∈ Rk is a location parameter, � ∈ Sk a scatter parameter, λ ∈

Rk a skewness parameter, where the radial density g : R

+0 → R

+ satis es(a)

∫Rk g(‖x‖) dx = 1 and (b)

∫Rk ‖x‖2g(‖x‖) dx = k, and where the skewing

function � is described in Assumption (A) below. Condition (b) allows toidentify the elliptically symmetric part (obtained for λ = 0) of (2.3); on onehand, it guarantees that � and g are identi able, and on the other hand it is

238 CHRISTOPHE LEY – DAVY PAINDAVEINE

so that the (multi)normal case is associated with the Gaussian kernel g(r) :=ck exp(−r2/2), where ck := (2π)−k/2 is the normalizing constant determinedby Condition (a). The functions g and � further need to satisfy

Assumption (A). (i) The radial density g : R+0 → R

+ belongs to the collec-tion G of a.e. positive, continuously differentiable functions for which Conditions(a)-(b) above hold and for which, letting ϕg := −g′/g,

∫Rk ϕ2g(‖x‖)g(‖x‖) dx

and∫

Rk ‖x‖2ϕ2g(‖x‖)g(‖x‖) dx are nite. (i i) The skewing function � : R →[0, 1] is a continuously differentiable function that satis es �(−x) = 1 − �(x)for all x ∈ R, and �′(0) �= 0.

We adopt the following notation. Let vec� denote the k2-vector obtainedby stacking the columns of � on top of each other, and vech� be the k(k+1)/2-subvector of vec� where only upper diagonal entries in � are considered (inorder to avoid heavy notations, we will write ∇� instead of ∇vech� in whatfollows). De ne �⊗2 := � ⊗ �, where ⊗ stands for the usual Kroneckerproduct, and let I and Kk be the -dimensional identity matrix and the k2×k2

commutation matrix, respectively. Finally, put Jk := (vec Ik)(vec Ik)′ and let Pkbe the matrix such that P ′

k(vech A) = vec A for any k× k symmetric matrix A.Under Assumption (A), the scores for μ, vech� and λ, in the vicinity of

symmetry (that is, at λ = 0), are the quantities dg;μ(x), dg;�(x) and dg;λ(x),respectively, in

g;μ,�,λ(x) :=( dg;μ(x)dg;�(x)dg;λ(x)

)

:=

⎛⎜⎜⎜⎜⎜⎜⎝

ϕg(||�−1/2(x − μ)||)||�−1/2(x − μ)|| �−1(x − μ)

1

2Pk(�⊗2)−1/2vec

(ϕg(||�−1/2(x−μ)||)

||�−1/2(x−μ)|| �−1/2(x−μ)(x−μ)′�−1/2− Ik

)

2�′(0)�−1/2(x−μ)

⎞⎟⎟⎟⎟⎟⎟⎠

,

where the factor 2 in the λ-score follows from the fact that �(0) = 1/2. Thecorresponding Fisher information matrix is then given by

�g :=∫

Rkg;μ,�,λ(x)

′g;μ,�,λ(x) f

�g;μ,�,0(x)dx,

a matrix that naturally partitions into

�g =( �g;μμ 0 �g;μλ

0 �g;�� 0�g;λμ 0 �g;λλ

),

Singularities in skew-elliptical models 239

with

�g;μμ := 1

kE[ϕ2g(‖Z‖)]�−1, �g;μλ := 2�′(0)�−1/2 =: �g;λμ,

�g;λλ := 4

k(�′(0))2E[‖Z‖2]Ik,

and

�g;��

:= 1

4Pk(�

⊗2)−1/2{(k(k+2))−1E[‖Z‖2ϕ2g(‖Z‖)](Ik2+Kk+ Jk)− Jk}(�⊗2)−1/2P ′k;

here, Z stands for a random vector with the same distribution as �−1/2(X−μ),where X has pdf f �

g;μ,�,0. The zero blocks in �g can easily be obtained bynoticing that the score in vech� is symmetric with respect to x−μ, while thescores in μ and λ are anti-symmetric with respect to the same quantity.

The expression of �g enables us to have

Theorem 2.1. Let Assumption (A) hold. Then, �g is singular iff g is the Gaussiankernel (i.e., g(r) = ck exp(−r2/2) for all r > 0).

We here propose the following simple and direct proof of this result.

Proof of Theorem 2.1. Since �g;�� is invertible (see, e.g., Hallin and Paindav-eine 2006), �g is singular iff

�subg :=( �g;μμ �g;μλ

�g;λμ �g;λλ

)

is singular. Now, since �g;μμ = 1kE[ϕ

2g(‖Z‖)]�−1 is invertible, we have

that |�subg | = |�g;μμ||�g;λλ −�g;λμ�−1g;μμ�g;μλ|. Hence, �g is singular iff �g;λλ −

�g;λμ�−1g;μμ�g;μλ = 4(�′(0))2(E[‖Z‖2]E[ϕ2g(‖Z‖)] − k2)Ik/(kE[ϕ2g(‖Z‖)]) is,

which, under Assumption (A), is the case iff

E[‖Z‖2]E[ϕ2g(‖Z‖)] = k2. (2.4)

Now, by applying the Cauchy-Schwarz inequality and by integrating by parts,we obtain √

E[‖Z‖2]E[ϕ2g(‖Z‖)] ≥ E[‖Z‖ϕg(‖Z‖)]

=(∫ ∞

0rk−1g(r)dr

)−1 (∫ ∞

0rϕg(r)r

k−1g(r)dr)

= k.

240 CHRISTOPHE LEY – DAVY PAINDAVEINE

Equation (2.4) indicates that the Cauchy-Schwarz inequality here is an equality,which implies that there exists some real number η such that ϕg(r) = ηr forall r > 0. Solving the latter differential equation yields g(r) = γ exp[−ηr2/2].Since

∫Rk g(‖x‖) dx = 1 and

∫Rk ‖x‖2g(‖x‖) dx = k, we must have γ = ck

and η = 1, which establishes the result.

Theorem 2.1 states that, in the semiparametric class of generalized skew-elliptical densities of type (2.3), only the skew-(multi)normal parametric sub-model may suffer from the inferential problems usually associated with singularFisher information matrices. We refer to Ley and Paindaveine (2010) for a dis-cussion of such issues (see also Section 5 below), as well as for possibleextensions of Theorem 2.1 to the skew-symmetric setup. Here, we only stressthat the simple proof above can be straightforwardly adapted to the setup where�, in the argument of the mapping � in (2.3), is replaced with the identitymatrix Ik (Genton and Loper do 2005 also consider generalized skew-ellipticaldistributions of this form).

3. On the profile log-likelihood function for skewness

For a given random sample X (n) := (X1, . . . , Xn) from (2.3), we de nethe pro le log-likelihood function for the skewness parameter λ as

λ �→ L�g;λ(X

(n)) := supμ∈Rk, �∈Sk

L�g;μ,�,λ(X

(n)), λ ∈ Rk, (3.5)

where L�g;μ,�,λ(X

(n)) := ∑ni=1 log f

�g;μ,�,λ(Xi) is the standard log-likelihood

function associated with X (n). This expression can be rewritten under themore tractable form L�

g;λ(X(n)) = L�

g;μg(λ),�g(λ),λ(X (n)), where μg(λ) and �g(λ)

stand for the MLEs of μ and � at xed λ. This equality implies that∇λ L�

g;λ(X(n)) = ∇λL�

g;μ,�,λ(X(n))|(μ,�,λ)=(μg(λ),�g(λ),λ) (see Ley and Paindav-

eine 2010 for explicit calculations, or Barndorff-Nielsen and Cox 1994 forgeneral results on pro le log-likelihood functions). Consequently, a necessaryand suf cient condition for the pro le log-likelihood function to always admita stationary point at λ = 0 is that

∇λ L�g;λ(X

(n))|λ=0 = 2�′(0)(�g(0))−1/2

n∑i=1

(Xi − μg(0)) = 0

for any sample X (n). Since �g(0) ∈ Sk , hence is invertible, this means thatμg(0), the MLE for the location parameter μ at λ = 0, must coincide, forany X (n), with X (n) := 1

n

∑ni=1 Xi . The result in Theorem 3.1 below — which

Singularities in skew-elliptical models 241

is in line with the statement of Theorem 2.1 — thus directly follows from awell-known characterization property of (multi)normal distributions which canbe traced back to Gauss (see Azzalini and Genton 2007 for a recent account).

Theorem 3.1. Let Assumption (A) hold. Then, the pro le log-likelihood functionfor skewness in (3.5) admits, for any sample X (n) of size n ≥ 3, a stationary pointat λ = 0 iff g is the Gaussian kernel.

Although Theorem 3.1 only considers the rst derivative of the pro le log-likelihood function, it also shows that, in the class of generalized skew-ellipticaldensities of type (2.3), only the skew-(multi)normal ones may exhibit a system-atic saddle point at λ = 0 in the pro le log-likelihood function for skewness(since a saddle point is in particular a stationary point). In contrast with this,we now investigate the conditions under which the pro le log-likelihood func-tion for skewness has a systematic point of in ection at λ = 0, without requiringanything on the rst derivative (hence, in particular, without requiring that thispoint of in ection is a saddle point).

To avoid any confusion, we will say that x0 ∈ Rk is a point of in ection

for h : Rk → R (of class C1) if there is no neighborhood of (x ′

0, h(x0))′ ∈ R

k+1

in which the graph Gh of h is fully contained in one of the two (closed)halfspaces determined by the hyperplane tangent to Gh at (x ′

0, h(x0))′. De ning

Assumption (A′) as the reinforcement of Assumption (A) obtained by furtherrequiring that both g and � are of class C2 (which clearly implies that �′′(0) =0), we have the following result.

Theorem 3.2. Let Assumption (A′) hold. Then, if the pro le log-likelihood functionfor skewness in (3.5) admits a point of in ection at λ = 0 for any sample X (n), gis the Gaussian kernel.

Proof of Theorem 3.2. First note that, under Assumption (A′), the pro le log-likelihood function for skewness is of class C2. Hence, if the latter has asystematic point of in ection at λ = 0, there exists a non-zero k-vector v suchthat

v′[(∇λ∇′λ)L

�g;λ(X

(n))|λ=0]v = 0 (3.6)

for any sample X (n); throughout, we denote by (∇a∇′b)ha,b(.) the matrix whose

entry (i, j) is given by∂2ha,b(.)∂ai ∂bj

. By using the main result in Pate eld (1977),

we can write

(∇λ∇′λ)L

�g;λ(X

(n)) = [(∇λ∇′

λ)L�g;α,λ(X

(n))

−(∇λ∇′α)L

�g;α,λ(X

(n))[(∇α∇′α)L

�g;α,λ(X

(n))]−1(∇α∇′λ)L

�g;α,λ(X

(n))]|(α,λ)=(αg(λ),λ),

242 CHRISTOPHE LEY – DAVY PAINDAVEINE

where α = (μ′, (vech�)′)′ and αg(λ) stands for the MLE of α at xed λ.By (3.6), we then have

v′{[

1

n(∇λ∇′

λ)L�g;α,λ(X

(n))

− 1

n(∇λ∇′

α)L�g;α,λ(X (n))

[1

n(∇α∇′

α)L�g;α,λ(X

(n))

]−11

n(∇α∇′

λ)L�g;α,λ(X

(n))

]∣∣∣(α,λ)=(αg(0),0)

}v=0(3.7)

for any sample X (n). Now, writing Mg(x) := (∇x∇′x) log g(‖x‖), direct com-

putations show that

(∇λ∇′μ)L�

g;μ,�,λ(X(n))|λ=0 = −2n�′(0)�−1/2,

(∇μ∇′μ)L�

g;μ,�,λ(X(n))|λ=0 =

n∑i=1

�−1/2Mg(�−1/2(Xi − μ))�−1/2,

and

(∇λ∇′λ)L

�g;μ,�,λ(X

(n))|λ=0 = −4(�′(0))2�−1/2[

n∑i=1

(Xi − μ)(Xi − μ)′]

�−1/2,

whereas it is easy to show (by using the expression for �nξ, f1

on page 2246 ofPaindaveine2008) that, for some mappings Ng;� and Pg;� satisfying E[Ng;�(X−μ)] = 0 and E[Pg;�(X − μ)] = 0 if X admits the pdf f �

g;μ,�,0,

(∇μ∇′�)L�

g;μ,�,λ(X(n))|λ=0 =

n∑i=1

Ng;�(Xi − μ),

and

(∇λ∇′�)L�

g;μ,�,λ(X(n))|λ=0 =

n∑i=1

Pg;�(Xi − μ).

Since 1n (∇μ∇′

�)L�g;μ,�,λ(X

(n))|λ=0 and 1n (∇λ∇′

�)L�g;μ,�,λ(X

(n))|λ=0 converge tozero in probability as n → ∞, substituting in (3.7) and taking the limit inprobability as n → ∞ yields

−4(�′(0))2 v′(�−1/2E[(X−μ)(X−μ)′]�−1/2+(E[Mg(�−1/2(X−μ))])−1)v = 0.

Dividing by −4(�′(0))2, letting, as before, Z stand for a random vector withthe same distribution as �−1/2(X−μ), where X has pdf f �

g;μ,�,0, and integrating

Singularities in skew-elliptical models 243

by parts in the second term, yields

v′(1

kE[‖Z‖2]Ik +

(−1kE[ϕ2g(‖Z‖)]Ik

)−1)v

= E[‖Z‖2]E[ϕ2g(‖Z‖)]− k2

kE[ϕ2g(‖Z‖)] ‖v‖2 = 0.

Since v is a non-zero vector, we must have that E[‖Z‖2]E[ϕ2g(‖Z‖)] = k2,which coincides with (2.4). We therefore obtain, in the same way as in theproof of Theorem 2.1, that g must be the Gaussian kernel, which concludesthe proof.

This result shows that, in the class of generalized skew-elliptical densitiesof type (2.3), a systematic point of in ection (at λ = 0) again can happenat skew-(multi)normal densities only. Similarly to the previous section, thetheorem remains valid if � is replaced with Ik in the argument of � in (2.3).The proof is actually easier in this setup since (∇λ∇′

�)L�g;μ,�,λ(X

(n))|λ=0 is thenexactly zero — and not only a oP(n) quantity, as in the proof of Theorem 3.2.

Strengthening Theorem 3.2 into an “iff” result would require an inves-tigation of higher-order derivatives of the pro le log-likelihood function forskewness. This has been achieved for the univariate case in Chiogna (2005),where it is shown that the pro le function, under skew-normal distributions,always admits a point of in ection in the vicinity of symmetry. Hence, thedesired “iff” statement is formally established in the scalar case, and shouldextend to the general multinormal setup. Finally, whereas Theorem 3.1 can beshown to hold in broader multivariate skew-symmetric models (see Ley andPaindaveine 2010 for a precise statement), Theorem 3.2 seems to be limitedto generalized skew-elliptical distributions, mainly due to the very subtle inter-play, in the multivariate skew-symmetric setup, between the nature of points ofin ection and the rank of the corresponding Hessian matrices.

4. Investigating the case of multivariate skew-t distributions

In this section, we study the possible singularity of Fisher informationmatrices in multivariate skew-t models. It should be noted that the commonlyadopted form of those distributions (namely the one from Azzalini and Capi-tanio 2003) is based on a skewing function that is not of the form �(λ′·) (seethe pdf (4.8) below), which explains that the important case of multivariateskew-t densities does not enter the framework of the previous sections and hasto be treated in this separate section.

244 CHRISTOPHE LEY – DAVY PAINDAVEINE

For any ν > 0 and any positive integer k, consider the mapping

r �→ gk,ν(r) = �((ν + k)/2)

(πν)k/2�(ν/2)(1+ r2/ν)−(ν+k)/2, r > 0,

which is such that x �→ gk,ν(‖x‖), x ∈ Rk, is the pdf of the k-dimensional

standard t distribution with ν degrees of freedom. Also denote by y �→ Tη(y),y ∈ R, the cdf of the scalar t distribution with η degrees of freedom. Mul-tivariate skew-t densities — as de ned by Azzalini and Capitanio (2003) —then have densities of the form

x �→ f STμ,�,λ,ν(x) := 2|�|−1/2gk,ν(||�−1/2(x − μ)||)

× Tν+k

(λ′σ−1(x − μ)

(ν + k

||�−1/2(x − μ)||2 + ν

)1/2), x ∈ R

k,(4.8)

where μ ∈ Rk is the location parameter, � ∈ Sk the scatter parameter, λ ∈ R

k

the skewness parameter, ν > 0 the tail weight parameter, and where σ is thek × k diagonal matrix with diagonal entries σi i = �

1/2i i , i = 1, . . . , k. When

ν → ∞, (4.8) becomes the density of a k-dimensional skew-normal distribution.Azzalini and Genton (2008) provided evidence that, for any nite value

of ν, the univariate skew-t distributions do not suffer, in the vicinity of sym-metry, from a singular Fisher information matrix — and hence not from asystematic stationary point in the pro le log-likelihood function for skewness.DiCiccio and Monti (2009) formally proved that statement. As for the mul-tivariate setup, Azzalini and Genton (2008) showed that those singularities inthe vicinity of symmetry also appear in multivariate skew-normal distributions,and conjectured that, as in the univariate case, no member from the multivari-ate skew-t class should suffer from those problems. We now prove that theirconjecture actually holds.

For multivariate skew-t distributions with xed ν value, the score for(μ′, (vech�)′, λ′)′, in the vicinity of symmetry, is given by

STgk,ν ;μ,�,λ(x) :=⎛⎝ dSTgk,ν ;μ(x)

dSTgk,ν ;�(x)

dSTgk,ν ;λ(x)

⎞⎠ =

⎛⎝ dgk,ν ;μ(x)dgk,ν ;�(x)dSTgk,ν ;λ(x)

⎞⎠

=

⎛⎜⎜⎜⎜⎝

(1+k/ν)

(1+||�−1/2(x−μ)||2/ν)�−1(x − μ)

12 Pk(�

⊗2)−1/2vec(

(1+k/ν)

(1+||�−1/2(x−μ)||2/ν)�−1/2(x−μ)(x−μ)′�−1/2− Ik

)2tν+k(0)σ−1(x − μ)

(1+k/ν

1+||�−1/2(x−μ)||2/ν)1/2

⎞⎟⎟⎟⎟⎠(4.9)

Singularities in skew-elliptical models 245

where tν+k(0) stands for the derivative of y �→ Tν+k(y) at 0; the exponentST makes the notation somewhat heavy, but actually stresses the fact thatgk,ν ;μ,�,λ(x) (that is, the score associated with the g = gk,ν version of theg;μ,�,λ(x) score from Section 2) and STgk,ν ;μ,�,λ(x) do not coincide, due to thefact that both skewing mechanisms are of a different nature. As suggestedin (4.9), they differ only through the λ-part of the scores, though.

Now, note that the new λ-score dSTgk,ν ;λ(x) remains an anti-symmetric func-tion of x − μ. Hence, the symmetry properties, still with respect to x − μ,of dSTgk,ν ;μ(x) = dgk,ν ;μ(x) and dSTgk,ν ;�(x) = dgk,ν ;�(x) entail that the resultingFisher information matrix takes the form

�STgk,ν =⎛⎝ �gk,ν ;μμ 0 �STgk,ν ;μλ

0 �gk,ν ;�� 0�STgk,ν ;λμ 0 �STgk,ν ;λλ

⎞⎠

with the same quantities �gk,ν ;μμ and �gk,ν ;�� as in Section 2 (here evaluatedat the radial density gk,ν). Note that the niteness of �STgk,ν requires that ν > 2,a condition that guarantees that the parent elliptically symmetric t distributionhas nite second-order moments; see Assumption (A).

We can now state the following result.

Theorem 4.1. Fix ν ∈ (2, ∞) and an arbitrary positive integer k. Then, at anyparameter value (μ′, (vech�)′, 0′)′ ∈ R

k × Rk(k+1)/2×{0}, the Fisher information

matrix associated with the xed-ν subclass of multivariate skew-t densities in (4.8),namely �STgk,ν , is non-singular.

Proof of Theorem 4.1. First note that, for ν > 2, the radial density gk,νsatis es the g-part of Assumption (A), which entails that (see Hallin and Pain-daveine 2006) �gk,ν ;μμ and �gk,ν ;�� are invertible. It is therefore suf cientto show that the (μ, λ)-submatrix of �STgk,ν is non-singular. Denoting by Z a

k-variate random vector with the same distribution as �−1/2(X − μ), where Xadmits the pdf f STμ,�,0,ν , we easily obtain that

�gk,ν ;μμ = (1+ k/ν)2�−1/2E[

Z Z ′

(1+ ||Z ||2/ν)2

]�−1/2,

�STgk,ν ;λλ = 4(tν+k(0))2(1+ k/ν)σ−1�1/2E[

Z Z ′

(1+ ||Z ||2/ν)

]�1/2σ−1,

and

�STgk,ν ;μλ = 2tν+k(0)(1+ k/ν)3/2�−1/2E[

Z Z ′

(1+ ||Z ||2/ν)3/2

]�1/2σ−1 = (�STgk,ν ;λμ)′.

246 CHRISTOPHE LEY – DAVY PAINDAVEINE

Since �gk,ν ;μμ is invertible, we have (as in the proof of Theorem 2.1) that the(μ, λ)-submatrix of �STgk,ν is non-singular iff

�STgk,ν ;λλ.μ := �STgk,ν ;λλ − �STgk,ν ;λμ�−1gk,ν ;μμ�STgk,ν ;μλ

is non-singular.Now, letting

Ak,ν := E[

Z Z ′

(1+ ||Z ||2/ν)

]

− E[

Z Z ′

(1+ ||Z ||2/ν)3/2

] (E

[Z Z ′

(1+ ||Z ||2/ν)2

])−1E

[Z Z ′

(1+ ||Z ||2/ν)3/2

],

and

b(r)k,ν := E

[Z 21

(1+ ||Z ||2/ν)r

],

simple algebra and the fact that Z has a spherically symmetric distributionyield

�STgk,ν ;λλ.μ = 4(tν+k(0))2(1+ k/ν)σ−1�1/2Ak,ν�1/2σ−1

= (4/b(2)k,ν)

(b(1)k,νb

(2)k,ν − (b(3/2)

k,ν )2)(tν+k(0))2(1+ k/ν)σ−1�σ−1.

The Cauchy-Schwarz inequality and the fact that ‖Z‖ is an absolutely con-tinuous random variable entail that b(1)

k,νb(2)k,ν − (b(3/2)

k,ν )2 > 0. This allows toconclude since the invertibility of σ and � then guarantees that �STgk,ν ;λλ.μ isnon-singular.

Since, as ν → ∞, the pdf f STμ,�,λ,ν(x) in (4.8) tends to the pdf of a skew-multinormal density, the resulting Fisher information matrix at λ = 0 becomessingular. In this sense, Theorem 4.1 requires the xed value of ν to be nite.

Now, one might argue that, nice as it is, the non-singularity result ofTheorem 4.1 is obtained only for xed values of ν, and that one should alsoinvestigate a possible singularity of Fisher information matrices (still, in thevicinity of symmetry) in the class of multivariate skew-t densities indexedby μ, �, λ, and ν. It is indeed unrealistic to assume that the tail weightparameter ν is known in practice, so that ν should enter score functions andFisher information matrices when performing inference for such distributions.The rest of this section therefore deals with the ν-unspeci ed case.

Singularities in skew-elliptical models 247

The parameter ν enters the score function STgk,ν ;μ,�,λ(x) through a furthercomponent given by (in the vicinity of symmetry)

dSTgk,ν ;ν(x) := cv − log(1+ ||�−1/2(x − μ)||2/ν)

2

+ (ν + k)

2ν2||�−1/2(x − μ)||2

(1+ ||�−1/2(x − μ)||2/ν),

(4.10)

where cv := ddνlog

(�((ν+k)/2)

(πν)k/2�(ν/2)

), which gives rise to a Fisher information

matrix of the form

�ST =

⎛⎜⎜⎜⎝

�gk,ν ;μμ 0 �STgk,ν ;μλ 0

0 �gk,ν ;�� 0 �STgk,ν ;�ν

�STgk,ν ;λμ 0 �STgk,ν ;λλ 0

0 �STgk,ν ;ν� 0 �STgk,ν ;νν

⎞⎟⎟⎟⎠ ;

again, the extra zero blocks in �ST follow from the fact that the ν-score in (4.10)is symmetric with respect to x − μ.

Theorem 4.2. Fix an arbitrary positive integer k. Then, at any parameter value(μ′, (vech�)′, 0′, ν)′ ∈ R

k×Rk(k+1)/2×{0}×(2, ∞), the Fisher informationmatrix

associated with the class of multivariate skew-t densities in (4.8), namely �ST, isnon-singular.

Proof of Theorem 4.2. Clearly, it is suf cient to show that the (μ, λ)- and(�, ν)-submatrices of �ST are non-singular. Since the (μ, λ)-submatrix wasshown to be non-singular in the proof of Theorem 4.1, we may focus on the(�, ν)-submatrix. As already stated in the proof of Theorem 4.1, �gk,ν ;�� isinvertible, so that the (�, ν)-submatrix is non-singular iff �STgk,ν ;νν.� := �STgk,ν ;νν −�STgk,ν ;ν��−1

gk,ν ;���STgk,ν ;�ν is non-singular, that is, iff the latter is non-zero.

Consider the random variable Y := dSTgk,ν ;ν(X) − �STgk,ν ;ν��−1gk,ν ;��dgk,ν ;�(X),

where X has pdf f STμ,�,0,ν . Clearly, the expectation and variance of Y are equalto 0 and �STgk,ν ;νν.� , respectively. If �STgk,ν ;νν.� = 0, we must therefore have that

dSTgk,ν ;ν(x) − �STgk,ν ;ν��−1gk,ν ;��dgk,ν ;�(x) = 0 a.e. over R

k .

This, however, cannot hold, since replacing, in dgk,ν ;�(x) and dSTgk,ν ;ν(x),�−1/2(x−

μ) with s�−1/2(x − μ) (s ∈ R) and letting s go to in nity indeed reveals thatdgk,ν ;�(x), unlike dSTgk,ν ;ν(x), is a bounded function of s. Therefore, we musthave that �STgk,ν ;νν.� > 0, which establishes the result.

248 CHRISTOPHE LEY – DAVY PAINDAVEINE

Theorems 4.1 and 4.2 prove that the conjecture of Azzalini and Gen-ton (2008) is correct. Finally, note that the proof of Theorem 4.2 did notrequire an exact determination of the Fisher information matrix �ST, which isknown to be dif cult (see Azzalini and Capitanio 2003).

5. Implications for inference

We have shown in the previous sections that, for the class of generalizedskew-elliptical densities of type (2.3) and for the class of multivariate skew-t distributions, singular Fisher information matrices and systematic stationarypoints in the pro le log-likelihood function for skewness, both in the vicinity ofsymmetry, only occur for skew-(multi)normal distributions (here, we of courseconsider that the skew-normal distributions are embedded in the class of skew-tdistributions, that is, we allow the number of degrees of freedom to be in nitein that class). We now provide a short overview of the implications of ourndings on inference.Azzalini (1985) proposed an alternative parameterization, the so-called cen-

tred parameterization, in order to get rid of the singularities of the univariateskew-normal model. Arellano-Valle and Azzalini (2008) adopted the same ap-proach for the multivariate skew-normal densities. Our results strongly alleviatethe necessity of such a rather cumbersome reparameterization for broad classesof skewed densities which do not contain the skew-normal ones, which appearsto be a welcomed feature (see Azzalini and Genton 2008, Section 1.2). More-over, for distributions other than the skew-normal, the more regular behaviorof the pro le log-likelihood function for skewness should remove estimationproblems encountered in the neighborhood of λ = 0. Non-singular Fisher infor-mation matrices also allow for using the standard asymptotic theory of MLEs,hence avoid further mathematical complications.

As for hypothesis testing, the singularities related to the skew-normal den-sities imply that the Le Cam optimal tests for symmetry about an unspeci-ed centre against skew-normal alternatives coincide with the trivial test, thatis, the test discarding the observations and rejecting the null of symmetry atlevel α whenever an auxiliary Bernoulli variable with parameter α takes valueone. It can be shown that non-singular information matrices protect from such“worst” Le Cam optimal tests for symmetry (see Ley and Paindaveine 2010for more details); our ndings therefore encourage the construction of Le Camoptimal tests for symmetry against non-Gaussian generalized skew-ellipticalalternatives.

Further inferential aspects related to singular information matrices can befound in Rotnitzky et al. (2000) or Bottai (2003).

Singularities in skew-elliptical models 249

REFERENCES

Arellano-Valle, R. B. and Azzalini, A. (2008) The centred parametrization for the multivariateskew-normal distribution, Journal of Multivariate Analysis, 99, 1362–1382.

Azzalini, A. (1985) A class of distributions which includes the normal ones, Scandinavian Journalof Statistics, 12, 171–178.

Azzalini, A. (2005)The skew-normal distribution and relatedmultivariate families (with discussion),Scandinavian Journal of Statistics, 32, 159–188.

Azzalini, A. and Capitanio, A. (1999) Statistical applications of the multivariate skew-normaldistributions, Journal of the Royal Statistical Society B, 61, 579–602.

Azzalini, A. and Capitanio, A. (2003) Distributions generated by perturbation of symmetry withemphasis on a multivariate skew-t distribution, Journal of the Royal Statistical Society B, 65,367–389.

Azzalini, A. and Dalla Valle, A. (1996) The multivariate skew-normal distribution, Biometrika,83, 715–726.

Azzalini, A. and Genton, M. G. (2007) On Gauss’s characterization of the normal distribution,Bernoulli, 13, 169–174.

Azzalini, A. and Genton, M. G. (2008)Robust likelihoodmethods based on the skew-t and relateddistributions, International Statistical Review, 76, 106–129.

Barndorff-Nielsen, O. E. and Cox, D. R. (1994) Inference and asymptotics, Chapman and Hall,London.

Bottai, M. (2003) Con dence regions when the Fisher information is zero, Biometrika, 90, 73–84.

Branco, M. D. and Dey, D. K. (2001) A general class of multivariate skew-elliptical distributions,Journal of Multivariate Analysis, 79, 99–113.

Chiogna, M. (2005) A note on the asymptotic distribution of the maximum likelihood estimator forthe scalar skew-normal distribution, Statistical Methods and Applications, 14, 331–341.

DiCiccio, T. J. and Monti, A. C. (2004) Inferential aspects of the skew exponential power distri-bution, Journal of the American Statistical Association, 99, 439–450.

DiCiccio, T. J. and Monti, A. C. (2011) Inferential aspects of the skew t-distribution, Manuscriptin preparation.

Genton, M. G. and Loperfido, N. (2005) Generalized skew-elliptical distributions and theirquadratic forms, Annals of the Institute of Statistical Mathematics, 57, 389–401.

Hallin, M. and Paindaveine, D. (2006) Semiparametrically ef cient rank-based inference forshape. I. Optimal rank-based tests for sphericity, Annals of Statistics, 34, 2707–2756.

Gomez, H. W., Venegas, O. and Bolfarine, H. (2007) Skew-symmetric distributions generatedby the distribution function of the normal distribution, Environmetrics, 18, 395–407.

Ley, C. and Paindaveine, D. (2010) On the Singularity of Multivariate Skew-Symmetric Models,Journal of Multivariate Analysis, 101, 1434–1444.

Paindaveine, D. (2008) A canonical de nition of shape, Statistics and Probability Letters, 78,2240–2247.

Patefield, W. M. (1977) On the maximized likelihood function, Sankhya Series B, 39, 92–96.

250 CHRISTOPHE LEY – DAVY PAINDAVEINE

Pewsey, A. (2000) Problems of inference for Azzalini’s skew-normal distribution, Journal of AppliedStatistics, 27, 859–870.

Rotnitzky, A., Cox, D. R., Bottai, M. and Roberts, J. (2000) Likelihood-based inference withsingular information matrix, Bernoulli, 6, 243–284.

Wang, J., Boyer, J. and Genton, M. G. (2004) A skew-symmetric representation of multivariatedistribution, Statistica Sinica, 14, 1259–1270.

CHRISTOPHE LEYE.C.A.R.E.S.Institut de Recherche en StatistiqueDepartement de MathematiqueUniversite Libre de BruxellesBrussels (Belgium)[email protected]

DAVY PAINDAVEINEE.C.A.R.E.S.Institut de Recherche en StatistiqueDepartement de MathematiqueUniversite Libre de BruxellesBrussels (Belgium)[email protected]