Multidimensional multiple group IRT models with skew ... · Item response theory (IRT) models are...

Multidimensional multiple group IRT models with skew

normal latent trait distributions

Juan L. Padillaa, Caio L.N. Azevedoa,∗, Victor H. Lachosa

aA Department of Statistics, University of Campinas, Brazil

Abstract

Item response theory (IRT) models are one of the most important statisticaltools for psychometric data analysis. Their applicability goes from educa-tional assessment to biological essays. The IRT models combine, at least, twosets of unknown quantities: the latent traits (person parameters) and itemparameters (related to measurement instruments of interest). The multidi-mensional item response theory (MIRT) models are quite useful to analyzedata sets involving multiple skills or latent traits, which occurs in many ofthe applications. However, most of the works in the literature consider theusual assumption of multivariate (symmetric) normal distribution to the la-tent traits and do not deal with the multiple group framework (few groupswith many of subjects in each one). They, in general, consider a limitednumber of model fit assessment tools, and do not investigate the measure-ment instrument dimensionality in a detailed way, while also dealing with themodel nonidentifiability in a different way than that we presented here andonly for one group model. In this work, we propose a MIRT multiple groupmodel with multivariate skew normal distributions for modeling the latenttraits of each group under the centered parameterization, presenting simpleand feasible conditions for model identification. A full Bayesian approachfor parameter estimation, structural selection (model comparison and deter-mination of the dimensionality of the measurement instrument) and modelfit assessment are developed through Markov Chain Monte Carlo (MCMC)algorithms. The developed tools are illustrated through the analysis of areal data set related to the first stage of the University of Campinas 2013admission exam.

∗Corresponding authorEmail address: [email protected] (Caio L.N. Azevedo)

Preprint submitted to Journal of Multivariate Analysis May 30, 2017

Keywords: Item response theory, multidimensional models, multivariateskew normal distribution, centered parameterization, Bayesian inference,MCMC algorithms, Model fit assessment.

1. Introduction

Item response theory (IRT) models are one of the most important psy-chometric tools for data analysis. Their applicability goes from educationalassessment to biological essays. The IRT models combine, at least, two sets ofunknown quantities: the latent traits (person parameters) and item parame-ters (related to measurement instruments of interest, that is, a cognitive test,genetic experiments or a psychiatric questionnaire, among others examples).The multidimensional item response theory (MIRT) models are quite usefulto analyze data sets involving multiple skills or latent traits, which occursin many of the applications. However, most of the works in the literatureconsider the usual assumption of multivariate (symmetric) normal distribu-tion to the latent traits and do not deal with the multiple group framework(few groups with many subjects in each one). They also both do and do notconsider a limited number of model fit assessment tools, and do not inves-tigate the measurement instrument dimensionality in a detailed way, whilealso dealing with the model nonidentiability in a different way than that wepresented here and only for one group model. Particularly, the assumptionof multivariate (symmetric) normality is also considered for unidimensionalmodels, with some underlying correlation structure for the latent traits, asin Andrade & Tavares (2005) and Azevedo et al. (2016), but not for MIRTmodels. Particularly, the above issues were not simultaneously considered inany work of the literature, to the best of our knowledge.

Bayesian inference, model identification and model fit assessment/modelcomparison tools are discussed in Beguin & Glas (2001) and Bolt & Lall(2003). A new MCMC algorithm is proposed in Fu et al. (2009). Sometools for testing dimensionality are discussed in Levy et al. (2009), Beguin& Glas (2001) and Bartolucci (2007). Torre & Patz (2005) discuss the gainin latent trait estimation when the underlying correlation structure of thelatent traits is taken into account. In de Jong & Steenkamp (2003), a gradedmultilevel finite mixture MIRT model is presented with group-specific normalmultivariate (symmetric) distributions along with developments about modelidentification, model fit assessment and model comparison through Bayesian

2

inference. Sheng & Wikle (2008) presented a hierarchical one group MIRTmodel built from unidimensional models, along with developments aboutmodel identification, model fit assessment and model comparison throughBayesian inference. The work of Beguin & Glas (2001) presents a multiple-group MIRT model but all developments and applications are made for onegroup MIRT model, but in a different way that we consider in this work. Onlythe work of de Jong & Steenkamp (2003) deals with a multiple group MIRTmodel. More discussions about MIRT models can be found in Reckase (2009).To the best of our knowledge, none of the works in the literature considera latent trait distribution different from the multivariate normal neither allthe mentioned issues simultaneously.

In this work, we propose a new MIRT multiple group model with groupspecific multivariate skew normal distributions for the latent traits. We con-sider a slightly different version of the skew multivariate distribution underthe centered parameterization developed by Arellano-Valle & Azzalini (2008).That is, a new parameterization of the multivariate skew normal distribu-tion is introduced. Also, we explore two types of covariance matrix for thelatent traits: a diagonal matrix and a non-structured covariance matrix. Wepresent simple conditions for model identification, allowing to have eithernon-correlated or correlated latent factors and without imposing or impos-ing only few restrictions to the item parameters (which can be data and/orexperimental design-driven). A full Bayesian approach for parameter esti-mation, structural selection (model comparison and the determination of thedimensionality of the measurement instrument) and model fit assessment isdeveloped through MCMC algorithms. Approaches for comparison and as-sessment of the test dimensionality are proposed based on Bayesian measuresof model complexity as in Spiegelhalter et al. (2002) and posterior predictivechecking (see Levy et al., 2009). Also, mechanisms for measuring the global,per group and per item model fit assessment, as in Azevedo et al. (2012), aredeveloped. The developed tools are illustrated through the analysis of a realdata set related to 2013 first stage of the University of Campinas entranceexam.

This paper is outlined as follows. In Section 2, we present a new pa-rameterization of the multivariate skew normal distribution, introduce themultiple group skew MIRT model and the model identifiability is discussed.In Section 3, the prior and posterior distributions are presented, the MCMCalgorithm is given and the model fit assessment and model comparison toolsare discussed. In Section 4, the analysis of a real data set is presented.

3

Finally, in Section 5, some additional comments are presented.

2. The Model

In this section, we present a new model which is a usual multidimen-sional three-parameter probit model (see Reckase, 2009) with the assumptionof multivariate skew normal distribution under a new centered parameteri-zation for the latent traits. First, we introduce this new parameterizationand afterwards we propose a new MIRT model. Some useful and necessarynotations are also introduced. Finally, the identification issues are discussed.

2.1. A new parameterization of the multivariate skew normal distributions

Similarly to the dichotomous unidimensional IRT (UIRT) models, forthe dichotomous MIRT models, is necessary to fix the mean (vector) and thevariance (covariance matrix) of the latent traits distribution and/or to imposesome restrictions to the discrimination and difficulty parameters (see Reckase,2009; Beguin & Glas, 2001; Azevedo et al., 2011). In the usual skew univariatenormal distribution, the mean and variance can not be fixed without fixingthe value of the asymmetry parameter, Azzalini (1985). Therefore, Azevedoet al. (2011) considered the centered version of the skew normal distributionto define a skew unidimensional IRT model, see also Azzalini (1985) andPewsey (2000). For our model, we need to impose restrictions in the meanvector and covariance matrix of the latent traits distributions and, dependingon the covariance structure, to the discrimination parameters (more detailsabout the identification aspects are discussed in Subsection 2.3).

To the best of our knowledge, none of the parameterizations of the multi-variate skew normal distributions provides mean vector and covariance matri-ces, simultaneously, free from the asymmetry parameters (see Genton, 2004)except those proposed by Arellano-Valle & Azzalini (2008). However, we willconsider a slightly different version of this multivariate skew normal underthe centered parameterization (MSNCP), since in our parameterization therespective density can be easily obtained, an useful stochastic representationcan be always defined and there are no restrictions in the parameter space.Also, for the unidimensional case, the MSNCP distribution considered herebecomes that proposed by Azzalini (1985), which was used by Azevedo etal. (2011) and Santos et al. (2013), within the IRT context. Therefore, ourmodel is a generalization of those proposed by Santos et al. (2013), Azevedoet al. (2011) and Azevedo et al. (2012). The MSNCP proposed in this work is

4

based on that presented by Lachos (2004), while the MSNCP distribution ofArellano-Valle & Azzalini (2008) is developed from the distribution proposedby Azzalini & Capitanio (1999).

First, let us present the definition of the multivariate skew normal distri-bution considered in Lachos (2004), henceforth denoted by MSN (multivari-ate skew normal) distribution.

Definition 1. A D-dimensional random vector, say, Z has a MSN distri-bution if its density is given by:

pZ(z|µ,Σ,λ) = 2φD(z;µ; Σ)Φ(λ>Σ−1/2(z − µ))11IRD(z), (1)

where φD(·;µ,Σ) is the density of a D-dimensional normal distribution withmean vector µ and covariance matrix Σ, Φ is the cdf of the standard (sym-metric) normal distribution, µ is the location parameter, Σ is the dispersionmatrix and λ is the vector of the asymmetry parameters. Let us denote byZ ∼ SND(µ,Σ,λ) a random vector that follows a d-dimensional MSN distri-

bution. Notice that, if µ = 0 and Σ = ID, we have that E(Z) = µZ =√

2πδ

and Cov(Z) = ΣZ = ID−µZµ>Z , where ID stands for a D-dimensional iden-tity matrix and δ = λ√

1+λ>λ. Therefore, this parameterization is not useful

to build our MIRT model, since it does not allow to determinate the latenttrait scale without fixing the asymmetry parameter (see also Azevedo et al.,2011; Santos et al., 2013). In addition, since we want to let the data indicatethe entire behavior of the latent traits distributions, this parameterizationis not useful, even fixing the location mean and the dispersion matrix. Forfurther details about the MSN distributions, see Genton (2004) and Lachos(2004).

Now let us introduce our centered parametrization for the MSN distribu-tion. The idea is similar to the unidimensional case (see Azevedo et al., 2012;Santos et al., 2013; Azzalini, 1985). First, we consider a random vector suchthat Z ∼ SND(0, ID,λ), and then, we define the following transformation:

θ = µθ + Ψ1/2>θ

[ΣZ

1/2>]−1

(Z − µZ) , (2)

where (.)1/2 stands for the Cholesky decomposition, µθ is the mean vectorand Ψθ is the covariance matrix. Let us define θ ∼ SNCPD (µθ,Ψθ, δθ),where SNCPD represents a D-variate skew normal distribution under thecentered parameterization and δθ ≡ δ. Therefore, in our parameterization,

5

the mean vector and covariance matrix are directly defined and the identifi-cation restrictions can be easily considered. In addition, these quantities canbe directly estimated. Finally, if D = 1, we have the distribution defined inAzevedo et al. (2012) and Santos et al. (2013). Also, we have:

µθ =

µθ1µθ2...

µθD

,Ψθ =

ψθ1 ψθ12 . . . ψθ1Dψθ12 ψθ2 . . . ψθ2D

......

. . ....

ψθ1D ψθ2D . . . ψθD

and δθ =

δθ1δθ2...δθD

. (3)

Another way to write the expression (2) is

θ = αθ + Σ>θZ, (4)

where αθ = µθ −Ψ1/2>θ

(Σ

1/2>Z

)−1µZ and Σt

θ = Ψ1/2>θ

(Σ

1/2>Z

)−1. There-

fore, by using some properties of the MSN distribution, see Lachos (2004),we may conclude that θ ∼ SND

(αθ,Σ

>θΣθ,λ

).

Theorem 2. From equations (1) to (4) and from some properties of the MSNdistribution, see Lachos (2004), it is possible to conclude that the density ofthe MSNCP is given by

p(θ|ηθ) = 2φD(θ;αθ,Σ

>θΣθ

)Φ{λ>θΣ>θ (θ −αθ)

}11IRD(θ), (5)

where ηθ = (µθ, vech (Ψθ) , δθ)>, vech extracts all elements in and below the

main diagonal.The following lemma regarding the distribution of an affine transforma-

tion of the MSN distribution will be quiet useful for finding the marginaldistributions of a vector distributed as a MSNCP. Its proof can be found inLachos (2004).

Lemma 3. Let θ ∼ SND (µθ,Ψθ, δθ), and C a (D × k) full rank ma-

trix. Then C>θ ∼ SNk

(C>µθ,C

>ΨθC, δ∗), where δ∗ =

(C>Σ>θΣθC

)− 12>

C>(Σ>θΣθ

) 12>δθ.

Theorem 4. Let us consider two sets of random variables (θ1, . . . , θq) and

(θq+1, . . . , θD) forming the vectors θ(1) = (θ1, . . . , θq)> and θ(2) = (θq+1, . . . , θD)>.

These variables form the random vector

6

θ =[θ(1),θ(2)

]>= (θ1, . . . , θq, θq+1, . . . , θD)> . (6)

Now let us assume that the D variables have a joint skew normal distri-bution under the centered parameterization with null mean vector, identity

covariance matrix and skewness coefficient δθ =[δ(1)θ , δ

(2)θ

]>. Then the dis-

tribution of the partitioned vector θ(i), for i = 1, 2 and such that D1+D2 = D,

is SNCPD(i)

(0, ID(i)

, δ(i)θ

). With D(i) being the rank of the vector θ(i).

Proof. In order to prove this result, recall the equivalence between the MSNand MSNCP distributions. That is, if θ ∼ SNCPD (µθ,Ψθ, δ) then θ ∼SND

(αθ,Σ

>θΣθ,λ

)where (αθ,Σθ,λ) were already defined. Taking C>

from Lemma 3 to be the matrix such that it chooses from θ the elements toform the partition i we arrive to the desired conclusion.

Figure 1 presents the contour plots for a bivariate MSNCP distribution(that is, D = 2) for different values of the correlation between θ1 and θ2(Corre(θ1, θ2)), and δθ with µ = (0, 0)> and ψθ11 = ψθ22 = 1. It can be seenthat the contours do not depict an elliptic behaviour, except for the cases inwhich the skewness coefficient is the same for both components.

An important result is the stochastic representation of density (5), whichcan be deduced from that presented in Lachos (2004) for a random vectorwith MSN distribution. That is, if θ ∼ SNCPD (µθ,Ψθ, δθ), then

θ|(T = t) ∼ ND

(αθ + Σ>θ δθt,Σ

>θ

(ID − δθδ>θ

)Σθ

), (7)

where αθ and Σθ are as defined before, T ∼ HN(0, 1) and ND(µ,Ψ) standsfor aD-variate normal distribution with mean vector µ and covariance matrixΨ. More details about the MSNCP can be found in Padilla (2014). Next, wepresent our multiple group MIRT model and the related identification issues.

2.2. A new MIRT multiple group model with skew normal latent traits dis-tributions under the centered parameterization

One or more different tests are administered to the (randomly selected)subjects of each group. The tests have common items and the structurecan be recognized as an incomplete block design (see Montgomery, 2004).We will assume that each group has a reasonable number of subjects. In

7

δ=(−0.7, 0.7), Corre(θ1,θ2) = 0

−3 −2 −1 0 1 2 3

−2

−1

01

2

δ=(−0.95, 0), Corre(θ1,θ2) = 0

−3 −2 −1 0 1 2 3−

2−

10

12

δ=(0.5, 0.5), Corre(θ1,θ2) = 0.6

−3 −2 −1 0 1 2 3

−2

−1

01

2

δ=(0, 0.95), Corre(θ1,θ2) = 0.6

−3 −2 −1 0 1 2 3

−2

−1

01

2

δ=(0, −0.95), Corre(θ1,θ2) = 0

−3 −2 −1 0 1 2 3

−2

−1

01

2

δ=(−0.5, −0.5), Corre(θ1,θ2) = 0.6

−3 −2 −1 0 1 2 3

−2

−1

01

2

Figure 1: Contour plots of the bivariate MSNCP for different values of Corre(θ1, θ2) andδ.

summary, we are dealing with a set of n subjects clustered in K groups, withnk subjects in group k, and n =

∑Kk=1 nk. The subjects of each group k

answer Ik items, and∑K

k=1 Ik < I, where I is the total number of items.The following notation will be introduced: θdjk is the latent trait of sub-

ject j (j = 1, . . . , nk,) belonging to group k (k = 1, . . . , K), related to thedimension d , (d = 1, ..., D), θ.jk = (θ1jk, . . . , θDjK)> is the vector of thelatent traits of subject j of group k, θ..k = (θ.1k, ...,θ.nkk) is the vector ofall latent traits of the subjects of group k and θ... = (θ..1, . . . ,θ..K)> is thevector with all latent traits; Yijk is the response of the subject j of groupk to item i (i = 1, . . . , Ik), Y .jk = (Y1jk, . . . , YIkjk)

> is the response vector

8

of subject j of group k, Y ..k = (Y >.1k, . . . ,Y>.nkk

)> is the response vector of

all subjects of group k, Y ... = (Y >.1., . . . ,Y>.nk.

)> is the whole response setand yijk,y.jk,y..k,y... are the respective observed values; ζi is the vector of

parameters of the item i, ζ = (ζ>1 , . . . , ζ>I )> is the whole set of item pa-

rameters, ηθk is the vector with the population parameters of group k andηθ = (η>θ1 , . . . ,η

>θK

)> is the whole set of population parameters.The MIRT multiple group model with multivariate skew normal distri-

bution under the centered parameterization (MSNCP) is given by

Yijk | (θ.jk, ζi) ∼ Bernoulli(Pijk),

Pijk = P (Yijk = 1 | θ.jk, ζi) = ci + (1− ci)Φ(a>i θ.jk − bi

),

= ci + (1− ci)Φ

(D∑d=1

aidθdjk − bi

),

θ.jk|ηθk ∼ SNCPD(µθk ,Ψθk , δθk

),

where µθk , Ψθk and δθk are as in (3), considering the index k, ζi = (a>i , bi, ci)>,

ai = (ai1, ..., aiD)>, ηθk = (µθk , vech(Ψθk), δθk). For more details concerningthe interpretation of item parameters and for the so-called multidimensionalitem parameters, the reader is referred to Reckase (2009). Notice that forK = 1, we also have a new model, that is, a one group MIRT model with askew normal multivariate distribution under the centered parameterization.As mentioned before, two structures for the covariance matrix will be consid-ered in this work. In the first case, we assume that the covariance matrix ofthe reference group is an identity matrix, that is, Ψθ1 = ID, while the othermatrices are diagonal, that is, Ψθk = diag (ψθk1 , ..., ψθkD) , k = 2, ..., K. Inthe second case, the covariance matrix of the reference group is assumed tobe a correlation one, whereas for the other groups, a usual covariance matrixis considered.

2.3. Model identification

Similarly to the usual multiple group model (MGM), it is necessary to es-tablish a reference group, for example, the first. To accomplish that, we canfix the mean vector and the covariance matrix of the first group in some spe-cific values and/or to impose restrictions to the difficulty and discriminationparameters.

9

In this work, we consider two scenarios of interest, concerning the covari-ance structure of the latent traits, as mentioned before, and each situationmust be treated in a different way, in terms of model identification, as wewill show ahead.

For the class of MIRT models, two conditions must hold, in order to ensurethe model identification, namely: invariance against linear transformations(IALT) and invariance against rotations (IAR) (see Rivers, 2003; Matos,2008). For MIRT models, these aspects are related to the linear predictor,that is, to

∑Dd=1 aidθdjk−bi = a>i θ.jk−bi. Being A a non-singular real matrix

and β a real vector, we have:

a>i θ.jk − bi = a>i (θ.jk − β + β)− bi

=D∑d=1

aidθ∗djk − b∗i = a>i θ

∗.jk − b∗i , (8)

where θ∗.jk = θ.jk +β and b∗i = a>i β+ bi. The second type of transformation,that is, the IAR, is related, being A a orthogonal matrix, to

a>i θ.jk − bi = aiAA>θ.jk − bi

=D∑d=1

a∗idθ∗djk − bi = (a∗i )

> θ∗.jk − bi, (9)

where θ∗.jk = Aθ.jk and a∗i = Aai. Notice that, combining the two sets oftransformations and due to some proprieties of the multivariate skew normaldistribution, see Lachos (2004), we have that

θ∗.jk|ηθk ∼ SNCPD(β +Aµθk ,AΨθkA>, δθk). (10)

Therefore, these sort of transformations change the mean vector and thecovariance matrix, but does not affect the vector of the asymmetry parame-ters of the latent trait distribution. Then, the idea to identify the model, isto restrict the mean vector and the covariance matrix of the reference group(similarly to the unidimensional MGM) and/or to restrict some (or all) itemparameters belonging to the test applied to the reference group, in such waythat transformations as in (8) and (9) are no longer possible. Therefore, by

10

combining these restrictions with the linking design (a structure of commonitems among the tests) the model will be identified, see Bock & Zimowski(1997) and Santos et al. (2013) for further details. When D = 1, that is, forthe one group MIRT model, the restrictions are similar see Azevedo et al.(2011) but, in this case, the reference group is the unique group.

To solve the problem of IALT, it suffices to fix the mean vector of thelatent trait distribution of the reference group, whatever the selected struc-ture of the covariance matrix, as we can see from (10). On the other hand,the approach for solving the IAR depends on the structure adopted for thecovariance matrix.

Diagonal covariance matrix: uncorrelated factorsIn this case, as explained in Beguin & Glas (2001), some additional re-

strictions on the item discrimination parameters are necessary. Essentially,since any orthogonal transformation, as in (9) is feasible. For example, wecan impose that some items load in specific latent traits dimensions and/orthat they are positively or negatively related to some specific dimensions.This choice depends on the situation. For example, in an educational assess-ment, it is reasonable to expect that each item either loads in some specificdimensions (or even in all of them) and/or they are positive correlated tosome (or all) latent traits (since it is not expected that having high valuesin any latent trait will decrease the probability of correct response). On theother hand, in psychiatric studies, the items of a questionnaire (measurementinstrument) can be grouped in such a way that they are related to specificsymptoms. Therefore, each item will load in only one specific dimensionwhich, in its turn, is related to some specific symptoms. Another situationcan be where a specific symptom prevents the presence of another symptom.Therefore, in this case, it is expected that the higher the latent trait in theformer symptom, smaller the probability of manifesting the latter. In thiscase, some discrimination parameters must be positive and others negative.In conclusion, by assuming a diagonal covariance matrix for the groups, someparticular interactions between item and latent traits need to be considered.However, in general, this is not a difficult task and can be drawn from thedata and/or from the experiment and/or from the specialist. In conclusion,if we fix some discrimination parameters to zero for some items or fix thesignal for some of them (related to the reference group) the model is identi-fied. Naturally, if we consider these two sets of restrictions, simultaneously,the model is also identified.

11

Full covariance matrix: correlated factorsRecall that, is this the case, the covariance matrix of the reference group

is, in fact, a correlation matrix. In this case, orthogonal transformations as in(9) related to the covariance matrix of the reference group, are not possible.As a result, the model is identified, provided that all correlations are differentfrom zero, regardless any restrictions imposed on the item parameters. If atleast one is equals to zero, the model is no longer identified, and we have asimilar pattern to that of the previous section. We shall prove this result inthe sequence.

Proposition 5. Let us suppose an orthogonal matrix, say, R, different fromthe identity or a permutation matrix, and be Γ a correlation matrix. There-fore, the product Γ∗ = RΓR> is such that at least one element of its maindiagonal is different from 1. That is, the matrix Γ∗, is not a correlationmatrix.

Proof. We seek to prove that, for any orthogonal R, different from the iden-tity or a permutation matrix, and any correlation matrix, Γ, the matrixΓ∗ = RΓR> will be no longer a correlation matrix. We will present theproof for the cases D = 2 and D = 3. For the other cases, the proof isstraightforward. However, notice that, in general, it is usual to consider, atmost, a five dimensional model, that is, D = 5. To prove that Γ∗ is not acorrelation matrix, it suffices to prove that it has at least one element in itsmain diagonal different from one.

• 2 × 2 matrices. We have that:

RΓR> =

(r11 r12r21 r22

)(1 γγ 1

)(r11 r21r12 r22

)=

(r11 r12r21 r22

)((1 00 1

)+

(0 γγ 0

))(r11 r21r12 r22

)=

(1 00 1

)+

(r12γ r11γr22γ r21γ

)(r11 r21r12 r22

)=

(1 00 1

)+

(2r11r12γ r12r21γ + r11r22γ

r12r21γ + r11r22γ 2r11r12γ

).

12

Then, we want to prove that there are not real numbers r11, r12 and γ,such that

2γr12r11 = 0;

2γr21r22 = 0,(11)

under the restrictions:

r211 + r212 = 1;

r221 + r222 = 1;

r11r21 + r12r22 = 0,

which are valid since since R is an orthogonal matrix. Let us assumethat there are real numbers r11, r12 and γ such that (11) holds. How-ever, this is only possible if γ = 0, which leads to the matrix Γ beingan identity matrix, which, in its turn, clearly violates one of the as-sumptions of the theorem; or if at least two elements of the matrix Rare equal to zero, which also violates one of the assumptions of thetheorem, since that R would be either an identity matrix or a permu-tation matrix. Therefore, there are no real numbers r11, r12 and γ suchthat (11) holds, which implies that Γ∗ can not be a correlation matrix.It is worthwhile to mention that the identity matrix does not changethe covariance matrix and the permutation matrix can only permutatethe dimension positions in the covariance matrix, see Equation (10).Therefore, these two cases are not relevant for the model identifica-tion. Also, the results obtained in the simulation studies (see Padilla,2014) indicates that the model is identified, under the aforementionedrestrictions.

• For 3 × 3 matrices, we have:

R =

r11 r12 r13r21 r22 r23r31 r32 r33

, Γ =

1 γ1 γ2γ1 1 γ3γ2 γ3 1

,

the elements of the main diagonal, RΓR> will be:

1 + r12r11γ1 + r11r13γ2 + r12r13γ3;

1 + r21r22γ1 + r21r23γ2 + r23r22γ3;

1 + r31r32γ1 + r31r33γ2 + r32r33γ3.

(12)

13

We seek to prove that, provided R is a orthogonal matrix and γi 6= 0,∀i = 1, 2, 3, there are not real numbers rij, ∀i, j = 1, 2, 3 and γi, ∀i =1, 2, 3, such that

r12r11γ1 + r11r13γ2 + r12r13γ3 = 0;

r21r22γ1 + r21r23γ2 + r23r22γ3 = 0;

r31r32γ1 + r31r33γ2 + r32r33γ3 = 0,

(13)

holds. However, similarly to the 2 × 2 matrices, the orthogonal matrices Rthat satisfy (13), are the permutation and the identity matrices, which wouldbe a contradiction. Therefore, the result follows.

Then, in this case, that is, when the covariance matrix of the referencegroup is a correlation matrix with non zero off diagonal elements, it is notnecessary to impose restrictions on the discrimination parameters.

In conclusion, the structure assumed for the latent traits distribution is:

θ.j1|ηθ1 ∼ SNCPD (0,Ψθ1 , δθ1) ,

θ.jk|ηθk ∼ SNCPD(µθk ,Ψθk , δθk

), k = 2, ..., K.

Two situations, as mentioned before, are considered for the covariancematrices of the latent traits: 1) Ψθ1 = ID; Ψθk = diag (ψθk11 , ..., ψθkDD) , k =2, ..., K and 2) for k = 1, ..., K.

Ψθ1 =

1 ψθ112 . . . ψθ11D

ψθ112 1 . . . ψθ12D...

.... . .

...ψθ11D ψθ12D . . . 1

; Ψθk =

ψθk1 ψθk12 . . . ψθk1Dψθk12 ψθk2 . . . ψθk2D

......

. . ....

ψθk1D ψθk2D . . . ψθkD

.While in the situation 1) it is necessary to impose additional restrictions in

the discrimination parameters of some items, besides the restrictions imposedon the latent traits distribution of the reference group, in the situation 2) nofurther restrictions are necessary.

14

3. Bayesian inference and Gibbs sampling algorithm

Despite the prior distributions adopted, the structure for the covariancematrices and the likelihood considered (original or augmented), the marginalposterior distributions of interest are not analytically obtainable. The useof MCMC algorithms, however, enables one to obtain numerical approxi-mations. Some MCMC algorithms were compared in Padilla et al. (2017),according to Effective Sample Size criterion, see Sahu (2002). In this work,we consider the selected algorithm by that work, which corresponds to theaugmented data scheme proposed by Sahu (2002), combined with the con-vergence acceleration algorithm proposed by Gonzalez (2004). More detailsof this algorithm can be found in Appendix 8.

3.1. Augmented likelihood and prior and posterior distributions

The initial step, following Sahu (2002), is to define two sets of augmentedvariables, say Z ... = (Z111, ..., ZIknkK)> and U ... = (U111, ..., UIknkK)>, such

that: Zijki.i.d.∼ N(a>i θ.jk − bi, 1)⊥Uijk

i.i.d.∼ Bernoulli(ci),∀i, j, k, here i.i.d.means that the variables are independent and identically distributed.

To handle incomplete block designs, an indicator variable is defined thatdefines the set of administered items for each occasion and subject. Thisindicator variable is defined as follows,

Iijk =

{1, item i administered for subject j of group k,0, missing by design.

The nonselective missing responses due to uncontrolled events are marked,as nonresponse or errors in recording data, by another indicator, which isdefined as,

Vijk =

{1, observed response of subject j of group k on item i,0, otherwise.

It is assumed that the missing data are missing at random (MAR), suchthat the distribution of patterns of missing data does not depend on the un-observed data. When the MAR assumption does not hold and the missingdata cannot be ignored, a missing data model can be defined to model ex-plicitly the pattern of missingness. In case of MAR, the observed data canbe used to make valid inferences about the model parameters.

15

To ease the notation, let the indicator matrix I = (I111, ..., IIKnkK)> rep-resent both cases of missing data (which can not be confounded with theidentity matrix I(.)).

Therefore, using the usual conditional independence assumptions, we havethat the augmented likelihood is given by:

p(z...,u...|θ..., ζ,y...) ∝K∏k=1

nk∏j=1

∏i|Iijk=1

{exp

{−0.5

(zijk − a>i θ.jk + bi

)2}× c

uijki (1− ci)1−uijk11(zijk,uijk,yijk)

}, (14)

where z... = (z111, ..., zIknkK)> and u... = (u111, ..., uIknkK)>. Here 11(zijk,uijk,yijk)stands for the indicator function representing the sample space defined inSahu (2002). That is, if yijk = 0, we have that uijk = 0 and zijk must be neg-ative, that is, Zijk ∼ N(a>i θ.jk − bi, 1)11(zijk<0). If yijk = 1 and, if uijk = 0,then Zijk ∼ N(a>i θ.jk − bi, 1)11(zijk≥0), otherwise Zijk ∼ N(a>i θ.jk − bi, 1).Once zijk has been sampled, we verify if it is negative if it is, then we simplyset uijk = 1. Otherwise uijk is drawn from a Bernoulli(ci).

The joint prior distribution of the parameters is given by: p(θ, ζ,η) =p(θ|ηθ)p(ηθ|ηη)p(ζ|ηζ)where ηη and ηζ are the hyperparameters associated with ηθ and ηζ, re-spectively.

The prior distribution of the latent traits will be considered through thestochastic representation given by (7), that is:

p(θ...|t..,ηθ) = p(θ...|t..,ηθ)p(t) =K∏k=1

nk∏j=1

p(θ.jk|tjk,ηθk)p(tjk)

∝K∏k=1

nk∏j=1

{exp

{−0.5

(θ.jk − µ∗θjk

)> (Ψ∗θk

)−1 (θ.jk − µ∗θjk

)}11IRD(θ.jk)

× exp

{−t2jk2

}11(0,∞)(tjk)

},

where t.. = (t11, ..., tnKK)>, µ∗θjk = αθk+Σθkδθktjk, Ψ∗θk = Σ>θk(I − δθkδ

>θk

)Σθk ,

αθk = µθk −Ψ1/2>θk

(Σ

1/2>Zk

)−1µZk and Σ>θk = Ψ

1/2>θk

(Σ

1/2>Zk

)−1. Then, the

16

joint prior distribution for (θ>..., t>.. , ζ

>,η>θ )>, assumed here, is

p(θ..., t.., ζ,ηθ) ∝ p(θ...|t..,ηθ)p(t..)p(ζ|ηζ)p(ηθ)

=K∏k=1

nk∏j=1

{p(θ.jk|tjk,ηθk)p(tjk)

} k∏k=1

p(ηθk)I∏i=1

{p(ζi|ηζ)

}=

K∏k=1

nk∏j=1

{p(θ.jk|tjk,ηθk)p(tjk)

} K∏k=1

{p(µθk)p(Ψθk)p(δθk)

}×

I∏i=1

{p(ai, bi)p(ci)} , (15)

where ηζ are the hyperparameters associated with the vector ζ. We as-

sume that µθki.i.d.∼ ND

(µµ,Ψµ

), Ψθk

i.i.d.∼ IW (τ,ΨΨ), k = 1, 2, ..., K, whereIW (τ,ΨΨ) stands for a Inverse-Wishart distribution with degrees of freedomκ and dispersion matrix ΨΨ. The priors for the asymmetry vectors are basedon the beta distribution so let δdθk denote the element on the dth positionin the vector δθk and δ(−d)θk denote the vector δθk removing the element onposition d, this way

p(δdθk |αδ1 , αδ2) ∝(√

1− δ>(−d)θkδ(−d)θk + δdθk

)αδ1−1×

×(√

1− δ>(−d)θkδ(−d)θk − δdθk)αδ2−1

11(δdθk∈Aδdk),

(16)

where (αδ1 , αδ2) is a set oh hyperparameters and Aδdk = (−1, 1), for d =1, ..., D and k = 1, ..., K. The prior chosen for the item parameters vector

ζi is ζi = (ai, bi)> i.i.d.∼ ND(µζi ,Ψζi)11Aai (ai)11(−∞,∞)(bi), where 11Aai (ai)

and 11(−∞,∞)(bi) are the indicator functions associated with the item param-eters ai and bi, respectively, and Aai is an appropriate set, for example,

Aai = IR+D or Aai = IRD, depending on the situation (see Subsection 2.3),

and cii.i.d.∼ beta(κ1, κ2), for i=1,..,I. Therefore, from (14) to (15), the joint

posterior distribution is given in equation (17).

17

p(z...,u...,θ..., ζ,ηθ|y...)

∝K∏k=1

nk∏j=1

∏i|Iijk=1

{exp

{−0.5

(zijk − a>i θ.jk − bi

)2}× cuijki (1− ci)1−uijk11(zijk,uijk,yijk)

}×

K∏k=1

nk∏j=1

{exp

{−0.5

(θ.jk − µ∗θjk

)>(Ψ∗θ)

−1(θ.jk − µ∗θjk

)}11IRD(θ.jk)

× exp

{−t2jk2

}11(0,∞)(tjk)

}×

K∏k=1

exp{−0.5

(µθk − µµ

)>Ψ−1µ

(µθk − µµ

)11IRD(µθk)

}×

K∏k=1

|Ψθk |−κ+D+1

2 exp

{−1

2tr(ΨΨΨ−1θk

)}

×I∏i=1

exp{−0.5 (ai − µa)>Ψ−1a (ai − µa)

}11Aai (ai)

×I∏i=1

exp

{−(bi − µb)2

2ψb

}11(−∞,∞)(bi)

×I∏i=1

cκ1−1i (1− ci)κ2−111(0,1)(ci).

(17)

This posterior distribution has an intractable form, and it is not pos-sible to obtain the marginal posterior distributions analytically. Some fullconditional distributions, however, are either known, and thus easily sampledfrom, or can be sampled, using an auxiliary algorithm, such as the Metropolis-Hastings, see Gamerman & Lopes (2006). Here; we use Metropolis-Hastingswhen the full conditional distribution is unknown. Therefore, our algorithmis a sort of a Metropolis-Hastings within Gibbs Sampling, as in Patz & Junker(1999). The technical details about the full posterior distributions and theMCMC steps can be found in Appendix 8. To develop our algorithm, we

18

need to define a kernel density for the parameters Ψθk and δθk, that is:

q(Ψ(t−1)θk

,Ψθk) ∼ IW(

2D; Ψ(t−1)θk

), q(δ

(t−1)θk

, δθk) ∼ U(g1(δ

(t−1)θk ), g2(δ

(t−1)θk )

),

where g1(δ(t−1)θk ) = max{−

√1− δ>(t−1)(−d)k δ

(t−1)(−d)k, δ

(t−1)dk − 0.01}, g2(δ(t−1)θk ) =

min{√

1− δ>(t−1)(−d)k δ(t−1)(−d)k, δ

(t−1)dk + 0.01} and δ(−d)k stands for the vector δθk

after removing the element on position d.Let (.) denote the set of all necessary parameters. The Metropolis-

Hastings within Gibbs sampling algorithm, where GS indicates that the fullconditional distribution is known and can be simulated directly and MHindicates that this distribution is not known and it is simulated by using theMetropolis-Hastings algorithm, is defined as follows:

1. Start the algorithm by choosing suitable initial values.

Repeat steps 2–11:

2. Simulate Uijk from Uijk | (.), i = 1, ..., Ik, j = 1, ..., nk, k = 1, ..., K(GS).

3. Simulate Zijk from Zijk | (.), i = 1, ..., Ik, j = 1, ..., nk, k = 1, ..., K(GS).

4. Simulate Tjk from Tjk | (.), j = 1, ..., nk, k = 1, ..., K(GS).

5. Simulate θ.jk from θ.jk | (.), j = 1, ..., nk, k = 1, ..., K(GS).

6. Simulate µθk from µθk | (.), k = 1, ..., K (GS).

7. Simulate Ψθk from Ψθk | (.), k = 1, ..., K (MH).

8. Simulate δθk from δθk | (.), k = 1, ..., K (MH).

9. Simulate (a>i , bi)>, from (a>i , bi)

> | (.), i =1,...,I(GS).

10. Use the convergence acceleration algorithm of Gonzalez (2004) to up-date the values (a>i , bi)

>, i =1,...,I(GS).

11. Simulate ci, from ci | (.), i =1,...,I(GS).

We can notice that the MH algorithm is only necessary to simulate fromηθk , k = 1, .., K (the population parameter). Simulation studies presented(not presented here) induced that the above MCMC algorithm convergeswith a burn-in of 5,000, a spacement of 50 and a total number of simulationsof 55,000. This produces a valid MCMC sample of size 1,000. In addition,in that work, many scenarios of interest, related to number of subjects, testsize, number of dimensions of the test, underlying latent trait distributionand number of groups were considered. In all scenarios, all parameters wereproperly recovered.

19

3.2. Model fit assessment and model comparison: posterior predictive check-ing and statistics of model comparison

Besides using model selection criteria for selecting the best model, as inAzevedo et al. (2012), in our case, concerning the test dimensionality, the la-tent trait distribution and the item response function (probit model), the fitof the general MIRT model can be evaluated using Bayesian posterior predic-tive tests and/or appropriate plots based on the observed and the replicateddata (see Sinharay et al., 2006). The literature about posterior predictivechecks for Bayesian item response models shows several diagnostics for eval-uating the model fit. A general discussion can be found in, among others,Stern & Sinharay (2005), Sinharay (2006), and Fox (2004, 2005, 2010). Ex-amples where these techniques were successfully applied are Beguin & Glas(2001), Sheng & Wikle (2007), Fragoso & Curi (2013), Azevedo et al. (2012),Santos et al. (2013), Azevedo et al. (2015) and Azevedo et al. (2011).

The usual posterior predictive tests and plots can be generalized to makethem applicable for the MIRT model. Each posterior predictive test is basedon a discrepancy measure, where this discrepancy measure is defined in sucha way that a specific assumption or the general fit of the model can be evalu-ated. The main idea is to generalize the well-known discrepancy measures toa multidimensional multiple group structure. On the other hand, the plots,in general, display a comparison between the predicted and observed data.These procedures can be done at the population level (general fit), as in Sin-haray et al. (2006), per group, as in Santos et al. (2013) and Azevedo et al.(2012), or per item, as in Sinharay (2006) and Azevedo et al. (2012a). Anexample where the model fit was considered for each one of these three levelscan be found in Santos et al. (2013).

In general, let yobs(.) be the matrix of observed responses, and yrep(.) thematrix of replicated responses generated from its posterior predicted dis-tribution, where (.) represents a convenient index, employed whenever itis necessary. The posterior predicted distribution of the response data ofgroup k is represented by p

(yrepk | yobsk

)=∫p (yrepk | ϑk) p

(ϑk | yobsk

)dϑk,

where ϑk denotes the set of model parameters related to group k. Fromy(.)k , y

(.)ilk is available, which is the response of the subjects belonging to

group k with a score l to item i. One approach commonly employed is toplot an appropriate comparison between the replicated and observed data.For example, the predicted and observed score distributions (at populationand/or at group level) and the predicted and observed proportions of correctanswer for each item. Another approach is, given a discrepancy measure

20

D(y(.),ϑ(.)

), to use the replicated data to evaluate whether the discrepancy

value given the observed data is typical under the model. A p-value canbe defined in order to quantify the extremeness of the observed discrepancy

value p0

(y(obs)(.)

)= P

(D(y(rep)(.) ,θ(.)

)≥ D

(y(obs)(.) ,θ(.)

)| y(obs)

(.)

), where the

probability is taken over the joint posterior of (y(rep)(.) ,θ(.)). The discrepancy

measure can be defined at the population, group or item level. Here, p-values, based on a chi-square distance, and predicted distributions of scoresare considered (see Fox, 2004, 2010; Azevedo et al., 2012). The chi-squareposterior predictive checking is defined to evaluate the predicted score distri-bution with the observed score distribution. The discrepancy considered hereis a slight modification of that one presented in Sinharay (2006), respectively,

at item, group and population level, that are Di (y) =∑

l

∑k

(nilk−E(Nilk))2

E(Nilk),

Dk (y) =∑

l

∑i(nilk−E(Nilk))

2

E(Nilk), D (y) =

∑l

∑k

∑i(nilk−E(Nilk))

2

E(Nilk), where Nilk

is the number of subjects with a score l at group k that answer correctly theitem i, and E(.) stands for the respective expectation (which is calculated us-ing the posterior predictive distribution). The posterior predictive checkingis evaluated using MCMC output. Naturally, when an item is not presentedto a given group and/or a given subject, the related quantities are skippedin the calculations above through the matrix I, as defined in page 16.

The predicted score distribution is easily calculated using the MCMCoutput. In each iteration, a sample of the score distribution is obtained.This is accomplished by generating response data from the sampled param-eters according to the model. Subsequently, the number of subjects can becalculated for each possible score at each group. For each possible score, themedian and 95% equi-tailed credibility intervals are calculated to evaluatethe score distribution. Concerning the proportion of correct response foreach item, a similar procedure is used and then, the number of subjects thatcorrectly answer the item, at each possible score, is then calculated.

Another based predictive checking technique is proposed here to deter-mine the test dimensionality. It is known that the matrix of the tetrachoriccorrelation can be helpful to determine the test dimensionality, see Reckase(2009). The idea is to compare the eigenvalues associated to the matrix ofthe tetrachoric correlation of the replicated data with those obtained withthe observed data. For example, if we have three competing MIRT models,that is, a uni-, two- and three-dimensional models, the model that producespredicted eigenvalues more similar to those obtained with the observed data,

21

should be the most appropriate model. In other words, the test dimension-ality should correspond to the model with the predicted eigenvalues closestto the observed eigenvalues. The underlying idea is similar to those pre-sented above, that is, in each MCMC iteration we calculated the eigenvaluesassociated with the matrix of the tetrachoric correlation of the replicateddata. Then, we have the posterior distribution of the eigenvalues, which canbe used to calculate the posterior mean or median and the respective HPDintervals.

3.3. Statistics of model comparison

In this work, we considered the model comparison statistics presentedin Spiegelhalter et al. (2002) and successfully used by Santos et al. (2013),Azevedo et al. (2011) and Bazan et al. (2006), which are based on the con-cept of Bayesian deviance, see Dempster et al. (1977). The Deviance, letus say D(ϑ), is given by D(ϑ) = −2 ln [L(θ..., ζ)p(θ|ηθ)], where L(θ..., ζ) =∏K

k=1

∏nkj=1

∏i|Iijk=1 P

yijkijk (1−Pijk)1−yijk , is the original likelihood, p(θ|ηθ) =∏K

k=1

∏nkj=1 p(θ.jk|ηθk) and p(θ.jk|ηθk) is as in (5). Also, let ϑ(m)(m =

1, ...,M) the m-th value of the valid simulated MCMC sample.The ρD statistic (also named effective sample size) is given by ρD =

D (ϑ)−D(ϑ), where ϑ is the vector with the posterior expectation of each

parameter, based on the valid MCMC sample andD (ϑ) = 1M

∑Mm=1D

(ϑ(m)

).

In addition, the deviance information criterion (DIC) is given by DIC =D(ϑ)

+ 2ρD. The posterior expectation of the Akaike information (EAIC)criterion and of the Bayesian information criterion are, respectively, de-fined as, EAIC = D (ϑ) + 2p, EBIC = D (ϑ) + p ln(N), where p is thenumber of parameters and N is the number of observations, that is, N =∑K

k=1

∑nkj=1

∑Ii=1 Iijk. Here we follow the suggestions of Spiegelhalter et al.

(2002) by taking p = ρD.

4. Real data analysis

We analyzed a part of the 2013 first stage of the admission exam ofthe University of Campinas (see https : //www.comvest.unicamp.br/, inPortuguese). The test is composed of 48 multiple-choice items. Item 43 wasexcluded, since it presented a negative biserial correlation, but the originalnumeration was preserved. We selected a sample of 3,000 candidates (amongthose that answered all items on the test) spread uniformly over six

22

Table 1: Comvest dataset. Questions skills of the exam.

Item Skill ai1 ai2 Item Skill ai1 ai2

Q1 Philosophy .512 .138 Q25 Chemistry .877 .941Q2 History .554 .226 Q26 Chemistry .728 .605Q3 History .772 .258 Q27 Chemistry 1.032 1.056Q4 History .996 .407 Q28 Chemistry 1.019 1.011Q5 History .797 .332 Q29 Chemistry .986 1.240Q6 History 1.258 .546 Q30 Chemistry 1.684 1.452Q7 History .956 .519 Q31 Physics .715 .939Q8 History 1.217 .491 Q32 Physics .690 .914Q9 History .910 .463 Q33 Physics .937 1.159Q10 Geography 1.205 1.198 Q34 Physics .716 1.058Q11 Geography .159 .104 Q35 Physics .918 .772Q12 Geography .590 .274 Q36 Physics 1.132 1.633Q13 Geography .479 .301 Q37 Mathematics .409 .586Q14 Geography .608 .430 Q38 Mathematics .972 1.398Q15 Geography .513 .433 Q39 Mathematics .484 .815Q16 Geography .959 .620 Q40 Mathematics .861 1.164Q17 Sociology .707 .201 Q41 Mathematics 1.468 1.901Q18 Geography .926 1.086 Q42 Mathematics 1.312 1.783Q19 Biology 1.476 1.061 Q44 Mathematics 1.040 1.623Q20 Biology .604 .352 Q45 Mathematics .643 1.374Q21 Biology .812 .438 Q46 Mathematics .936 1.252Q22 Biology 1.463 1.296 Q47 Mathematics .827 1.013Q23 Biology .544 .382 Q48 Mathematics .968 1.541Q24 Biology .162 .038

areas (Arts, Biological and Health Sciences, Exact and Earth Sciences, Hu-manities, Medicine and Technological), respectively groups 1, 2, 3, 4, 5 and6. Therefore, we have six groups with nk = 500, k = 1, ..., 6, submitted tothe same 47 item-test in a complete design (every subject answered all itemsof the test). Each item on the test is supposed to be related, by design, to,at least, one of the following fields: Biology, Chemistry, Geography, Geol-ogy, History, Mathematics, Physics and Sociology. In Table 1, we presenta classification of each item concerning these fields. Table 1 also presentsthe posterior estimates for the parameters ai for each item. Further details

23

regarding the estimates of parameter vector ζi are presented in Figure 6.0

24

68

10

Full sample

dimension

Eig

enva

lue

●

● ● ● ●

1 2 3 4 5

●

●

1−MSNCP2−MSNCP3−MSNCPobserved

●

●

●

●●

● ● ● ● ● ● ● ● ●●

● ●

●●

●

●

●

●

●●

●

●●

●●

● ●

●

●

●●

●

●●

●

●

● ●

●

●

●

●

● ●

●

●●

●

0 10 20 30 40 50

050

100

150

200

250

Full Sample

Score

Fre

quen

cy

●

expected scorecredibility intervalobserved score

Figure 2: Comvest dataset. Full sample tetrachoric eigenvalues and predictive scores.

The goal is to analyze the items and the subjects, identifying the testdimensionality and the differences among the groups, and select the mostsuitable group-specific multivariate latent traits distribution (either multi-variate symmetric normal or MSNCP). The reference group here consideredis the second one.

We fitted a total of six models, varying according the latent trait dis-tribution (normal or skew normal under the centered parameterization) andthe number of dimensions (1, 2 or 3) of the test. The values of the hy-perparameters, when pertinent, were the same for all models as well asthe values of the parameters related to the kernel densities. We consid-ered µζi = (0.5, . . . , 0.5, 0)> and Ψζi = diag(0.5, . . . , 0.5, 2); κ1 = 100 andκ2 = 300; αδ1 = 0.5 and αδ2 = 0.5; τ → 0 and |ΨΨ| → 0; µ = 0 andΨµ = (5, . . . , 5)>. We observed that the two dimensional model with multi-variate centered skew normal latent traits (2-MSNCP) model is selected byEDIC and EAIC whereas the statistic EBIC selected the one dimensionalmodel with univariate centered skew normal latent traits (1-MSNCP). Whilethe asymmetry of at least one of the latent trait distribution is detected, thetest dimensionality varies between one and two. Figures 2 and 3 display the

24

eigenvalues of the tetrachoric correlation matrices for the observed response0

24

68

10

Group 1

dimension

Eig

enva

lue

●

●●

● ●

1 2 3 4 5

●

●


●

●

●

● ●

02

46

81

0

Group 2

dimension

Eig

enva

lue

●

●●

● ●

1 2 3 4 5

●

●


●

●

●●

●

02

46

81

0

Group 3

dimension

Eig

enva

lue

●

●● ● ●

1 2 3 4 5

●

●


●

● ●●

●

02

46

81

0

Group 4

dimension

Eig

enva

lue

●

●● ● ●

1 2 3 4 5

●

●


●

●

●

●●

05

10

15

Group 5

dimension

Eig

enva

lue

●

●● ● ●

1 2 3 4 5

●

●


●

● ●

● ●

02

46

81

0

Group 6

dimension

Eig

enva

lue

●

●●

● ●

1 2 3 4 5

●

●

1−PM3CSN2−PM3CSN3−PM3CSNobserved

●

●

●●

●

Figure 3: Comvest dataset. Eigenvalues tetrachoric correlation matrix.

matrices associated to the whole sample and to each group, respectively.These figures also display the eigenvalues of the tetrachoric correlation ma-trices for response matrices simulated according to the fitted asymmetricmodels of dimensions 1, 2 and 3.

From Figures 2 and 3, we can see that the eigenvalues of the tetrachoriccorrelation matrix, predicted from model 2-MSNCP, are the closest obtainedfrom the observed (tetrachoric) correlation matrix, compared with the othermodels. Therefore we choose the model 2-MSNCP. Figures 2 and 4 representthe observed, predicted and the 95% credibility intervals, related to the scoredistributions at population level and group level, respectively. We can see

25

that the model, in both cases, is well fitted to the data. The results of the

● ● ● ● ● ● ● ● ●

●

● ● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

● ●

●

●

●

●

● ●

●

●

●

●

● ● ●

●

● ● ●

0 10 20 30 40 50

010

2030

4050

Group 1

Score

Freq

uenc

y

●


● ● ● ● ● ● ● ● ● ●

● ●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ● ● ● ● ●

0 10 20 30 40 50

010

2030

4050

Group 2

Score

Freq

uenc

y

●


● ● ● ● ● ● ● ● ● ● ● ●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

● ●

0 10 20 30 40 50

010

2030

4050

Group 3

Score

Freq

uenc

y

●


● ● ● ● ● ● ● ● ● ●

●

●

●

●

●

● ● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

● ● ●

●

0 10 20 30 40 50

010

2030

4050

Group 4

Score

Freq

uenc

y

●


● ● ● ● ● ● ● ● ● ●

●

● ●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

0 10 20 30 40 50

010

2030

4050

Group 5

Score

Freq

uenc

y

●


● ● ● ● ● ● ● ● ● ●● ●

●●

●

●

●

●

●

●

●

● ●

●

●

●

● ●

●●

●

●

●

●

●

●●

●

●

●● ● ●

●● ● ● ●

0 10 20 30 40 50

010

2030

4050

60

Group 6

Score

Freq

uenc

y

●


Figure 4: Comvest dataset. Scores distribution per group.

26

item-fit analysis, based on the p-value for the chi-square distance, are shownin Figure 10. We also compute the p-values at group level, which rangedfrom 0.12 and 0.95 indicating that the model fitted well all almost of theitems. Figure 5 presents the item fit plots for some selected item accordingthey were or not well fitted by the model. In conclusion, we can say that themodel is properly fitted to almost all of the items.

In Tables 2 and 3, we present the estimatives for the population parame-ters, there we can notice that group 5 presents the largest population means(for the two dimensions), that the variances (for the two dimensions) aresimilar among the groups and that for groups 1, 2 and 5 the latent traitsdistributions, for one dimension at least, are skewed. In Figures 7 and 8, wecompare the empirical densities of the estimated θj. (for both the symmetricand skew model two-dimensional models) with their respective theoreticaldensities, that is, with either the multivariate normal density or the MSNCPdensity with parameters equal to those presented in Tables 2 and 3. FromFigures 7 and 8, we can see how the empirical densities are very similar tothe theoretical ones. In Figure 9, we present the Box-plots of the posteriordistribution for the asymmetry parameters for each group. There we cannotice how the asymmetry for some groups can be thought to be differentfrom 0.

From Figure 6, we can conclude that most of the items present a reason-able discrimination power and that they are difficult for most of the groups(see the estimates of the population means from Table 2). Also, except forItem 22, all items present guessing estimates compatible with a pattern ofrandom choice (for subjects with low latent trait level) for an item with fouror five alternatives. Also, from Table 1, we can notice that the factor loadingshave no pattern concerning the fields, since the magnitude of them are, ingeneral, very similar within each field. Even after rotating them, no patternwas observed. Therefore, it is not easy to provide interpretations for thetwo retained factors. Figure 6 presents the posterior expectations and theequi-tailed 95% credibility intervals for the multidimensional discrimination

(MULDISC =√∑D

d=1 a2id) and the multidimensional difficulty (MULTDIF =

bi∑Dd=1 a

2id

), which have similar interpretations to those presented by their uni-

dimensional versions and in Figure 10 we plot these estimates. See Reckase(2009) for further details.

27

0 10 20 30 40

0.00.4

0.8

Item 14

Raw score

Prop

ortion

corre

ct

● ● ● ● ● ●●

●

●

●

●

●

●

●

●

●●

●●

●●

●●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

●

●●

●●

●

●

●

●

●●

● ●● ● ●

●

● ● ● ● ●● ● ● ● ● ●

●● ●

●● ● ●

●

●

0 10 20 30 40

0.00.4

0.8

Item 18

Raw score

Prop

ortion

corre

ct

● ● ● ● ● ● ●●

●

●

●●

●●

●●

●●

●●

●●

●

●

●

●

●

●

●

●

●●

●●

●●

●● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

●

●

●

● ●

●●

●

●

●●

●

●

● ●

●

●●

● ●●

● ●●

●●

●●

●●

●

●● ● ●

● ●

0 10 20 30 40

0.00.4

0.8

Item 26

Raw score

Prop

ortion

corre

ct

● ● ● ● ● ● ●●

●●

●●

● ● ●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●

● ● ● ● ● ● ● ● ● ● ● ●

●

●

●

● ●

● ●

●

● ●

● ●●

●

● ●

●●

●

●●

● ●● ●

●

●

●

●

●

● ●●

●

0 10 20 30 40

0.00.4

0.8

Item 28

Raw score

Prop

ortion

corre

ct

● ● ● ● ● ● ●●

●

●●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

●

●

●

●● ● ●

● ●

●● ●

●

●● ●

● ● ●

●●

●● ●

●

●●

● ●● ● ● ● ● ● ● ●

0 10 20 30 40

0.00.4

0.8

Item 35

Raw score

Prop

ortion

corre

ct

● ● ● ● ● ● ●●

●●

●● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●

●●

●●

●●

●●

●

●

●

●

●

●

●

●

●

● ● ● ● ● ● ● ● ● ●

●

●

●●

● ●●

●

●●

●

●

●

●●

● ●●

●●

●●

●

●

●● ● ● ●

● ●

● ●

●

● ●

0 10 20 30 40

0.00.4

0.8

Item 36

Raw score

Prop

ortion

corre

ct

● ● ● ● ● ● ● ●

●

●

● ● ●● ● ●

● ● ●● ●

●●

●●

●●

●

●

●

●

●

●

●

●

●●

●●

●● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

●

●

●

● ●

●●

●●

●

●

● ● ●

● ●

●

●●

●●

● ●

●●

●● ●

●

●● ●

● ● ● ●

0 10 20 30 40

0.00.4

0.8

Item 41

Raw score

Prop

ortion

corre

ct

● ● ● ● ● ● ●●

●

●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●●

●●

●

●

●

●

●

●

●

●

●

●

●●

●●

● ● ● ● ● ● ● ● ● ● ●

●

●

● ●

●

●

●

●●

●● ● ●

● ● ● ● ● ●

● ●●

●

●

●

●

●

●●

● ●

● ●● ●

0 10 20 30 40

0.00.4

0.8

Item 48

Raw score

Prop

ortion

corre

ct

● ● ● ● ● ● ●●

●

●

●● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●

●●

●

●

●

●

●

●

●

●

● ● ● ● ● ● ● ● ● ●

●

●●

● ●

●● ●

● ● ● ● ● ● ●

●

● ● ● ●

● ●

● ●

●●

●●

●

●

●

●

●

●

●

●

Figure 5: Comvest dataset. Item fit plots.

28

0 10 20 30 40

0.0

0.4

0.8

1.2

a1

Item

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

● ●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

0 10 20 30 40

0.0

0.5

1.0

1.5

a2

Item

●

● ●

●●

● ●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

0 10 20 30 40

−10

12

34

b

Item

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

● ●

●

●

●●

●

●

●●

●

●

●

●

●

0 10 20 30 400.15

0.20

0.25

0.30

c

Item

● ●

●

●

●

●

●●

●

●

●●

● ● ●

●

● ●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

0 10 20 30 40

0.5

1.0

1.5

2.0

DISCM

Item

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

0 10 20 30 40

−20

24

DIFICM

Item

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●●

●

●●

●●

●

●

●

●

● ●

●

● ● ●

●

●

●

●

Figure 6: Comvest dataset. Posterior expectations and equi-tailed 95% credibility intervalsof the item parameters.

29

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

Dimension 1−Group 1

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

SKDSKEDSDSED

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0


−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

SKDSKEDSDSED

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0


−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

SKDSKEDSDSED

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0


−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

SKDSKEDSDSED

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0


−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

SKDSKEDSDSED

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0


−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

SKDSKEDSDSED

Figure 7: Comvest dataset. Empirical and theoretical posterior densities: groups 1, 2 and3. Where SKD, SKED, SD and SED means skewed density, skewed empirical density,symmetrical density and symmetric empirical density, respectively.

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0


−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

SKDSKEDSDSED

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0


−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

SKDSKEDSDSED

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0


−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

SKDSKEDSDSED

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0


−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

SKDSKEDSDSED

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0


−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

SKDSKEDSDSED

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0


−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

SKDSKEDSDSED

Figure 8: Comvest dataset. Empirical and theoretical posterior densities: groups 4, 5 and6. Where SKD, SKED, SD and SED means skewed density, skewed empirical density,symmetrical density and symmetric empirical density, respectively.

30

−1.0

−0.5

0.0

0.5

1.0

Dimension 1 1 2 3 4 5 6 −1

.0−0

.50.

00.

51.

0 Dimension 2

1 2 3 4 5 6

Figure 9: Comvest dataset. Box-plots of the posterior distributions of δθk.

● ●

●

●

●

●

●●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

0.0

0.2

0.4

0.6

0.8

1.0

Item

Bav

esia

n p−

valu

e

1 5 10 15 20 25 30 35 40 45

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●●

●●

●

●

●

●

●●

●

●●●

●

●

●

●

0.5 1.0 1.5

−2

01

23

4

DISCM

DIF

ICM

Figure 10: Comvest dataset. Bayesian p-value per item; multidimensional difficulty vsmultidimensional discrimination

31

Table 2: Comvest dataset. Posterior expectation and equi-tailed 95% credibility intervals.

ModelSkew Symmetric

Group Parameter Mean SD CI(95%) Mean SD CI(95%)

1

µθ1 .15 .08 [.00, .33] .22 .07 [.09, .35]µθ2 -0.25 .12 [−0.53,−0.06] -0.29 .09 [−0.49,−0.13]σ2θ1

.87 .00 [.87, .87] .56 .03 [.53, .60]σ2θ2

.67 .01 [.67, .67] 1.10 .15 [.88, 1.27]ρθ -0.03 .00 [−0.03,−0.03] -0.22 .09 [−0.37,−0.05]δθ1 -0.37 .19 [−0.72, .18] - - -δθ2 .79 .12 [.49, .94] - - -

2ρθ -0.20 .05 [−0.31,−0.07] -0.42 .10 [−0.59,−0.23]δθ1 -0.76 .37 [−0.99,−0.17] - - -δθ2 .24 .20 [−0.13, .70] - - -

3

µθ1 .08 .09 [−0.11, .26] .01 .10 [−0.20, .20]µθ2 .57 .09 [.39, .73] .60 .09 [.43, .76]σ2θ1

.95 .05 [.87, .99] .78 .05 [.69, .81]σ2θ2

1.12 .15 [.89, 1.27] .94 .11 [.78, 1.17]ρθ .02 .06 [−0.05, .11] -0.13 .12 [−0.39, .00]δθ1 -0.47 .21 [−0.80, .00] - - -δθ2 -0.20 .53 [−0.89, .59] - - -

4

µθ1 .22 .08 [.06, .41] .24 .08 [.09, .39]µθ2 -0.16 .09 [−0.35, .02] -0.18 .10 [−0.40, .02]σ2θ1

1.05 .14 [.78, 1.45] .83 .08 [.73, .95]σ2θ2

1.08 .13 [.89, 1.24] 1.05 .15 [.86, 1.34]ρθ -0.08 .07 [−0.18, .01] -0.17 .06 [−0.23,−0.06]δθ1 -0.08 .36 [−0.68, .51] - - -δθ2 -0.30 .39 [−0.86, .57] - - -

5

µθ1 .67 .09 [.50, .85] .59 .10 [.37, .77]µθ2 .49 .12 [.23, .71] .47 .14 [.21, .73]σ2θ1

.96 .11 [.83, 1.08] .88 .06 [.79, .95]σ2θ2

.99 .09 [.81, 1.05] .88 .20 [.61, 1.05]ρθ .38 .06 [.32, .47] .14 .06 [.10, .26]δθ1 -0.73 .16 [−0.98,−0.49] - - -δθ2 -0.44 .32 [−0.86, .09] - - -

32

Table 3: Cont. of Table 2.

ModelSkew Symmetric

Group Parameter Mean SD CI(95%) Mean SD CI(95%)

6

µθ1 -0.39 .08 [−0.53,−0.24] -0.36 .07 [−0.50,−0.21]µθ2 .06 .09 [−0.10, .22] .10 .08 [−0.06, .26]σ2θ1

.59 .03 [.56, .62] .43 .03 [.41, .45]σ2θ2

.74 .03 [.67, .77] .60 .06 [.52, .70]ρθ -0.06 .01 [−0.07,−0.05] -0.11 .04 [−0.18,−0.07]δθ1 -0.05 .61 [−0.88, .86] - - -δθ2 .10 .45 [−0.64, .76] - - -

5. Conclusions and Comments

In this work, a Multidimensional Multiple Group IRT model with a mul-tivariate skew normal latent trait distribution under the centered parameter-ization was presented. Bayesian inference for parameter estimation, modelcomparison and model fit assessment were developed through MCMC algo-rithms. Simulation studies indicate that the proposed model presented moreaccurate results compared to the symmetric one, when the underlying latenttrait distributions corresponds to the MSNCP. Moreover, the model/MCMCalgorithm recovered all parameters properly (simulation study not shownhere). The developed tools were illustrated through an analysis of a realdata set related to the first stage of the University of Campinas AdmissionExam. In this example, a two dimensional MIRT model and skew normaldistribution for the latent traits was selected, which indicated that three, ofthe six groups, presented asymmetric behavior. Also, this model fitted tothe data quite properly.

As future research we suggest to consider other distributions for the latenttraits and/or other link functions (item response functions) as the skew-t uniand multivariate distributions. Also, other estimation methods such as CA-DEM, see Azevedo et al. (2012b) and Metropolis-Hastings Robbins-Monroalgorithm see Cai (2010), can be considered. Other techniques for determi-nation of the test dimensionality, as RJMCMC algorithms could be explored.In addition, other tools for model fit assessment, as residual analysis, can bedeveloped.

33

6. Acknowledgments

The authors are thankful to the CAPES (Coordenacao de Aperfeicoamentode Pessoal de Ensino Superior) for the financial support.

7. Bibliography

Andrade, D.F. and Tavares, H.R. Item response theory for longitudinaldata: population parameter estimation. Journal of Multivariate Analysis,95(1):1–22, 2005.

Arellano-Valle, R.B. and Azzalini, A. The centred parametrization for themultivariate skew-normal distribution. Journal of Multivariate Analysis,99(7):1362–1382, 2008.

Azevedo, C.L.N., Andrade, D.F. and Fox, J.-P. A Bayesian generalized mul-tiple group IRT model with model-fit assessment tools. Computationalstatistics & Data Analysis, 56(12):4399–4412, 2012.

Azevedo, C.L.N., Bolfarine, H. and Andrade, D.F. Bayesian inference fora skew-normal IRT model under the centred parameterization. Computa-tional Statistics & Data Analysis, 55(1):353–365, 2011.

Azevedo, C.L.N., Bolfarine, H. and Andrade D.F. Parameter recovery fora skew-normal IRT model under a bayesian approach: hierarchical frame-work, prior and kernel sensitivity and sample size. Journal of StatisticalComputation and Simulation, 82(11):1679–1699, 2012.

Azevedo, C. L. N., Andrade, D. F. and Fox,J.-P. CADEM: A conditionalaugmented data EM algorithm for fitting one parameter probit models.Brazilian Journal of Probability and Statistics, 27, 245–262, 2012.

Azevedo, C.L.N., Fox, J.-P. and Andrade, D.F. Longitudinal multiple-groupirt modelling: covariance pattern selection using MCMC and RJMCMC.International Journal of Quantitative Research in Education, 2(3-4):213–243, 2015.

Azevedo, C.L.N, Fox, J.-P. and Andrade, D.F. Bayesian longitudinal itemresponse modeling with restricted covariance pattern structures. Statisticsand Computing, 26(1-2):443–460, 2016.

34

Azzalini, A. A class of distributions which includes the normal ones. Scan-dinavian Journal of Statistics, 12(2):171–178, 1985.

Azzalini, A. and Capitanio, A. Statistical applications of the multivariateskew normal distribution. Journal of the Royal Statistical Society: SeriesB (Statistical Methodology), 61(3):579–602, 1999.

Bartolucci, F. A class of multidimensional IRT models for testing unidimen-sionality and clustering items. Psychometrika, 72(2):141–157, 2007.

Bazan, J.L., Branco, M. and Bolfarine. A skew item response model.Bayesian Analysis, 1(4):861–892, 2006.

Beguin, A. and Glas, C.A.W. Mcmc estimation and some model-fit analysisof multidimensional IRT models. Psychometrika, 66(4):541–561, 2001.

Bock, R.D. and Zimowski, M.F. Multiple group IRT. In Handbook of modernitem response theory, pages 433–448. Springer, 1997.

Bolt, D.M. and Lall, V.F. Estimation of compensatory and noncompensatorymultidimensional item response models using Markov chain Monte Carlo.Applied Psychological Measurement, 27(6):395–414, 2003.

Cai, L. Metropolis-Hastings Robbins-Monro algorithm for confirmatoryitem factor analysis. Journal of Educational and Behavioral Statistics,35(3):307–335, 2010.

De Jong, M.G. and Steenkamp, J.-B. Finite mixture multilevel multidimen-sional ordinal IRT models for large scale cross-cultural research. Psychome-trika, 75(1):3–32, 2010.

Torre, J. and Patz, R.J. Making the most of what we have: A practical ap-plication of multidimensional item response theory in test scoring. Journalof Educational and Behavioral Statistics, 30(3):295–311, 2005.

Dempster, A.P, Laird, N.M. and Rubin, D.B. Maximum likelihood fromincomplete data via the EM algorithm. Journal of the Royal StatisticalSociety, Series B, 1–38, 1977.

Fox, J.-P. Multilevel IRT model assessment. New developments in categoricaldata analysis for the social and behavioral sciences, pages 227–252, 2005.

35

Fox, J.-P. Bayesian item response modeling: Theory and applications.Springer Science & Business Media, 2010.

Fox, J.-P. and Glas, C.A.W. Bayesian modification indices for IRT models.Statistica Neerlandica, 59(1):95–106, 2005.

Fragoso, T.M. and Curi, M. Improving psychometric assessment of the beckdepression inventory using multidimensional item response theory. Bio-metrical Journal, 55(4):527–540, 2013.

Fu, Z.H, Tao, J. and Shi, N.-Z. Bayesian estimation in the multidimensionalthree-parameter logistic model. Journal of Statistical Computation andSimulation, 79(6):819–835, 2009.

Gamerman, D. and Lopes, H.F. Markov chain Monte Carlo: stochastic sim-ulation for Bayesian inference. CRC Press, 2006.

Genton, M.G. Skew-elliptical distributions and their applications: a journeybeyond normality. CRC Press, 2004.

Horn, R.A. and Johnson, C.R. Matrix analysis. Cambridge university press,2012.

Lachos, V.H. Skew linear mixed models. PhD thesis, 2004.

Leon-Gonzalez, R. Data augmentation in the Bayesian multivariate probitmodel. 2004. http://eprints.whiterose.ac.uk/9887/1/SERP2004001.pdf.

Levy, R. and Mislevy, R.J. and Sinharay, S. Posterior predictive model check-ing for multidimensionality in item response theory. Applied PsychologicalMeasurement, 33(7):519–537, 2009.

Matos, G.S. Multidimensional IRT models with skew distributions for thelatent traits. PhD thesis.

Montgomery, D.C. Design and analysis of experiments. John Wiley & Sons,2008.

Padilla, J.L. Multidimensional multiple group skew item response theorymodels for dichotomous responses under a Bayesian approach. Master’sthesis (In Portuguese), 2014.

36

Padilla, J.L., Azevedo, C.L.N. and Lachos, V.H. Parameter recovery fora skew multidimensional item response model: a comparison of MCMCalgorithms and measurement of some effects of interest, manuscript underpreparation.

Patz, R.J. and Junker, B.W. Applications and extensions of MCMC inIRT: Multiple item types, missing data, and rated responses. Journal ofeducational and behavioral statistics, 24(4):342–366, 1999.

Pewsey, A. Problems of inference for Azzalini’s skew-normal distribution.Journal of Applied Statistics, 27(7):859–870, 2000.

Reckase, M. Multidimensional item response theory, volume 150. Springer,2009.

Rivers, D. Identification of multidimensional spatial voting models. Type-script. Stanford University, 2003.

Sahu, S.K. Bayesian estimation and model choice in item response models.Journal of Statistical Computation and Simulation, 72(3):217–232, 2002.

Santos, J.R., Azevedo, C.L.N. and Bolfarine, H. A multiple group itemresponse theory model with centered skew-normal latent trait distributionsunder Bayesian framework. Journal of Applied Statistics, 40(10):2129–2149, 2013.

Sheng, Y. and Wikle, C.K. Bayesian multidimensional IRT models witha hierarchical structure. Educational and Psychological Measurement,68(3):413–430, 2008.

Sheng, Y. and Wikle, C.K. Comparing multiunidimensional and unidimen-sional item response theory models. Educational and Psychological Mea-surement, 67(6):899–919, 2007.

Sinharay, S. Bayesian item fit analysis for unidimensional item responsetheory models. British Journal of Mathematical and Statistical Psychology,59(2):429–449, 2006.

Sinharay, S., Johnson, M.S. and Stern, H.S. Posterior predictive assess-ment of item response theory models. Applied Psychological Measurement,30(4):298–321, 2006.

37

Spiegelhalter, D.J., Best, N.G. and Carlin, B.P. and Van Der Linde, A.Bayesian measures of model complexity and fit. Journal of the Royal Sta-tistical Society: Series B (Statistical Methodology), 64(4):583–639, 2002.

Stern, H.S. and Sinharay, S. Bayesian model checking and model diagnostics.Handbook of Statistics, 25:171–192, 2005.

8. Appendix

The algorithm is described in the following:

• Step 1: Simulate the augmented variables U(t)ijk from

Uijk|(.) = 0× 11{0}(yijk) + 1× 11{1}(yijk)11(−∞,0)(z(t−1)ijk )

+ Bernoulli(c(t−1)i )11{1}(yijk)11(0,∞)(z

(t−1)ijk ),

(18)

for i = 1, ..., I, j = 1, ..., Nk and k = 1, ..., K.

• Step 2: Simulate the augmented variables Z(t)ijk from

Zijk|(.) ∼ N(−∞,0)(a>(t−1)i θ

(t−1).jk − b(t−1)i , 1)11{0}(yijk)

+ N(0,∞)(a>(t−1)i θ

(t−1).jk − b(t−1)i , 1)11{1}(yijk)11{0}(u

(t)ijk)

+ N(a>(t−1)i θ

(t).jk − b

(t−1)i , 1)11{1}(yijk)11{1}(u

(t)ijk), (19)

where N[a,b](µ, ψ) stands for a truncated normal distribution at theinterval [a, b] with nontruncated mean µ and non truncated variance ψ.This for i = 1, ..., I, j = 1, ..., nk and k = 1, ..., K.

• Step 3: Simulate the latent variables T(t)jk from Tjk|(.) ∼ HN

(ψ

(t−1)Tjk t

(t−1)jk , ψ

(t−1)Tjk

)for j = 1, ..., nk, k = 1, ..., K, mutually independently, where

38

ψ(t−1)Tjk =

(1 + δ

>(t−1)θk (ID − δ(t−1)θk δ

>(t−1)θk )−1δ

(t−1)θk

)−1,

t(t−1)jk = δ


>(t−1)θk )−1

(Σ

1/2>(t−1)Zk

)−1 (Ψ

1/2>(t−1)θk

)−1 (θ(t−1)jk − µ(t−1)

θk

)+ δ


>(t−1)θk )−1

(Σ

1/2>(t−1)Zk

)−1ε(t−1)θk ,

where

ε(t−1)θk = −

(Σ

1/2>(t−1)Zk

)−1µZk,

µZk =

√2

πδ(t−1)θk ,

ΣZk = ID − δ>(t−1)θk δ(t−1)θk .

(20)

• Step 4: Simulate θ(t)jk |(.) from ND

(Ξ−1(t−1)θk %

(t−1)θk , Ξ

−1(t−1)θk

), for j =

1, ..., nk, k = 1, ..., K, mutually independently, where

Ξ(t−1)θk = a

>(t−1)jk a

(t−1)jk

+(Ψ

1/2>(t−1)θk Σ

−1/2>(t−1)Zk (ID − δ(t−1)θk δ

>(t−1)θk )Σ

−1/2(t−1)Zk Ψ

1/2(t−1)θk

)−1,

%θk = a>(t−1)jk z.jk + a

>(t−1)jk b

(t−1)jk

+(Ψ

1/2>(t−1)θk Σ

−1/2>(t−1)Zk (ID − δ(t−1)θk δ

>(t−1)θk )Σ

−1/2>(t−1)Zk Ψ

1/2>(t−1)θk

)−1×

(µ

(t−1)θk + Ψ

1/2>(t−1)θk ε

(t−1)θk + Ψ

1/2>(t−1)θk Σ

−1/2>(t−1)Zk δ

(t−1)θk t

(t)jk

),

(21)

where ajk stand for the (Ik × D) dimensional matrix containing theparameters ai. of every item i answered by subject j of the group kand bjk stand for the (Ik × 1) vector containing the parameters bi ofevery item i answered by subject j of group k.

39

• Step 5: Simulate µ(t)θk|(.) from ND

(Ξ−1(t−1)µk %

(t−1)µk , Ξ

−1(t−1)µk

), for k =

1, ..., K, mutually independently, where

Ξ(t−1)µk = Ψ−1µ + nk

(Ψ

1/2>(t−1)θk Σ

−1/2>(t−1)Zk (ID − δ(t−1)θk δ

>(t−1)θk )Σ

−1/2(t−1)Zk Ψ

1/2(t−1)θk

)−1,

%(t−1)θk =

(Ψ

1/2>(t−1)θk Σ

−1/2>(t−1)Zk (ID − δ(t−1)θk δ

>(t−1)θk )Σ

−1/2(t−1)Zk Ψ

1/2(t−1)θk

)−1,

×nk∑j=1

(θ(t)jk −Ψ

1/2>(t−1)θk ε

(t−1)θk + Ψ

1/2>(t−1)θk Σ

−1/2>(t−1)Zk δ

(t−1)θk t

(t)jk .)

(22)

• Step 6: Simulate Ψ(t)θk|(.) for k = 1, ..., K, mutually independently. A

MH step is required

1. Draw Ψ(c)0θk from q

(Ψ

(t−1)θk

).

2. Let Ψ(c)θk = D−1/2Ψ

(c)0θkD

−1/2 with D the diagonal matrix with el-

ements equal to the principal diagonal of Ψ(c)0θk. Note that this

step is only required on the case that Ψ(c)θk is a correlation matrix,

otherwise Ψ(c)θk = Ψ

(c)0θk.

3. Accept Ψ(c)θk = Ψ

(t)θk with probability πj

(Ψ

(c)θk ,Ψ

(t−1)θk

)= min{RΨk

, 1}where

RΨk=

∏nkj=1 p

(θ(t)jk |µ

(t)θk,Ψ

(c)θk , δ

(t−1)θk , t

(t)jk

)p(Ψ

(c)θk

)q(Ψ

(t−1)θk |Ψ

(c)θk

)∏nk

j=1 p(θ(t)jk |µ

(t)θk,Ψ

(t−1)θk , δ

(t−1)θk , t

(t)jk

)p(Ψ

(t−1)θk

)q(Ψ

(c)θk |Ψ

(t−1)θk

) ,(23)

p(.) and q(.) stands for the prior and transition densities previouslydefined.

• Step 7: Simulate δ(t)θk|(.) for k = 1, ..., K, mutually independently. A

MH step is required. For a general D, first draw δ(t)1k |δ

(t−1)2k , ..., δ

(t−1)Dk then

draw δ(t)2k |δ

(t)1k , ..., δ

(t−1)Dk and repeat this process for all the D elements.

40

1. Draw δ(c)1k from q

(δ(t−1)1k , δ

(t−1)2k , ..., δ

(t−1)Dk

),

2. Accept δ(c)1k = δ

(t)1k with probability π1k

(δ(c)1k , δ

(t−1)1k

)= min{Rδ1k , 1}

where

Rδ1k =

∏nkj=1 p

(θ(t)jk |µ

(t)θk,Ψ

(t)θk, δ

(c)1k , δ

(t−1)2k , ..., δ

(t−1)Dk , t

(t)jk

)∏nk

j=1 p(θ(t)jk |µ

(t)θk,Ψ

(t)θk, δ

(t−1)1k , δ

(t−1)2k , ..., δ

(t−1)Dk , t

(t)jk

)×

p(δ(c)1k |δ

(t−1)2k , ..., δ

(t−1)Dk

)q(δ(c)1k |δ

(t−1)1k , δ

(t−1)2k , ..., δ

(t−1)Dk

)p(δ(t−1)1k |δ(t−1)2k , ..., δ

(t−1)Dk

)q(δ(t−1)1k |δ(c)1k , δ

(t−1)2k ..., δ

(t−1)Dk

) ,then repeat these steps for δ

(t)2k , ..., δ

(t)Dk, here p(.) and q(.) stands

for the prior and transition densities previously defined.

• Step 8: Simulate the item parameter ζ(t)i from ζi|(.) ∼ ND+1

(Λ

(t−1)i , ζ

(t−1)j

),

for i = 1, ..., I mutually independent, where

Λ(t−1)i =

(Ψ−10a,b +H>i.Hi.

)−1,

ζ(t−1)j = Hi.>z

(t)i.. + Ψ−10a,bµ0a,b,

Hi. = [θ(t),−1N ] • Ii, (24)

where Ii is an (n× (D + 1)) matrix with elements, in each line, equal to1 or 0, according to whether or not the item has been or not presentedto the the corresponding subject and “•” stands for the Hadamard’sproduct, see Horn & Johnson (2012).

The implementation of the convergence acceleration algorithm of Gon-zalez (2004) is achieved by:

Fix (ai2(t), . . . , aiD

(t), bi(t)

) = (a(t)i2 /a

(t)i1 , . . . , a

(t)iD/a

(t)i1 , b

(t)i /a

(t)i1 ). Then, sim-

ulate ν from f(ν) where f(ν) ∼ U [0, 5; 1, 5] and fix

a(t)i1 = νa

(t)i1 , a

(t)i2 = νa

(t)i1 a

(t)i2 , . . . , a

(t)iD = νa

(t)i1 a

(t)iD, b

(t)i = νa

(t)i1 b

(t)

i ,

41

with probability

pr = min

{L(y|νa(t)i1 , . . . , νb

(t)i )π(νa

(t)i1 , . . . , νb

(t)i )f(1/ν)

L(y|a(t)i1 , . . . , b(t)i )π(a

(t)i1 , . . . , b

(t)i )f(ν)

|νD+1|, 1},

and set

a(t)i1 = a

(t)i1 , a

(t)i2 = a

(t)i1 a

(t)i2 , . . . , a

(t)iD = a

(t)i1 a

(t)iD, b

(t)i = a

(t)i1 b

(t)

i ,

with probability (1− pr), where L(y|a(t)i1 , . . . , b(t)i ) is the original likeli-

hood.

• Step 9: Draw the guessing parameters c(t)i , from:

ci|(.) ∼ beta

(κ1 +

K∑k=1

nk∑j=1

Iijkuijk;κ2 +K∑k=1

nk∑j=1

Iijk (1− uijk)

).

42

Multidimensional multiple group IRT models with skew ... · Item response theory (IRT) models are...

Documents

Transcript of Multidimensional multiple group IRT models with skew ... · Item response theory (IRT) models are...