STATIS and DISTATIS: optimum multitable principal component analysis...

44
Advanced Review STATIS and DISTATIS: optimum multitable principal component analysis and three way metric multidimensional scaling Herv ´ e Abdi, 1Lynne J. Williams, 2 Domininique Valentin 3 and Mohammed Bennani-Dosse 4 STATIS is an extension of principal component analysis (PCA) tailored to handle multiple data tables that measure sets of variables collected on the same observations, or, alternatively, as in a variant called dual-STATIS, multiple data tables where the same variables are measured on different sets of observations. STATIS proceeds in two steps: First it analyzes the between data table similarity structure and derives from this analysis an optimal set of weights that are used to compute a linear combination of the data tables called the compromise that best represents the information common to the different data tables; Second, the PCA of this compromise gives an optimal map of the observations. Each of the data tables also provides a map of the observations that is in the same space as the optimum compromise map. In this article, we present STATIS, explain the criteria that it optimizes, review the recent inferential extensions to STATIS and illustrate it with a detailed example. We also review, and present in a common framework, the main developments of STATIS such as (1) X-STATIS or partial triadic analysis (PTA) which is used when all data tables collect the same variables measured on the same observations (e.g., at different times or locations), (2) COVSTATIS, which handles multiple covariance matrices collected on the same observations, (3) DISTATIS, which handles multiple distance matrices collected on the same observations and generalizes metric multidimensional scaling to three way distance matrices, (4) Canonical-STATIS (CANOSTATIS), which generalizes discriminant analysis and combines it with DISTATIS to analyze multitable discriminant analysis problems, (5) power-STATIS, which uses alternative criteria to find STATIS optimal weights, (6) ANISOSTATIS, which extends STATIS to give specific weights to each variable rather than to each whole table, (7) (K + 1)-STATIS (or external-STATIS), which extends STATIS (and PLS-methods and Tucker inter battery analysis) to the analysis of the relationships of several data sets and one external data set, and (8) double-STATIS (or DO-ACT), which generalizes (K + 1)-STATIS and analyzes two sets of data tables, and STATIS-4, which generalizes double-STATIS to more than two sets of data. Correspondence to: [email protected] 1 School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA 2 Rotman Research Institute, Baycrest, Toronto, Ontario, Canada 3 Universit´ e de Bourgogne, Dijon, France 4 Universit´ e de Rennes 2, Rennes, France 124 © 2012 Wiley Periodicals, Inc. Volume 4, March/April 2012

Transcript of STATIS and DISTATIS: optimum multitable principal component analysis...

Page 1: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review

STATIS and DISTATIS: optimummultitable principal componentanalysis and three way metricmultidimensional scalingHerve Abdi,1∗ Lynne J. Williams,2 Domininique Valentin3

and Mohammed Bennani-Dosse4

STATIS is an extension of principal component analysis (PCA) tailored tohandle multiple data tables that measure sets of variables collected on the sameobservations, or, alternatively, as in a variant called dual-STATIS, multiple datatables where the same variables are measured on different sets of observations.STATIS proceeds in two steps: First it analyzes the between data table similaritystructure and derives from this analysis an optimal set of weights that are usedto compute a linear combination of the data tables called the compromise that bestrepresents the information common to the different data tables; Second, the PCA ofthis compromise gives an optimal map of the observations. Each of the data tablesalso provides a map of the observations that is in the same space as the optimumcompromise map. In this article, we present STATIS, explain the criteria that itoptimizes, review the recent inferential extensions to STATIS and illustrate it witha detailed example.

We also review, and present in a common framework, the main developmentsof STATIS such as (1) X-STATIS or partial triadic analysis (PTA) which is usedwhen all data tables collect the same variables measured on the same observations(e.g., at different times or locations), (2) COVSTATIS, which handles multiplecovariance matrices collected on the same observations, (3) DISTATIS, whichhandles multiple distance matrices collected on the same observations andgeneralizes metric multidimensional scaling to three way distance matrices, (4)Canonical-STATIS (CANOSTATIS), which generalizes discriminant analysis andcombines it with DISTATIS to analyze multitable discriminant analysis problems,(5) power-STATIS, which uses alternative criteria to find STATIS optimal weights,(6) ANISOSTATIS, which extends STATIS to give specific weights to each variablerather than to each whole table, (7) (K + 1)-STATIS (or external-STATIS), whichextends STATIS (and PLS-methods and Tucker inter battery analysis) to theanalysis of the relationships of several data sets and one external data set, and (8)double-STATIS (or DO-ACT), which generalizes (K + 1)-STATIS and analyzes twosets of data tables, and STATIS-4, which generalizes double-STATIS to more thantwo sets of data.

∗Correspondence to: [email protected] of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA2Rotman Research Institute, Baycrest, Toronto, Ontario, Canada3Universite de Bourgogne, Dijon, France4Universite de Rennes 2, Rennes, France

124 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

Page 2: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

These recent developments are illustrated by small examples. © 2012 Wiley Periodicals,Inc.

How to cite this article:WIREs Comput Stat 2012, 4:124–167. doi: 10.1002/wics.198

Keywords: STATIS; STATIS-4; DISTATIS; COVSTATIS; power-STATIS; ani-sotropic STATIS; ANISOSTATIS; double-STATIS; dual-STATIS; canonical-STATIS; SUM-PCA; RV-PCA; multiple factor analysis; multiblock correspondenceanalysis; multidimensional scaling; INDSCAL; multiblock barycentric discriminantanalysis; co-inertia analysis; STATICO; COSTATIS; barycentric discriminantanalysis, generalized singular value decomposition; principal component analysis;consensus PCA; multitable PCA; multiblock PCA; X-STATIS; partial triadicanalysis; PTA

INTRODUCTION

STATIS is an acronym which stands for the Frenchexpression ‘Structuration des Tableaux a Trois

Indices de la Statistique’ (which can, approximately,be translated as ‘structuring three way statisticaltables’). This technique is also known under theacronym of ‘ACT’ which stands for the French‘Analyse Conjointe de Tableaux’ (joint analysis oftables). Behind these rather obscure acronyms isa generalization of principal component analysis(PCA) whose goal is to analyze several data sets ofvariables collected on the same set of observations,or—as in its dual version called dual-STATIS—severalsets of observations measured on the same set ofvariables. As such, STATIS is part of the multitable(also called multiblock or consensus analysis1–15)PCA family which comprises related techniquessuch as multiple factor analysis (MFA), multiblockdiscriminant correspondence analysis (MUDICA), andSUM-PCA.

STATIS originated from the work of Escou-ffier15,16 and was first described by L’Hermier desPlantes (Ref 17; see also Refs 18–20, for an earlyintroduction see Ref 21). A related approach is knownas Procrustes matching by congruence coefficients inthe English speaking community.22

The goals of STATIS are (1) to com-pare and analyze the relationships betweenthe different data sets, (2) to integrate these data setsinto an optimum weighted average called a compro-mise (sometimes also called a consensus) which is thenanalyzed via PCA to reveal the common structurebetween the observations, and finally (3) to projecteach of the original data sets onto the compromise toanalyze communalities and discrepancies.

STATIS is a popular method for analyzingmultiple sets of variables measured on the same

observations and it has been recently used invarious domains, such as sensory and consumerscience research,12,13,23–29 chemometry and processmonitoring,30 ecology,31–33, computer vision,34

hydrology,35,36 information science,37 neuroima-ging,38–42 medicine,3 statistical quality control,43,44

and molecular biology.10,45–48

In addition to being used in several domains ofapplications, STATIS is also a vigourous domain oftheoretical developments that are explored later in thisarticle.

When To Use ItSTATIS is used when several sets of variables havebeen measured on the same set of observations. Thenumber and/or nature of the variables used to describethe observations can vary from one set of variables tothe other, but the observations should be the same inall the data sets.

For example, the data sets can be measurementstaken on the same observations (individuals or objects)at different occasions. In this case, the first data setcorresponds to the data collected at time 1, the secondone to the data collected at time 2 and so on. The goalof the analysis, then is to evaluate how the positionsof the observations change over time.

In another example, the data sets can bemeasurements taken on the same observations bydifferent subjects or groups of subjects. In this case,the first data set corresponds to the first subject, thesecond one to the second subject and so on. Thegoal of the analysis, then, is to evaluate if there is anagreement between the subjects or groups of subjects.

The Main IdeaThe general idea behind STATIS is to analyzethe structure of the individual data sets (i.e., the

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 125

Page 3: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review wires.wiley.com/compstats

relation between the individual data sets) and toderive from this structure an optimal set of weightsfor computing the best common representationof the observations called the compromise, oralso sometimes the consensus. To compute thiscompromise, the elements of each table are multipliedby the optimal weight of this table and the compromiseis obtained by the addition of these ‘weighted’ Ktables (i.e., the compromise is a linear combinationof the tables). These weights are chosen so that thecompromise provides the best representation (in aleast square sense) of the whole set of tables. ThePCA of the compromise decomposes the varianceof the compromise into a set of new orthogonalvariables called principal components (also oftencalled dimensions, axes, factors, or even latentvariables) ordered by the amount of variance thateach component explains. The coordinates of theobservations on the components are called factorscores and these can be used to plot maps of theobservations in which the observations are representedas points such that the distances in the map bestreflect the similarities between the observations. Theposition of the observations ‘as seen by’ each data setcan be also represented as points in the compromise.As the components are obtained by combining theoriginal variables, each variable contributes a certainamount to each component. This quantity, called theloading of a variable on a component, reflects the

importance of that variable for this component andcan also be used to plot maps of the variables thatreflect their association. Finally, as a byproduct of thecomputation of the optimal weights, the data sets canbe also represented as points in a multidimensionalspace. A sketch of the technique is provided inFigure 1.

NOTATIONS AND PRELIMINARIES

Matrices are denoted by boldface uppercase letters(e.g., X), vectors by boldface lowercase letters (e.g., q),elements of vectors and matrices are denoted by italiclower case letters with appropriate indices if needed(e.g., xij is an element of X). Blocks of variables(i.e., tables) are considered as submatrices of largermatrices and are represented in brackets separated byvertical bars (e.g., a matrix X made of two submatricesX[1] and X[2] is written X = [

X[1]|X[2]]). The identity

matrix is denoted by I, a vector of ones is denotedby 1 (indices may be use to specify the dimensions ifthe context is ambiguous). The transpose of a matrixis denoted T. The inverse of a matrix is denoted −1.When applied to a square matrix, the diag operatortakes the diagonal elements of this matrix and storesthem into a column vector; when applied to a vector,the diag operator stores the elements of this vectoron the diagonal elements of a diagonal matrix. Thevec operator transforms a matrix into a vector by

1

i

I

……

1 J1 1 1Jk Jk… …j … … … …j j

X

C<Sk,Sk,>

K

K1

2PCA

of C

GPCA weighted by a

Com

prom

ise

Tables in the com

promise

… ………

……

J variables in K studies

I Obs

erva

tions Inner product

matrixInner product

map

a… …

X1 Xk XK

αα

FIGURE 1 | The different steps of STATIS.

126 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

Page 4: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

stacking the elements of a matrix into a column vector.The standard product between matrices is implicitlydenoted by simple juxtaposition or by × when it needsto be explicitly stated (e.g., XY = X × Y is the productof matrices X and Y). The Hadamard or element-wiseproduct is denoted by ◦ (e.g., X ◦ Y).

The raw data consist of K data sets collectedon the same observations. Each data set is also calleda table, a subtable, or sometimes also a block ora study (in this article, we prefer the term table oroccasionally block). The data for each table are storedin an I × J[k] rectangular data matrix denoted Y[k],where I is the number of observations and J[k] thenumber of variables collected on the observationsfor the kth table. The total number of variables isdenoted J (i.e., J = ∑

J[k]). Each data matrix is, ingeneral, preprocessed (e.g., centered, normalized) andthe preprocessed data matrices actually used in theanalysis are denoted X[k] (the preprocessing steps aredetailed below in Section ‘More on Preprocessing’).

The K data matrices X[k], each of dimensionsI rows by J[k] columns, are concatenated into thecomplete I by J data matrix denoted X:

X = [X[1]| · · · |X[k]| · · · |X[K]

]. (1)

A mass, denoted mi, is assigned to eachobservation. These masses are collected in the massvector, denoted m, and in the diagonal elements of themass matrix denoted M, which is obtained as

M = diag {m} . (2)

Masses are positive or null elements whose sumequals one. Often, equal masses are chosen withmi = 1

I .To each matrix X[k], we associate its cross-

product matrix defined as

S[k] = X[k]XT[k]. (3)

A cross-product matrix of a table expresses thepattern of relationships between the observations asseen by this table. Note that because of the blockstructure of X, the cross product of X can beexpressed as

XXT = [X[1]| · · · |X[k]| · · · |X[K]

]× [

X[1]| · · · |X[k]| · · · |X[K]]T

=∑

k

X[k]XT[k]

=∑

k

S[k]. (4)

Singular Value Decomposition andGeneralized Singular Value DecompositionSTATIS is part of the PCA family and therefore themain analytical tool for STATIS is the singular valuedecomposition (SVD) and the generalized singularvalue decomposition (GSVD) of a matrix (see fortutorials, e.g., Refs 49–54). We briefly describe thesetwo methods below.

Singular Value DecompositionRecall that the SVD of a given I × J matrix Xdecomposes it into three matrices as:

X = U�VT with UTU = VTV = I, (5)

where U is the I by L matrix of the normalized leftsingular vectors (with L being the rank of X), Vthe J by L matrix of the normalized right singularvectors, and � the L by L diagonal matrix of the Lsingular values; also γ�, u�, and v� are, respectively,the �th singular value, left, and right singular vectors.Matrices U and V are orthonormal matrices (i.e.,UTU = VTV = I). The SVD is closely related to andgeneralizes the well-known eigendecomposition as Uis also the matrix of the normalized eigenvectors ofXXT, V is the matrix of the normalized eigenvectorsof XTX, and the singular values are the square root ofthe eigenvalues of XXT and XTX (these two matriceshave the same eigenvalues).

Key property: the SVD provides the bestreconstitution (in a least squares sense) of the originalmatrix by a matrix with a lower rank.

Generalized Singular Value DecompositionThe generalized singular value decomposition (GSVD)generalizes the SVD of a matrix by incorporatingtwo additional positive definite matrices (recall that apositive definite matrix is a square symmetric matrixwhose eigenvalues are all positive) that represent‘constraints’ to be incorporated in the decomposition(formally, these matrices are constraints on theorthogonality of the singular vectors, see Refs 50,53for more details). Specifically let M denote an I by Ipositive definite matrix representing the ‘constraints’imposed on the rows of an I by J matrix X, andA a J by J positive definite matrix representing the‘constraints’ imposed on the columns of X. MatrixM is almost always a diagonal matrix of the ‘masses’of the observations (i.e., the rows); whereas matrix Aimplements a metric on the variables and is often butnot always diagonal. Obviously, when M = A = I,the GSVD reduces to the plain SVD. The GSVD of X,taken into account M and A, is expressed as (compare

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 127

Page 5: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review wires.wiley.com/compstats

with Eq (5)):

X = P�QT with PTMP = QTAQ = I, (6)

where P is the I by L matrix of the normalizedleft generalized singular vectors (with L being therank of X), Q the J by L matrix of the normalizedgeneralized right singular vectors, and � the L by Ldiagonal matrix of the L generalized singular values.The GSVD implements the whole class of generalizedPCA which includes (with a proper choice of matricesM and A and preprocessing of X) techniques suchas discriminant analysis, correspondence analysis,canonical variate analysis, etc. With the so called‘triplet notation’, that is used as a general frameworkto formalize multivariate techniques, the GSVD ofX under the constraints imposed by M and A isequivalent to the statistical analysis of the triplet(X, A, M).33,55–59

Key property: the GSVD provides the bestreconstitution (in a least squares sense) of the originalmatrix by a matrix with a lower rank under theconstraints imposed by two positive definite matrices.The generalized singular vectors are orthonormal withrespect to their respective matrix of constraints.

THE DIFFERENT STEPS OF STATIS

STATIS comprises two main steps: First the similaritystructure of the set of tables is analyzed. This stepis sometimes called the analysis of the inter-structureof the tables.21,60,61 This provides a set of optimalweights which are used in the second step. This stepperforms a generalized PCA (i.e., a GSVD) of X thatintegrates the weights as constraints on the tablesand their variables. This step is sometimes called theanalysis of the intra-structure of the tables.

Step One: Analyzing the Between-TableStructureThe first step of STATIS analyzes the similaritystructure of the set of the K tables. The similaritybetween two data matrices X[k] and X[k′] is denotedck,k′ . This similarity is evaluated by first transformingthe matrices into I by I cross-product matrices denotedS[k] and S[k′] which are computed as

S[k] = X[k]XT[k] and S[k′] = X[k′]X

T[k′]. (7)

The similarity measure ck,k′ (akin to a coefficientof correlation between square matrices16,62) is calledthe Hilbert–Schmidt inner product,63 or simply,here, the inner product (it is also called the scalar

product, the vector covariance—denoted COVV—orthe Frobenius product) between the matrices S[k] andS[k′]. The inner product between matrices S[k] and S[k′]is denoted

⟨S[k], S[k′]

⟩, and so the coefficient ck,k′ is

obtained as

ck,k′ = ⟨S[k], S[k′]

⟩= trace

{X[k]X

T[k] × X[k′]X

T[k′]

}= trace

{S[k] × S[k′]

}= vec

{S[k]

}T × vec{S[k′]

}=

I∑i

I∑j

si,j,ksi,j,k′ . (8)

Geometrically, the inner product can also beinterpreted as a scalar product between two positivesemidefinite matrices (recall that a square matrix ispositive semidefinite if its eigenvalues are all positiveor null) S[k] and S[k′] and is therefore proportionalto the cosine between the matrices (because the setof positive semidefinite matrices is a vector space).The inner product is also used to define the norm ofa matrix (e.g., S[k]) as the square root of the innerproduct of this matrix with itself. Formally, the normof a cross product matrix S[k] is denoted ‖S[k]‖ and isdefined as:

‖S[k]‖2 = ⟨S[k], S[k]

⟩. (9)

Because all matrices S[k] are positive semidefinite,the inner product between two of these matricesis always equal or larger than zero (see Appendixfor a proof). Note that when the matrices S[k]and S[k′] are normalized such that the sum ofsquares of their elements is one, the inner productbetween these two matrices is equal to the cosinebetween these matrices. This cosine is also knownas the RV coefficient introduced by Escoufier asa measure of similarity between squared symmetricmatrices (specifically positive semidefinite matrices,64

see also Ref 65) and as a theoretical tool toanalyze multivariate techniques (see Refs 66, 67 forreviews).

The inner products between all K matrices arecollected into a K by K inner product matrix denoted Cwhere ck,k′ gives the value of the inner product betweentables k and k′. Matrix C can also be computed fromthe K by I2 matrix Z defined as

Z = [vec

{S[1]

} | · · · |vec{S[k]

} | · · · |vec{S[K]

}]T.

(10)

128 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

Page 6: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

With this matrix Z, we compute C as:

C = ZZT. (11)

This shows that C is a positive semidefinitematrix and that, therefore, its eigenvalues are allpositive or null and its eigenvectors are real andorthogonal to each other. Consequently, C can beeigen-decomposed as

C = U�UT with UTU = I. (12)

So, the eigendecomposition of C provides thePCA of the similarity structure between the tables(stored in X) which can, now, be represented as pointsin a PCA map by using for coordinates their factorscores that are computed as:

G = U�12 . (13)

Note that the eigenvectors of C, could also havebeen obtained from the SVD of Z as

Z = U�VT. (14)

This approach, called RV-PCA, is sometimesused to analyze multiple correlation or covariancematrices. It provides the similarity structure of thetables and information on the values of what pairsof observations contribute to the pattern of similaritymeasured by

⟨S[k], S[k′]

⟩.

Optimum Weights from the First Eigenvectorof CIn addition to providing a visual representation of thesimilarity structure, the eigendecomposition of matrixC also provides optimal weights for combining thetables into a compromise.

The eigendecomposition of matrix C alwaysgives a first eigenvector whose elements have thesame sign (which is, for convenience, consideredpositive). This property is a consequence of thePerron–Frobenius theorem that states that positivesemi-definite matrices whose elements are all positivealways have a first eigenvector with all elements havingthe same sign (see Ref 68, p. 34ff ).

In fact, the value of a table for the firsteigenvector reflects its overall similarity to all theother tables (e.g., its ‘communality’). This suggeststo use the values of this first eigenvector to weightthe tables in order to give more importance to tablesthat well represent the group and less importanceto idiosyncratic tables. It can be shown (see Section‘What Does Statis Optimize?’ and Appendix), that this

procedure provides an optimal representation of theset of tables. So, if the first eigenvector of C is denotedu1, the set of optimal weights for the tables is stored ina K by 1 vector denoted α and computed by rescalingu1 such that the sum of the elements of α is equal toone:

α = u1 × (uT

11)−1

. (15)

For convenience, the α weights can be gatheredin a J by 1 vector denoted a where each variableis assigned the α weight of the matrix to which itbelongs. Specifically, a is constructed as:

a =[α11T

[1], . . . , αk1T[k], . . . , αK1T

[K]

], (16)

where 1[k] stands for a J[k] vector of ones.Alternatively, the weights can be stored as the diagonalelements of a diagonal matrix denoted A obtained as

A=diag {a}=diag{[

α11T[1], . . . , αk1T

[k], . . . , αK1T[K]

]}.

(17)

Step Two: Generalized PCA of XGSVD of XAfter the weights have been collected, they are usedto compute the GSVD of X under the constraintsprovided by M (masses for the observations) and A(optimum weights for the K tables). This GSVD isexpressed as:

X = P�QT with PTMP = QTAQ = I. (18)

This GSVD corresponds to a GPCA of matrixX and, consequently, will provide factor scores todescribe the observations and factor loadings todescribe the variables. Each column of P and Q refersto a principal component also called a dimension(because the numbers in these columns are often usedas coordinates to plot maps, see Ref 50 for moredetails). In PCA, Eq. 18 is often rewritten as

X = FQT with F = P�, (19)

where F stores the factor scores (describing theobservations) and Q stores the loadings (describingthe variables). Note, incidentally, that in the tripletnotation, STATIS is equivalent to the statisticalanalysis of the triplet (X, A, M).

Because matrix X comprises K tables, each ofthem comprising J[k] variables, the matrix Q of the leftsingular vectors can be partitioned in the same way asX. Specifically, Q can be expressed as a column block

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 129

Page 7: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review wires.wiley.com/compstats

matrix as:

Q =

⎡⎢⎢⎢⎢⎢⎢⎣

Q[1]...

Q[k]...

Q[K]

⎤⎥⎥⎥⎥⎥⎥⎦ =[QT

[1]| · · · |QT[k]| · · · |QT

[K]

]T, (20)

where Q[k] is a J[k] by L (with L being the rank of X)matrix storing the right singular vectors correspondingto the variables of matrix X[k]. With this in mind,Eq. 18 can re-expressed as:

X = [X[1]| · · · |X[k]| · · · |X[K]

] = P�QT

= P�

([QT

[1]| · · · |QT[k]| · · · |QT

[K]

]T)T

= P�[QT

[1]| · · · |QT[k]| · · · |QT

[K]

]=[P�QT

[1]| · · · |P�QT[k]| · · · |P�QT

[K]

]. (21)

Note, that, the pattern in Eq. 18 does notcompletely generalize to Eq. 21 because, if we defineA[k] as

A[k] = αkI, (22)

we have, in general, QT[k]A[k]Q[k] �= I.

Factor ScoresThe factor scores for X represent the best compromise(i.e., the best common representation) for the set of theK matrices. Recall that these factor scores, called thecompromise factor scores, are computed (cf., Eqs (18)and (19)) as

F = P�. (23)

Factor scores can be used to plot the observationsas done in standard PCA for which each column of Frepresents a dimension. Note that the variance of thefactor scores of the observations is computed usingtheir masses (stored in matrix M) and can be foundas the diagonal of the matrix FTMF. This varianceis equal, for each dimension, to the square of thesingular value of this dimension as shown by

FTMF = �PTMP� = �2. (24)

As in standard PCA, F can be obtained from Xby combining Eqs (18) and (23) to get:

F = P� = XAQ. (25)

Taking into account the block structure of X, A,and Q, Eq. 18 can also be rewritten as (cf., Eq. (22)):

F = XAQ = [X[1]| · · · |X[k]| · · · |X[K]

]× A ×

⎡⎢⎢⎢⎢⎢⎢⎣

Q[1]...

Q[k]...

Q[K]

⎤⎥⎥⎥⎥⎥⎥⎦=∑

k

X[k]A[k]Q[k] =∑

k

αkX[k]Q[k]. (26)

This equation suggests that the partial factorscores for a table can be defined as the projection ofthis table onto its right singular vectors (i.e., Q[k]).Specifically, the partial factor scores for the kth studyare stored in a matrix denoted F[k] computed as

F[k] = X[k]Q[k]. (27)

Note that the compromise factor scores matrixis the barycenter of the partial factor scores because itis the weighted average (where the weights are givenby the αk’s) of the partial factor scores (cf., Eq. (25)):∑

k

αkF[k] =∑

k

αkX[k]Q[k] = F. (28)

Also as in standard PCA, the elements of Q areloadings and can be plotted either on their own oralong with the factor scores as a biplot.69,70

As the loadings come in blocks (i.e., the loadingscorrespond to the variables of a table), it is practicalto create a biplot with the partial factor scores(i.e., F[k]) for a block and the loadings (i.e., Q[k])for this block. In doing so, it is often practical tonormalize the loadings such that their variance iscommensurable with the variance of the factor scores.This can be achieved, for example, by normalizing,for each dimension, the loadings of a block such thattheir variance is equal to the square of the singularvalue of the dimension or even to the singular valueitself (as illustrated in the example that we presentin a following section). These biplots are helpfulfor understanding the statistical structure of eachblock, even though the relative positions of the factorscores and the loadings are not directly interpretablebecause only the projections of observations on theloading vectors can be meaningfully interpreted in abiplot.69,70

An alternative pictorial representation of thevariables and the components plots the correlationsbetween the original variables of X and thefactor scores. These correlations are plotted as

130 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

Page 8: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

two-dimensional maps in which a circle of radius one(called the circle of correlation50) is also plotted. Thecloser to the circle a variable is, the better this variableis ‘explained’ by the components used to create theplot (see Ref 71 for an example).

Contribution of Observations, Variables,and Tables to a DimensionIn STATIS, just like in standard PCA, the importanceof a dimension (i.e., principal component) is reflectedby its eigenvalue which indicates how much of thetotal inertia (i.e., variance) of the data is explained bythis component.

In order to better understand the relationshipsbetween components and observations, variables,and tables and also to help interpret a component,we can evaluate how much an observation, avariable, or a whole table contribute to the inertiaextracted by a component. In order to do so, wecompute descriptive statistics, called contributions50

(p. 437ff ). The stability of these descriptive statisticscan also be assessed by cross-validation techniquessuch as the bootstrap, and this approach canthen be used to select the relevant elements for adimension.

Contribution of an Observation to a DimensionAs stated in Eq. (24), the variance of the factor scoresfor a given dimension is equal to its eigenvalue (i.e.,the square of the singular value) associated with thisdimension. If we denote λ�, the eigenvalue of a givendimension, we can rewrite Eq. (24) as

λ� =∑

i

mi × f 2i,� (29)

(where mi and fi,� are, respectively, the mass of the ithobservation and the factor score of the ith observationfor the �th dimension).

As all the terms mi × f 2i,� are positive or null,

we can evaluate the contribution of an observationto a dimension as the ratio of the squared weightedfactor score to the dimension eigenvalue. Formally,the contribution of observation i to component �,denoted ctri,�, is computed as

ctri,� = mi × f 2i,�

λ�

. (30)

Contributions take values between 0 and 1, andfor a given component, the sum of the contributionsof all observations is equal to 1. The larger acontribution, the more the observation contributes

to the component. A useful heuristic is to base theinterpretation of a component on the observationsthat have contributions larger than the averagecontribution. Observations with high contributionsand whose factor scores have different signs can thenbe contrasted to help interpreting the component.Alternatively (as described in a later section) wecan use bootstrap ratios to keep for considerationobservations with large bootstrap ratios for a givendimension.

Contributions of a Variable to a DimensionIn a manner similar to the observations, we can findthe important variables for a given dimension bycomputing variable contributions. The variance of theloadings for the variables is equal to one when the α

weights are taken into account. (cf., Eq. (18)). So ifwe denote by aj the alpha weight for the jth variable(cf., Eq. (16)), we have

1 =∑

j

aj × q2j,� (31)

(where qj,� is the loading of the jth variable for the�th dimension). As all terms aj × q2

j,l are positive ornull, we can evaluate the contribution of a variable toa dimension as its squared weighted loading for thisdimension. Formally, the contribution of variable j tocomponent �, denoted ctrj,�, is computed as

ctrj,� = aj × q2j,�. (32)

Variable contributions take values between 0and 1, and for a given component, the contributionsof all variables sum to 1. The larger a contributionof a variable to a component the more this variablecontributes to this component. Variables with highcontributions and whose loadings have different signscan then be contrasted to help interpreting thecomponent.

Contribution of a Table to a DimensionSpecific to multiblock analysis is the notion of a tablecontribution. As a table comprises several variables,the contribution of a table can simply be defined asthe sum of the contributions of its variables (a simpleconsequence of the Pythagorean theorem that statesthat squared lengths are additive). So the contributionof table k to component � is denoted ctrk,� and isdefined as

ctrk,� =J[k]∑

j

ctrj,�. (33)

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 131

Page 9: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review wires.wiley.com/compstats

Table contributions take values between 0 and 1,and for a given component, the contributions ofall tables sum to 1. The larger a contributionof a table to a component, the more this tablecontributes to this component. The contributions ofthe tables for a given dimension sum to one. Analternative approach re-scales the contributions sothat the sum of the contributions for a dimensionis now equal to the eigenvalue of this dimension.These re-scaled contributions are called partial inertiasand are denoted Ipartial. The partial inertias areobtained from the contributions by multiplying thecontributions for a dimension by the dimensioneigenvalue.

STATIS with Cross-Product MatricesThe traditional approach to STATIS focuses on thecross-product matrices S[k] and on the computationof the factor scores for the observations. In thiscontext, the compromise is computed as the optimallinear combination of the S[k] matrices with weightsprovided by the elements of α. Specifically, thecompromise cross-product matrix is denoted S[+] andis computed as

S[+] =K∑k

αkS[k]. (34)

Note that S[+] can also be directly computedfrom X as

S[+] = XAXT. (35)

The compromise matrix being a weighted sumof cross-product matrices, is also a cross-productmatrix (i.e., it is a positive semidefinite matrix)and therefore its eigendecomposition amounts to aPCA. The generalized eigendecomposition under theconstraints provided by matrix M of the compromisegives:

S[+] = P�PT with PTMP = I. (36)

Eqs (35) and (18), together indicate that thegeneralized eigenvectors of the compromise are theleft generalized singular vectors of X (cf., Eq. 18) andthat the eigenvalues of S[+] are the squares of thesingular values of X (i.e., � = �2). The loadings canbe computed by rewriting Eq. (18) as

Q = XTMP�−1. (37)

Similarly, the compromise factor scores can becomputed from S[+] (cf., Eq. (25)) as

F = S[+]MP�−1. (38)

In this context, the loadings for the variablesfrom table k are obtained from Eq. (37) as

Q[k] = XT[k]MP�−1. (39)

The factor scores for table k are obtained fromEqs (39) and (27) as

F[k] = X[k]Q[k] = X[k]XT[k]MP�−1 = S[k]MP�−1.

(40)

STATIS as Simple PCASTATIS can also be computed as the simple PCA ofthe set of the X[k] matrices, each weighted by thesquare root of its respective α weight (this assumes, asit is the case in general for STATIS, that matrix M isequal to 1

I I). Specifically, if we define the matrix

X = [√α1X[1]| · · · |√α[k]X[k]| · · · |

√αKX[K]

], (41)

whose (simple) SVD is given by

X = P�QT with P T P = QTQ = I. (42)

Then the factor scores for the observations canbe obtained as

F = P�. (43)

The loadings for table k are obtained as

Q[k] = 1√α

Q[k]. (44)

More on PreprocessingThe preprocessing step is a crucial part of the analysisand can be performed on the columns, on the rows,and on the whole matrix.

Column and Row PreprocessingMost of the time, each variable is centered (i.e., themean of each variable is zero) and normalized (i.e.,the sum of the squared elements of each column isequal to one, I, or even I − 1). In some cases, thenormalization affects the rows of the matrix andin this case the sum of each row can be equal toone (e.g., as in correspondence analysis,53 see alsoRef 72 for an explicit integration of correspondenceanalysis and STATIS) or the sum of squares of theelements of a given row can be equal to one (e.g., asin Hellinger/Bhattacharyya analysis73–77).

132 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

Page 10: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

Whole Table PreprocessingA STATIS analysis of a set of tables with very differentscales or factorial structures will, in general, favor thetables with the largest variance. A large variance canbe due to a large number of variables (compared tothe other tables) or to a strong factorial structure.So, in the same way as normalizing variables (e.g.,using Z-scores) allows to compare variables measuredon different scales, normalizing data tables allows tocompare data tables that have different structures ordifferent scales.

There are different ways to normalize datatables. The simplest one is to divide all entries ofthe data table by its number of columns or, better, bythe square root of the number of columns (as donein the so-called ‘Tucker-1’ model and in ‘consensusPCA’78–80). Another straightforward transformationdivides all entries of the data table by the squareroot of the total sum of squares of its elements. Thissets the total variance of each table to one and willguarantee that all tables participate equally in theanalysis. This normalization is used, for example,in the SUM-PCA technique which originated in thechemometrics tradition.14,15 In this article we willfavor this normalization because of its simplicity.A closely related procedure is to normalize X[k] bythe square root of the norm of the matrix Y[k]YT

[k].Specifically, here X[k] is obtained as

X[k] = Y[k] × ‖Y[k]YT[k]‖− 1

2 . (45)

This normalization ensures that the cross-product matrices S[k] all have norm equal to oneand therefore that the inner product between cross-product matrices is equal to the RV-coefficient. Amore sophisticated alternative to the normalizationproblem originated in the multiple factor analysisframework.81,82 Here, each table is normalized bydividing each of its entries by the first singular valueof this table. This normalization guarantees that alldata tables have a first component of unit varianceand therefore that all tables can contribute equally tothe first dimension of the common structure.

SUPPLEMENTARY ELEMENTS(ALSO KNOWN AS OUT OF SAMPLE)

As in standard PCA, we can use the results ofthe analysis to compute approximate statistics (e.g.,factor scores, loadings, or optimum weights) fornew elements (i.e., elements that have not beenused in the analysis). These new elements arecalled supplementary elements50,83 or ‘out of sample’

elements.84 By contrast with the supplementaryelements, the elements actually used in the analysisare called active elements. The statistics for thesupplementary elements are obtained by projectingthese elements onto the active space. In the STATISframework, we can have supplementary rows andcolumns (like in PCA) but also supplementary tables.Supplementary rows for which we have values for allJ variables and supplementary variables for whichwe have measurements for all I observations areprojected in the same way as for PCA (see, e.g., Ref 50,p. 436ff ). Computing statistics for supplementarytables, however, is specific to STATIS.

Supplementary Rows and ColumnsBecause STATIS is a generalized PCA, we can addsupplementary rows and columns as in standardPCA (see Ref 50 for details). Note incidentally, thatthis procedure assumes that the supplementary rowsand columns are scaled in a manner comparableto the rows and columns of the original matrix X.Specifically, from Eqs (27) and (18), we can computethe factor scores, denoted fsup for a supplementaryobservation (i.e., a supplementary row of dimensions1 by J recording measurements on the same variablesas the whole matrix X). This supplementary row isrepresented by a 1 by J vector denoted rT

sup (whichhas been preprocessed in the same way as X), thesupplementary factor scores are computed as

fsup = rTsupAQ. (46)

Loadings are denoted qsup for a new columnwhich is itself denoted by an I by 1 vector osup(note that osup needs to have been preprocessed ina way comparable with the tables). These loadings areobtained, in a way similar to Eq. (46) as

qsup = osupMP�−1. (47)

Supplementary Partial ObservationIn some cases, we have supplementary observationsfor only one (or some) table(s). In this case, called apartial observation, we can obtain the partial factorscores for this observation from Eq. (27). Specifically,let xT

sup[k] denotes a 1 by J[k] vector of measurementscollected on the J[k] variables of table k (xT

sup[k]should have been pre-processed in the same wayas the whole matrix X[k]). The partial factor scoresfor this supplementary observation from table k areobtained as:

fsup[k] = xTsup[k] × Q[k]. (48)

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 133

Page 11: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review wires.wiley.com/compstats

Note that the factor scores for a supplementaryobservation collected on all tables can be obtained asthe weighted (with the α weights) average of the partialfactors scores (see Eqs (28) and (46)). Specifically, wecan show that

K∑k

αkfsup[k] =K∑k

xTsup[k] (αk × I) Q[k]

=K∑k

xTsup[k]A[k]Q[k] = xT

supAQ = fsup. (49)

In order to compute the loadings for asupplementary variable for a specific table, it sufficesto preprocess this variable like the variables of thistable (and this includes the whole table normalizationif it was used) and then to use Eq. (47).

Supplementary tablesBecause STATIS involves tables, it is of interest to beable to project a whole table as a supplementaryelement. This table will include new variablesmeasured on the same observations described by theactive tables. Such a table is represented by the I by Jsupmatrix Ysup. The matrix Ysup is preprocessed in thesame manner (e.g., centered, normalized) as the Y[k]matrices to give the supplementary matrix Xsup. Thismatrix will provide supplementary factor scores andloadings for the compromise solution, factor scoresfor the inner-product map, and one supplementary α

value as described below.

Factor ScoresIn order to obtain the factor scores for a new table, thefirst step is to obtain (from Eq. (47)) the supplementaryloadings which are computed as

Qsup = XTsupMP�−1. (50)

Then, using Eq. (27), we obtain the supplemen-tary factor scores for the new table Xsup as

Fsup = XsupQsup = XsupXTsupMP�−1 = SsupMP�−1.

(51)

Inner Product Map ProjectionWhen evaluating a supplementary table, it isimportant to evaluate its similarity to the set of theactive tables. This is done by computing the innerproduct between this supplementary table and all theother tables and then projecting the vector made ofthese inner products onto the eigenvectors of the innerproduct matrix.

The first step is to compute the cross-productmatrix associated to Xsup as

Ssup = XsupXTsup. (52)

Then the K inner products between Ssup and eachof the K cross-product matrices S[k] are computed (seeEq. 8) and stored into the K by 1 vector denotedcsup:

csup = [⟨Ssup, S[1]

⟩, . . . ,

⟨Ssup, S[k]

⟩, . . . ,

⟨Ssup, S[K]

⟩]T.

(53)

The factor scores of the supplementary table inthe inner product map are stored in the 1 by L vectordenoted gT

sup computed as (see Eq. (13)):

gTsup = cT

supU�− 12 . (54)

Alternatively, if we denote by zsup the I2 by 1vector version of Ssup (i.e., zsup = vec

{Ssup

}), we can,

using Eq. (14), compute gsup as

gTsup = zT

supV = vec{Ssup

}T V = vec{XsupXT

sup

}TV.

(55)

STATIS Weight: αsupThe value of the STATIS weight for a supplementarytable is denoted αsup and is obtained by combiningEqs (15) and (55) to give:

αsup = γ− 1

21 gsup,1 × (u11)−1 . (56)

where gsup,1 is the value of the supplementary factorscore for the first dimension of the inner productmatrix and γ1 is the first eigenvalue of the innerproduct matrix.

INFERENTIAL ASPECTS

STATIS is a descriptive multivariate technique, butit is often important to be able to complement thedescriptive conclusions of an analysis by assessing ifits results are reliable and replicable. For example, astandard question, in the PCA framework, is to findthe number of reliable components and most of theapproaches used in PCA will work also for STATIS.For example, we can use of the informal ‘scree’ test(also known as ‘elbow’) and the more formal testsRESS, PRESS, and Q2 statistics (see, for details, e.g.,Ref 50, p. 440ff ).

In addition to standard PCA methodology, twoproblems are specific to STATIS. The first one is to

134 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

Page 12: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

decide if the set of tables is sufficiently homogenousto be considered a random sample from a singlepopulation. If so, integrating these tables into acompromise is relevant. If not then maybe the setof tables needs to be partitioned into homogenoussets of tables and a new analysis can be run oneach of these homogenous sets. This approach hasbeen developed analytically recently85–88 and can,under standard (but somewhat stringent) assumptions(e.g., normality, random sampling of the tables,etc.), provide point estimates, statistical tests, andconfidence intervals. This recent important line ofwork is likely to be expanded to more general cases inthe near future.

A promising nonparametric alternative tothis analytical approach is to use cross-validationresampling techniques such as permutation tests,89

the jackknife,90 or the bootstrap.91–93 For example,a null hypothesis test for the significance of the firstor second eigenvalue of the inner product matrixcould be obtained by permuting the elements of thecolumns of the matrices X[k] (the permutation needsto be done per column in order to keep the columnnormalization) and to compute the related matricesS[k] which are then used to compute a ‘permutedmatrix’ Z (see Eq. (10)). This permuted Z is, in turn,used to create a ‘permuted’ version of matrix C whoseeigenvalues will correspond to eigenvalues obtainedwhen the data are random (permuting the entries ofC will not lead to the correct test because C willnot, in general, be positive definite under permutationof its elements). Repeating this procedure a largenumber of times will provide a sampling distributionof the eigenvalues of C under the null hypothesisand an observed rare value for the analysis underconsideration can then be judged significant (seeRef 42 for an example).

Another interesting problem is to evaluate ifSTATIS improves over a simple multitable analysiswhich has all weights equal to 1

I . This question canbe addressed with techniques such as the jackknife orthe bootstrap as illustrated below.

Jackknife for the Optimum WeightsRecall that the optimal weights (i.e., the α’s storedin vector α) are used to compute the compromise asa linear combination of the original tables. Theseweights are computed to optimize the similaritybetween the tables and the compromise. In somecases we can consider that these tables as fixed (i.e.,they constitute the population of interest) and thatthe observations are random (specifically, independentand identically distributed random variables, or ‘i.i.d.’

in statistical jargon). In such cases, replications wouldkeep the same tables but will sample new (i.e.,different) observations. An obvious question, then,is: How different would the new α values be? Ifwe assume that the observations are i.i.d., we canuse estimation techniques such as the jackknife90

or the bootstrap.91 As an illustration, we presenthow the jackknife could be used to estimate thevalues of α when the tables are considered as afixed factor and the observations are a randomfactor.

For the jackknife, we leave out each observationin turn and compute an estimate of α without thisobservation. We then integrate all these estimates toderive an unbiased estimate of α and of its variability(we follow here the notations and approach detailedin Ref 90).

Specifically, if we consider the ith observation,we denote α−i. the vector of optimum weightscomputed without this observation. We then computethe pseudo-value of α, denoted α∗

i as:

α∗i = Iα − (I − 1)α−i. (57)

The jackknife estimator of α is denoted α∗ andis computed as the mean of the pseudovalues:

α∗ = 1I

I∑i

α∗i = α − I − 1

I

I∑i

α−i (58)

(note that some values of α∗i could be negative,

in such unlikely cases, negative values are set tozero).

The vector variance of the α∗i is computed as

σ 2α∗

i= 1

I − 1

I∑i

(α∗

i − α∗)2 . (59)

The standard error of the mean is obtained as

σ 2α∗ =

√σ 2

α∗i

I. (60)

When the observations are i.i.d., this standarderror can be used to compute confidence intervalsas the sampling distribution of the α’s (i.e., theelements of α∗) converges to a normal distribution.The standard error can also be used to computepseudo-t values called jackknife ratios in order to testnull-hypothesis. In the case of STATIS, one interestingnull hypothesis for an αk would be to test if its valuediffers from the chance value of 1

I .

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 135

Page 13: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review wires.wiley.com/compstats

Bootstrap for the Factor ScoresIf we assume that the tables constitute a randomfactor (i.e., the tables are i.i.d, and sampled from apotentially infinite set of tables) and if we considerthe observations as a fixed factor, we may want toestimate the stability of the compromise factor scores.

Such an evaluation could be done, for example,by using a jackknife or a bootstrap approach.

We briefly sketch here a possible bootstrapapproach for the factor scores (see Ref 93 for theproblem of bootstrapping in the context of the SVD,and Ref 94 for a review of the bootstrap in the PCAcontext). The main idea is to use the properties ofEq. (28) which indicate that the compromise factorscores are the weighted (by the α’s) average ofthe factor scores of the tables. Therefore we canobtain bootstrap confidence intervals by repeatedlysampling with replacement from the set of tablesand compute for each of these samples an estimateof the α weights and the compromise factor scores(this approach corresponds to the partial bootstrap ofRef 94, see Ref 42 for an alternative approach usingsplit-half resampling). From these estimates we canalso compute bootstrap ratios for each dimension bydividing the mean of the bootstrap estimates by theirstandard deviation. These bootstrap ratios are akinto t statistics and can be used to detect observationsthat reliably contribute to a given dimension. So,for example, for a given dimension and a givenobservation a value of the bootstrap ratio larger than2 will be considered reliable (by analogy with a t largerthan 2 which would be ‘significant’ at p < .05). Whenevaluating bootstrap ratios, the multiple comparisonsproblem needs to be taken into account by using, forexample, a Bonferroni-type correction95 and insteadof using critical values corresponding to say p < 0.05we would use values corresponding to p < 0.05

I .More formally, in order to compute a bootstrap

estimate, we need to generate a bootstrap sample.To do so, we first take a sample of integers withreplacement from the set of integers from 1 to K.Recall that, when sampling with replacement, anyelement from the set can be sampled zero, one, ormore than one times. We call this set B (for bootstrap).For example, with 5 elements, a possible bootstrap setcould be B = {1, 5, 1, 3, 3}. We then generate a newdata set (i.e., a new X matrix comprising K tables)using matrices X[k] with these indices. So with K = 5,this would give the following bootstrap set{

X[1], X[5], X[1], X[3], X[3]}. (61)

We will then use this set of matrices to compute amatrix of inner products (denoted C∗

1, see Eq. (8)) from

which we would derive a set of α weights (denoted α∗1,

see Eq. (15)). From these weights we will then computea compromise cross product matrix (denoted S∗

[+]1, seeEq. (34)). Finally we will compute the bootstrappedfactor scores (denoted F∗

1) from this compromiseby projecting the bootstrapped compromise on thecompromise as a supplementary element (see Eqs(40) and (51)). Interestingly, as a consequence ofthe barycentric properties of the factor scores (seeEq. (28)), this last step can also directly obtained bycomputing F∗

1 as the weighted (by the α∗’s) averageof the corresponding partial factor scores. We thenrepeat the procedure for a large number of times (e.g.,L = 1000) and generate L bootstrapped matrices offactor scores F∗

� . From these bootstrapped matrices offactor scores, we can derive confidence intervals andestimate the mean factor scores as the mean of thebootstrapped factor scores. Formally, F

∗denotes the

bootstrap estimated factor scores, and is computed as

F∗ = 1

L

L∑�

F∗�. (62)

In a similar way, the bootstrapped estimate of thevariance is obtained as the variance of the F∗

� matrices.Formally, σ 2

F∗ denoted the bootstrapped estimate ofthe variance and is computed as

σ 2F∗ = 1

L

(L∑�

(F∗

� − F∗) ◦

(F∗

� − F∗))

(63)

(where ◦ denotes the Hadamard product). Bootstrapratios are computed by dividing the bootstrappedmean by the bootstrapped standard deviation(denoted σF∗ and is the square root of the bootstrappedestimate of the variance).

Bootstrapped Confidence IntervalsThe bootstrap factor scores (i.e., the F∗

� ’s) can alsobe used to compute confidence intervals (CI) for theobservations. For a given dimension, the bootstrappedfactor scores of an observation can be ordered fromthe smallest to the largest and a confidence interval fora given p value can be obtained by trimming the upperand lower 1

p proportion of the distribution. In general,for a given dimension, the bootstrap ratios, and theCIs will agree to detect the relevant observations. Inaddition, CIs can also be plotted directly on the factorscores map as confidence ellipsoids or confidenceconvex-hulls which comprise a 1 − p proportionof the bootstrapped factor scores (see Ref 42 andour example below for an illustration). When theellipsoids or convex-hulls of two observations do not

136 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

Page 14: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

overlap, these two observations can be considered asreliably different. Like for the bootstrap ratios, inorder to correct for the potential problem of multiplecomparisons, a Bonferroni type of correction can alsobe implemented when plotting hulls or ellipsoids (seeRef 42 for details).

WHAT DOES STATIS OPTIMIZE?

Intuitively, the goal of the STATIS method is to finda set of weights (i.e., the αk’s) such that the factorscores of matrix X are as close as possible to the setof the partial factor scores of the K tables comprisingX. This intuition can be formalized by two main typesof criteria (which are equivalent as they correspond tomaximizing the same quantities). We express formallythese criteria in this section and give the proofs inAppendix.

Largest Variance of the CompromiseThe first criterion can be expressed in severalequivalent ways. The simplest one indicates thatthe variance of the compromise is maximal (andtherefore that the factor scores of X have maximalvariance as they are obtained from its GSVD).Formally, this amounts to maximizing the followingcriterion:

V = ||S[+]||2 = ⟨S[+], S[+]

⟩with αTα = 1. (64)

This criterion can be re-expressed as aminimization problem, for which STATIS seeks theminimum distance between the individual cross-product matrices and the compromise. Namely,STATIS minimizes:

D =K∑k

‖S[k] − αkS[+]‖2 with αTα = 1. (65)

Compromise Most Similar to TablesAn alternative criterion maximized by STATIS isthe overall similarity between the tables and thecompromise. Specifically, STATIS maximizes21:

S =K∑k

⟨S[k], S[+]

⟩2 with αTα = 1. (66)

Note that, for the STATIS criterion, thesimilarity between two cross-product matrices isevaluated by the squared inner product of these twomatrices.

DUAL-STATISIn dual-STATIS, the data consist of K sets ofobservations measured on the same set of variables.Here, instead of computing K cross-product matricesbetween the observations, we compute K covariancematrices between the variables (one per set ofobservations). The dual-STATIS approach thenfollows the same steps as standard STATIS and willprovide a compromise map for the variables (insteadof the observations in STATIS), and partial loadingsfor each table.

AN EXAMPLEA typical example of using STATIS is the descriptionof a set of products by a group of expert assessors.This type of data could be analyzed using a standardPCA, but this approach obviously neglects the inter-assessor differences. STATIS has the advantages ofproviding a compromise space for the products aswell as evaluating differences between the assessors.We illustrate STATIS with a fictitious example fromwine tasting.

For our example, we selected twelve wines madefrom Sauvignon Blanc grapes coming from threewine regions (four wines from each region): NewZealand, France, and Canada. We then asked 10expert assessors to evaluate these wines. The assessorswere asked (1) to evaluate the wines on 9-point ratingscales, using four variables considered as standardfor the evaluation of these wines (cat-pee, passion-fruit, green pepper, and mineral) and, (2) if they feltthe need, to add some variables of their own (someassessors choose none, some choose one or two morevariables). The raw data are presented in Table 1. Thegoals of the analysis were twofold: (1) to obtain atypology of the wines, and (2) to discover agreement(if any) between the assessors.

For the example, the data consist of K = 10tables (one for each assessor) shown in Table 1. Forexample, the first table denoted Y[1] is equal to

Y[1] =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

8 6 7 4 1 67 5 8 1 2 86 5 6 5 3 49 6 8 4 3 52 2 2 8 7 33 4 4 9 6 15 3 5 4 8 35 2 4 8 7 48 6 8 4 4 74 6 2 5 3 48 4 8 1 3 35 3 6 4 4 2

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

. (67)

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 137

Page 15: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review wires.wiley.com/compstats

TAB

LE1

Raw

Data

(Tab

les

Y [1]

thro

ugh

Y [10

])

Asse

ssor

1As

sess

or2

Asse

ssor

3As

sess

or4

Asse

ssor

5

V1V2

V3V4

V5V6

V1V2

V3V4

V7V8

V1V2

V3V4

V9V1

0V1

V2V3

V4V8

V1V2

V3V4

V11

V12

NZ 1

86

74

16

86

83

75

86

83

72

95

82

69

69

38

2

NZ 2

75

81

28

65

63

77

87

72

82

87

73

57

77

19

2

NZ 3

65

65

34

66

65

87

87

76

91

88

92

77

77

17

2

NZ 4

96

84

35

86

84

66

82

83

93

88

94

78

97

56

1

FR1

22

28

73

23

17

43

34

36

46

42

24

34

44

24

4

FR2

34

49

61

43

49

35

43

48

39

32

26

24

55

61

5

FR3

53

54

83

33

27

44

54

52

36

44

46

46

57

23

1

FR4

52

48

74

43

55

33

63

77

17

52

29

46

65

84

5

CA1

86

84

47

86

95

56

85

91

52

75

63

28

68

25

4

CA2

46

25

34

55

56

58

55

46

51

56

64

46

66

46

3

CA3

84

81

33

84

83

77

83

73

54

73

61

67

48

45

1

CA4

53

64

42

53

74

85

54

45

43

52

26

65

55

56

1

Asse

ssor

6As

sess

or7

Asse

ssor

8As

sess

or9

Asse

ssor

10

V1V2

V3V4

V13

V1V2

V3V4

V1V2

V3V4

V14

V5V1

V2V3

V4V1

5V1

V2V3

V4

NZ 1

85

62

98

58

47

67

49

28

69

17

86

75

NZ 2

66

62

47

68

46

56

27

28

79

16

75

73

NZ 3

77

72

76

76

36

66

49

27

78

47

76

62

NZ 4

87

82

87

86

18

78

28

28

99

39

87

74

FR1

32

27

24

23

63

34

44

43

44

54

23

17

FR2

33

33

44

44

44

44

73

65

55

72

33

39

FR3

42

33

34

34

45

35

33

55

55

63

42

58

FR4

53

59

35

35

76

46

32

45

56

53

34

28

CA1

77

71

48

49

48

65

45

48

78

47

86

74

CA24

62

46

47

52

57

54

61

56

45

65

64

4

CA3

74

82

38

57

37

48

26

28

47

45

74

85

CA4

45

33

74

35

25

46

24

35

45

34

54

66

V1,

Cat

Pee;

V2,

Pass

ion

Frui

t;V

3,G

reen

Pepp

er;V

4,M

iner

al;V

5,Sm

oky;

V6,

Cit

rus;

V7,

Tro

pica

l;V

8,L

eafy

;V9,

Gra

ssy;

V10

,Flin

ty;V

11,V

eget

al;V

12,H

ay;V

13,M

elon

;V14

,Gra

ss;V

15,P

each

.

138 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

herve
Line
herve
Line
herve
Highlight
herve
Highlight
herve
Highlight
herve
Highlight
herve
Highlight
herve
Highlight
herve
Highlight
herve
Highlight
herve
Highlight
herve
Highlight
herve
Highlight
herve
Highlight
herve
Highlight
herve
Highlight
herve
Highlight
herve
Highlight
herve
Highlight
herve
Highlight
herve
Highlight
herve
Highlight
herve
Highlight
herve
Highlight
herve
Highlight
herve
Sticky Note
This whole line needs to be translated by one column. So the value for V1 is 4, for V2 is 6 …. till V12 which is 4.
herve
Highlight
Page 16: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

Each table was then preprocessed by firstcentering and normalizing each column and then bynormalizing the whole table such that the sum of thesquare values of all elements of this table was equal toone. For example, X[1], the preprocessed matrix forAssessor 1, is equal to:

X[1] =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0.12 0.13 0.07 −0.04 −0.18 0.110.07 0.05 0.13 −0.18 −0.12 0.230.01 0.05 0.02 0.01 −0.07 −0.010.18 0.13 0.13 −0.04 −0.07 0.05

−0.21 −0.18 −0.20 0.16 0.15 −0.07−0.16 −0.03 −0.09 0.21 0.10 −0.19−0.05 −0.11 −0.04 −0.04 0.21 −0.07−0.05 −0.18 −0.09 0.16 0.15 −0.01

0.12 0.13 0.13 −0.04 −0.01 0.17−0.10 0.13 −0.20 0.01 −0.07 −0.01

0.12 −0.03 0.13 −0.18 −0.07 −0.07−0.05 −0.11 0.02 −0.04 −0.01 −0.13

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

.

(68)

Analyzing the Between-Table StructureTo each X[k] matrix, we associated its I × I cross-product matrix, denoted S[k]. For example, the S[k]matrix for Assessor 1 is obtained as (cf., Eq. (3)):

S[1] = X[1]XT[1]

=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0.08 0.08 0.02 0.07 −0.11 −0.08 −0.07 −0.07 0.06 0.00 0.03 −0.030.08 0.13 0.01 0.06 −0.11 −0.12 −0.05 −0.07 0.08 −0.02 0.05 −0.030.02 0.01 0.01 0.01 −0.02 −0.01 −0.02 −0.02 0.01 0.01 0.01 −0.000.07 0.06 0.01 0.07 −0.11 −0.07 −0.04 −0.06 0.07 −0.02 0.04 −0.02

−0.11 −0.11 −0.02 −0.11 0.17 0.12 0.07 0.11 −0.10 0.03 −0.08 0.03−0.08 −0.12 −0.01 −0.07 0.12 0.12 0.04 0.07 −0.08 0.03 −0.06 0.02−0.07 −0.05 −0.02 −0.04 0.07 0.04 0.06 0.05 −0.04 −0.02 −0.01 0.02−0.07 −0.07 −0.02 −0.06 0.11 0.07 0.05 0.09 −0.05 −0.01 −0.05 0.01

0.06 0.08 0.01 0.07 −0.10 −0.08 −0.04 −0.05 0.08 −0.02 0.02 −0.040.00 −0.02 0.01 −0.02 0.03 0.03 −0.02 −0.01 −0.02 0.07 −0.04 −0.010.03 0.05 0.01 0.04 −0.08 −0.06 −0.01 −0.05 0.02 −0.04 0.07 0.02

−0.03 −0.03 −0.00 −0.02 0.03 0.02 0.02 0.01 −0.04 −0.01 0.02 0.03

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

. (69)

Next, the inner-products between all pairs ofS[k] tables (i.e., the coefficients ck,k′) are computed andstored in matrix C. Using Eqs (8), (10), and (11), weget the following C matrix:

C =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0.51 0.44 0.40 0.38 0.34 0.40 0.41 0.35 0.48 0.520.44 0.52 0.36 0.41 0.30 0.42 0.40 0.39 0.43 0.540.40 0.36 0.42 0.39 0.33 0.35 0.33 0.31 0.42 0.460.38 0.41 0.39 0.56 0.37 0.40 0.37 0.39 0.46 0.510.34 0.30 0.33 0.37 0.35 0.30 0.27 0.29 0.37 0.390.40 0.42 0.35 0.40 0.30 0.48 0.39 0.34 0.45 0.510.41 0.40 0.33 0.37 0.27 0.39 0.46 0.32 0.41 0.460.35 0.39 0.31 0.39 0.29 0.34 0.32 0.43 0.38 0.440.48 0.43 0.42 0.46 0.37 0.45 0.41 0.38 0.60 0.540.52 0.54 0.46 0.51 0.39 0.51 0.46 0.44 0.54 0.68

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦. (70)

The eigendecomposition of C is then obtainedas (cf., Eq. (12)):

C = U�UT =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0.325 −0.2120.326 −0.3590.289 0.2120.326 0.5610.253 0.4430.311 −0.2370.294 −0.3910.279 0.1240.349 0.1370.389 −0.165

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

× diag{

4.1350.224

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0.325 −0.2120.326 −0.3590.289 0.2120.326 0.5610.253 0.4430.311 −0.2370.294 −0.3910.279 0.1240.349 0.1370.389 −0.165

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

T

(71)

for the first two components.

From the eigendecomposition of C, we obtainthe factor scores (for the first two components)of the tables for the inner product matrix as

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 139

Page 17: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review wires.wiley.com/compstats

(cf., Eq. 13):

G = U�12 =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0.662 −0.1000.662 −0.1700.588 0.1000.662 0.2650.515 0.2090.633 −0.1120.598 −0.1850.567 0.0590.710 0.0650.791 −0.078

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦.

The projections of the assessors onto the firstand second components are shown in Figure 2 (notethat the elements of the first column of U, denoted u1,are all positive). From this visual display, we can seethat the 10 tables of the assessors should all receivesimilar weights in the compromise.

Permutation Test: Significance of the First TwoEigenvaluesWhen using STATIS two questions are important:(1) Is the first dimension of the C matrix significant?In practice, it seems to always be the case, howeverit is useful to confirm that it indeed is so for aspecific analysis; but also (2) is the second dimensionlarger than chance? This last question is particularlyrelevant for the STATIS model, because a positiveanswer suggests that there are clusters of tables andthat separate STATIS analyses (i.e., one per cluster)would be more appropriate85–88 than an overallanalysis.

Because the data at hand do not seem tofollow the assumptions of the analytical approach, wedecided to evaluate the question of the significance ofthe eigenvalues with a permutation test. To do so, we

2

3

98

4

10

6

7

5

1

2

1

FIGURE 2 | Inner product map. Projection of the assessors onto thefirst and second components. Note that the first dimension is positivebecause all the inner products are positive.

started by permuting the entries within each columnof the matrices X[k]. For example, the first permutedmatrix denoted X∗

[1] would be:

X∗[1] =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

−0.16 −0.11 0.13 0.16 −0.07 0.17−0.05 0.05 −0.09 0.21 0.15 −0.19−0.05 0.13 −0.20 −0.04 −0.01 −0.01

0.12 −0.11 −0.20 0.01 −0.07 0.230.07 0.05 0.07 −0.04 0.15 −0.070.12 0.13 0.02 −0.04 −0.07 −0.07

−0.05 0.13 −0.04 0.01 0.21 −0.13−0.10 −0.18 0.13 −0.04 −0.12 −0.07

0.12 0.13 0.02 −0.18 −0.07 −0.010.18 −0.03 0.13 −0.04 −0.01 0.050.01 −0.18 0.13 0.16 −0.18 −0.01

−0.21 −0.03 −0.09 −0.18 0.10 0.11

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦.

(72)

We then computed all the correspondingpermuted cross product matrices (denoted S∗

[k]). Forexample S∗

[1] is equal to (see Eq. (69)):

S∗[1] = X∗

[1]X∗[1]

T =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0.11 −0.02 −0.04 0.01 −0.04 −0.04 −0.05 0.04 −0.06 −0.01 0.07 0.01−0.02 0.12 0.02 −0.05 0.02 −0.00 0.07 −0.03 −0.05 −0.04 −0.01 −0.03−0.04 0.02 0.06 0.02 −0.01 0.01 0.02 −0.04 0.02 −0.04 −0.05 0.03

0.01 −0.05 0.02 0.12 −0.04 −0.01 −0.06 −0.03 −0.00 0.01 0.01 0.01−0.04 0.02 −0.01 −0.04 0.04 0.01 0.04 −0.02 0.01 0.01 −0.03 −0.01−0.04 −0.00 0.01 −0.01 0.01 0.04 0.01 −0.02 0.04 0.02 −0.01 −0.04−0.05 0.07 0.02 −0.06 0.04 0.01 0.08 −0.04 −0.00 −0.03 −0.06 0.01

0.04 −0.03 −0.04 −0.03 −0.02 −0.02 −0.04 0.08 −0.02 0.00 0.07 0.00−0.06 −0.05 0.02 −0.00 0.01 0.04 −0.00 −0.02 0.07 0.03 −0.04 −0.01−0.01 −0.04 −0.04 0.01 0.01 0.02 −0.03 0.00 0.03 0.05 0.02 −0.04

0.07 −0.01 −0.05 0.01 −0.03 −0.01 −0.06 0.07 −0.04 0.02 0.11 −0.060.01 −0.03 0.03 0.01 −0.01 −0.04 0.01 0.00 −0.01 −0.04 −0.06 0.11

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

.

140 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

Page 18: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

With these permuted cross-product matrices, wecomputed a permuted inner product matrix denotedC∗ equal to (cf., Eq. (70)):

C∗ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0.24 0.09 0.08 0.10 0.07 0.12 0.11 0.10 0.09 0.050.09 0.22 0.08 0.11 0.12 0.11 0.10 0.12 0.06 0.050.08 0.08 0.22 0.08 0.12 0.05 0.08 0.10 0.07 0.080.10 0.11 0.08 0.26 0.07 0.11 0.11 0.08 0.09 0.120.07 0.12 0.12 0.07 0.25 0.05 0.06 0.12 0.06 0.080.12 0.11 0.05 0.11 0.05 0.30 0.13 0.09 0.11 0.080.11 0.10 0.08 0.11 0.06 0.13 0.32 0.10 0.10 0.050.10 0.12 0.10 0.08 0.12 0.09 0.10 0.24 0.10 0.060.09 0.06 0.07 0.09 0.06 0.11 0.10 0.10 0.23 0.110.05 0.05 0.08 0.12 0.08 0.08 0.05 0.06 0.11 0.30

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦. (73)

The eigendecomposition of C∗ gave thefollowing eigenvalues:

�∗ = diag{[

1.070 0.298 0.282 0.192 0.178 0.143 0.129 0.109 0.097 0.084]}

. (74)

We repeated this procedure 10,000 times andfound that the first eigenvalue of matrix C was largerthan the first eigenvalue of all 10,000 permuted C∗matrices. From that pattern we can conclude that thefirst eigenvalue is highly significant (i.e., p < 0.0001).By contrast, all 10,000, but four, second eigenvaluesturned out to be larger than the second eigenvalue ofC (i.e., p = 0.9996) and this pattern strongly suggeststhat there is no cluster (or subpopulation or ‘segment’)in the sample of assessors used for this example.

Finding the Optimal Weights from the FirstEigenvector of CFrom u1 (see Eq. (71)), we find the optimum set ofweights, denoted α, by rescaling u1 so that the sum ofits elements is equal to 1 (cf., Eq. (15)):

α = u1 × (u11)−1 =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0.3250.3260.2890.3260.2530.3110.2940.2790.3490.389

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦× 3.141−1 =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0.1040.1040.0920.1040.0810.0990.0940.0890.1110.124

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦.

(75)

There are K = 10 values in α (see Eqs (75),(2) and (15)–(17)). These values are stored in the

J = 53 × 1 vector a, which can itself be used to fillin the diagonal elements of the 53 × 53 diagonalmatrix A:

a =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1[1] × 0.1041[2] × 0.1041[3] × 0.0921[4] × 0.1041[5] × 0.0811[6] × 0.0991[7] × 0.0941[8] × 0.0891[9] × 0.1111[10] × 0.124

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦and

A = diag {a} = diag

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1[1] × 0.1041[2] × 0.1041[3] × 0.0921[4] × 0.1041[5] × 0.0811[6] × 0.0991[7] × 0.0941[8] × 0.0891[9] × 0.1111[10] × 0.124

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

⎫⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎭

, (76)

where 1[k] is an J[k] vector of ones.

Generalized PCA of XThe 12 × 12 diagonal mass matrix M for theobservations (with equal masses set to mi = 1

I = 0.08)

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 141

Page 19: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review wires.wiley.com/compstats

is given as

M = diag {m} = diag{

112

× 1}

= diag

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0.080.080.080.080.080.080.080.080.080.080.08

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

⎫⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎭

. (77)

The GSVD of X with matrices M (Eq. (77))and A (Eq. (76)) gives X = P�QT. The eigenvalues(denoted λ) are equal to the squares of the singularvalues and are often used to informally decide upon thenumber of components to keep for further inspection.The eigenvalues and the percentage of the inertia thatthey explain are given in Table 2.

The pattern of the eigenvalue distributionsuggests to keep two or three dimensions for futureexamination and we decided (somewhat arbitrarily)to keep only the first two dimensions for this example.So, in the remainder of this example we show onlythese first two dimensions.

The matrix Q of the right singular vectors(loadings for the variables) is given in Table 3, thematrix P of the right singular vectors and the matrix� of the singular values are given below:

P =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

−1.098 0.460−0.921 0.185−0.876 −1.297−1.275 −0.507

1.557 −0.4051.415 −0.3510.958 0.6631.068 0.968

−0.781 0.8100.083 −2.212

−0.535 1.6120.406 0.073

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

� = diag{[

0.2290.089

]}. (78)

Factor ScoresThe factor scores for X show the best two-dimensionalrepresentation of the compromise of the K tables.Using Eqs (23), (25), and (78), we obtain:

F = P� = XAQ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

−0.252 0.041−0.211 0.016−0.201 −0.115−0.292 −0.045

0.357 −0.0360.324 −0.0310.220 0.0590.245 0.086

−0.179 0.0720.019 −0.196

−0.123 0.1430.093 0.006

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

. (79)

In the F matrix, each row represents a wineand each column is a component. Figure 3a shows thewines in the space created by the first two components.The first component (with an eigenvalue equal to λ1 =0.2292 = 0.053) explains 63% of the inertia, andcontrasts the French and New Zealand wines. The sec-ond component (with an eigenvalue of λ2 = 0.0892 =0.008) explains 9% of the inertia and is more delicateto interpret from the wines alone (its interpretationwill become clearer after looking at the loadings).

Partial Factor ScoresThe partial factor scores (which are the projection ofa table onto the compromise) are computed from Eq.(27). For example, for the first assessor, the partialfactor scores of the 12 wines are obtained as:

F[1] = X[1]Q[1] =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

−0.275 0.044−0.312 0.173−0.057 −0.028−0.250 0.125

0.409 −0.2010.311 −0.2170.185 0.0480.266 −0.033

−0.244 0.1260.050 −0.266

−0.170 0.2030.086 0.026

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

, (80)

TABLE 2 Eigenvalues and Percentage of Explained Inertia

Component

1 2 3 4 5 6 7 8 9 10 11 12

Eigenvalue (λ) 0.053 0.008 0.006 0.004 0.004 0.002 0.002 0.002 0.001 0.001 0.001 0

cumulative 0.053 0.060 0.066 0.071 0.074 0.077 0.079 0.081 0.082 0.083 0.083 0.083

% Inertia (τ ) 63 9 7 5 5 3 2 2 1 1 1 0

cumulative 63 72 79 84 89 92 94 96 97 98 100 100

142 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

Page 20: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

TAB

LE3

αW

eigh

ts,L

oadi

ngs,

and

Squa

red

Load

ings

Asse

ssor

1As

sess

or2

Asse

ssor

3

V1V2

V3V4

V5V6

V1V2

V3V4

V7V8

V1V2

V3V4

V9V1

0

α-w

eigh

ts0.

104

0.10

40.

104

0.10

40.

104

0.10

40.

104

0.10

40.

104

0.10

40.

104

0.10

40.

092

0.09

20.

092

0.09

20.

092

0.09

2

Load

ings

(Q)

Dim

1−0

.459

−0.4

18−0

.406

0.37

40.

449

−0.3

63−0

.465

−0.4

64−0

.419

0.39

8−0

.373

−0.3

49−0

.475

−0.2

12−0

.403

0.31

5−0

. 434

0.41

8

Dim

2−0

.514

0.43

6−0

.674

0.38

5−0

.223

−0.1

49−0

.290

0.33

9−0

.323

0.44

10.

073

0.50

7−0

.365

0.35

9−0

.547

0.60

90.

453

−0.4

34

Squa

red

Load

ings

Dim

10.

210

0.17

50.

165

0.14

00.

201

0.13

20.

216

0.21

50.

176

0.15

90.

139

0.12

20.

226

0.04

50.

162

0.09

90.

189

0.17

4

Dim

20.

265

0.19

00 .

455

0.14

80.

050

0.02

20.

084

0.11

50.

104

0.19

50.

005

0.25

70.

133

0.12

90.

299

0.37

10.

205

0.18

8

Asse

ssor

4As

sess

or5

Asse

ssor

6As

sess

or7

V1V2

V3V4

V8V1

V2V3

V4V1

1V1

2V1

V2V3

V4V1

3V1

V2V3

V4

α-w

eigh

ts0.

104

0.10

40.

104

0.10

40.

104

0.08

10.

081

0.08

10.

081

0.08

10.

081

0.09

90.

099

0.09

90.

099

0.09

90.

094

0.09

40.

094

0.09

4

Load

ings

(Q)

Dim

1−0

.535

−0.4

46−0

.518

0.39

5−0

.349

−0.4

60−0

.333

−0.4

170.

192

−0.4

030.

273

−0.5

16−0

.478

−0.4

530.

396

−0. 3

51−0

.526

−0.4

73−0

.530

0.34

7

Dim

2−0

.146

0.64

50.

310

−0.0

820.

103

−0.2

900.

448

−0.4

14−0

.187

0.19

10.

103

−0.3

560.

505

−0.5

560.

005

0.56

6−0

.718

0.81

5−0

.543

−0.6

43

Squa

red

Load

ings

Dim

10.

287

0.19

90.

269

0.15

60.

122

0.21

20.

111

0.17

40.

037

0.16

20.

074

0.26

60.

229

0.20

50.

157

0.12

30.

276

0.22

30.

280

0.12

0

Dim

20.

021

0.41

60.

096

0.00

70.

011

0.08

40.

200

0.17

10.

035

0.03

60.

011

0.12

60.

256

0.30

90.

000

0.32

00.

515

0.66

40.

295

0.41

3

Asse

ssor

8As

sess

or9

Asse

ssor

10

V1V2

V3V4

V14

V5V1

V2V3

V4V1

5V1

V2V3

V4

α-w

eigh

ts0.

089

0.08

90.

089

0.08

90.

089

0.08

90.

111

0.11

10.

111

0.11

10.

111

0.12

40.

124

0.12

40.

124

Load

ings

(Q)

Dim

1−0

.430

−0.3

88−0

.365

0.21

3−0

.447

0.37

3−0

.519

−0.4

03−0

.491

0.42

8−0

.508

−0.6

17−0

.527

−0.5

470.

542

Dim

2−0

.460

0.66

4−0

.426

0.43

30 .

401

−0.4

32−0

.424

0.46

4−0

.379

0.20

10.

362

−0.1

690.

581

−0.4

47−0

.528

Squa

red

Load

ings

Dim

10.

185

0.15

10.

133

0.04

50.

200

0.13

90.

270

0.16

20.

241

0.18

30.

258

0.38

10.

278

0.29

90.

294

Dim

20.

211

0.44

00.

181

0.18

80.

161

0.18

60.

179

0.21

60.

143

0.04

10.

131

0.02

80.

337

0.20

00.

279

V1,

Cat

Pee;

V2,

Pass

ion

Frui

t;V

3,G

reen

Pepp

er;V

4,M

iner

al;V

5,Sm

oky;

V6,

Cit

rus;

V7,

Tro

pica

l;V

8,L

eafy

;V9,

Gra

ssy;

V10

,Flin

ty;V

11,V

eget

al;V

12,H

ay;V

13,M

elon

;V14

,Cut

Gra

ss;V

15,P

each

.

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 143

Page 21: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review wires.wiley.com/compstats

(a) (b)

FIGURE 3 | Compromise of the 10 tables. (a) Factor scores (wines). (b) Assessors’ partial factor scores projected into the compromise assupplementary elements. Each assessor is represented by a dot, and for each wine a line connects the wine factor scores to the partial factors scoresof a given assessor for this wine. (λ1 = 0.053, τ1 = 63%; λ2 = 0.008, τ2 = 9%).

2

1

2

1

2

1

2

1

2

1

GreenPepper

Smoky

MineralGrassy Passion

CatPee

FruitMineral

FlintyCatPee

LeafyMineral

VegetalHay

Mineral

Passion Fruit

MineralGreenPepper

Cat

Mineral

GreenCatPee

PeachPassion Fruit

PepperPee

Passion Fruit

Cat

GreenPepper

Pee

GreenPepper

Passion Fruit

Leafy

Passion Fruit

Tropical

CatGreen

Pepper

Mineral

GreenPepper

CatPee

Melon

Mineral Passion Fruit

Mineral

Passion Fruit

Mineral

Smoky

Cut

Green

CatPee

Pepper

Grass

Cat Pee

GreenPepper

Passion Fruit

Assessor 1

Assessor 6 Assessor 7 Assessor 8 Assessor 9 Assessor 10

Assessor 2 Assessor 3 Assessor 4 Assessor 5

Passion Fruit

CitrusCatPee

Pee

2

1

2

1

2

1

2

1

2

1

(a) (b) (c) (d) (e)

(f) (g) (h) (i) (j)

FIGURE 4 | Partial factor scores and variable loadings for the first two dimensions of the compromise space. The loadings have been re-scaled tohave a variance equal the singular values of the compromise analysis.

and are displayed in Figure 4a. From Figure 4, wecan see that, for all the assessors, Component 1 sep-arates the New Zealand from the French SauvignonBlancs, a pattern replicating the pattern seen in thecompromise (Figure 3a). However, the assessors showa large inter-individual differences in how they ratedthe Canadian wines.

The original variables are analyzed, as instandard PCA, by computing loadings which are givenin Table 3 for the first two dimensions. The loadingsare also plotted in a biplot fashion in Figure 4. Herewe show the partial factor scores (of the wines) alongwith the loadings for each assessor (which we havere-scaled so that their variance, for each dimension,

144 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

Page 22: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

is equal to the singular value of the compromise).From these plots, we can see that for all the assessors,the New Zealand wines are rated as having a morecat-pee aroma, with some green pepper and passionfruit, while the French wines are rated as being moremineral, smoky, or hay-like.

Determining the Importance of the tablesin the CompromiseThere are two ways to determine which tables playthe largest role in the compromise: contributions andpartial inertias. The contributions of a table reflectthe proportion of the variance of a dimension thatcan be attributed to this table (see Eq. (33)). Thelarger the contribution of a table to a component, themore important this table is for this component. Thecontributions for the tables are:

ctrk,� =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0.106 0.1170.106 0.0790.082 0.1220.107 0.0570.062 0.0430.097 0.1000.084 0.1770.076 0.1210.124 0.0790.155 0.105

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦. (81)

The contributions can also be plotted to obtain avisual representation of the importance of the studies.Figure 5 shows the relative contributions of each ofthe tables to Components 1 and 2. From this figure,we can see that Assessor 10 contributes the most

7

3

8

6

10

9

2

45

1

1

2

FIGURE 5 | Contributions of the tables to the compromise. The sizesof the assessors’ icons are proportional to their contribution tocomponents 1 and 2.

to the first component of the compromise, whileAssessor 7 contributes most to the second component.Assessor 5, by contrast, contributes the least to bothComponents 1 and 2.

In addition to using contributions, we candetermine a table’s importance with the partialinertia that gives the proportion of the compromisevariance (i.e., inertia) explained by the table. This isobtained by multiplying, for each dimension, the tablecontribution by the eigenvalue of the dimension. Forour example, the partial inertias denoted Ipartial are:

Ipartial =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0.0056 0.00090.0056 0.00060.0043 0.00100.0056 0.00040.0033 0.00030.0051 0.00080.0044 0.00140.0040 0.00100.0065 0.00060.0082 0.0008

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦. (82)

Like the contributions, the partial inertia can beplotted to get a visual representation. From Figure 6,we see that Assessor 10 accounts for the most inertiaon the first dimension, while Assessor 5 accounts forthe lowest proportion of the total variance.

Supplementary TableBecause we would like to know what qualities ofthe wines are associated with the assessors’ ratings,we wanted to include some chemical componentsof the wines as variables, namely, titratable acidity,pH, alcohol, and residual sugar. The values for thesevariables are shown in Table 4. However, becausethese properties are qualitatively different from theassessors’ ratings (i.e., did not come from the samepopulation), we did not want to include them as activeelements in the analysis and therefore projected themas a supplementary table.

To obtain the factor scores, we first computethe supplementary loadings, which are obtained as

7

8

54 2

910

6

31

FIGURE 6 | Partial inertias of the studies. The sizes of the assessors’icons are proportional to their explained inertia for Components 1 and 2.

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 145

Page 23: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review wires.wiley.com/compstats

TABLE 4 Supplementary Table: Chemical Properties of the Wines

Chemical Properties

TitratableAcidity pH Alcohol

ResidualSugar

NZ1 5.60 3.38 14.00 3.00

NZ2 5.30 3.53 13.50 3.60

NZ3 6.20 3.27 14.00 3.00

NZ4 8.50 3.19 13.50 3.90

F1 5.00 3.60 12.50 1.50

F2 5.88 3.00 12.50 2.00

F3 4.50 3.33 13.00 0.80

F4 5.60 3.40 12.00 2.10

CA1 7.60 3.30 13.00 2.80

CA2 5.70 3.43 13.50 2.10

CA3 6.20 3.30 12.50 2.50

CA4 6.90 2.20 13.00 2.00

(cf., Eq. (50)):

Qsup = XTsupMP�−1 =

⎡⎢⎢⎣−0.337 0.054−0.059 0.036−0.460 0.782−0.537 0.124

⎤⎥⎥⎦ . (83)

From these loadings, we can see that, onComponent 1, the New Zealand Sauvignon Blancs

are more acidic, have a greater alcohol content, andmore residual sugar than the French wines.

Next, we compute the supplementaryfactor scores for the first two components as

(cf., Eq. (51)):

Fsup = XsupQsup = XsupXTsupMP�−1

=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

−0.133 −0.181−0.124 −0.101−0.158 −0.184−0.289 −0.125

0.191 0.1310.120 0.1230.231 0.0610.168 0.210

−0.094 −0.004−0.001 −0.070

0.052 0.1060.038 0.035

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

. (84)

In a manner similar to the biplot approachwe used for the assessors (see Figure 4) we plottedtogether, in a biplot way, the supplementary partialfactor scores and loadings for the chemical propertiesof the wines (see Figure 7). This figure confirms theinterpretation that we reached from the numericalvalues.

Inner Product Map ProjectionTo evaluate the similarity between the supplementaryand the active tables, we first compute the cross-product matrix associated to Xsup (cf., Eq. (52)):

Ssup = XsupXTsup

=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0.06 0.05 0.06 0.02 −0.03 −0.04 −0.02 −0.05 −0.01 0.02 −0.03 −0.040.05 0.07 0.04 0.02 −0.01 −0.04 −0.03 −0.02 −0.01 0.01 −0.01 −0.080.06 0.04 0.06 0.05 −0.05 −0.04 −0.03 −0.06 0.01 0.02 −0.03 −0.010.02 0.02 0.05 0.18 −0.11 −0.04 −0.14 −0.06 0.08 −0.02 −0.01 0.02

−0.03 −0.01 −0.05 −0.11 0.09 0.02 0.08 0.06 −0.03 0.01 0.02 −0.06−0.04 −0.04 −0.04 −0.04 0.02 0.04 0.03 0.04 −0.01 −0.02 0.02 0.05−0.02 −0.03 −0.03 −0.14 0.08 0.03 0.13 0.04 −0.06 0.03 −0.00 −0.02−0.05 −0.02 −0.06 −0.06 0.06 0.04 0.04 0.08 −0.01 −0.01 0.04 −0.03−0.01 −0.01 0.01 0.08 −0.03 −0.01 −0.06 −0.01 0.05 −0.01 0.01 0.01

0.02 0.01 0.02 −0.02 0.01 −0.02 0.03 −0.01 −0.01 0.02 −0.01 −0.04−0.03 −0.01 −0.03 −0.01 0.02 0.02 −0.00 0.04 0.01 −0.01 0.02 −0.01−0.04 −0.08 −0.01 0.02 −0.06 0.05 −0.02 −0.03 0.01 −0.04 −0.01 0.21

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

. (85)

Then, we compute the K inner products betweenSsup and each of the S[k] matrices (cf., Eq. (8)):

csup,k = ⟨Ssup, S[k]

⟩ = trace{SsupS[k]

}, (86)

146 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

Page 24: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

1

2

Acidity pH

Alcohol

Sugar

FIGURE 7 | Supplementary table: Chemical components of thewines. Supplementary partial scores and loadings. (cf., Figure 3a).

which gives:

csup = [0.247 0.269 0.233 0.281 0.205 0.297 0.254 0.230 0.317 0.326

]. (87)

The factor scores of the supplementary table inthe inner product map are obtained as (cf., Eqs (54)and (55)):

gsup = csupU�12

= [0.417 −0.006 −0.008 0.136 −0.038 −0.031 −0.051 0.013 −0.038 0.058

]. (88)

When gsup is projected onto the inner productmap (Figure 8), we can see that the chemical propertiesof the wines are most similar to the ratings of Assessors1, 8, 9, and 10. This suggests that for the sample ofSauvignon Blanc wines, these four assessors may bequite sensitive to the chemical properties of the wines.

1

2

3

4

5

6

7

89

101

2

FIGURE 8 | Projection of the supplementary table into the innerproduct map.

To get the α weight of the supplementary table,we compute αsup (from Eqs (56), (71), and (75)) as:

αsup = γ− 1

21 × gsup,1 × (uT

11)−1 = 4.135− 12 × 0.417

× 3.141−1 = 0.4172.034 × 3.141

= 0.065. (89)

Inferential Steps

JackknifeTo estimate the reliability of the α-weights, we use thejackknife procedure as described in Eqs 57–60.

This gives the following estimates of the elementsof the vector α:

α∗ = 1I

I∑i

α∗i = α − I − 1

I

I∑i

α−i (90)

= [0.105 0.105 0.089 0.103 0.076 0.100 0.092 0.087 0.113 0.129

].

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 147

Page 25: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review wires.wiley.com/compstats

Then, the standard error of the vector α∗i (cf.,

Eq. (60)) is obtained as:

σ 2α∗ = [

0.016 0.014 0.016 0.023 0.017 0.019 0.016 0.018 0.016 0.010]. (91)

Finally, using a critical value of 2 (for anapproximate value of p < 0.05), we find, according tothe jackknife estimator, the following 95% confidenceintervals for the α-weights for each table:

[0.137 0.134 0.121 0.150 0.109 0.138 0.124 0.122 0.146 0.1490.073 0.077 0.057 0.057 0.043 0.061 0.059 0.052 0.081 0.109

]. (92)

With one (slight) exception, the value of 0.100is contained within all confidence intervals. This valuecorresponds to an equal weighting scheme of αi = 1

I .This suggests that the STATIS solution differs littlefrom a simple multitable approach. Also, only the 10thassessor can be considered as having a ‘commonality’higher than the group average.

BootstrapTo estimate the stability of the compromise factorscores, we used a bootstrap approach. We generated1000 bootstrap samples that gave 1000 estimatedbootstrapped factor scores. For example, for the firstbootstrapped sample, we sampled with replacementin the set of the integers from 1 to 10 and obtainedthe following bootstrap set

B = {6 4 4 2 9 3 1 1 2 8

}. (93)

We then computed the C∗1 matrix using the

following inner product matrices:

{S[6] S[4] S[4] S[2] S[9] S[3] S[1] S[1] S[2] S[8]

}.

(94)

From the eigendecomposition of C∗1, we obtained

the following bootstrapped values for the α weights:

α∗1 = [

0.096 0.103 0.103 0.103 0.109 0.090 0.102 0.102 0.103 0.088]. (95)

With these weights, we then computed thebootstrapped estimate of the factor scores as

F∗1 =

∑k∈B

F[k] = 0.096 × F[6] + 0.103 × F[4]

+ 0.103 × F[4] + 0.103 × F[2] + 0.109 × F[9]

+ 0.090 × F[3] + 0.102 × F[1] + 0.102 × F[1]

+ 0.103 × F[2] + 0.088 × F[8]

=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

−0.262 −0.020−0.219 −0.022−0.203 0.108−0.289 0.005

0.352 0.0450.328 0.0500.228 −0.0350.246 −0.091

−0.163 −0.0550.027 0.185

−0.137 −0.1330.092 −0.038

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

. (96)

From the bootstrap estimates, we can alsocompute bootstrap ratios, which, like t statistics,can be used to find the observations that reliablycontribute to a given component. To get the bootstrapratios, we first computed the mean of the bootstrap

148 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

Page 26: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

samples which are equal to (see Eq. (62))

F∗ = 1

L

L∑�

F∗� =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

−0.252 −0.041−0.211 −0.016−0.201 0.116−0.292 0.045

0.356 0.0360.325 0.0310.219 −0.0600.245 −0.086

−0.179 −0.0730.019 0.196

−0.122 −0.1430.094 −0.006

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

, (97)

and the standard deviations as (see Eq. (63)):

σF∗ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0.010 0.0290.023 0.0280.030 0.0220.023 0.0370.022 0.0320.022 0.0370.026 0.0310.022 0.0280.030 0.0360.021 0.0300.021 0.0180.021 0.033

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

. (98)

Then, the bootstrap ratios are computed bydividing the bootstrap mean by its standard deviation.This gives the following bootstrap ratios:

Bootratio =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

−24.204 −1.385−9.258 −0.561−6.820 5.347

−12.721 1.20816.041 1.12115.079 0.833

8.328 −1.95111.155 −3.084−6.035 −2.004

0.898 6.478−5.955 −7.769

4.510 −0.172

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

. (99)

These bootstrap ratios can be plotted as abar chart to give a visual representation of whichobservations most reliably contribute to a component(see Figure 9). To be conservative and to take intoaccount the multiple comparisons problem, we chosea bootstrap ratio critical value of ±3. This valuecorrespond roughly to a Bonferroni corrected p-valuefor J = 53 comparisons (i.e., p-corrected is equal to0.0553 ≈ 0.001 which approximately corresponds to a t

value of 3).

From the bootstrap ratios shown in Figure 9(see also Figure 10), we can see that all ofthe wines—except the second Canadian SauvignonBlanc—contribute reliably to the first component,with the New Zealand wines separated fromthe French wines. This confirms our previousinterpretation of Figure 3a. However, only four winescontribute reliably to the second component, with thethird New Zealand Sauvignon Blanc grouping withthe second Canadian wine and the fourth Frenchwine grouping together with the third Canadianwine.

We also used the set of bootstrapped factorsscores to obtain the 95% confidence intervals (CI)around the factor scores (see Figure 10). Here,around each wine, we fitted an ellipsoid thatcomprises 95% of the bootstrapped factor scores.This ellipsoid represents the possible positions of awine for replications of the analysis (assuming that theassessors were randomly sampled from a populationof assessors). When the ellipses of two wines do notoverlap, these two wines can be considered as reliablydifferentiated by the assessors.

RECENT DEVELOPMENTSAND VARIATIONS OVER STATIS

Recently, several papers have been dedicated toexploring the developments of various aspects of theSTATIS approach. We briefly review some of the mostrelevant ones.

X-STATIS or Partial Triadic AnalysisPartial triadic analysis (PTA), also called X-STATIS,was first named and described by Jaffrenou.96 Thissimple variation over STATIS can be used when allthe X[k] tables are measuring the same variables forthe same observations32,33,97–101 and can be seen alsoas a particular simplified case of Tucker three-modefactor analysis,102,103 hence the name partial (forsimplified) triadic (for three mode) analysis (the X-STATIS name is also used because it is a STATISanalysis on the X[k] matrices, see below). Theprocedure for PTA mirrors the STATIS procedurewith two small differences: (1) the inner productmatrix used for the computation of the α weightsis obtained from the X[k] matrices rather than fromthe S[k] matrices; and (2) the compromise is obtainedas the weighted average of the X[k] matrices (insteadof the weighted average of the S[k] matrices).

In PTA, the alpha weights are obtained fromthe eigendecomposition of the inner product matrix

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 149

Page 27: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review wires.wiley.com/compstats

+3

−3

0

Com

pone

nt 1

+3

−3

0

Com

pone

nt 2

FR

1

FR

2

FR

3

FR

4

CA

2

CA

1

NZ

4

NZ

2

NZ

1

NZ

3 CA

3 NZ

3

CA

2

CA

3

FR

4

NZ

4

FR

1

FR

2

CA

4

CA

1

FR

3

NZ

2

NZ

1

CA

4

FIGURE 9 | Bootstrap ratio plot for Components 1 and 2.

FIGURE 10 | Bootstrap confidence ellipses plotted onComponents 1 and 2.

C′ whose general term is obtained as (comparewith Eq. (8))

c′k,� = vec

{X[k]

}T × vec{X[�]

}(100)

(note that, contrary to standard STATIS, the termsc′

k,� are not guaranteed to be non-negative but PTA

requires them to be non-negative; a negative c′k,�, indi-

cates that, e.g. STATIS should be used rather thanPTA, but also that the reason of a negative valueneeds to be explored). The first eigenvector for C′,gives after re-scaling (see Eq. (15)), α′, the vector ofthe optimum weights. These weights are then usedto compute the compromise denoted X[+] which isobtained as the weighted average of the X[k] matrices(instead of the weighted average of the S[k] matrices).Specifically, X[+] is computed as

X[+] =K∑k

α′kX[k]. (101)

The SVD of X[+] provides factors scores for the obser-vations and loadings for the variables. Note that theloadings are computed for the common variables, (i.e.,not for the K sets of variables). The loadings for thevariables of an X[k] table can be recovered by project-ing this table as a supplementary table (from Eq. (50)).

150 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

Page 28: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

PTA maximizes a criterion akin to (but differentfrom) the criterion of STATIS. Specifically, PTAmaximizes: (see Ref 100, compare with Eq. (66))

Spta =K∑k

(vec

{α′

kX[k]}T × vec

{X[+]

})2with

α′Tα′ = 1. (102)

Recent models called STATICO32,33,98 andCOSTATIS33 generalizes PTA to two sets of tables bycombining co-inertia analysis (which relates one set oftables to the other set) with PTA (which integrates theinformation within one set of tables). These models areclosely related to double-STATIS which is describedlater in this article. These models can also be seen asa generalization of Tucker inter battery analysis,104

partial least square correlation,105,106 and STATIS.

Example X-STATISAs an illustration, the results of a PTA on the four vari-ables common to all the wines are shown in Figure 11.Here the results are very close to those of the STATISanalysis with these four variables (not shown here, butthe two figures would be almost identical). In practice,this similarity of the results between PTA and STATISis often observed, but the computational load of PTAis smaller (a feature of interest for large data sets suchas brain imaging or ‘-omics’ data). In PTA, the Ktables are often considered as i.i.d. observations and,in this case, a bootstrap resampling scheme on the Kmatrices X[k] can be used to estimate the variability(and derive confidence interval, and bootstrap ratios)of factor scores and loadings. Here, because the vari-ables are put in correspondence, the variability of theestimates is likely to be smaller for PTA than in plainSTATIS because of the larger number of measure-ments per variable. However, in PTA as in STATIS,when using the bootstrap to estimate the variabilityof the factor scores or the loadings, the estimation ofthe α vector needs to be performed for each bootstrapsample and this can add to the computational load ofthe procedure.

COVSTATIS and DISTATISCOVSTATIS and DISTATIS21,27,34,42 are straightfor-ward adaptations of the STATIS methodology to theproblem of integrating, respectively, covariance (orcorrelation), and distance matrices. Here the raw dataconsists of K covariance or distance matrices collectedon the same observations. As such, COVSTATIS andDISTATIS can be considered as three-way extensionsof metric multidimensional scaling (see e.g., Ref 107for an earlier closely related approach).

CatPee

PassionFruit

GreenPepper

Mineral 1

2

FIGURE 11 | X-STATIS or partial triadic analysis. Biplot of the winesand the four common variables.

COVSTATISCOVSTATIS is used to analyze a set of covariance(or correlation) matrices. When dealing with a setof covariance matrices, the problem is similar tothe cross-product approach for STATIS (we havecovariance matrices instead of cross-products), anda first idea could be to use directly the covariance orcorrelation matrices in lieu of the S[k] in Eqs (34)–(40).However, when dealing with covariance matrices, theproblem of having different units for the differentcovariance matrices needs to be eliminated by somenormalization scheme (e.g., by ensuring that all thecovariance matrices have a norm of 1, or have a firsteigenvalue equal to 1).

An additional problem remains, however,because each covariance (or correlation) matrix canbe interpreted as a space in which points are scatteredfrom their origin, but each matrix has its own origin(see Ref 108 for a detailed account of this problemand also Refs 20, 21, 34). Covariance matrices withdifferent origins cannot be meaningfully comparedfor the same reason that variables with differentmeans cannot be compared. In order to guaranteethat all covariance matrices have a common origin,each matrix is double centered (this is equivalentto subtracting the mean for simple variables). This isobtained by first computing an I by I centering matrix,denoted �, and defined as

� =(I − 1mT

). (103)

So, if R[k] is a (normalized) covariance orcorrelation matrix, the corresponding double centeredcross-product matrix (i.e., S[k]) will be obtained as

S[k] = 12

�R[k]�. (104)

These S[k] matrices are then used as in the standardcross-product approach for the STATIS procedure.21

With COVSTATIS, the distances on the mapbetween observations are approximately inversely

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 151

Page 29: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review wires.wiley.com/compstats

proportional to their covariance or correlation in theoriginal data matrices.

DISTATISDISTATIS is used when we have K distance matricescollected on the same set of I observations. The mainidea behind DISTATIS is to transform the distancematrices into cross-product matrices and then use thecross-product approach to STATIS. This idea intro-duced in Ref 21 was developed later in Refs 27, 34, 42and subsequently used in different domains such assensory evaluation,27–29,109–112 brain imaging,42,113

survey analysis,3 computer science114 or molecularbiology.46

Distance matrices cannot be directly analyzedwith STATIS because distance, having a diagonal ofzero values, are not positive semidefinite (i.e., arenot cross-product matrices). To alleviate this prob-lem, a traditional approach is to first transform thedistance matrices into cross-product matrices. Thistransformation (originally proposed by Young andHouseholder,115 see also Ref 116), constitutes thebasis of metric multidimensional scaling,117 and is par-ticularly appropriate for squared Euclidean distancematrices because this transformation is completelyreversible in the sense that a squared Euclidean dis-tance matrix can be perfectly reconstituted fromits corresponding cross-product matrix and vice-versa.115 When this transformation is applied to anon-Euclidean distance, it gives a symmetric matrixthat has both positive and negative eigenvalues (thelarger the proportion of negative eigenvalues, themore ‘non-Euclidean’ the distance). For a non-Euclidean matrix, a cross-product matrix can beobtained by keeping only the eigenvectors associ-ated with positive eigenvalues and then reconstitutingthe cross-product matrix with these eigenvectors andeigenvalues (transforming this cross-product matrixback into a distance matrix will provide a squaredEuclidean distance matrix).

In order to transform a distance matrix intoits corresponding cross-product matrix, we use adouble centering similar to the one described byEq. 104. Specifically a squared distance matrix D[k]is transformed into a cross-product matrix R[k] bythe following transformation called ‘double-centering’and computed as (compare to Eq. (104), note theminus sign in front of the right term of the equation):

R[k] = −12

�D[k]�. (105)

The R[k] matrices are often normalized prior to theanalysis such that, for example, the sum of theirsquared elements is equal to one (a la SUM-PCA) or

that they have a first eigenvalue equal to one (a laMFA).

The normalized cross-product matrices aredenoted S[k] and are then used as the entries forthe cross-product approach of STATIS. Detailedcomputational examples of DISTATIS can be foundin Refs 27, 28 and a short example can be found inthis article in the section on canonical-STATIS.

With DISTATIS, the distances on the mapbetween two observations are approximatelyproportional to their distances in the original datamatrices.

So, in summary, with DISTATIS, we start withK distance matrices D[k] of dimensions I by I. Thesedistance matrices are then transformed into K cross-product matrices S[k] which are then plugged into thecross-product approach of STATIS.

CANOSTATIS: Canonical STATISRecently, a new method called canonical-STATIS(CANOSTATIS118) has been proposed that inte-grates features of DISTATIS and discriminant anal-ysis (as well as multiblock barycentric discriminantanalysis5,119). In this framework, the data consist inK tables, denoted X[k] each of dimensions I by J[k]collecting measurements on the same I observations(the X[k] matrices are supposed to have been ade-quately normalized). As an additional requirement ofCANOSTATIS, each of the X[k] matrix is required tobe full-rank on its columns (which implies that J[k] < Ifor all K, this requirement comes from discriminantanalysis which is part of CANOSTATIS, see below formore explanations). For convenience, the columns ofX[k] are supposed to be centered and normalized (andthe whole matrix itself can also be normalized).

In CANOSTATIS, the I observations come fromN a-priori known groups. The problem, here, is toexplore the relationship between these groups in orderfor example to assess if these groups are differentiatedand also to assess if the observations can reliably (ornot) be assigned to their respective groups.

If the data were structured into only onetable (and if the data matrix were full rank onits columns), the problem would correspond to aclassic problem of discriminant analysis.120 An equi-valent way of performing discriminant analysis isto compute the so-called Mahalanobis distance77

between the N groups and to compute amultidimensional scaling117 analysis on this distancematrix followed by a projection as supplementaryelements of the I observations. Canonical-STATISgeneralizes this discriminant analysis approach bycomputing for each of the K tables an N by NMahalanobis distance matrix (see, Ref 77 for a

152 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

Page 30: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

detailed example of this computation). In order todo so, the first step is to compute, for each tabletwo matrices: (1) the N by J[k] matrix of the groupsmeans denoted X[k] (i.e., an entry of X[k] collects themean of one column of X[k] for one of the N groups)and (2) the J[k] by J[k] matrix denoted [k] of thecovariances between the variables of X[k] [i.e., the(j, j′) entry of [k] contains the covariance between thejth and the (j′)th variables of X[k]]. The computationof the Mahalanobis distance matrix requires thecomputation of the inverse of the covariance matrix.This inverse is denoted

−1[k] , and can be computed only

when [k] is full rank (hence the requirement that theX[k] matrices should be full rank on their columns,because these two statements are equivalent). Withthese notations, the distance matrix for the kth groupis denoted D[k] and is computed as (see next sectionfor an example of this computation)

D[k] = diag{X[k]

−1[k] X

T[k]

}× 1T + 1

× diag{X[k]

−1[k] X

T[k]

}T− 2 × X[k]

−1[k] X

T[k]. (106)

These K distance matrices are then used as theinput for a DISTATIS analysis. This provides an innerproduct map that shows how the tables relate to eachother in terms of their discrimination and a com-promise that gives the best common discriminationpattern. On this map, the discrimination pattern ofeach table can also be projected. As with DISTATIS(and in metric multidimensional scaling42,117,121),the original observations can also be projected

onto the group map to assess the quality of thediscrimination.

The constraint that each X[k] matrix be full rankis necessary only if the (full) inverse of [k] is com-puted. Using a pseudo-inverse will provide the samedistance matrix (cf., Ref 122, Lemma 2.2.4.iii) anddoes not require the full rank assumption. Alterna-tively, when an X[k] is not full rank, some form ofregularization or ridge could also work (see Ref 137).These approaches are likely to constitute future devel-opments of CANOSTATIS.

So, in brief, in CANOSTATIS we have groupsof observations described by several tables. For eachtable we compute a Mahalanobis distance matrix—astep which is equivalent to performing a linear discrim-inant analysis. We then use these distance matrices asthe input to a DISTATIS analysis in order to integratethem and represent them as a compromise map.

Example Canonical-STATISAs an illustration, we use the same data as in theSTATIS example, with the X[k] matrices being now theI = 12 by J[k] tables and the countries of origins of thewines as defining the groups of wines to be classified.We have N = 3 groups corresponding to the threecountries of origin of the wines (i.e., New Zealand,France, and Canada). The goal of the analysis isto evaluate if the assessors differentiate the winesaccording to the country of origin.

If we look at the first assessor, the data matrixX[1] is given by Eq. (68). The N = 3 by J[1] = 6 matrixof means X[1] is equal to

X[1] =⎡⎣ 0.0929 0.0922 0.0859 −0.0609 −0.1109 0.095

−0.1161 −0.1252 −0.1039 0.1218 0.1524 −0.08560.0232 0.0329 0.0181 −0.0609 −0.0416 −0.0101

⎤⎦ . (107)

The variance/covariance matrix is equal to

[1] =

⎡⎢⎢⎢⎢⎢⎢⎣0.0085 0.0010 0.0078 −0.0040 0.0007 0.00350.0010 0.0073 −0.0008 0.0020 −0.0010 0.00230.0078 −0.0008 0.0103 −0.0054 0.0009 0.0025

−0.0040 0.0020 −0.0054 0.0086 −0.0007 −0.00270.0007 −0.0010 0.0009 −0.0007 0.0020 0.00010.0035 0.0023 0.0025 −0.0027 0.0001 0.0112

⎤⎥⎥⎥⎥⎥⎥⎦ . (108)

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 153

Page 31: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review wires.wiley.com/compstats

The Mahalanobis distance matrix between thethree countries of origin, for the first assessor isequal to

D[1] =⎡⎣ 0 51.0810 4.3609

51.0810 0 29.98914.3609 29.9891 0

⎤⎦ , (109)

which is then transformed into a cross-product matrix(that, for convenience, has been normalized, a la MFA,to have a first eigenvalue of 1) equal to:

S[1] =⎡⎣ 0.3239 −0.4426 0.1187

−0.4426 0.6318 −0.18920.1187 −0.1892 0.0706

⎤⎦ . (110)

All 10 assessors are processed in a similarfashion. From their cross-product matrices, we obtaina matrix of inner products whose eigendecompositionshows the relationship between assessors (seeFigure 12). From this eigendecomposition, we alsoobtain the compromise whose analysis in turn gives thecompromise factor scores map shown in Figure 13a.

2

13

8

54

6

1

2

10

9

7

FIGURE 12 | Canonical-statis. PCA of the inner product map: theassessors.

In order to assess the quality of the discrimination,we have performed bootstrap resampling with 1000iterations and projected on the compromise map the95% confidence intervals (see Figure 13b). The figureindicates that the regions are well discriminated andthat this discrimination is reliable.

Power STATISRecently, Benasseni and Bennani Dosse123 looked foralternative criteria for STATIS-like techniques. Specif-ically, they explored and generalized Criterion 2 (seeEq. (66) and Appendix) and, instead of maximiz-ing S = ∑⟨

S[k], S[+]⟩2 (see Eq. (66)), they decided to

explore the solution of the more general problem

S′ =K∑k

⟨S[k], S[+]

⟩p with αk ≥ 0 and αTα = 1,

(111)

where p can be any non-negative real number.Of particular interest is the case of p = 1,

which we call power-1 STATIS (because itmaximizes the criterion with p = 1). In this casethe quantity to maximize corresponds to the sumof the inner-products between the tables and thecompromise. Formally, in Power-1 STATIS, thecriterion to be maximized reduces to

S1 =K∑k

⟨S[k], S[+]

⟩with αk ≥ 0 and αTα = 1.

(112)

This expression seems to capture better the intuitionof a least squares minimization than the S criterionbecause the inner product already behaves likea ‘squared quantity’ and squaring it again seemssuperfluous. Also, this minimization problem has a

2 2

Canada Canada

New Zealand

New Zealand

France France 11

FIGURE 13 | Canonical-STATIS. Left panel: Compromise factor scores for the three wine regions. Right panel: Compromise factor scores for thethree wine regions with bootstrapped 95% confidence intervals.

154 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

Page 32: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

very simple and elegant solution because then thevector of optimal weights, can be obtained as

α[1] = C1 × ‖C1‖−1 (113)

(where 1 is a K by 1 vector of ones). For our example,α computed with standard STATIS and α[1] computedfrom power-1 STATIS were almost identical (an effectalso observed by Benasseni and Bennani Dosse123 withseveral examples), and this suggests that the specificchoice of the criterion—which is a matter of importanttheoretical interest—may not be a crucial problem forpractical applications.

Power-1 STATIS can also be straightforwardlyadapted to handle X-STATIS by replacing matrix Cin Eq. (113) by C′ from Eq. (100).

ANISOSTATISSTATIS gives the same weight to each variable of agiven table (namely αk, see, e.g., Eqs (16) and (17)). Amore formal way of expressing this condition is to saythat STATIS is an isotropic technique. An interestingextension of STATIS is to relax the isotropic conditionand to assign one weight per variable (whatever thetable) instead of having the same weight for allvariables of a table. This extension of STATIS isanisotropic and is called ANISOSTATIS.124 So forANISOSTATIS, the goal is to find values for thevector a (cf., Eq. (16)) that will make the map of thecompromise closer to the set of tables. Interestingly,when generalizing STATIS, the two equivalent criteria(i.e., Eqs (64) and (66)) maximized by STATIS giverise to two different solutions. In addition, it is alsopossible to derive a power-1 criterion analogous tothe criterion used by power-1 STATIS.123 We presentthese three different criteria below. All three criteriacan also be directly adapted to work with PTA.

ANISOSTATIS Criterion 1The first criterion maximized by ANISOSTATIS—equivalent to Eq. (64)—is expressed as:

V1 = ||S[+]||2 = ⟨S[+], S[+]

⟩=⟨∑

k

S[k]A[k]ST[k],

∑k

S[k]A[k]ST[k]

⟩(114)

with the constraint that aTa = 1.The solution of this problem involves the

covariance matrix of X with all of its elementssquared. This J by J covariance matrix is denoted and computed as:

= XTMX. (115)

Each entry of stores the covariance between twovariables of X. This matrix is then used to computea matrix denoted C which is called the Hadamardsquared covariance matrix and whose elements arethe squares of the elements of . Specifically, C iscomputed as

C = ◦ =(XTMX

)◦(XTMX

)(116)

(where ◦ denotes the Hadamard—i.e., the element-wise—product between two matrices). This matrix C

is positive semi-definite (because is positive semi-definite) and all its elements, being squared numbers,are positive.

The solution of the first criterion is obtaineddirectly from the eigendecomposition of C which isexpressed as

C = UUT with U

TU = I. (117)

As stated by the Perron–Frobenius theorem, theelements of the first eigenvector of C can be chosento be all positive and so the optimum weight vector isobtained by rescaling this first eigenvector such thatthe sum of its elements is equal to one. All the othersteps of ANISOSTATIS can then be performed in thesame way as for the standard version of STATIS.

As an illustration, the set of optimum weightsobtained with Criterion V1 is given in Table 5, andthe factor score solution is shown in Figure 14.

ANISOSTATIS Criterion 2The second criterion of ANISOSTATIS generalizesthe criterion expressed in Eq. (66) and is expressed

TABLE 5 Numerical Values of the a Vector of the Optimum Weightsfor Criterion 1 of ANISOSTATIS

Variables

V1 V2 V3 V4 V5 V6

Assessor 1 0.0215 0.0178 0.0174 0.0139 0.0196 0.0138

Assessor 2 0.0220 0.0216 0.0181 0.0160 0.0135 0.0124

Assessor 3 0.0229 0.0052 0.0179 0.0107 0.0184 0.0169

Assessor 4 0.0283 0.0200 0.0263 0.0158 0.0122

Assessor 5 0.0215 0.0118 0.0179 0.0044 0.0158 0.0076

Assessor 6 0.0271 0.0227 0.0218 0.0159 0.0126

Assessor 7 0.0295 0.0229 0.0293 0.0128

Assessor 8 0.0195 0.0158 0.0138 0.0051 0.0196 0.0137

Assessor 9 0.0274 0.0172 0.0247 0.0180 0.0254

Assessor 10 0.0375 0.0283 0.0297 0.0287

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 155

Page 33: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review wires.wiley.com/compstats

2

1

FIGURE 14 | ANISOSTATIS. Map of the factor scores for Dimensions1 and 2 using Criterion 1 of ANISOSTATIS (because the α weights arevery similar for all three criteria of ANISOSTATIS, this map alsocorresponds to the solutions obtained with Criteria 2 and 3).

similarly as

S′ =K∑k

⟨S[k], S[+]

⟩2 with aTa = 1. (118)

The solution of this problem involves a new J byK matrix denoted B in which a 1 in row j and columnk indicates that the jth variable belongs to the kthtable and a value of 0 that it does not. The solutionof this problem is obtained by first computing the(positive semidefinite) matrix

(CB) × (CB)T (119)

which is then eigen-decomposed as

(CB) × (CB)T = V�VT with V

TV = I. (120)

Note, that this eigendecomposition can also bemore efficiently obtained from the singular valuedecomposition of the matrix CB.

As stated by the Perron–Frobenius theorem, theelements of the first eigenvector of (CB) × (CB)T canbe chosen to be all positive and the optimum weightvector is obtained by rescaling this first eigenvector toa sum of 1. All the other steps of STATIS can thenbe performed in the same way as for the standardversion.

As an additional benefit of using the singularvalue decomposition of CB, we can plot the tablefactor scores in a manner similar to the plot of matrixC for standard STATIS (although in this case, becausewe are using the squared values of a covariance matrix,the first singular value is in general very much largerthan the second and it is therefore preferable toplot Dimensions 2 versus 3 in order to explore thesimilarity structure of the tables).

TABLE 6 Numerical Values of the a Vector of the Optimum Weightsfor Criterion 2 of ANISOSTATIS

Variables

V1 V2 V3 V4 V5 V6

Assessor 1 0.0209 0.0177 0.0170 0.0146 0.0198 0.0137

Assessor 2 0.0213 0.0212 0.0176 0.0163 0.0145 0.0128

Assessor 3 0.0221 0.0059 0.0168 0.0109 0.0188 0.0175

Assessor 4 0.0278 0.0203 0.0262 0.0160 0.0133

Assessor 5 0.0207 0.0120 0.0173 0.0053 0.0164 0.0088

Assessor 6 0.0261 0.0228 0.0208 0.0162 0.0132

Assessor 7 0.0280 0.0233 0.0281 0.0142

Assessor 8 0.0187 0.0158 0.0140 0.0061 0.0198 0.0144

Assessor 9 0.0265 0.0171 0.0238 0.0185 0.0253

Assessor 10 0.0369 0.0280 0.0298 0.0292

The set of optimum weights obtained withCriterion S′ is given in Table 6. As can be seenby comparing Tables 5 and 6, the optimum weightsare almost identical for these two versions ofANISOSTATIS, and the factor scores are also almostidentical (and so Figure 14 also shows the plot of thefactor scores obtained with Criterion S′).

ANISOSTATIS Criterion 3. Power-STATISIn a manner equivalent to power-1 STATIS, the secondcriterion can be modified in order to involve theminimization of the sum of the inner products of thecross-product matrices to the compromise rather thanthe squared inner products. Specifically, the criterionto maximize is then (compare with Eqs 111 and 118):

S′1 =

K∑k

⟨S[k], S[+]

⟩with aTa = 1. (121)

Like power-1 STATIS, the power-1 version ofANISOSTATIS does not require the computationof an eigendecomposition (a important featurewhen dealing with very large matrices because theeigendecomposition computational time depends onthe cube of the size of the matrix to be decomposed).Specifically, the vector a of optimal weights is obtainedas (compare with Eq. (113))

a = C1 × ‖C1‖−1 (122)

(where 1 is a J by 1 vector of ones). This vector ofoptimal weights is then used as in standard-STATISto compute factor scores and loadings.

The set of optimum weights obtained withCriterion S′

1 is given in Table 7. As can be seen

156 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

Page 34: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

TABLE 7 Numerical Values of the a Vector of the Optimum Weightsfor Criterion 3 (Power 1) of ANISOSTATIS

Variables

V1 V2 V3 V4 V5 V6

Assessor 1 0.0210 0.0176 0.0169 0.0147 0.0197 0.0137

Assessor 2 0.0212 0.0212 0.0175 0.0164 0.0145 0.0127

Assessor 3 0.0221 0.0061 0.0169 0.0110 0.0188 0.0175

Assessor 4 0.0278 0.0203 0.0262 0.0160 0.0135

Assessor 5 0.0208 0.0120 0.0174 0.0054 0.0165 0.0090

Assessor 6 0.0261 0.0226 0.0208 0.0161 0.0133

Assessor 7 0.0280 0.0233 0.0281 0.0142

Assessor 8 0.0187 0.0158 0.0142 0.0063 0.0198 0.0145

Assessor 9 0.0264 0.0170 0.0238 0.0185 0.0252

Assessor 10 0.0367 0.0278 0.0297 0.0291

by comparing Tables 5–7, the optimum weightsare almost identical for all three versions ofANISOSTATIS, and, in fact, the factor scoresare also almost identical (and so Figure 14 alsoshows the plot of the factor scores obtained withCriterion S3).

(K + 1)-STATISAn interesting extension to STATIS explores therelationship of K tables with one external table. Asan illustration, in our example we have 10 tablesrepresenting the assessors and one table representingthe chemical properties of the wines. An importantquestion is to explicitly evaluate the similarities amongthe assessors relative to the way they relate to thechemical properties of the wines. This problem iscalled a (K + 1) table problem. Here, we have a set ofK tables (i.e., the K assessors), each of dimensions Iby J[k] and one external table [i.e., the (K + 1)th tableof the chemical properties of the wines] representedby an I by M matrix denoted H. A recent extensionof STATIS called (K + 1)-STATIS or external-STATIShas been proposed to handle this type of question.125

The goal of the method it to analyze the patternof similarity between the K tables in relation tothe (K + 1)th table. In addition to generalizingSTATIS, this methods can also be seen as anextension of Tucker inter-battery,104 co-inertia,32,33

or partial-least square correlation.105,106,126 The basicidea is to compute the optimal α weights usingcorrelation or covariance matrices between each ofthe K tables and the (K + 1)th table (instead of thecross-product matrices). The optimal weights thuscomputed are then used in lieu of the α weightsof STATIS to compute a standard solution for

STATIS (based on the K matrices). The (K + 1)thmatrix is projected as a supplementary element in thesolution.

Formally, we have K matrices X[k] (each of orderI by J[k]) and one I by M ‘external matrix’ H, withall these tables measuring variables on the same setof I observations. These matrices are supposed tohave been preprocessed in an appropriate way. Thefirst step of the analysis is to compute a correlationor covariance matrix (depending upon the type ofpreprocessing) between each of the X[k] matrices andH. These matrices, denoted W[k] (each or order M byJ[k]) are computed as

W[k] = HTX[k]. (123)

For the first step of STATIS (i.e., finding the optimumweights) we use the cross-product matrices computedfrom the W[k] matrices as

S∗[k] = W[k]W

T[k] = HTX[k]X

T[k]H. (124)

As the term X[k]XT[k] defines the standard cross-

product (see Eq. (3)), this last equation can berewritten as:

S∗[k] = HTS[k]H. (125)

This makes explicit the relationships between (K + 1)-STATIS and standard STATIS.

From the cross-product matrices S∗[k], we then

compute the inner product matrix (cf., Eq. 11) denotedC∗ whose generic term is

c∗k,k′ =

⟨S∗

[k], S∗[k′]

⟩=⟨HTX[k]X

T[k]H, HTX[t′]XT

[t′]H⟩.

(126)

The eigendecomposition of matrix C∗ gives (cf.,Eq. (120)):

C∗ = U∗�∗U∗T with U∗TU∗ = I. (127)

From the first eigenvector of C∗, denoted u∗1, we obtain

the set of optimal weights for (K + 1)-STATIS in thesame way as for STATIS (cf., Eq. (15)) as:

α∗ = u∗1 × (u∗T

11)−1. (128)

The elements of α∗ are then used in lieu of the elementsof α and the rest of the STATIS procedure is appliedwith these weights.

As an illustration, we took the matrix ofchemical properties of the wines (e.g., Xsup, see

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 157

Page 35: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review wires.wiley.com/compstats

Table 4) as our H matrix. The first covariance matrixW[1] is

W[1] = HTX[1] =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

−0.065 0.056 0.218 0.096−0.105 0.119 0.099 0.200

0.016 0.011 0.218 0.0960.326 −0.023 0.099 0.251

−0.146 0.148 −0.139 −0.162−0.027 −0.101 −0.139 −0.076−0.213 0.036 −0.020 −0.283−0.065 0.065 −0.258 −0.059

0.205 0.023 −0.020 0.062−0.051 0.077 0.099 −0.059

0.016 0.023 −0.139 0.0100.110 −0.433 −0.020 −0.076

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

T

×

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0.12 0.13 0.07 −0.04 −0.18 0.110.07 0.05 0.13 −0.18 −0.12 0.230.01 0.05 0.02 0.01 −0.07 −0.010.18 0.13 0.13 −0.04 −0.07 0.05

−0.21 −0.18 −0.20 0.16 0.15 −0.07−0.16 −0.03 −0.09 0.21 0.10 −0.19−0.05 −0.11 −0.04 −0.04 0.21 −0.07−0.05 −0.18 −0.09 0.16 0.15 −0.01

0.12 0.13 0.13 −0.04 −0.01 0.17−0.10 0.13 −0.20 0.01 −0.07 −0.01

0.12 −0.03 0.13 −0.18 −0.07 −0.07−0.05 −0.11 0.02 −0.04 −0.01 −0.13

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

=

⎡⎢⎢⎣0.118 0.100 0.109 −0.036 −0.080 0.0360.008 0.029 −0.028 0.000 0.005 0.0970.088 0.154 0.069 −0.090 −0.148 0.0980.151 0.142 0.140 −0.090 −0.163 0.135

⎤⎥⎥⎦ . (129)

From W[1], we can compute S∗[1] as

S∗[1] = W[1]WT

[1]

=

⎡⎢⎢⎣0.0037 0.0003 0.0043 0.00570.0003 0.0009 0.0010 0.00110.0043 0.0010 0.0063 0.00750.0057 0.0011 0.0075 0.0096

⎤⎥⎥⎦ .

(130)

This cross-product matrix (and the nine other ones)are then used to compute the inner product matrixC∗ (whose first element, for example, was equal toc∗

1,2 = 0.00039, note that the elements of C∗ are likelyto be small and the use of the RV coefficient rather thanthe inner product will eliminate this scale problem).

From the eigendecomposition of C∗, we obtainthe first eigenvector, which, in turn, is used to computethe value of α∗ (compare with Eq. (75)) as:

α∗ = u∗1 × (1Tu∗

1)−1 =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0.29100.31080.25960.33220.22160.34260.28670.27420.39160.4047

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦× 3.1149−1 =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0.09340.09980.08330.10660.07110.11000.09200.08800.12570.1299

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦. (131)

This vector α∗ is then plugged in the standardSTATIS algorithm (in lieu of α) and this will providefactor scores and loadings. As an illustration, we showin Figure 15, the factor scores for the compromiseand the H table. The comparison of these figureswith Figures 3a and 7 indicates that, here, taking intoaccount the similarity of the assessor with the chemicalproperties of the wines did not change much theoverall solution. This is probably because of the factthat the (fictitious) assessors were all roughly equallysensitive to the chemical properties of the wines.

DOUBLE-STATIS or DO-ACTA new method, called DO-ACT (for double-act)or double-STATIS, further extends the (K + 1)-STATIS methodology and generalizes Tuckerinterbattery,104 coinertia,32,33 and partial-least squarecorrelation.105,106,126 In the double-STATIS approach,

158 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

Page 36: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

2

1

2

1

(a) (b)

FIGURE 15 | (K + 1)-STATIS. (a) Compromise factor scores. (b) Factor scores for the H matrix: Chemical components of the wines (compare withFigures 3a and 7).

we have one series of K matrices X[k] (each of order Iby J[k]) and one series of L matrices H[�] (each of orderI by M[�]) with all these matrices collecting measureson the same I observations. The goal of the analysis isto find two compromises—one for the X[k]’s and onefor the H[�]’s—such that these two compromises are assimilar as possible (with the similarity being measuredby the inner product between these two compromises).So, more formally, if we denote, respectively, S[k] andT[�] the I by I cross-product matrices of, respectively,X[k] and H[�] and by, respectively, S[+], and T[+] theI by I compromise cross-product matrices for, respec-tively, the X[k]’s and H[�]’s, we are looking for twosets of weights (α for the S[k]’s and β for the T[�]’s)such that the two compromise matrices are computedas (cf., Eq. 34)

S[+] =K∑k

αkS[k] and T[+] =L∑�

β�T[�] with

αTα = βTβ = 1. (132)

In a manner analogous to standard STATIS, thesolution of this problem requires the computation ofa K by L inner product matrix. But, by contrast,with standard STATIS, this matrix (being K by L), isobviously not, in general a square symmetric matrixanymore, as its generic term gives the inner productbetween an S[k] matrix and a T[�] matrix (instead ofbetween two S[k] matrices). Specifically, the value ofthe element ck,� from the K by L inner product matrix

C is computed as (compare with Eq. (8)):

ck,� = ⟨S[k], T[�]

⟩(133)

= trace{X[k]X

T[k] × H[�]HT

[�]

}= trace

{S[k] × H[�]

}= vec

{S[k]

}T × vec{H[�]

}=

I∑i

I∑j

si,j,kti,j,�.

The singular value decomposition (as opposed tothe eigenvalue decomposition, for standard STATIS)of C provides a set of right singular vectors(corresponding to the X[k]’s), a set of left singularvectors (corresponding to the H[�]’s) and a set ofsingular values as:

C = U�VT with UTU = VTV = I. (134)

The first left singular vector (denoted u1) and thefirst left singular vector (denoted v1) provide—afterre-scaling such that their respective sum is one—thevalue of the two sets of optimal weights as (comparewith Eq. (15)):

α = u1 × (uT11)−1 and β = v1 × (vT

11)−1. (135)

Each of the compromises in double-STATIS canthen be analyzed as in standard-STATIS. A detailed

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 159

Page 37: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review wires.wiley.com/compstats

example is provided in Ref 127. The relationshipsbetween the different extensions of STATIS for twosets of tables such as STATICO, COSTATIS, anddouble-STATIS remain to be explored.

The double-STATIS approach has also been,recently, extended to multiple sets of data mat-rices.60,61 This extension, called STATIS-4, uses aiterative process to compute a compromise for eachof the set of tables and a global compromise (i.e., thecompromise of the compromises).

RELATED METHODSSTATIS is part of the multitable family and alsoof the Procrustes family. A thorough evaluationof the complex relationships between all thesetechniques is beyond the scope of this article, butsome directions for comparisons could be of interest.STATIS is also closely related to generalized canonicalcorrelation analysis GCCA. We will look first at therelationship between STATIS and GCCA and thenat the relationships between STATIS and the othertechniques.

STATIS and Generalized CanonicalCorrelation AnalysisAn important anisotropic technique is general-ized canonical correlation analysis (GCCA). InGCCA,128–130 the goal is to maximize the inner-product of the factor scores of the compromise undera constraint that makes it equivalent to maximizingthe correlations between the compromise factor scoresand the partial factor scores. Specifically, in GCCA,we seek a matrix of J weights by L dimensions (withL being the number of dimensions for the solution)Wgcca such that

Fgcca = XWgcca with trace{FT

gccaFgcca

}= max .

(136)

under the constraints that

WTgcca

˜Wgcca = I, (137)

where ˜ is a J by J block diagonal matrix with eachJ[k] by J[k] diagonal block equal to 1

I XT[k]X[k] (note

that, here, as in CANOSTATIS, the inverse can bereplaced by a pseudoinverse129,130).

The set of Wgcca weights is obtained from thefollowing GSVD of X:

X = Pgcca�gccaQTgcca with

PTgccaMPgcca = QT

gcca˜

−1Qgcca = I. (138)

The factors scores Fgcca are obtained as

Fgcca = Pgcca�gcca = X˜−1

Qgcca

= XWgcca with Wgcca = ˜−1

Qgcca. (139)

So, in brief, GCCA and ANISOSTATIS areboth anisotropic techniques, but the criteria theymaximize are quite different: maximum correlationfor gcca and maximum squared inner product (orinner product, depending upon the version) forANISOSTATIS. Further work is needed to furtherdevelop the comparison between GCCA, STATIS, andtheir variants.

STATIS and GPAThe relationship of STATIS and general Procrustesanalysis (GPA) is examined in Refs 12,131. Anobvious advantage of STATIS over GPA is to be aneigendecomposition technique and therefore STATISdoes not require multiple iterations to reach aconsensus and is also guaranteed to converge. A mainpoint of difference between the two techniques is thespace they try to fit: STATIS considers the wholespace (i.e., the whole cross product matrices) in whichthe K tables are, whereas GPA considers the specificdimensions (i.e., the factor scores) and, is general,restricted to a subset of these dimensions.

STATIS and Multiblock AnalysesSTATIS is also part of the multitable or multiblockfamily of PCA extensions.2–15 The well-knownmembers of this family—such as multiple factoranalysis, SUM-PCA, consensus PCA, and multiblockcorrespondence analysis—reduce to the PCA of amatrix X in which each X[k] has been normalizedin a specific way for each technique (see section ontable normalization). STATIS differs essentially by thechoice of the optimum weights which is an additionalstep after table normalization has been performed(see, e.g., Ref 132 for a comparison between STATISand multiple factor analysis). In this sense, STATISencompasses all these methods as particular cases(i.e., when all the weights in STATIS are equaland when the adequate table normalization has beenperformed).

STATIS (and in particular DISTATIS) can beseen also as a constrained simplified version ofINDSCAL (see Ref 13 for a comparison betweenSTATIS and INDSCAL, and Ref 133 for a geometricalinterpretation of INDSCAL and multitable analyses)because STATIS uses isotropic weights on the tables,

160 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

Page 38: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

whereas INDSCAL uses anisotropic weights on thedimensions (in addition STATIS, being an eigen-vector base technique is guaranteed to converge,whereas indscal is an iterative technique whoseconvergence is not always certain). DISTATIS,compared to INDSCAL, is obviously also anisotropic technique tailored for Euclidean distancematrices whereas INDSCAL can handle non Euclideansimilarity matrices. ANISOSTATIS and INDSCALare both anisotropic but ANISOSTATIS weights thevariables of the tables whereas indscal weights thedimensions of the solutions.

COMPUTER PACKAGES

Several statistical packages implement STATIS andsome of their variants. A very comprehensive package(oriented for ecologists) is the R-package ade4.56,134

STATIS-4 has been programmed in S (and is likelyto run with R) and is available from Ref 60. Alsoall the techniques described in this article (with theexception of STATIS-4) have been programmed inMATLAB and are available from the senior author’shome page at www.utdallas.edu/∼herve. In addition,R-programs—implementing the techniques describedin this article—written by Derek Beaton are availablefrom this address. A simple DISTATIS programwritten by Guillaume Blancher is also available atthis address.

CONCLUSION

STATIS is a simple, elegant, and robust technique thatcan be used to integrate multiple data tables collectedon the same set of observations. In addition of beingof very practical importance, STATIS is also a domainof recent vigourous theoretical developments; and itssimplicity (both theoretical and computational) makesit an ideal tool for the very large data sets of modernscience.

APPENDIXTHE INNER-PRODUCT BETWEENPOSITIVE SEMI-DEFINITE MATRICESIS POSITIVE OR NULL

Here we show that the inner-product between positivesemidefinite matrices is always larger or equal tozero. Recall that a matrix S is positive semidefiniteif there is a matrix X such that S can be writtenas the product XXT. Let S and T be two positive

semidefinite matrices. The inner product betweenthese two matrices is equal to⟨

S, T⟩ = trace

{ST}. (140)

Because T is positive semidefinite, the left part ofEq. (140) can be rewritten as

trace{T

12 ST

12

}. (141)

Because S is positive semidefinite it can be expressedas S = (S

12 )(S

12 )T, therefore, the matrix T

12 ST

12 =(

T12 S

12

) (T

12 S

12

)Tis also positive semidefinite, and

therefore all its eigenvalues are positive or null. Thus,its trace being equal to the sum of its eigenvaluesis also positive or null and this completes theproof.

STATIS. CRITERION 1

PreliminariesFirst we note that the quadratic expression αTCα canbe rewritten as:

αTCα =K∑k

K∑m

αkαmck,m. (142)

This can be obtained by developing αTCα to expressit as a sum.

Some properties of the trace operator will beuseful (see, for more details, e.g., Refs 135,136):

• Property P1: trace {α × S} = α × trace {S} forany scalar α.

• Property P2: trace{S+T

}=trace {S}+trace{T}

• Property P3: Combining Properties 2 and 3, wehave:

trace

{∑k

αkS[k]

}=∑

t

αktrace{S[k]

}.

• Property P4: trace{ST} = trace

{TS} = trace{

TTST}.

Maximum Variance of CompromiseHere we show that Criterion 1 of STATIS maximizes:

V = ||S[+]||2 = ⟨S[+], S[+]

⟩with αTα = 1. (143)

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 161

Page 39: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review wires.wiley.com/compstats

First we show that V = αTCα. This is shown bydeveloping Eq. (143) as

⟨S[+], S[+]

⟩ = trace{S[+]S[+]

}= trace

{(K∑k

αkS[k]

)(K∑m

αmS[m]

)}

= trace

{K∑k

K∑m

αkαmS[k]S[m]

}

=K∑k

K∑m

αkαmtrace{S[k]S[m]

}from P1

=K∑k

K∑m

αkαm⟨S[k], S[m]

=K∑k

K∑m

αkαmck,m

= αTCα (From Eq. 142). (144)

Second, we show that αTCα is maximum whenα is first eigenvector of C under the constraint thatαTα = 1 (see Ref 136, p. 161 or Ref 50, p. 455, formore details).

To do so, we start by writing the Lagrangian

L = 12

{αTCα − λ

(αTα − 1

)}. (145)

Differentiating L with respect to α gives

∂L∂α

= Cα − λα. (146)

The expression L is maximum when its derivativevanishes and this implies that

Cα = λα (147)

which shows that α is an eigenvector of C and soαTCα = λ. Therefore the maximum of L is reachedwhen α is the first eigenvector of C. Note, also that,because of the Perron–Frobenius theorem, the ele-ments of α can always be chosen to be positive (seee.g., Ref 68 p. 34ff ).

Alternative Version: MinimizingDissimilarityCriterion V from Eq. (143) can be rewritten as a min-imization problem for which we minimize the sumof the square distances between the cross-productmatrices and the compromise. Specifically, we want tominimize:

D =K∑k

‖S[k] − αkS[+]‖2 with αTα = 1. (148)

Developing and simplifying Eq. (148) gives

D =K∑k

‖S[k]‖2 − αTCα, (149)

which, obviously, reaches its minimum when αTCα

reaches its maximum.

STATIS. CRITERION 2

Here we show that STATIS maximizes the followingcriterion21

S =K∑k

⟨S[k], S[+]

⟩2 with αTα = 1. (150)

This is obtained by first developing and simpli-fying Eq. (150) to obtain

K∑k

⟨S[k], S[+]

⟩2 =K∑k

K∑m

αkαm

K∑n

trace{S[k]S[n]

}× trace

{S[n]S[m]

}=

K∑k

K∑m

αkαm

K∑n

ck,n × cn,m. (151)

The termK∑n

ck,n × cn,m corresponds to the element at

row k and column m of C2, and therefore Eq. (151)can be rewritten as (cf., Eq. (142)):

S = αTC2α. (152)

Because C2 has the same eigenvectors as C (and itseigenvalues are the squared of the eigenvalues of C),the approach use previously shows S reaches its max-imum when α is the first eigenvector of C2 which isalso the first eigenvector of C.

162 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

Page 40: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

ACKNOWLEDGMENT

We would like to thank two anonymous reviewers and also particularly Yoshio Takane for their very thoroughand helpful comments on a draft of this article.

REFERENCES1. Abdi H. Multivariate analysis. In: Lewis-Beck M,

Bryman A, Futing T, eds. Encyclopedia for ResearchMethods for the Social Sciences. Thousand Oaks (CA):Sage; 2003, 699–702.

2. Acar E, Yener B. Unsupervised multiway data analysis:a literature survey. IEEE Trans Knowledge Data Eng2009, 21:6–19.

3. Arcidiacono C, Sarnacchiaro P, Velleman R. Testingfidelity to a new psychological intervention for familymembers of substance misusers during implementationin Italy. J Subst Abuse 2008, 13:361–381.

4. Stanimirova I, Boucon C, Walczak B. Relating gaschromatographic profiles to sensory measurementsdescribing the end products of the Maillard reaction.Talanta 2011, 83:1239–1246.

5. Williams LJ, Abdi H, French R, Orange JB. A tutorialon multi-block discriminant correspondence analysis(MUDICA): A new method for analyzing discoursedata from clinical populations. J Speech Lang HearingRes 2010, 53:1372–1393.

6. Daszykowski M, Walczak B. Methods for the explor-atory analysis of two-dimensional chromatographicsdata. Talanta 2011, 83:1088–1097.

7. Carlier A, Lavit C, Pages M, Pernin M, Turlot J. Acomparative review of methods which handle a setof indexed data tables. In: Coppi R, Bollasco S, eds.Multiway Data Analysis. Amsterdam: North Holland;1989, 85–101.

8. Derks EPPA, Westerhuis JA, Smilde AK, King BM. Anintroduction to multi-blocks component analysis bymeans of a flavor language case study. Food QualityPrefer 2003, 14:497–506.

9. Escofier B, Pages J. Multiple factor analysis. CompStat Data Anal 1990, 18:121–140.

10. Guebel DV, Canovas M, Torres N. 2009 ModelIdentification in presence of incomplete informationby generalized principal component analysis: appli-cation to the common and differential responses ofEscherichia coli to multiple pulse perturbation incontinuous high-biomass density culture. BiotechnolBioeng 104:785–795

11. Hassani S, Martens M, Qannari EM, Hanafi M.Analysis of -omic data: Graphical interpretation andvalidation tools in multi-block methods. Chem IntellLab Syst 2010, 104:140–153.

12. Meyners M, Kunert J, Qannari EM. Comparing gener-alized Procrustes analysis and STATIS. Food QualityPrefer 2000, 11:77–83.

13. Qannari EM, Wakeling I, MacFie JH. A hierarchyof models for analyzing sensory data. Food QualityPrefer 1995, 6:309–314.

14. Smilde AK, Westerhuis JA, de Jong S. A frameworkfor sequential multiblock component methods. J Chem2003, 17:323–337.

15. Van Deun K, Smilde AK, van der Werf MJ, Kiers HAL,Van Mechelen IV. A structured overview of simul-taneous component based data integration. BMC-Bioinform 2009, 10:246–261.

16. Escoufier Y. L’analyse conjointe de plusieurs matricesde donnees. In: Jolivet M, ed. Biometrie et Temps.Paris: Societe Francaise de Biometrie; 1980, 59–76.

17. L’Hermier des Plantes H. Structuration des Tableauxa Trois Indices de la Statistique, These de troisiemecycle. Universite de Montpellier, 1976.

18. L’Hermier des Plantes H, Thiebaut E. Etude de la plu-viosite au moyen de la methode STATIS. Revue StatAppl 1977, 25:57–81.

19. Lavit C. Application de la methode STATIS. Stat AnalDonnees 1985, 10:103–116.

20. Lavit C. Analyse Conjointe de Tableaux Quantitatifs.Paris: Masson; 1988.

21. Lavit C, Escoufier Y, Sabatier R, Traissac P. TheACT (STATIS) method. Comp Stat Data Anal 1994,18:97–119.

22. Korth B, Tucker LR. Procrustes matching by congru-ence coefficients. Psychometrika 1976, 41:531–535.

23. Genard M, Souty M, Holmes S, Reich M, Breuils L.Correlations among quality parameters of peach fruits.J Sci Food Agric 1994, 66:241–245.

24. Qannari EM, Wakeling I, Courcous P, MacFie JH.Defining the underlying dimensions. Food QualityPrefer 2000, 11:151–154.

25. Schlich P. Defining and validating assessor compro-mises about product distances and attribute correla-tions. In: Næs T, Risvik E, eds. Multivariate Analysisof Data in Sensory Sciences. New York: Elsevier; 1996,229–306.

26. Chaya C, Perez-Hugalde C, Judez L, Wee CS, GuinardJX. Use of the STATIS method to analyze time-intensity profiling data. Food Quality Prefer 2003,15:3–12.

27. Abdi H, Valentin D, Chollet S, Chrea C. Analyzingassessors and products in sorting tasks: DISTATIS,theory and applications. Food Quality Prefer 2007,18:627–640.

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 163

Page 41: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review wires.wiley.com/compstats

28. Abdi H, Valentin D. Some new and easy ways todescribe, compare, and evaluate products and asses-sors. In: Valentin D, Nguyen DZ, Pelletier L, eds. NewTrends in Sensory Evaluation of Food and Non-FoodProducts. Ho Chi Minh (Vietnam): Vietnam NationalUniversity-Ho Chi Minh City Publishing House; 2007,5–18.

29. Blancher G, Clavier B, Egoroff C, Duineveld K, Par-con J. A method to investigate the stability of a sortingmap. Food Quality Prefer 2011, in press.

30. Gourvenec S, Stanimirova I, Saby CA, Airiau CY, Mas-sart DL. Monitoring batch processes with the STATISapproach. J Chem 2005, 19:288–300.

31. Stanimirova I, Walczak B, Massart DL, Simeonovc V,Sabyd CA, & di Crescenzo E. STATIS, a three-waymethod for data analysis: application to environmentaldata. Chem Intell Lab Syst 2004, 73:219–233.

32. Thioulouse J, Simier M, Chessel D. Simultaneous anal-ysis of a sequence of paired ecological tables. Ecology2004, 85:272–283.

33. Thioulouse J. Simultaneous analysis of a sequenceof paired ecological tables: a comparison of severalmethods. Ann Appl Stat 2011, in press.

34. Abdi H, Valentin D, O’Toole AJ, Edelman B. DIS-TATIS: The analysis of multiple distance matrices,Proceedings of the IEEE Computer Society: Interna-tional Conference on Computer Vision and PatternRecognition, San Diego, CA, USA, 2005, 42–47.

35. Fournier M, Motelay-Massei A, Massei N, Aubert M,Bakalowiicz M, Dupond JP. Investigation of transportprocesses inside Karst aquifer by means of STATIS.Ground Water 2009, 47:391–400.

36. Gudmundsson L, Lena M, Tallaksen LM, Stahl K. Spa-tial cross-correlation patterns of European low, meanand high flows. Hydrol Process 2011, 25:1034–1045.

37. Enachescu C, Postelnicu T. Patterns in journal citationdata revealed by exploratory multivariate analysis.Scientometrics 2003, 56:43–59.

38. Kherif F, Poline J-P, Meriaux S, Benali H, Flandin G,Brett M. Group analysis in functional neuroimaging:selecting subjects using similarity measures. NeuroIm-age 2003, 20:2197–2208.

39. Shinkareva SV, Ombao HC, Sutton BP, Mohanty A,Miller GA. Classification of functional brain imageswith a spatio-temporal dissimilarity map. NeuroImage2006, 33:63–71.

40. Shinkareva SV, Mason RA, Malave VL, Wang W,Mitchell TM. Using fMRI brain activation to iden-tify cognitive states associated with perception oftools and dwellings. PLoS ONE 2008, 3:e1394.doi:10.1371/journal.pone.0001394.

41. Shinkareva SV, Malave SV, Just MA, Mitchell TM.Exploring commonalities across participants in theneural representation of objects. Human Brain Map2011, in press.

42. Abdi H, Dunlop JP, Williams LJ. How to computereliability estimates and display confidence and toler-ance intervals for pattern classifiers using the Bootstrapand 3-way multidimensional scaling (DISTATIS). Neu-roImage 2009, 45:89–95.

43. Scepi G. Parametric and non parametric multivariatequality control charts. In: Lauro C, Antoch J, EspositoVinzi V, Saporta G, eds. Multivariate Total QualityControl. Berlin: Physica-Verlag; 2002, 163–189.

44. Niang N, Fogliatto F, Saporta G. Controle multivariede procedes par lots a l’aide de STATIS, Proceedings ofthe 41th ‘‘Journees de Statistique.’’ Bordeaux (France),2009.

45. Coquet R, Troxler L, Wipff G. The STATIS method:Characterization of conformational states of flexi-ble molecules from molecular dynamics simulationin solution. J Mol Graph 1996, 14:206–212.

46. Marquez EJ, Knowles LL. Correlated evolution of mul-tivariate traits: detecting co-divergence across multipledimensions. J Evol Biol 2007, 20:2334–2348.

47. Pavoine S, Bailly X. New analysis for consistencyamong markers in the study of genetic diversity: devel-opment and application to the description of bacterialdiversity. BMC Evol Biol 2007, 7:156–172.

48. Raymond O, Fiasson JL, Jay M. Synthetic taxonomyof Rosa races using ACT-STATIS. Zeitschrift Natur-forschung C 2000, 55:399–409.

49. Yanai H, Takeuchi K, Takane Y. Projection Matri-ces, Generalized Inverse Matrices, and Singular ValueDecomposition. New York: Springer-Verlag; 2011.

50. Abdi H, Williams LJ. Principal component analysis.Wiley Interdiscip Rev: Comp Stat 2010, 2:433–459.

51. Abdi H. Singular value decomposition (SVD) and gen-eralized singular value decomposition (GSVD). In:Salkind N, ed. Encyclopedia of Measurement andStatistics. Thousand Oaks CA: Sage; 2007, 907–912.

52. de Leeuw J. Derivatives of generalized eigensystemswith applications. UCLA Depart Stat Papers 2007,1–28.

53. Greenacre M. Theory and Applications of Correspon-dence Analysis. London: Academic Press; 1984.

54. Takane Y. Relationships among various kinds of eigen-value and singular value decompositions. In: YanaiH, Okada A, Shigemasu K, Kano Y, Meulman J,eds. New Developments in Psychometrics. Tokyo:Springer-Verlag; 2002, 45–46.

55. Caillez F, Pages JP. Introduction a l’Analyse desDonnees. Paris: SMASH; 1976.

56. Dray S, Dufour AB. The ade4 package: implementingthe duality diagram for ecologists. J Stat Softw 2007,22:1–20.

57. Escoufier Y. Operators related to a data matrix: asurvey. COMPSTAT: Proceedings in ComputationalStatistics; 17th Symposium Held in Rome, Italy, 2006.New York: Physica Verlag; 2007, 285–297.

164 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

Page 42: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

58. Holmes S. Multivariate analysis: the French way. In:Nolan D, Speed T, eds. Festschrift for David Freed-man. Beachwood: IMS; 2006, 1–14.

59. de la Cruz O, Holmes S. The duality diagram in dataanalysis: examples of modern applications. Ann ApplStat 2011, in press.

60. Sabatier R, Vivien M. A new linear method for ana-lyzing four-way multiblocks tables: STATIS-4. J Chem2008, 22:299–407.

61. Vivien M, Sune F. Two four-way multiblock methodsused for comparing two consumer panels of children.Food Quality Prefer 2009, 20:472–481.

62. Renyi A. On measures of dependence. Acta MathHungar 1959, 10:441–451.

63. Horn RA, Johnson, CR. Matrix Analysis. Cambridge:Cambridge University Press; 2006.

64. Escoufier Y. Le traitement des variables vectorielles.Biometrics 1973, 29:751–760.

65. Robert P, Escoufier Y. A unifying tool for linear mul-tivariate statistical methods: the RV-coefficient. ApplStat 1976, 25:257–265.

66. Abdi H. RV coefficient and congruence coefficient.In: Salkind NJ, ed. Encyclopedia of Measurement andStatistics. Thousand Oaks, CA: Sage; 2007, 849–853.

67. Abdi H. Congruence: Congruence coefficient, RV coef-ficient, and Mantel Coefficient. In: Salkind NJ, ed.Encyclopedia of Research Design. Thousand Oaks,CA: Sage; 2010, 222–229.

68. Rencher AC. Methods of Multivariate Analysis. NewYork: John Wiley & Sons; 2002.

69. Greenacre M. Biplots in Practice. Barcelona: Fun-dacion BBVA; 2010.

70. Gower JC, Lubbe S, le Roux N. Understanding Biplots.New York: John Wiley & Sons; 2011.

71. Abdi H, Valentin D. STATIS. In: Salkind NJ, ed. Ency-clopedia of Measurement and Statistics. ThousandOaks, CA: Sage; 2007, 955–962.

72. Gaertner JC, Chessel D, Bertrand J. Stability of spa-tial structures of dermersal assemblages: a multitableapproach. Aquatic Living Resour 1998, 11:75–85.

73. Bhattacharyya A. On a measure of divergencebetween two multinomial populations. Sankhya 1941,7:401–406.

74. Escofier B. Analyse factorielle et distances repondantau principe d’equivalence distributionnelle. Revue StatAppl 1978, 26:29–37.

75. Domenges D, Volle M. Analyse factorielle spherique:une exploration. Ann l’INSEE 1979, 35:3–83.

76. Rao CR. Use of Hellinger distance in graphical dis-plays. In: Tiit E-M, Kollo T, Niemi H, eds. MultivariateStatistics and Matrices in Statistics. Leiden: Brill Aca-demic Publisher; 1995, 143–161.

77. Abdi H. Distance. In: Salkind N, ed. Encyclopediaof Measurement and Statistics. Thousand Oaks, CA:Sage; 2007, 280–284.

78. Tucker LR. The extension of factor analysis to three-dimensional matrices. In: Frederiksen N, Gulliken H,eds. Contributions to Mathematical Psychology. NewYork: Holt; 1964, 110–182.

79. Westerhuis JA, Kourti T, MacGregor JF. Analysis ofmultiblock and hierarchical PCA and PLS models. JChem 1998, 12:301–321.

80. Næs T, Brockhoff PB, Tomic O. Statistics for Sensoryand Consumer Science. London: John Wiley & Sons;2010.

81. Escofier B, Pages J. Analyses Factorielles Simples etMultiples: Objectifs, Methodes, Interpretation. Paris:Dunod; 1990.

82. Abdi H, Valentin D. Multiple factor analysis. In:Salkind NJ, ed. Encyclopedia of Measurement andStatistics. Thousand Oaks, CA: Sage; 2007, 657–663.

83. Lebart L, Piron M, Morineau A. StatistiqueExploratoire Multidimensionnelle: Visualisations etInferences en Fouille de Donnees. Paris: Dunod; 2006.

84. Gower JC. Adding a point to vector diagrams inmultivariate analysis. Biometrika 1968, 55:582–585.

85. Lazraq A, Hanafi M, Cleroux R, Allaire J, LepageY. Une approche inferentielle pour la validation ducompromis de la methode STATIS. JSoc Francaise Stat2008, 149:97–109.

86. Areia A, Oliveira MM, Mexia JT. Models for a seriesof studies based on geometrical representation. StatMethodol 2008, 5:277–288.

87. Oliveira MM, Mexia JT. Modelling series of studieswith a common structure. Comp Stat Data Anal 2007,51:5876–5885.

88. Oliveira MM, Mexia JT. ANOVA-like analysis ofmatched series of studies with a common structure. JStat Plan Infer 2007, 137:1862–1870.

89. Good P. Permutation, Parametric, and Bootstrap Testsof Hypotheses. New York: Springer-Verlag; 2005.

90. Abdi H, Williams LJ. Jackknife. In: Salkind NJ, ed.Encyclopedia of Research Design. Thousand Oaks,CA: Sage; 2010, 655–660.

91. Efron B, Tibshirani RJ. An Introduction to the Boot-strap. New York: Chapman and Hall; 1993.

92. Chernick MR. Bootstrap Methods: A Guide for Prac-titioners and Researchers. New York: John Wiley &Sons; 2008.

93. Milan M. Applications of the parametric bootstrap tomodels that incorporate a singular value decomposi-tion. Appl Stat 1995, 44:31–49.

94. Lebart L. Which Bootstrap for principal axes methods?Selected Contributions in Data Analysis and Classifi-cation. Studies in Classification, Data Analysis, andKnowledge Organization; 2007, 581–588.

95. Abdi H. The Bonferonni and Sidak corrections formultiple comparisons. In: Salkind N, ed. Encyclopediaof Measurement and Statistics. Thousand Oaks, CA:Sage; 2007, 104–107.

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 165

Page 43: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

Advanced Review wires.wiley.com/compstats

96. Jaffrenou PA. Sur l’Analyse des Familles Finies de Vari-ables Vectorielles: Bases Algebriques et Applications ala Description Statistique, These de Troisieme Cycle.Universite de Lyon, 1978.

97. Thioulouse J, Chessel D. Les analyses multitableauxen ecologie factorielle. I. De la theorie d’etat a latypologie de fonctionnement par l’analyse triadique.Acta Oecol, Oecol General 1987, 8:463–480.

98. Simier M, Blanc L, Pellegrin F, Nandris D. Approchesimultanee de K couples de tableaux: applica-tion a l’etude ds relations pathologie vegetale-environnement. Revue Stat Appl 1999, 47:31–46.

99. Rolland A, Bertrand F, Maumy M, Jacquet S. Assess-ing phytoplankton structure and spatio-temporaldynamics in a freshwater ecosystem using a pow-erful multiway statistical analysis. Water Res 2009,43:3155–3168.

100. Bertrand F, Maumy M. Using partial triadic anal-ysis for depicting the temporal evolution of spatialstructures: assessing phytoplankton structure and suc-cession in a water reservoir. Case Studies Business,Industry Govern Stat 2010, 4:23–43.

101. Mendes S, Gomez JF, Pereira MJ, Azeiteiro UM,Galindo-Villardon MP. The efficiency of the partialtriadic analysis methods: an ecological application.Biometr Lett 2010, 47:83–106.

102. Tucker LR. Some mathematical notes on three-modefactor analysis. Psychometrika 1966, 31:279–311.

103. Kiers HAL. Hierarchical relations among three-waymethods. Psychometrika 1991, 56:449–470.

104. Tucker LR. An inter-battery method of factor analysis.Psychometrika 1958, 23:111–136.

105. Krishnan A, Williams LJ, McIntosh AR, Abdi H. Par-tial Least Squares (PLS) methods for neuroimaging: atutorial and review. NeuroImage 2011, 56:455–475.

106. McIntosh AR, Lobaugh NJ. Partial least squares anal-ysis of neuroimaging data: applications and advances.NeuroImage 2004, 23:250–263.

107. Tucker LR, Messick S. An individual difference modelfor multidimensional scaling. Psychometrika 1963,28:333–367.

108. Horst P. Factor Analysis of Data Matrices. New York:Holt; 1965.

109. Ballester J, Abdi H, Langlois J, Peyron D, ValentinD. The odor of colors: can wine experts and novicesdistinguish the odors of white, red, and rose wines?Chem Percep 2009, 2:203–213.

110. Chollet S, Lelievre M, Abdi H, Valentin D. Sort andBeer: Everything you wanted to know about the sort-ing task but did not dare to ask. Food Quality Prefer2011, 22:507–520.

111. Santosa M, Abdi H, Guinard JX. A modified sortingtask to investigate consumer perceptions of extra virginolive oils. Food Quality Prefer 2010, 21:881–892.

112. Lelievre M, Chollet S, Abdi H, Valentin D. Beer trainedand untrained assessors rely more on vision than ontaste when they categorize beers. Chem Percep 2009,2:143–153.

113. Churchill N, Oder A, Abdi H, Tam F, Lee W, ThomasC, Graham S, Strother S. Optimizing correction ofhead motion and physiological noise in single-subjectfMRI analyses: 1. Standard temporal-based methods.Human Brain Mapp 2011, in press.

114. Caspi Y, Axelrod A, Matsushita Y, Gamliel A.Dynamic stills and clip trailers. Visual Comp 2006,22:642–652.

115. Young G, Householder AS. Discussion of a set ofpoints in terms of their mutual distances. Psychome-trika 1938, 3:19–22.

116. Torgerson W. Theory and Methods of Scaling. NewYork: John Wiley & Sons; 1958.

117. Abdi H. Metric multidimensional scaling. In: SalkindN, ed. Encyclopedia of Measurement and Statistics.Thousand Oaks, CA: Sage; 2007, 598–605.

118. Vallejo-Arboleda A, Vicente-Villardon JL, Galindo-Villardon MP. Canonical STATIS: Biplot analysis ofmulti-table group structured data based on STATIS-ACT methodology. Comp Stat Data Anal 2007,51:4193–4205.

119. Abdi H, Williams LJ. Barycentric discriminant analysis(BADIA). In: Salkind NJ, ed. Encyclopedia of ResearchDesign. Thousand Oaks, CA: Sage; 2010, 64–75.

120. Lachenbruch PA. Discriminant Analysis. New York:Macmillan; 1975.

121. Pele J, Abdi H, Moreau M, Thybert D, Chabbert M.Multidimensional scaling reveals the main evolution-ary pathways of class A G-protein-coupled receptors.PLoS One 2011, 6:1–10. doi:10.1371/journal.pone.0019094.

122. Rao CR, Mitra SK. Generalized Inverse of Matricesand its Applications. New York: John Wiley & Sons;1971.

123. Benasseni J, Bennani Dosse M. Analyzing multiset databy the power STATIS-ACT method. Adv Data AnalClassif 2011, 5, in press.

124. Bennani Dosse M, Groenen P, Abdi H. AnisotropicSTATIS, submitted.

125. Sauzay L, Hanafi M, Qannari EM, Schlich P. Analysede K + 1 tableaux a l’aide de la methode STATIS:application en evaluation sensorielle, 9ieme JourneesEuropeennes Agro-industrie et Methodes Statistiques.Montpellier (France). 2006, 1–23.

126. Abdi H. Partial least square regression, projectionon latent structure regression, PLS-Regression. WileyInterdiscip Rev: Comp Stat 2010, 2:97–106.

127. Vivien M, Sabatier R. A generalization of STATIS-ACT strategy: DO-ACT for two multiblocks tables.Comp Stat Data Anal 2004, 46:155–171.

166 © 2012 Wiley Per iodica ls, Inc. Volume 4, March/Apr i l 2012

Page 44: STATIS and DISTATIS: optimum multitable principal component analysis …herve/abdi_Wires_AWVB2012_Final.pdf · 2012-04-18 · Advanced Review STATIS and DISTATIS: optimum multitable

WIREs Computational Statistics STATIS and DISTATIS

128. Horst P. Generalized canonical correlations and theirapplications to experimental data. J Clin Psychol 1961,17:331–347.

129. Takane Y, Yanai M, Hwang H. An improved methodfor generalized constrained canonical correlation anal-ysis. Comp Stat Data Anal 2006, 50:221–241.

130. Takane Y, Hwang H, Abdi H. Regularized multi-ple set canonical correlation analysis. Psychometrika2008, 73:753–775.

131. Gower JC, Dijksterhuis GB. Procrustes Problems.Oxford: Oxford University Press; 2004.

132. Pages J. Elements de comparaison entre l’analyse fac-torielle multiple et la methode STATIS. Revue StatAppl 1996, 44:81–95.

133. Husson F, Pages J. INDSCAL model: geometrical inter-pretation and methodology. Comp Stat Data Anal2006, 50:358–378.

134. Dray S, Dufour AB, Chessel D. The ade4 package-II: two-table and K-table methods. R News 2007, 7:47–52.

135. Harville DA. Matrix Algebra from a Statistician’s Per-spective. New York: Springer-Verlag; 2008.

136. Gentle JE. Matrix Algebra: Theory, Computation andApplications in Statistics. New York: Springer-Verlag;2007.

137. Mkhadri A, Celeux G, Nasroallah A. Regularizationin discriminant analysis: an overview. Comp Stat DataAnal 1997, 23:403–423.

Volume 4, March/Apr i l 2012 © 2012 Wiley Per iodica ls, Inc. 167